Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models

Size: px
Start display at page:

Download "Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models"

Transcription

1 Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models David B. Dahl* Texas A&M University November 18, 2005 Abstract This paper proposes a new efficient merge-split sampler for both conjugate and nonconjugate Dirichlet process mixture (DPM) models. These Bayesian nonparametric models are usually fit using Markov chain Monte Carlo (MCMC) or sequential importance sampling (SIS). The latest generation of Gibbs and Gibbs-like samplers for both conjugate and nonconjugate DPM models effectively update the model parameters, but can have difficulty in updating the clustering of the data. To overcome this deficiency, merge-split samplers have been developed, but until now these have been limited to conjugate or conditionally-conjugate DPM models. This paper proposes a new MCMC sampler, called the sequentially-allocated merge-split (SAMS) sampler. The sampler borrows ideas from sequential importance sampling. Splits are proposed by sequentially allocating observations to one of two split components using allocation probabilities that condition on previously allocated data. The SAMS sampler is applicable to general nonconjugate DPM models as well as conjugate models. Further, the proposed sampler is substantially more efficient than existing conjugate and nonconjugate samplers. KEYWORDS: Sequential importance sampling; Partial conditioning; Markov chain Monte Carlo; Metropolis-Hastings Algorithm; Bayesian nonparametrics. *David B. Dahl is Assistant Professor, Department of Statistics, Texas A&M University, College Station, TX ( dahl@stat.tamu.edu). The author thanks Michael Newton for many helpful discussions and valuable suggestions. Sonia Jain graciously provided the simulated data used in Jain and Neal (2004). Finally, Marina Vannucci and Gordon Dahl provided helpful advice in the prepartion of the manuscript. The Monte Carlo study was performed using free, open source software written by the author and available at dahl/sams.

2 1 Introduction The Dirichlet process mixture (DPM) model is arguably the best studied class of Bayesian nonparametric models. Theoretical groundwork for the model was laid in pioneering work of Ferguson (1973), Blackwell and MacQueen (1973), Antoniak (1974), Ferguson (1974), Korwar and Hollander (1973) and others. With the advent of modern computational methods in the 1990 s, much research has been devoted to techniques for fitting DPM models. All methods for fitting DPM models must accommodate the following two features: (1) The allocation ( clustering ) of the data into components ( clusters ) and (2) The model parameters of each component ( cluster locations ). Escobar (1994) was the first to provide a Gibbs sampler for fitting DPM models. The sampler removes a data point from its component and reallocates it to one of the existing components or to a new component by itself. One Gibbs scan performs this procedure for every data point. A major deficiency of the original algorithm is that model parameters only change by completely emptying a component and/or creating a new component. Due to sticky model parameters inherent in this approach, this original algorithm has great difficulty in covering the posterior distribution. Computational strategies have been proposed to address the sticky model parameters. One strategy is to integrate away the model parameters, as suggested by Liu (1994), MacEachern (1994), and Neal (1992). Since the model parameters can easily be sampled given the clustering, this strategy of integrating away the model parameters does not preclude inference on them. Unfortunately, this integration is typically only feasible in conjugate DPM models. In the case of nonconjugate DPM models, sticky model parameters can be addressed by adding an additional element to the sampling algorithm specifically designed to move the model parameters, as suggested by Bush and MacEachern (1996), Damien et al. (1999), MacEachern and Müller (2000), and Neal (2000), among others. These two strategies for sticky model parameters have been used in the context of Markov chain Monte Carlo (MCMC) and sequential importance sampling (Liu 1

3 1996). For reviews and comparisons of various computational methods for fitting DPM models, see MacEachern et al. (1999), MacEachern and Müller (2000), Neal (2000), and Quintana and Newton (2000). These two strategies for sticky model parameters do not directly address the allocation of the data into components. With the exception of Jain and Neal (2004), all the MCMC samplers in the DPM model literature repeatedly reallocate one data point at a time. Researchers have noticed, however, that these Gibbs and Gibbs-like samplers can get stuck in particular clustering configurations due to the one-at-a-time nature of their updates. Celeux et al. (2000) state that the main defect of the Gibbs sampler from our perspective is the ultimate attraction of the local modes; that is, the almost impossible simultaneous reallocation of a group of observations to a different component. Even when these samplers are able to move between modes, they may do so rather slowly and thus require a major expenditure of computational resources. Thus, practitioner may find they are unable to use DPM models in moderately large or complex datasets. Efficient algorithms for DPM models are essential for the practical application of DPM models. Recent work has sought to address the deficiencies of the conjugate Gibbs sampler for DPM models. For conjugate DPM models, Jain and Neal (2004) propose a multivariate update mechanism (instead of a mechanism providing one-at-a-time updates) based on merging and splitting existing clusters. Their sampler can provide a significant reduction in the computational effort required to fit conjugate DPM models. Nevertheless, the computational burden is still formidable. New samplers which further improve computational efficiency would be able to fit existing models more readily and would facilitate the use of DPM models that are currently impractical. Jain and Neal have extended their method to conditionally-conjugate DPM models in a recent technical report (Jain and Neal 2005). A conditionally-conjugate DPM is one in which the centering distribution of a Dirichlet process prior is conjugate to the observational distribution when conditioning on all the other parameters in the model. A conditionally-conjugate setup is appropriate in many situations, but a large class of DPM models are not conditionally-conjugate. For 2

4 example, DPM models in a generalized linear model framework will usually not be conditionallyconjugate. It is important to have efficient samplers which are applicable to general nonconjugate DPM models, as well as conjugate and conditionally conjugate models. This paper proposes a new merge-split sampler for DPM models which has two advantages over existing methods: (1) It is applicable to general nonconjugate DPM models as well as conjugate DPM models, and (2) It is substantially more efficient computationally than existing samplers. The new sampler, called the sequentially-allocated merge-split (SAMS) sampler, borrows ideas from sequential importance sampling (Liu 1996; MacEachern et al. 1999). Splits are proposed by sequentially allocating observations to one of two split components using allocation probabilities that are conditional on previously allocated data. The SAMS sampler is computationally efficient since the proposed splits are likely to be supported by the model and can quickly be generated. While the conditional allocation of observations is similar to that of sequential importance sampling, the output from the SAMS sampler has the correct stationary distribution due to the use of the Metropolis-Hastings ratio. To assess the computational efficiency of the SAMS sampler and existing samplers for DPM models, a simulation study is presented. Three scenario are considered, each having a different model and dataset. It is shown that, for a fixed amount of computing resources, the SAMS sampler (in both its conjugate and nonconjugate forms) yields an effective sample size that is substantially better than that of existing methods. The implication is that the SAMS sampler allows its user to fit DPM models in a much shorter amount of time and to consider DPM models that were previously impractical. The remainder of the paper is organized as follows: Section 2 reviews DPM models and introduces the notation used to describe the sampling algorithms. In Section 3, the leading existing samplers for DPM models are described. Section 4 presents the conjugate and nonconjugate versions of the proposed SAMS sampler. The simulation study comparing the various samplers is described in Section 5. Finally, concluding remarks are found in Section 6. 3

5 2 The Dirichlet Process Mixture Model The DPM model assumes that observed data y = (y 1,..., y n ) is generated from the following hierarchical model: For i = 1,..., n, y i θ i p(y i θ i ) θ i F (θ i ) F (θ i ) (1) F (θ) DP(η 0 F 0 (θ)), where p(y θ) is a known parametric family of distributions (for the random variable y) indexed by θ and DP(η 0 F 0 (θ)) is the Dirichlet process (Ferguson 1973) centered about the distribution F 0 (θ) (for the random variable θ) and having mass parameter η 0 > 0. The notation follows Neal (2000) and is meant to imply the obvious independence relationships (e.g., y 1 given θ 1 is independent of the other y s, the other θ s, and F (θ)). Methods for fitting DPM models rely on the dimension reduction proposed by Blackwell and MacQueen (1973). Upon integrating with respect to the Dirichlet process, θ = (θ 1,..., θ n ) follows a general Polya urn scheme: θ i θ 1,..., θ i 1 θ 1 F 0 (θ 1 ) η 0F 0 (θ i ) + i 1 j=1 ψ θ j (θ i ), for i = 2,..., n, η 0 + i 1 where ψ µ (θ) is the point-mass distribution (for the random variable θ) at µ. Notice that (2) implies that θ 1,..., θ n may share values in common, a fact that is used below in an alternative parameterization of θ. The model is simplified by integrating out the random mixing distribution F over its prior distribution in (2). Thus, the model in (1) becomes: (2) y i θ i p(y i θ i ), for i = 1,..., n, θ p(θ) given in (2). (3) An alternative parameterization of θ is given in terms of a set partition π = {S 1,..., S q } for S 0 = {1,..., n} and a vector of model parameters φ = {φ S1,..., φ Sq }, where φ S is associ- 4

6 ated with component S. A set partition π for S 0 is a set of components (i.e., subsets) S 1,..., S q such that: (1) S π S = S 0, (2) S S = for all S S, and (3) S for all S π. The prior in (2) can be decomposed into two parts: (1) A prior on π, and (2) A conditional prior on φ given π. Equation (2) implies that the partition prior p(π) can be written as: p(π) = b S π η 0 Γ( S ), (4) where S is the number of elements of the component S, Γ(x) is the gamma function evaluated at x, and b 1 is a constant equal to n i=1 Γ(η 0 + i 1). The specification of the prior under this alternative parameterization is completed by noting that Korwar and Hollander (1973) showed that φ S1,..., φ Sq are independently drawn from F 0 (φ). Thus, θ is equivalent to (π, φ) and the model in (1) and (3) may be expressed as: n y π, φ p(y i φ S i) i=1 φ π S π F 0 (φ S ) (5) π p(π) given in (4), where S i is the component of π containing index i. Bayesian inference regarding the model parameters φ and set partition π is made via the posterior distribution p(π, φ y). Markov chain Monte Carlo (MCMC) techniques can be used to sample from the posterior distribution and sequential importance sampling (SIS) can be used to numerically integrate the posterior distribution. Fitting DPM models is usually computationally demanding, in part due to sticky model parameters φ, but mostly because inference on the set partition π requires sampling from a very large discrete state space. 2.1 Conjugacy If F (φ) is conjugate to p(y φ) in φ, a DPM model is said to be conjugate; otherwise, the model is nonconjugate. For conjugate DPM models, the challenge of sticky model parameters can be 5

7 eliminated by integrating away the model parameters φ as proposed by MacEachern (1994), Liu (1994), and Neal (1992). The task then reduces to fitting the posterior distribution of the partition π, denoted p(π y). This integration technique is merely a computational device used for model fitting and the full posterior distribution p(π, φ y) can easily be obtained since: p(π, φ y) = p(φ π, y)p(π y) (6) and p(φ π, y) is readily available in conjugate models. Thus, the typical approach utilizing this integration strategy draws B set partitions π 1,..., π B from p(π y) through an MCMC sampler and then samples the vectors of model parameters φ 1,..., φ B from p(φ π, y) using straight Monte Carlo. To implement the integration strategy, the partition likelihood p(y π) must be obtained. It is given as a product over components in π = {S 1,..., S q }: p(y π) = S π p(y S ) (7) where: p(y S ) = p(y k φ)df 0 (φ) (8) k S = p(y k φ)p(φ y 1,..., y k 1 )dφ, (9) k S where p(φ y 1,..., y k 1 ) is the density of the posterior distribution of a model parameter φ based on the prior F 0 (φ) and the data preceding the index k in the product above. If F 0 (φ) is conjugate to p(y φ) in φ, the integral in (9) may be evaluated analytically. Combining the partition likelihood and the partition prior, Bayes rule gives the partition posterior as: p(π y) p(y π) p(π), (10) where p(y π) and p(π) are given in (7) and (4), respectively. 6

8 3 Existing Samplers Using the notation from the previous section, this section describes three popular samplers for DPM models. Two of the three samplers are conjugate samplers; that is, they require quantities that are only available in conjugate models. Nonconjugate samplers do not make use of quantities specific to conjugate models. Although they likely are not as efficient as conjugate samplers, nonconjugate samplers can be applied to conjugate models. 3.1 Conjugate Gibbs Sampler for Conjugate DPM Models The conjugate Gibbs sampler for conjugate DPM models was introduced by MacEachern (1994) for Gaussian models and Neal (1992) for models of categorical data. This is the benchmark conjugate sampler to which other conjugate samplers are compared. One scan of the Gibbs sampler successively reallocates all indices, one at a time, according to the full conditional distribution as described below: Algorithm for Gibbs Sampler: For i in {1, 2,..., n}, Remove index i from its component S i to obtain π i, a set partition of: {1, 2,..., i 1, i + 1,..., n} containing the (potentially empty) component S i. If S i is empty, let S = S i. Otherwise, add a new empty component S to π i. Form π from π 1 by allocating index i to one of the components in π i with probability given by the full conditional distribution: Pr( i S π i, y ) w 1 (S) p(y i φ)p(φ y S )dφ for each S π 1, 7

9 where: S w m (S) = η 0 /m if S > 0, and otherwise, (11) and p(φ y S ) is the density of the posterior distribution of a component location φ based on the centering distribution F 0 (φ) and the data corresponding to the indices in S. If S is empty, p(φ y S ) reduces to the density of F 0 (φ). If S is empty, remove it from π. 3.2 Auxiliary Gibbs Sampler for Nonconjugate DPM Models Several nonconjugate samplers exists and are reviewed by Neal (2000). As with the Gibbs sampler, all of these samplers update the clustering of each index one-at-a-time, conditional on the allocation of the other indices. Being nonconjugate, however, these samplers must deal with model parameters in addition to the clustering of the observations. The Auxiliary Gibbs sampler of Neal (2000) is a nonconjugate analog to the Gibbs sampler for conjugate DPM models. Neal s simulation study shows that the Auxiliary Gibbs sampler (with a properly chosen tuning parameter) has the best computational efficiency of one-at-a-time nonconjugate samplers for DPM model. The tuning parameter is the number of auxiliary parameters in each MCMC update and Auxiliary Gibbs (m) denotes the Auxiliary Gibbs sampler with m auxiliary parameters. One scan of the Auxiliary Gibbs (m) sampler is described below: Algorithm for Auxiliary Gibbs Sampler: For i in {1, 2,..., n}, Remove index i from its component S i to obtain π i. If S i is empty, let S 1 = S i. Otherwise, add a new empty component S 1 to π i with component location φ S 1 sampled from F 0 (φ). 8

10 If m > 1, add new empty components S 2,..., S m to π i with component locations φ S 2,..., φ S m sampled independently from F 0 (φ). Allocate index i to one of the components in π i with probability given by the full conditional distribution: Pr( i S φ, π i, y ) = b w m (S) p(y i φ S ) for each S π 1, where w m (S) is defined in (11). Remove from π all S 1,..., S m that are empty. For each S in π: Perform an MCMC update for the component location φ S given the data y S in component S. For example, one could sample from p(φ S y S ). This quantity, however, will likely not be available for nonconjugate models, in which case another MCMC update which leaves its distribution invariant should be used. 3.3 Restricted Gibbs Split-Merge Sampler for Conjugate DPM Models The conjugate Gibbs and Auxiliary Gibbs samplers effective deal with sticky model parameters, but can get stuck in particular clustering configurations due to the one-at-a-time nature of their updates. Jain and Neal (2004) propose a merge-split sampler for conjugate DPM models. This is the state-of-the-art sampler for conjugate DPM models which they show can greatly improve the rate of convergence to the posterior distribution. Being a merge-split algorithm, their method updates groups of indices simultaneously and thus is able to move between high-probability modes by jumping over depressions of low probability. Jain and Neal (2004) refer to their method as the restricted Gibbs sampling split-merge procedure, which is abbreviated RGSM in this paper. As the name implies, the RGSM sampler involves a modified Gibbs sampler. Instead of proposing naive random splits of components which are unlikely to be supported by the model, the RGSM proposes splits that are more probable by 9

11 improving upon a random split through a series of t restricted Gibbs scans. The scans are restricted in the sense that an index involved in a split can only move between the two split components. The number of scans t to perform is a tuning parameter, thus RGSM is in fact a class of samplers with RGSM(t) being a particular case. The algorithm for one iteration of the RGSM(t) sampler is described below: Algorithm for RGSM(t) Sampler: Uniformly select a pair of distinct indices i and j. If i and j belong to the same component in π, say S, propose π by splitting S: Remove indices i and j from S and form singleton sets S i = {i} and S j = {j}. Make a naive split by allocating each remaining index in S to either S i or S j with equal probability. Perform t restricted Gibbs scans of the indices in S. A restricted Gibbs scan differs from a regular conjugate Gibbs scan in that: Instead of considering all the indices {1, 2,..., n}, only the indices in S the original companions of i and j, but not including i and j are considered, and When reallocating an index, it can only be placed in one of two components: S i and S j. An index cannot be placed in any other components and a singleton component cannot be formed. Perform one final restricted Gibbs scan, this time keeping track of the Gibbs sampling transition probabilities for use in computing the Metropolis-Hastings ratio. Let π be the set partition after the t + 1 restricted Gibbs scans. Compute the Metropolis-Hastings ratio and accept π as the new state of π with probability given by this ratio. Otherwise, i and j belong to different components in π, say S i and S j. Propose π by merging S i and S j : 10

12 Form a merged component S = S i S j. Propose the following merged set partition: π = π {S} \ {S i, S j }. In words, π differs from π in that the two components containing indices i and j are merged into one component. Compute the Metropolis-Hastings ratio and accept π as the current state π with probability given by this ratio. Notice that the RGSM(t) sampler requires the specification of t, the number of intermediate restricted Gibbs scans to perform before the final restricted Gibbs scan. This tuning parameter can be beneficial in that it allows flexibility in tailoring the algorithm for a particular situation. On the other hand, a practitioner may have little insight as to what a good value might be for the tuning parameter. In this case, experimentation is called for. A large number of restricted scans makes the split proposal more reasonable and therefore more likely to be accepted. The marginal benefit, however, of another restricted scan decreases and each additional scan takes time that might be better spent on another update. Based on their example, Jain and Neal (2004) recommend four to six restricted Gibbs scans. While the RGSM(t) sampling algorithm can theoretically stand alone in sampling from the posterior distribution, Jain and Neal (2004) show that convergence is greatly improved by combining the RGSM(t) sampler with the conjugate Gibbs sampler. They propose cycling between the two samplers by attempting x RGSM(t) updates and performing y Gibbs scans, where x and y are integers chosen by the practitioner. As a result, the conjugate Gibbs sampler fine tunes the current state while RGSM(t) potentially makes very large updates. 11

13 4 Sequentially-Allocated Merge-Split Sampler This section introduces the novel sequentially-allocated merge-split (SAMS) sampler proposed in this paper. The SAMS sampler is applicable to general nonconjugate DPM models, as well as conjugate DPM models. Further, the SAMS sampler is substantially more efficient than the existing leading samplers. By using the SAMS sampler, researchers can fit existing models faster and are able to consider rich models that would otherwise be impractical to fit. In the SAMS sampler, splits are proposed by sequentially allocating observations to one of two split components using allocation probabilities that condition on previously allocated data. The SAMS sampler is computationally efficiency because the proposed splits are generated with relative few computations, yet the proposed splits are likely to be supported by the model. The method is reminiscent of sequential importance sampling (Liu 1996; MacEachern et al. 1999), but the draws from the SAMS sampler have the correct stationary distribution (whereas draws from sequential importance sampling are weighted for Monte Carlo integration). In contrast to the RGSM(t) sampler, the SAMS sampler does not have a tuning parameter that must be specified by the practitioner, making its application more automatic. As with the RGSM(t) sampler, cycling between the SAMS sampler and Gibbs sampler is recommended. 4.1 Conjugate Version of the Algorithm The SAMS sampler has both conjugate and nonconjugate versions. The conjugate sampler is able to utilize the dimension reduction technique of Section 2.1 to eliminate the model parameters and leave only the set partition. The conjugate version of the SAMS sampler, denoted SAMS(conjugate), is described below: 12

14 Algorithm for SAMS(conjugate) Sampler: Uniformly select a pair of distinct indices i and j. If i and j belong to the same component in π, say S, propose π by splitting S: Remove indices i and j from S and form singleton sets S i = {i} and S j = {j}. Letting k be successive values in a random permutation of the indices in S, add k to S i with probability: Pr( k S i S i, S j, y ) = S i p(y k φ)p(φ y S i)dφ S i p(y k φ)p(φ y S i)dφ + S j p(y k φ)p(φ y S j)dφ, where p(φ y S ) is the posterior distribution of a component location φ based on the prior F 0 (φ) and the data corresponding to the indices in S. Otherwise, add k to S j. Note that, at each allocation above, either S i or S j gains an index. As a result, S i and S j as new indices are allocated. Further, in this conjugate version of the sampler, p(φ y S i) and p(φ y S j) evolve to account for each additional index. Compute the Metropolis-Hastings ratio and accept π as the current state π with probability given by this ratio. See Section 4.3 for details. Otherwise, i and j belong to different components in π, say S i and S j. Propose π by merging S i and S j : Form a merged component S = S i S j. Propose the following set partition: π = π {S} \ {S i, S j }. In words, π differs from π in that the two components containing indices i and j are merged into one component. Compute the Metropolis-Hastings ratio and accept π as the current state π with probability given by this ratio. See Section 4.3 for details. (12) 13

15 4.2 Nonconjugate Version of the Algorithm The nonconjugate version of the SAMS sampler differs from the conjugate version in three ways. Firstly, the probability in (12) is replaced by Pr( k S i S i, S j, φ, y ) = S i p(y k φ S i) S i p(y k φ S i) + S j p(y k φ S j). (13) Secondly, when i and j belong to the same component, the model parameter φ S i of component S i should be updated. Many methods can be used to propose new values for the model parameter φ S i. Here, two methods are considered: (1) Propose new values for the model parameters φ S i by sampling from the centering distribution F 0 (φ) and (2) Propose new values for the model parameters φ S i using some random walk. Finally, when i and j belong to different components, the model parameter φ S of the merged component S is set equal to φ S j, the model parameter in the original component S j. The nonconjugate SAMS sampler based on sampling from the centering distribution F 0 (φ) is denoted SAMS(prior). The random walk version is denoted SAMS(random walk). One might expect that the random walk version to be superior to the prior version and, indeed, the simulation results in Section 5 confirm this. 4.3 Metropolis-Hastings Ratio for SAMS Sampler For the conjugate SAMS sampler, the Metropolis-Hastings ratio a(π π) gives the probability that, at the current partition π, a proposed partition π is accepted. The MH ratio for the conjugate SAMS sampler is given as: a(π π) = min [ 1, p(π y) p(π y) ] q(π π ) q(π, (14) π) where p(π y) is the density of the partition posterior distribution evaluated at π given in (10) and q(π π ) is the probability of proposing π from the state π. 14

16 For the nonconjugate SAMS samplers, the Metropolis-Hastings ratio is denoted a(π, φ π, φ) and is given as: a(π, φ π, φ) = min [ 1, p(π, φ y) p(π, φ y) ] q(π, φ π, φ ) q(π, φ, (15) π, φ) where p(π, φ y) is the density of the joint posterior distribution evaluated at (π, φ) given in (6) and q(π, φ π, φ ) is the probability of proposing (π, φ) from the state (π, φ ). In practice, only parts of the posterior distribution need to be evaluated since contributions from components not involved in the split or merge will cancel. Making use of this fact reduces rounding error and greatly speeds up computations. Computing the Metropolis-Hastings ratio for the SAMS sampler requires a similar amount of bookkeeping as is required when using the RGSM(t) sampler. When the proposal involves a split, q(π π) is merely the product of the probabilities in (12) (or their complements) associated with the chosen allocations. Likewise, q(π, φ π, φ) is the product of probabilities in (13) times the proposal density used to move φ S i. Since the two split components could only be merged in one way, the reverse proposal probability is always 1. Conversely, consider the case where the proposal is a merged partition. Since the two split components could only be merged in one way, the proposal probability is 1. In the conjugate case, the reverse proposal probability q(π π ) is the product of the probabilities in (12) associated with the allocation choices that would need to be made to obtain the split partition π from the merged partition π. That is, for the purposes of calculating q(π π ), the computer merely temporarily reverses the current and proposed partitions and is forced to chose allocations which yield the split partition π from the merged partition π. The calculation for the nonconjugate case follows that of the conjugate case, with the additional evaluation of the proposal density used to move from φ S j to φ S i. 15

17 5 Comparison of SAMS Samplers to Other Samplers All (properly constructed) MCMC samplers for DPM models have the posterior distribution of interest as their stationary distribution. Experience has shown, however, that the computational efficiency of the samplers can vary widely. A DPM model that may be completely impractical to fit using one sampler can suddenly be feasible when a more efficient sampler becomes available. To compare the computational efficiency of the proposed SAMS sampler with the leading conjugate and nonconjugate samplers presented in Section 3, a Monte Carlo study was conducted. Jain and Neal (2004) note that their merge-split sampler performs better when used in combination with the conjugate Gibbs sampler. Following this observation, this simulation study compares the conjugate Gibbs sampler with hybrid samplers in which half of the CPU time was spent on conjugate Gibbs sampling and the other half was spent on the RGSM(1), RGSM(3), RGSM(5), or SAMS(conjugate) sampler. Also, the Auxiliary Gibbs (1) sampler is compared with a hybrid sampler in which half of the CPU time was spent on Auxiliary Gibbs (1) sampling and the other half was spent on the Auxiliary Gibbs (2), SAMS(prior), or SAMS(random walk) sampler. The simulation study was conducted using code written in the Ruby programming language and executed on a computer running the GNU/Linux operating system with an AMD Athlon processor and 2 GB of RAM. The software used for the comparisons is available at dahl/sams. 5.1 Accessing Computational Efficiency The computational effort required to fit a Bayesian model using MCMC can be divided into two parts: (1) Reaching a burned-in state (i.e., a state that is relatively probable under the posterior distribution) and (2) Sampling from the posterior distribution after burn-in. Jain and Neal (2004) provide a simulation study to compare their RGSM sampler to the conjugate Gibbs sampler. Their study focused on the number of MCMC updates required to burn-in. They found that their 16

18 RGSM sampler (coupled with the conjugate Gibbs sampler) is able to attain a burned-in state in substantially fewer MCMC updates. Reaching a burned-in state is essential to being able to sample from the posterior, but the vast majority of the computational effort is typically spent on sampling from the posterior. For this reason, the simulation study in this paper focuses on the computational efficiency of the various methods in sampling from the posterior distribution. Each sampler starts from a state that is already well supported by the posterior (i.e., no further burn-in is required). In particular, for a each replicate in the Monte Carlo study, every sampler uses a common burned-in state. This burned-in state is obtained by running a hybrid sampler in which half of its CPU time is spent on conjugate Gibbs sampling and the other half on SAMS(conjugate) sampling. In each example model and dataset, the effective sample size was computed for each of 20 independent burned-in states. Comparing the various samplers by letting each perform a specified number of updates is problematic. To take an extreme example, consider comparing the RGSM(1 000) sampler to the RGSM(3). Certainly the RGSM(1 000) will propose splits well supported by the posterior due to the 1,000 intermediate restricted Gibbs scans. These 1,000 scans come at a cost, however; for the same amount of CPU time used to perform one RGSM(1000) update, perhaps hundreds of RGSM(3) updates could have been performed. Hence, RGSM(3) may well be more computationally efficient. To make the samplers commensurate, each sampler is given the same amount of CPU time. Even with a fixed CPU time, care must be taken in interpreting the results since successive draws from a Markov chain are not independent. The degree to which the draws are correlated affects the amount of information that each sample carries. The autocorrelation time (Ripley 1987; Kass et al. 1998) gives the effective amount by which the sample size is reduced when measuring 17

19 the expectation of a quantity. It is defined as: ρ i, i=1 where ρ i is the autocorrelation between a univariate summary of two states i lags part in the Markov chain. In practice, ρ i is estimated using the sample autocorrelations and the infinite sum is replace by a sum until the sample autocorrelations are statistically indistinguishable from zero. Several measures could be used to compute the effective sample size and thereby compare the samplers. Two measures are commonly used: (1) The size of the largest cluster in the state and (2) The number of clusters in the state. These measures, however, have major deficiencies for comparing samplers. The size of the largest cluster ignores information about the number and sizes of the other clusters in the state. A very dramatic change in the clustering may result from an update, but still not affect the size of the largest cluster. Likewise, the number of clusters ignores potentially important information: the size of the clusters. For example, a sampler may move many indices from existing clusters to form a new cluster. In terms of the number of clusters, this dramatic update is rewarded the same as the very modest update of forming a new singleton cluster. The measure used to compute the effective sample size in this simulation study is the clustering entropy: q i=1 S i ( n log Si ). n Experience has shown that entropy is more discriminatory than the size of the largest cluster or the number of clusters. Entropy captures changes both in the size and number of clusters. Small changes in the size or number of clusters lead to small changes in entropy; more dramatic clustering changes are reflected by large changes in the entropy. 18

20 5.2 Monte Carlo Three models and datasets were used in the Monte Carlo study. All three of these DPM models are conjugate, hence either a conjugate or a nonconjugate DPM sampler can be used to fit them. The three cases considered are the Tack, Galaxy, and Jain/Neal examples Tack Example: Beckett and Diaconis (1994) presented data resulting from rolling thumbtacks. The data consist of 320 observations, each counting the number of times a tack lands pointing up in nine rolls. A DPM model for this data was presented by Liu (1996). The observational component p(y θ) is Binomial(9, θ), the centering distribution F 0 (θ) is Beta(1, 1), and mass parameter η 0 is 1.0. The random walk version of the SAMS sampler updated the model parameter θ using a Gaussian proposal centered at the current state θ and having variance θ(1 θ)/n, where n is the size of the cluster. For each replicate, the samplers were given 3.5 hours to sample from the posterior Galaxy Example: Roeder (1990) analyzed the velocities at which 82 galaxies are moving away from our galaxy. (See also Richardson and Green (1997)). In the DPM model used for this simulation study, the observational component p(y θ) is Normal(θ, ), the centering distribution F 0 (θ) is Normal(20 000, ), and mass parameters η 0 is 1.0. The random walk version of the SAMS sampler updated the model parameter θ using a Gaussian proposal centered at the current state θ and having variance /n, where n is the size of the cluster. For each replicate, the samplers were given 30 minutes to sample from the posterior. 19

21 5.2.3 Jain & Neal Example: To demonstrate their proposed sampler, Jain and Neal (2004) present synthetic datasets and a multivariate-bernoulli/beta DPM model. In this simulation study, we consider their higher dimension problem (Example 2 in their paper) which demonstrated both the efficiency of their method and the problems the Gibbs sampler can encounter. The random walk version of the SAMS sampler updates each element θ i of the model parameter using a Gaussian proposal centered at the current state θ i and having variance θ i (1 θ i )/n, where n is the size of the cluster. For each replicate, the samplers were given 3.5 hours to sample from the posterior. 5.3 Results The results of the simulation study are displayed in Tables 1, 2, and 3. Each table has two panels; panels a) and b) contain results from conjugate and nonconjugate samplers, respectively. For each sampler, the mean effective sample size based on the 20 replicates is shown, as well as the standard error of this mean. The last column shows that ratio of the effective sample size of each sampler to the reference sampler (i.e., the conjugate Gibbs sampler in the case of conjugate samplers and the Auxiliary Gibbs (1) sampler for nonconjugate samplers). This ratio gives the factor by which the effective sample size is multiplied. Put another way, this ratio shows how much faster a particular sampler is than its reference. In the case of the Tack example (Table 1), the hybrid sampler based on the SAMS(conjugate) sampler leads to a sample size that is effectively 4.77 times larger than that of the conjugate Gibbs sampler. In other words, the SAMS(conjugate) fits the model almost 5 times faster than the conjugate Gibbs sampler. The RGSM samplers with tuning parameters 1, 3, and 5 are statistically better than the conjugate Gibbs, but not nearly as good as the SAMS(conjugate) sampler. For example, the SAMS(conjugate) sampler is about 4.77/1.53 = 3.12 times better than the best RGMS sampler (performing 1 restricted Gibbs scan). In the nonconjugate case, both the random walk and 20

22 prior versions of the SAMS sampler are about 47% better than the Auxiliary Gibbs (1) sampler. The Galaxy example (Table 2) also show the benefit of using the SAMS sampler. The SAMS(conjugate) is 64% better than the conjugate Gibbs and the SAMS(random walk) is 80% better than the Auxiliary Gibbs (1). Notice that the prior version of the SAMS sampler is greatly inferior to the random walk version. This example also shows that the RGSM sampler can actually perform worse than the conjugate Gibbs sampler and under the best case, performs no better than the conjugate Gibbs sampler. The most striking demonstration of the benefit of the SAMS sampler is illustrated in the Jain/Neal example. As Jain and Neal (2004) report, their RGSM sampler indeed performs better than the conjugate Gibbs sampler. This simulation study shows that the effective sample size is 3.39, 3.25, and 2.79 times better than that of the conjugate Gibbs sampler. The SAMS(conjugate) sampler, however, performs 7.66 times better than the conjugate Gibbs sampler and 7.66/3.39 = 2.26 times better than the best RGSM sampler. The nonconjugate case reveals that the Auxiliary Gibbs samplers and the SAMS(prior) samplers are very inefficient samplers for this model and dataset. The SAMS(random walk) is about 34 times better than any other nonconjugate sampler for this model and dataset. 6 Conclusion This article introduces a new efficient split-merge MCMC algorithm for both conjugate and nonconjugate Dirichlet process mixture models. The proposed sampler makes split proposals by sequentially allocating indices to one of two split components using allocation probabilities that are conditional on previously allocated indices. The algorithm is computationally efficient because the proposal mechanism is both fast and well supported by the model. No tuning parameter needs to be chosen and hence the application of the sampler is automatic. While the conditional allocation of observations is reminiscent of sequential importance sampling, the output from the sampler 21

23 has the correct stationary distribution due to the use of the Metropolis-Hastings ratio. The SAMS sampler bears a resemblance to sequential importance sampling of Liu (1994) and MacEachern et al. (1999). In both sequential important sampling and the SAMS sampler, indices are allocated one at a time and previously allocated indices are used in helping to choose how to allocate the current index. Despite similarities, the algorithms differ in at least two important ways. First, sequential importance sampling uses every draw from the proposal distribution, but these draws must be adjusted by importance weights when computing Monte Carlo integrals with respect to the posterior distribution of interest. Conversely, the SAMS sampler is a Markov chain Monte Carlo algorithm whose stationary distribution is the posterior distribution of interest. Proposals are only accepted according to the probability given by the Metropolis-Hastings ratio. The second way in which the SAMS sampler and sequential importance sampler differ is in the allocation of indices. In sequential importance sampling, indices can be allocated to the full range of existing components or to a new component. In the SAMS sampler, indices involved in a split can only be allocated to one of two split components. Simulation results indicated that the proposed SAMS (sequentially-allocated merge-split) sampler performs substantially better than the RGSM (restricted Gibbs merge-split) sampler of Jain and Neal (2004) and the conjugate Gibbs sampler. Perhaps more importantly for practical applications where conjugate models may not be available, the SAMS(random walk) sampler show substantially improved performance over the Auxiliary Gibbs (1) sampler. Note that in two of the three examples, the SAMS(prior) was significantly worse than the SAMS(random walk) sampler. Therefore, practitioners are strongly encouraged to use the random walk version of the SAMS sampler for nonconjugate DPM models. 22

24 References Antoniak, C. E. (1974), Mixtures of Dirichlet Processes With Applications to Bayesian Nonparametric Problems, The Annals of Statistics, 2, Beckett, L. A. and Diaconis, P. W. (1994), Spectral Analysis for Discrete Longitudinal Data, Advances in Mathematics, 103, Blackwell, D. and MacQueen, J. B. (1973), Ferguson Distributions Via Polya Urn Schemes, The Annals of Statistics, 1, Bush, C. A. and MacEachern, S. N. (1996), A semiparametric Bayesian model for randomised block designs, Biometrika, 83, Celeux, G., Hurn, M., and Robert, C. P. (2000), Computational and inferential difficulties with mixture posterior distributions, Journal of the American Statistical Association, 95, Damien, P., Wakefield, J., and Walker, S. (1999), Gibbs Sampling for Bayesian Non-conjugate and Hierarchical Models By Using Auxiliary Variables, Journal of the Royal Statistical Society, Series B, Methodological, 61, Escobar, M. D. (1994), Estimating Normal Means With a Dirichlet Process Prior, Journal of the American Statistical Association, 89, Ferguson, T. S. (1973), A Bayesian Analysis of Some Nonparametric Problems, The Annals of Statistics, 1, (1974), Prior distributions on spaces of probability measures, The Annals of Statistics, 2, Jain, S. and Neal, R. M. (2004), A Split-Merge Markov Chain Monte Carlo Procedure for the Dirichlet Process Mixture Model, Journal of Computational and Graphical Statistics, 13, (2005), Splitting and Merging Components of a Nonconjugate Dirichlet Process Mixture Model, Tech. Rep. 0507, Dept. of Statistics, University of Toronto. Kass, R. E., Carlin, B. P., Gelman, A., and Neal, R. M. (1998), Markov Chain Monte Carlo in Practice: A Roundtable Discussion, The American Statistician, 52,

25 Korwar, R. M. and Hollander, M. (1973), Contributions to the theory of Dirichlet processes, The Annals of Probability, 1, Liu, J. S. (1994), The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem, Journal of the American Statistical Association, 89, (1996), Nonparametric Hierarchical Bayes Via Sequential Imputations, The Annals of Statistics, 24, MacEachern, S. and Müller, P. (2000), Efficient MCMC schemes for robust model extensions using ecompassing Dirichlet process mixture models, in Robust Bayesian analysis, pp MacEachern, S. N. (1994), Estimating Normal Means With a Conjugate Style Dirichlet Process Prior, Communications in Statistics, Part B Simulation and Computation, 23, MacEachern, S. N., Clyde, M., and Liu, J. S. (1999), Sequential Importance Sampling for Nonparametric Bayes Models: The Next Generation, The Canadian Journal of Statistics, 27, Neal, R. M. (1992), Bayesian mixture modeling, in Maximum Entropy and Bayesian Methods: Proceedings of the 11th International Workshop on Maximum Entropy and Bayesian Methods of Statistical Analysis, pp (2000), Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics, 9, Quintana, F. A. and Newton, M. A. (2000), Computational Aspects of Nonparametric Bayesian Analysis With Applications to the Modeling of Multiple Binary Sequences, Journal of Computational and Graphical Statistics, 9, Richardson, S. and Green, P. J. (1997), On Bayesian Analysis of Mixtures With An Unknown Number of Components (Disc: P ) (Corr: 1998V60 P661), Journal of the Royal Statistical Society, Series B, Methodological, 59, Ripley, B. D. (1987), Stochastic simulation, John Wiley & Sons. Roeder, K. (1990), Density Estimation With Confidence Sets Exemplified By Superclusters and Voids in the Galaxies, Journal of the American Statistical Association, 85,

26 Table 1: Efficiency in Terms of the Effective Sample Size for Tack Example. a) Conjugate Samplers Standard Ratio with Mean Error Gibbs Conjugate Gibbs 2, RGSM (1) & Conj. Gibbs 3, RGSM (3) & Conj. Gibbs 2, RGSM (5) & Conj. Gibbs 2, SAMS (conjugate) & Conj. Gibbs 11, b) Nonconjugate Samplers Standard Ratio with Mean Error Aux. Gibbs (1) Auxiliary Gibbs (1) 3, Auxiliary Gibbs (2) & Aux. Gibbs (1) 2, SAMS (prior) & Aux. Gibbs (1) 4, SAMS (random walk) & Aux. Gibbs (1) 4, Table 2: Efficiency in Terms of the Effective Sample Size for Galaxy Example. a) Conjugate Samplers Standard Ratio with Mean Error Gibbs Conjugate Gibbs 2, RGSM (1) & Conj. Gibbs 3, RGSM (3) & Conj. Gibbs 2, RGSM (5) & Conj. Gibbs 2, SAMS (conjugate) & Conj. Gibbs 4, b) Nonconjugate Samplers Standard Ratio with Mean Error Aux. Gibbs (1) Auxiliary Gibbs (1) 6, Auxiliary Gibbs (2) & Aux. Gibbs (1) 6, SAMS (prior) & Aux. Gibbs (1) 4, SAMS (random walk) & Aux. Gibbs (1) 11,

27 Table 3: Efficiency in Terms of the Effective Sample Size for Jain & Neal Example. a) Conjugate Samplers Standard Ratio with Mean Error Gibbs Conjugate Gibbs 2, RGSM (1) & Conj. Gibbs 7, RGSM (3) & Conj. Gibbs 7, RGSM (5) & Conj. Gibbs 6, SAMS (conjugate) & Conj. Gibbs 17, b) Nonconjugate Samplers Standard Ratio with Mean Error Aux. Gibbs (1) Auxiliary Gibbs (1) Auxiliary Gibbs (2) & Aux. Gibbs (1) SAMS (prior) & Aux. Gibbs (1) SAMS (random walk) & Aux. Gibbs (1) 17,

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix

Infinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Graphical Models for Query-driven Analysis of Multimodal Data

Graphical Models for Query-driven Analysis of Multimodal Data Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

Abstract INTRODUCTION

Abstract INTRODUCTION Nonparametric empirical Bayes for the Dirichlet process mixture model Jon D. McAuliffe David M. Blei Michael I. Jordan October 4, 2004 Abstract The Dirichlet process prior allows flexible nonparametric

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation

Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Gibbs Sampling for (Coupled) Infinite Mixture Models in the Stick Breaking Representation Ian Porteous, Alex Ihler, Padhraic Smyth, Max Welling Department of Computer Science UC Irvine, Irvine CA 92697-3425

More information

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models

Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models David B. Dahl Department of Statistics Texas A&M University Marina Vannucci, Michael Newton, & Qianxing Mo

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions

Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Bayesian semiparametric modeling and inference with mixtures of symmetric distributions Athanasios Kottas 1 and Gilbert W. Fellingham 2 1 Department of Applied Mathematics and Statistics, University of

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

CMPS 242: Project Report

CMPS 242: Project Report CMPS 242: Project Report RadhaKrishna Vuppala Univ. of California, Santa Cruz vrk@soe.ucsc.edu Abstract The classification procedures impose certain models on the data and when the assumption match the

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

NONPARAMETRIC HIERARCHICAL BAYES VIA SEQUENTIAL IMPUTATIONS 1. By Jun S. Liu Stanford University

NONPARAMETRIC HIERARCHICAL BAYES VIA SEQUENTIAL IMPUTATIONS 1. By Jun S. Liu Stanford University The Annals of Statistics 996, Vol. 24, No. 3, 9 930 NONPARAMETRIC HIERARCHICAL BAYES VIA SEQUENTIAL IMPUTATIONS By Jun S. Liu Stanford University We consider the empirical Bayes estimation of a distribution

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Slice Sampling Mixture Models

Slice Sampling Mixture Models Slice Sampling Mixture Models Maria Kalli, Jim E. Griffin & Stephen G. Walker Centre for Health Services Studies, University of Kent Institute of Mathematics, Statistics & Actuarial Science, University

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Markov Chain Sampling Methods for Dirichlet Process Mixture Models

Markov Chain Sampling Methods for Dirichlet Process Mixture Models Journal of Computational and Graphical Statistics ISSN: 1061-8600 (Print) 1537-2715 (Online) Journal homepage: http://amstat.tandfonline.com/loi/ucgs20 Markov Chain Sampling Methods for Dirichlet Process

More information

A permutation-augmented sampler for DP mixture models

A permutation-augmented sampler for DP mixture models Percy Liang University of California, Berkeley Michael Jordan University of California, Berkeley Ben Taskar University of Pennsylvania Abstract We introduce a new inference algorithm for Dirichlet process

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

An Alternative Infinite Mixture Of Gaussian Process Experts

An Alternative Infinite Mixture Of Gaussian Process Experts An Alternative Infinite Mixture Of Gaussian Process Experts Edward Meeds and Simon Osindero Department of Computer Science University of Toronto Toronto, M5S 3G4 {ewm,osindero}@cs.toronto.edu Abstract

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Part IV: Monte Carlo and nonparametric Bayes

Part IV: Monte Carlo and nonparametric Bayes Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Tree-Based Inference for Dirichlet Process Mixtures

Tree-Based Inference for Dirichlet Process Mixtures Yang Xu Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, USA Katherine A. Heller Department of Engineering University of Cambridge Cambridge, UK Zoubin Ghahramani

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Markov Chain Sampling Methods for Dirichlet Process Mixture Models

Markov Chain Sampling Methods for Dirichlet Process Mixture Models Markov Chain Sampling Methods for Dirichlet Process Mixture Models Radford M. Neal Journal of Computational and Graphical Statistics, Vol. 9, No. 2. (Jun., 2000), pp. 249-265. Stable URL: http://links.jstor.org/sici?sici=1061-8600%28200006%299%3a2%3c249%3amcsmfd%3e2.0.co%3b2-r

More information

Markov Chain Monte Carlo in Practice

Markov Chain Monte Carlo in Practice Markov Chain Monte Carlo in Practice Edited by W.R. Gilks Medical Research Council Biostatistics Unit Cambridge UK S. Richardson French National Institute for Health and Medical Research Vilejuif France

More information

Quantifying the Price of Uncertainty in Bayesian Models

Quantifying the Price of Uncertainty in Bayesian Models Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Quantifying the Price of Uncertainty in Bayesian Models Author(s)

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material

Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Bayesian Mixture Modeling of Significant P Values: A Meta-Analytic Method to Estimate the Degree of Contamination from H 0 : Supplemental Material Quentin Frederik Gronau 1, Monique Duizer 1, Marjan Bakker

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing

Partially Collapsed Gibbs Samplers: Theory and Methods. Ever increasing computational power along with ever more sophisticated statistical computing Partially Collapsed Gibbs Samplers: Theory and Methods David A. van Dyk 1 and Taeyoung Park Ever increasing computational power along with ever more sophisticated statistical computing techniques is making

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Default priors for density estimation with mixture models

Default priors for density estimation with mixture models Bayesian Analysis ) 5, Number, pp. 45 64 Default priors for density estimation with mixture models J.E. Griffin Abstract. The infinite mixture of normals model has become a popular method for density estimation

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

A note on Reversible Jump Markov Chain Monte Carlo

A note on Reversible Jump Markov Chain Monte Carlo A note on Reversible Jump Markov Chain Monte Carlo Hedibert Freitas Lopes Graduate School of Business The University of Chicago 5807 South Woodlawn Avenue Chicago, Illinois 60637 February, 1st 2006 1 Introduction

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

The Indian Buffet Process: An Introduction and Review

The Indian Buffet Process: An Introduction and Review Journal of Machine Learning Research 12 (2011) 1185-1224 Submitted 3/10; Revised 3/11; Published 4/11 The Indian Buffet Process: An Introduction and Review Thomas L. Griffiths Department of Psychology

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

Partially Collapsed Gibbs Samplers: Theory and Methods

Partially Collapsed Gibbs Samplers: Theory and Methods David A. VAN DYK and Taeyoung PARK Partially Collapsed Gibbs Samplers: Theory and Methods Ever-increasing computational power, along with ever more sophisticated statistical computing techniques, is making

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo (and Bayesian Mixture Models) David M. Blei Columbia University October 14, 2014 We have discussed probabilistic modeling, and have seen how the posterior distribution is the critical

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Bayesian Inference: Probit and Linear Probability Models

Bayesian Inference: Probit and Linear Probability Models Utah State University DigitalCommons@USU All Graduate Plan B and other Reports Graduate Studies 5-1-2014 Bayesian Inference: Probit and Linear Probability Models Nate Rex Reasch Utah State University Follow

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Density Modeling and Clustering Using Dirichlet Diffusion Trees

Density Modeling and Clustering Using Dirichlet Diffusion Trees BAYESIAN STATISTICS 7, pp. 619 629 J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West (Eds.) c Oxford University Press, 2003 Density Modeling and Clustering

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Research Article Spiked Dirichlet Process Priors for Gaussian Process Models

Research Article Spiked Dirichlet Process Priors for Gaussian Process Models Hindawi Publishing Corporation Journal of Probability and Statistics Volume 200, Article ID 20489, 4 pages doi:0.55/200/20489 Research Article Spiked Dirichlet Process Priors for Gaussian Process Models

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

A Nonparametric Model for Stationary Time Series

A Nonparametric Model for Stationary Time Series A Nonparametric Model for Stationary Time Series Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. isadora.antoniano@unibocconi.it Stephen G. Walker University of Texas at Austin, USA. s.g.walker@math.utexas.edu

More information