Efficient Bayesian Inference for Conditionally Autoregressive Models. Justin Angevaare. A Thesis presented to The University of Guelph

Size: px
Start display at page:

Download "Efficient Bayesian Inference for Conditionally Autoregressive Models. Justin Angevaare. A Thesis presented to The University of Guelph"

Transcription

1 Efficient Bayesian Inference for Conditionally Autoregressive Models by Justin Angevaare A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Mathematics and Statistics Guelph, Ontario, Canada c Justin Angevaare, April, 2014

2 ABSTRACT EFFICIENT BAYESIAN INFERENCE FOR CONDITIONALLY AUTOREGRESSIVE MODELS Justin Angevaare University of Guelph, 2014 Advisors: Dr. D. Gillis Dr. G. Darlington We compare the performance of Metropolis-Hastings (MH) and Hamiltonian Monte Carlo (HMC) methods for Bayesian inference, with specific application to conditionally autoregressive (CAR) models. A simulation study is performed which investigates the efficiency of MH and HMC in estimation of the spatial correlation strength parameter of the CAR model. For this, data are simulated at various resolutions and spatial correlation strengths. An application to the relative abundance of Lake Whitefish in Lake Huron is also presented. Many new HMC-based methods have been recently developed, some of which offer significant benefit in performing inference for CAR models.

3 iii Acknowledgments I have found statistics to be incredibly rewarding research area. Undoubtedly, I owe much of this to my wonderful thesis advisors, Dr. Dan Gillis and Dr. Gerarda Darlington, whom have been supportive during my many challenges and celebrative whenever things turned around. Without Dan s mentorship over the years, it is unlikely I would have gave consideration to performing research in statistics. I am glad that I did. Thank you. Communicating ideas or problems in statistics to people outside the discipline can be difficult. To those in my life without the background in statistics that have shown the interest and patience in understanding what I spend my time doing (and occasionally being frustrated by), thank you sincerely. This research has received support through the Mitacs Accelerate program, which covered research equipment costs and provided a valuable internship opportunity.

4 iv Table of Contents 1 Introduction Motivation Bayesian Inference Conditionally Autoregressive Models Metropolis-Hastings Hamiltonian Monte Carlo Goal and Objectives Simulation Study Methodology Design Analysis Metrics Results Trace plots Computational time Effective sample size Effective samples per second Discussion Application Methodology Data Description Analysis Results Trace plots Performance metrics Model predictions Discussion Conclusions 48 A Appendices 53 A.1 Tables A.1.1 Kruskal-Wallis tests A.1.2 Wilcoxon rank-sum Tests

5 v A.2 Code A.2.1 Simulation Study A Neighbourhood matrix generation A PyMC A Data organization A.2.2 Application A Data organization and visualization A PyMC A.2.3 Graphics A Level Plots A Line Graphs A Stacked Histograms A.2.4 Table Production A Metric summaries A Kruskal-Wallis tests A Wilcoxon rank-sum tests

6 1 Chapter 1 Introduction 1.1 Motivation When exploring and analyzing spatially labelled data, it is important to consider relationships which may occur due to spatial proximity. Many statistical methods are available to explore these relationships for point (spatially continuous) and areal (spatially discrete) data alike. When working with areal data, one of the most widely used methods is the conditionally autoregressive model. This model considers spatial random effects, which can be used within simple regression or more complicated hierarchical models. In the case of hierarchical models, the Bayesian framework and the computational methods available for Bayesian inference, are especially useful. When conditionally autoregressive models are used for high dimensional data, it becomes increasingly important to consider the efficiency of these computational methods. Two prominent computational methods in Bayesian inference are Metropolis-Hastings and Hamiltonian Monte Carlo. The relative efficiency of these methods is dependent on the shape of a model s parameter space. This thesis addresses the relative efficiency of Metropolis-Hastings and Hamiltonian Monte Carlo for conditionally autoregressive models.

7 2 1.2 Bayesian Inference Bayesian inference is a framework for assimilating prior assumptions about model parameters, the likelihood of observed data given specific values of those parameters, into a posterior (the probability of parameter values given observed data) (Neal, 1993). The shape of the posterior distribution is determined by how informative the observed data are versus how informative the chosen prior is. Calculation of the posterior distribution follows the rule of conditional probabilities, which states that for any two events A and B (Miller and Miller, 2004), P (A B)P (B) =P (B A)P (A), and therefore, P (B A)P (A) P (A B) = P (B) By substituting a parameter vector, θ, for event A, and observed data, D, for event B, it follows that P (θ D) = P (D θ)p (θ). P (D)

8 3 The prior distribution of θ is represented by P (θ), the likelihood of D for specific values of θ by P (D θ), and the posterior distribution of θ by P (θ D). P (D) ensures that P (θ D) is a true density (i.e. integrates to 1), and can be considered to be a normalization constant. P (D) can be found through an integration over the parameter space (Neal, 1993), such that P (θ D) = θ P (D θ)p (θ). P (D θ)p (θ)dθ However, the integration required to find P (D) is often difficult or impossible in many applications. Markov Chain Monte Carlo (MCMC) allows us to sample from and approximate our posterior distribution without performing this integration, allowing our posterior to be defined to proportionality as (Besag et al., 1995) P (θ D) P (D θ)p (θ). MCMC is a widely used method in performing Bayesian inference Cappé and Robert (2000).

9 1.3 Conditionally Autoregressive Models 4 Conditionally autoregressive (CAR) models, first defined by Besag (1974), and later generalized by Besag et al. (1991), describe the spatial relationship between regions or areal units. For a supplied neighbour kernel, the CAR prior describes spatial random effects based upon the strength of correlation between neighbours and the scale of regional variability. Often this neighbour kernel is binary, indicating whether regions are adjacent or not (for examples of use see Fuentes et al. (2008) or Yu et al. (2008)). Other definitions or weighting schemes for determining neighbour relationships are possible. For instance, Kyung and Ghosh (2009) describe a directional CAR model where neighbours in one direction may be weighted differently from neighbours in another. This approach makes sense in many applications where factors such as a prevailing wind patterns or other directional processes are expected to influence observations. The spatial random effects described by the CAR model are defined through a multivariate normal distribution, centred on zero, with covariance matrix (τ(d pw )) 1, where τ, ρ, D, and W are defined following Besag et al. (1991):

10 5 W, an r r symmetric matrix, defines the neighbour relationships/weights amongst r areal units. The diagonal elements of W, (w 1,1, w 2,2,... w r,r ), are necessarily all zeros, 0 w 1,2... w 1,r w 2, w 2,r W = w r,1 w r, For a binary neighbour weighting scheme, 1 if i and j are neighbours, w i,j = 0 otherwise. D is an r r diagonal matrix of neighbour counts, or total neighbour weights for the k th areal unit, such that d d D =, d r

11 6 where, r d k = w k,i. i=1 The parameter ρ describes the strength of spatial correlation, or spatial dependency. The range of ρ must be restricted to ( 1, 1) to ensure that the covariance matrix is positive definite. It is possible to further restrict ρ to only positive or negative values if we wish to place a stronger prior on the type of spatial correlation that may exist between areal units. For our purposes, an uninformative prior that allows for a positive or negative correlation amongst neighbouring areal units is selected, hence we allow ρ Uniform( 1, 1). The parameter τ serves as a scaling factor for the inverse of the covariance matrix, and can be thought of as the overall variation amongst regions. τ must be greater than 0 to ensure the covariance matrix is positive definite. A gamma prior is selected, which ensures that τ is positive, and is flexible in terms of shape and location, hence τ Gamma(α, β),

12 7 where α is the shape parameter, and β the rate parameter; hyperpriors for τ. The CAR model described in full is in the form of CAR(τ, ρ, W ) =MvNormal(µ = 0, Σ = (τ(d ρw )) 1 ). The standard method of parameterizing CAR models is with Metropolis- Hastings (MH) based algorithms. Typically the use of MH for these models will result in poor convergence and mixing properties (Haran et al., 2001). 1.4 Metropolis-Hastings Metropolis-Hastings (MH) sampling is an MCMC method available for Bayesian inference (Tierney, 1993). Samples of the joint posterior distribution exist as a position in an MCMC chain. These chains are Markovian in that a sample at location t in the chain only depends mathematically on the value of the previous sample at location t 1. The MH algorithm is first initialized, then iterated. The iteration scheme is as follows (Chib and Greenberg, 1995).

13 8 A new value (or vector), x t+1 for the t+1 position of the chain is proposed. This value is probablistically generated from a transition kernel, based on the current value of the chain, x t. The chosen transition kernel must be reversible, that is P (x t x t+1) =P (x t+1 x t ). It is also required that the chain that results from this transition kernel is aperiodic, meaning that movement through areas of the target density is not restricted by a multiple of an integer number of steps (Chib and Greenberg, 1995). The proposed value x t+1 is accepted with probability ( ) P (x t+1 = x t+1) = min 1, π(x )P (x t x t+1). π(x)p (x t+1 x t ) Since the transition kernel is reversible, this simplifies to ( ) P (x t+1 = x t+1) = min 1, π(x ), π(x)

14 9 where π(x) is some target density. If x t+1 fails to be accepted, the chain remains at the same values, that is, x t+1 =x t. In Bayesian inference, the values of the Markov Chain, x, would be values of model parameters, θ. The probability of these values, π(x) would correspond to the posterior distribution, P (θ D). The posterior need only be defined up to proportionality, i.e. the product of the likelihood and the prior. Normalization constants will cancel when calculating the probability that a proposed value is accepted: P (θ t+1 = θ t+1) = min 1, P (D θ t+1)p (θ t+1) P (D θ)p (θ)dθ P (D θ t )P (θ t ) P (D θ)p (θ)dθ ( = min 1, P ) (D θ t+1)p (θ t+1). P (D θ t )P (θ t )

15 10 The efficiency of this algorithm will be greatly dependent on the transition kernel used. Often a normal transition kernel is selected, and the variance of this kernel with respect to the parameter space determines efficiency. In modern uses of the MH algorithm, this variance is typically automatically tuned as samples are generated such that a desired rejection rate is achieved. That is, the transition kernel variance will be increased if the chain is remaining in a high acceptance region, and decreased if proposals are consistently rejected. With MH, obtaining an optimal rejection rate is the primary consideration in ensuring efficient exploration of the parameter space (Roberts et al., 1997; Chib and Greenberg, 1995). The optimal rejection rate will depend on the dimensionality of the parameter space (increasing with dimensionality), ranging from around 0.45 for a one-dimensional problem up to maximum of about 0.77 for higher dimensions(chib and Greenberg, 1995). 1.5 Hamiltonian Monte Carlo Hamiltonian Monte Carlo (HMC), first described by Duane et al. (1987), is another probabilistic sampling method. Like MH, we can utilize HMC for Bayesian inference by sampling from a model s posterior distribution. HMC is known to perform comparatively well when the parameter space of a model is particularly difficult to explore - such as when model parameters are highly correlated (Brooks et al., 2011). The efficiency of HMC is based on its ability to generate distant, but high acceptance

16 11 proposals. These proposals are generated through a discretization of Hamiltonian dynamics, which requires gradient information on the model parameters. The cost of computing this gradient with respect to HMC s ability to quickly explore difficult parameter spaces is the principal consideration in its use. With HMC, a realization from the posterior distribution of d model parameters is analogous to the position vector of a particle in a d dimensional space. This particle exists in a Hamiltonian system, and as such its movements follow Hamiltonian dynamics. Describing a physical system in terms of Hamiltonian dynamics is an alternative to the Newtownian interpretation. Hamiltonian dynamics are preferred especially when analyzing or simulating complex systems. In general, Hamiltonian dynamics describe an object s state in terms of its energy, mass, and position. This state can be determined through the appropriate differentiation of the Hamiltonian. The Hamiltonian describes the total energy of the system (Neal, 1993). Energy in Hamiltonian systems is continuously converted through time, from potential, to kinetic, and back again. An object may exhibit convergent, periodic, or chaotic behaviour in this respect. Our description of Hamiltonian systems follows details presented by Hairier et al. (2006) and Greiner (2009).

17 12 The location of a particle in a Hamiltonian system with d dimensions is described by a vector q, of length d. The k th element of q represents the location of the particle in the k th dimension. The velocity of this particle in each of d dimensions is described by the vector q, also of length d. The function T (q, q) describes the kinetic energy of the particle, typically defined as T (q, q) = 1 2 qt M(q) q, where M(q) represents a symmetric, positive definite, square, (possibly) positiondependent mass matrix of size d d. The potential energy of the particle is described by the function U(q), which only depends on the particle s location. The Langrangian function, L(q, q), is the difference between these energy functions, such that L(q, q) =T (q, q) U(q). The Lagrangian follows a known relationship between differentials involving the velocity, time, and location, that t ( ) L(q, q) = q L(q, q). q

18 13 The momentum of a particle in a Hamiltonian system is represented by a vector p of length d. Momentum is defined in each of k dimensions with respect to the Langrangian and the velocity as L(q, q) p k =. (1.1) q k With this momentum, we can finally define the Hamiltonian function, H(q, q, p), which depends on a particle s position, velocity, and momentum as H(q, q, p) =p T q L(q, q). The Hamiltonian can be shown to be the total energy in a system, thus can also be defined as H(q, q, p) =T (q, q) + U(q). For the Hamiltonian to only depend on the current position and momentum, the position and the velocity must have one-to-one correspondence with the momentum, via equation 1.1, which must be continously differentiable. Velocity, q k, and change in momentum, ṗ k, in the k th dimension can then be described as: q k = H(p, q) p k,

19 14 and, ṗ k = H(p, q) q k. The state of a mass point following Hamiltonian dynamics through time can be approximated with second order accuracy through a leapfrog integration scheme. This integration method is used to approximate many systems described by differential equations. In the case of Hamiltonian dynamics, integration will find the state of the system at time intervals dictated by a step size, ɛ (Brooks et al., 2011). The step size will determine the initial resolution in which the dynamics are described - too large and the resolution may gloss over the features of interest; too small and the computation may be needlessly intensive (Hoffman and Gelman, 2011). Other discretization methods are possible (Neal, 1993), but the leapfrog integration method has proved to be the most practical and widely used for HMC. Description of the scheme in which a particle s momentum and position are updated follows Neal (1993): p k (t + ɛ/2) =p k (t) (ɛ/2) U(q(t)) q k, q k (t + ɛ) =q k (t) + (ɛ)p k (t + ɛ/2), p k (t + ɛ) =p k (t + ɛ/2) (ɛ/2) U(q(t + ɛ)) q k.

20 15 To sample from a posterior distribution using HMC, the position vector q now represents a vector of proposed parameters. We define our potential energy function U(q) with special reference to our likelihood and prior such that U(q) = log (P (q D)P (q)). Samples that are generated from this algorithm are then subject to a regular MH acceptance scheme. In relation to MH, HMC produces samples with lower autocorrelation, but does so at a computational cost (Hoffman and Gelman, 2011). 1.6 Goal and Objectives The goal of this study is to demonstrate and compare the merits of MH and HMC in performing Bayesian inference for CAR models. Specifically, we investigate the hypothesis that HMC is unequivocally superior in efficiency to MH in performing inference for CAR models. In order to achieve this goal, the following objectives must be met: Spatially correlated data must be simulated with a variety of resolutions and parameter values; Joint posterior densities of CAR model parameters from each simulated dataset must be sampled using MH and HMC methods;

21 16 Efficiency must be measured, and compared between MH and HMC; An application must be presented that demonstrates the abilities of MH and HMC in performing inference for a CAR model.

22 17 Chapter 2 Simulation Study 2.1 Methodology Design A simulation study was designed to investigate the relative performance of HMC and MH in inferring the value of the spatial correlation strength parameter of the CAR model, ρ. Spatially correlated data were simulated for one of nine levels of correlation: high (ρ = ±0.95), medium-high(ρ = ±0.75), medium (ρ = ±0.5), medium-low(ρ = ±0.25), or zero (ρ = 0). In this simulation, an areal unit represented a cell from a regular, finite lattice. The resolution of the simulated spatial data, in other words the lattice dimensions, was one of four levels: 5 5, 10 10, 15 15, or Each combination of spatial correlation strength and resolution was simulated ten times, yielding a total of 360 simulated datasets. Visualizations of three such datasets are shown in figure 2.1. The experiment follows a full factorial arrangement with three factors: MCMC method, spatial correlation strength, and resolution, with 10 replicates.

23 18 Figure 2.1: Three examples of simulated data for three different spatial correlation strengths (ρ = 0.95, 0.5, and 0.5), and three different lattice sizes (20 20, 15 15, and 10 10).

24 Analysis Each simulated dataset was fit to a simple CAR model in Python using PyMC 3.0 (Patil et al., 2010). These models consisted only of a CAR component, i.e. observations were assumed to be direct observations from CAR(τ, ρ). Posterior distributions of τ and ρ were sampled using MH and HMC methods. Both algorithms were initiated at maximum a posteriori (MAP) points, and iterated for steps each. Complete Python code for this simulation study is included in appendix A Metrics The purpose of this simulation was to study the relative efficiency of MH and HMC when performing Bayesian inference of CAR models. Here, efficiency has two main components: 1 - the speed with which samples can be generated from the joint posterior distribution, and 2 - the level of temporal autocorrelation amongst these samples (i.e. how many independent samples do we actually have for the purpose of estimating the joint posterior distribution). The first component is measured simply with computation time. The second component can be measured through an effective sample size (ESS) calculation. A higher ESS indicates a better exploration of a parameter space. ESS is defined as (Kass et al., 1998; Pakman and Paninski, 2013) ESS = n δ,

25 20 where n is the number of posterior samples generated by an MCMC method (n = in the present simulation study), and δ is the autocorrelation time such that δ =1 + 2 k ψ(k). with ψ(k), the autocorrelation at lag k for parameter θ i defiined as ψ(k) =corr[e(θ (t) i ), E(θ (t+k) i )]. Girolami and Calderhead (2011) and Betancourt (2012) use these same metrics in the comparison of a variety of MCMC methods. Pakman and Paninski (2013) used ESS and CPU runtime units in this situation, and Wang et al. (2013) and Hoffman and Gelman (2011) use ESS and the number of leapfrog steps when comparing HMC methods. Here, the number of leapfrog steps (i.e. gradient calculations) is a measure of computational cost in performing some form of HMC. ESS calculations were performed using R software for statistical computing (R Core Team, 2013), with the LaplacesDemon package (Statisticat, LLC., 2013). LaplacesDemon ESS calculation was interfaced with Python using rpy2 (Gautier, 2013). The time (in seconds) required to generate samples using MH and HMC was recorded directly within Python. Combined, ESS and time allow for the calculation of effective samples per second.

26 Results Trace plots The performance of an MCMC algorithm can be quickly assessed through the visual examination of trace plots. Typical trace plots are included in figure 2.2, which correspond to data simulated for a lattice with ρ = Through visual examination of trace plots from the simulation study, it appears that consecutive samples in trace plots from HMC are generally more distant than those in the trace plots from MH. Relatively distant consecutive samples in the trace plots of HMC suggest that there is less autocorrelation present in comparison to MH. In trace plots for both HMC and MH, convergence appears to be immediate. Immediate convergence suggests that the use of the maximum a posteriori point as an initial value for each algorithm has been effective in eliminating the need for a burn-in period Computational time Computational time was found to be largely related to the number of regions for which data were simulated. Figure 2.3 illustrates how computational time differs between HMC and MH as a function of lattice size. We see in this figure that HMC consistently requires more time for computation for a given lattice size in comparison to MH. We also see both methods require more time for computation as lattice size increases. The rate of this increase is much higher for HMC. For instance, for 5 5 lattices, MH and HMC require 3.05 and seconds for computation on average, respectively. In other words, HMC requires roughly 7 times longer than MH for

27 22 Figure 2.2: Trace plots for posterior samples of CAR model parameters as generated by MH (top right plots) and HMC (bottom right plots). Posterior densities corresponding to these trace plots are included on the left. The data were simulated for a lattice with ρ = 0.95.

28 23 computation at this lattice size. For lattices, this increases to and seconds for MH and HMC respectively, or 21 times longer is required for computation for HMC in comparison to MH. Kruskal-Wallis tests were performed to determined if there was evidence that computation time differed amongst simulations with different spatial correlation strength, for each method and lattice dimension. There was evidence (p-value 0.05) that differences occur in computation time amongst spatial correlation strengths for both methods at every lattice dimension except for This was followed by a Wilcoxon rank-sum post hoc analysis to determine if there were any patterns to when these differences in computation time with spatial correlation strength occurred. There was evidence (p-value 0.05 ( 9 2) ) that significant differences in computation time occurred amongst some, but not all spatial correlation strengths with MH and HMC - but without any obvious patterns. The Kruskal-Wallis test results are included in appendix A.1.1 in table A.1. The complete set of Wilcoxon rank-sum test results are included in appendix A.1.2, in tables A.2 and A.3. As there was evidence computation time differs amongst spatial correlation strengths, separate lines for each spatial correlation strength and method are presented in figure 2.4 to show the relationship of computation time and lattice dimension. For individual simulation runs, the relative computation time of HMC and MH is presented figure 2.5. This figure shows that HMC requires more computation time for all grid sizes, but also for all spatial correlation strengths. Stacked histograms allow for examination of whether relative computational time depends on the spatial correlation strength. Based on visual examinination of these figures, relative compu-

29 24 tation time appears to be consistent across all spatial correlation strengths. Median computation time is reported in table 2.1 for each lattice dimension. In table 2.2, median computation time is reported for each combination of lattice dimension and ρ. Computation time is bold in these tables when it is significantly lower for MH, as determined by one-sided Wilcoxon rank-sum tests (p-value 0.05). The results of these tests are found in appendix A.1.2 in tables A.8 and A Effective sample size The main feature of HMC is its ability to produce samples with limited autocorrelation. The degree of autocorrelation present in an MCMC chain can be measured with ESS. Figure 2.6 shows the relative ESS of the spatial correlation strength parameter, ρ, for each individual simulation run. Again, the use of stacked histograms allows us to see how differences in ESS occur with respect to lattice dimension and true spatial correlation strength. Values greater than 1 in this figure indicate that HMC has produced more effective samples than MH for a given simulation run. We see that this is nearly always the case. The difference in ESS is commonly more than 7500, which can be seen in tables 2.1 and 2.2. There are instances, however, in which HMC and MH produce comparable ESS. This appears to be more likely when the spatial correlation strength has been set at extreme values (i.e. ρ = 0.95 and ρ = 0.95). A difference which is close to zero is not due to MH producing more independent samples in these situations, but due to poorer performance of HMC. It seems this performance issue with HMC is reduced as lattice dimensionality increases. Kruskal-Wallis tests were performed to detect whether ESS differed signifi-

30 25 Figure 2.3: Median computation time for HMC and MH with respect to lattice dimension (d d) are shown. The 95th percentile range of computation time are indicated with whiskers.

31 26 Figure 2.4: Median computation time for HMC and MH with respect to lattice dimension (d d) and spatial correlation strength (ρ) is shown.

32 27 Figure 2.5: The above histograms depict the relative computation time required for HMC in comparison to MH for each simulation run, by lattice dimension.

33 28 cantly between different spatial correlation strengths within a method and lattice size. Significant differences in ESS were detected amongst spatial correlation strengths within each lattice size and method. Results from these tests are in appendix A.1 in table A.1.1. The Kruskal-Wallis tests were followed by a Wilcoxon rank-sum post hoc test to determine if there were any patterns for when significant differences in ESS occurred. For HMC, ESS for simulations with extreme spatial correlation strengths (ρ = ±0.95) were commonly significantly different than that for every other spatial correlation strength. For a specific spatial correlation strength, there were no significant differences detected in ESS between negative and positive values (e.g. between ρ = 0.5 and ρ = 0.5.). Complete results from the Wilcoxon rank-sum tests are found in appendix A.1.2 in tables A.4 and A.5. Median ESS is reported in table 2.1 for each lattice dimension. In table 2.2, median ESS is reported for each combination of lattice dimension and ρ. ESS is bold in these tables when it is significantly higher for HMC, as determined by one-sided Wilcoxon rank-sum tests (p-value 0.05). The results of these tests are found in appendix A.1.2 in tables A.8 and A Effective samples per second Combining the previously mentioned metrics, computation time and ESS, we arrive at effective samples per second. This metric speaks to the overall efficiency of the MCMC algorithms. Computation time and ESS mean very little when not put in the context of one another. Effective samples per second for both methods decreases as lattice dimension increases. This decrease occurs more rapidly for HMC

34 29 Figure 2.6: The above histograms depict the relative effective sample size for HMC in comparison to MH for each simulation run, by lattice dimension.

35 30 in comparison of MH, as shown in figure 2.8. For a lattice, HMC and MH generate very similar effective samples per second. We use stacked histograms to visualize the relative effective samples per second of HMC in comparison to MH for each individual simulation run, in figure 2.7. If relative effective samples display a pattern with respect to spatial correlation strength, that would suggest that it is situational. In general, HMC is able to generate more individual samples per unit time in comparison to MH. There are exceptions, however. Since HMC consistently requires more computation time, when HMC and MH produce comparable ESS, the effective samples per second is greater for MH than HMC. In our simulation study, this occurs when extreme values have been selected for the spatial correlation strength. Kruskal-Wallis tests were used to determined if significant differences occurred amongst effective samples per second due to spatial correlation strength, within each method and lattice size. Significant differences were detected by these tests for every method and lattice dimension. The results of these tests can be found in appendix A.1 in table A.1.1. These tests were followed by a Wilcoxon rank-sum post hoc analysis to determine if there were any patterns to when differences in effective samples per second occurred amongst spatial correlation strengths. Significant differences were found to occur amongst some, but not all spatial correlation strengths for both methods without any obvious patterns. The results of these tests can be found in appendix A.1.2 in tables A.6 and A.7. As there was evidence of differences amongst spatial correlation strengths for both methods, figure 2.9 presents the median effective samples per second for each spatial correlation strength and method as

36 31 a function of lattice dimension. Median effective samples per second is reported in table 2.1 for each lattice dimension. In table 2.2, median effective samples per second is reported for each combination of lattice dimension and ρ. Effective samples per second is bold in these tables when it was significantly higher for a specific method, as determined by twosided Wilcoxon rank-sum tests (p-value 0.05). The results of these tests are found in appendix A.1.2 in tables A.8 and A Discussion The purpose of this research has been to investigate the hypothesis that HMC is unequivocally superior in efficiency to MH in performing inference for CAR models. Here, efficiency has been defined as number of effective samples generated per second for each method - which has been measured for a variety of scenarios through simulation. If this hypothesis was true, we would predict that HMC would have more effective samples per second for each of these scenarios. This was found to be the case the majority of the time, but with some interesting exceptions. HMC dropped in performance when regions had very strong positive or negative correlation with their neighbours. The true values in these cases, 0.95 and -0.95, approach the boundaries of the uniform prior set on the spatial correlation strength, 1 and -1. It seems natural that the effective sample size of an MCMC chain for a parameter, when its true value is near a hard boundary, may be lower than when this is not the case. This is due to the fact that any value proposed beyond this boundary will have an acceptance probability

37 32 Figure 2.7: The above histograms depicts the difference in effective samples per second between HMC and MH methods for each simulation run, by grid size.

38 33 Figure 2.8: Median effective samples per second for HMC and MH with respect to lattice dimension (d d) are shown. The 95th percentile range of effective samples per second are indicated with whiskers.

39 34 Figure 2.9: Median effective samples per second for HMC and MH with respect to lattice dimension (d d) and spatial correlation strength (ρ) are shown.

40 35 of zero, and proposals in this region will be more common when near high density areas of the posterior distribution. Additionally, the gradient calculation involved in the generation of HMC proposals is not able to guide an MCMC chain away from the boundaries of a uniform prior. Pakman and Paninski (2013) have presented an interesting HMC-based MCMC algorithm to deal with truncated distributions, such as the one required for CAR models. In their algorithm, they account for boundaries by having particles, or posterior samples bounce off of them. When this occurs, an inversion of velocity occurs, which is shown to continue to satisfy the conditions of Hamiltonian dynamics. An approach such as this could mitigate the efficiency issues observed at boundaries in this simulation study. A second interesting result was that the efficiency advantages of HMC were diminishing as the number of regions for which data were simulated increased. For a lattice, the efficiency differences between HMC and MH were slight. If this trend continued, for higher spatial resolutions it may actually be advantageous to use MH over HMC. The high computational cost to HMC is related to required gradient calculation of the potential energy function. As the dimensionality of a model increases, so does the cost of computing this gradient. For MH, the cost of increased dimensionality is due to a higher rejection rate, and more complex posterior density calculations, something which HMC is also subject to. One option to lessen the burden of gradient calculations is to use a stochastic gradient approach with HMC. Chen et al. (2014) have implemented stochastic gradient HMC. In order to account for the noise generated from the use of stochastic gradients, they found that a friction term was necessary to maintain Hamiltonian dynamics. This friction term

41 36 itself is based on second-order Langevin dynamics. Chen et al. (2014) conclude that stochastic gradient HMC, with their simple friction term presents a promising avenue for scaling HMC for practical use with high dimensional Bayesian models. Beyond the two HMC variants briefly described here, many others have been developed that seek to improve the efficiency and usability of HMC. Figure 2.10 lists some of these developments. Excitingly, there are opportunities for some of these new HMC-based methods to borrow from one another. The popularity of HMC for performing Bayesian inference will continue to increase as these advances are made.

42 37 Stochastic Gradient (Chen et al., 2014) Exact (Pakman and Paninski, 2013) Advanced (Beskos et al., 2013) Hamiltonian (Duane et al., 1987)(Neal, 1993) Split (Lan and Shahbaba, 2012) Rasmussen (Fielding and Liong, 2011) Parallel Tempering (Fielding and Liong, 2011) NUTS (Hoffman and Gelman, 2011) Riemann Manifold (Girolami and Calderhead, 2011) Adaptive (Wang et al., 2013) Figure 2.10: developed. Some of the recent HMC-based MCMC methods which have been

43 38 d Time ESS ES/sec HMC MH HMC MH HMC MH (20, 25.7) 3.1 (2.9, 3.2) (7.4, 10000) (86.3, 817.7) 388 (0.4, 481) 76.6 (28.1, 282.3) (50.9, 62.1) 5.4 (5.3, 6.4) (1.9, 10000) (64, 862.6) (0, 196.3) 31.1 (11.7, 159.9) (232.1, 267.6) 14.8 (14.4, 17.5) (16.4, 10000) (47, 879.9) 39.9 (0.1, 42.9) 11.6 (3.2, 58.8) (1051.9, ) 55.2 (50.1, 151.7) (48.3, 10000) (27.6, 879.4) 8.3 (0, 9.5) 2.7 (0.5, 16.4) Table 2.1: Median values of the metrics considered in this simulation study are presented by lattice dimension (d) and method, over all spatial correlation strengths. The range of each metric is included in brackets. Bold values indicate significantly better performance of a method for a specific metric. Relatively lower computation time, higher ESS, and higher effective samples per second are preferred in performance.

44 d ρ Time ESS ES/sec HMC MH HMC MH HMC MH (20, 20.9) 2.9 (2.9, 2.9) 474 (112.7, ) (127.8, 456.8) 22.8 (5.5, 480.4) 89.3 (43.7, 156.3) (51.1, 58.2) 5.4 (5.3, 5.6) (1.9, ) (96, 581.7) 8.4 (0, 42) 70.3 (17.7, 106.7) (232.1, 239.4) 14.7 (14.4, 15.9) (16.4, 10000) (47, 630.1) 4.1 (0.1, 42.6) 20.5 (3.2, 42.8) (1051.9, ) 50.9 (50.1, 54) (48.3, 10000) (27.6, 632.9) 0.2 (0, 9.5) 5 (0.5, 12.4) (20.1, 23.2) 3.1 (2.9, 3.2) (89.4, 10000) (119.2, 446.6) (3.9, 481) 68.7 (39.2, 149.4) (51.1, 51.8) 5.4 (5.3, 5.6) (2149.9, 10000) 190 (99.6, 480) (41.5, 195.6) 35.3 (18.4, 88.3) (233.1, 245.3) 14.6 (14.4, 16.1) (10000, 10000) (166.5, 338.6) 42.2 (40.8, 42.9) 14.4 (11.3, 23.1) (1059.3, ) 51 (50.3, 64.7) (10000, 10000) (131.3, 427.7) 9.2 (8.7, 9.4) 5.2 (2.6, 8.3) (21.6, 23.7) 3.2 (3.1, 3.2) (5748, 10000) (102.3, 289.9) 441 (242.7, 461.9) 53.8 (32.6, 93.8) (51.1, 59.2) 5.4 (5.3, 5.5) (5875.7, 10000) (72, 462.3) (114.9, 195.8) 25 (13.3, 85.5) (235.5, 240.4) 14.7 (14.6, 14.9) (9077, 10000) (109, 220) 41.9 (38.2, 42.5) 11.1 (7.3, 15) (1069.2, ) 51.6 (50.5, 61.6) (10000, 10000) (114.4, 206.9) 9.2 (8.5, 9.4) 2.4 (2.2, 3.8) (20.9, 23.6) 3 (3, 3.1) (7376.1, 10000) (151.8, 369.4) (347.2, 479) 64.8 (48.9, 122.3) (51.1, 51.4) 5.4 (5.4, 5.5) (9545.6, 10000) (82.3, 213.1) (185.6, 195.6) 27.5 (15.1, 39.4) (236.1, 251.6) 14.7 (14.5, 14.9) (8461.2, 10000) (94.6, 175.2) 41 (35.3, 42.2) 9.7 (6.5, 11.7) (1157.8, ) 56.8 (52.9, 66.7) (9724.8, 10000) (103.3, 164.3) 8.3 (6.8, 8.6) 2 (1.8, 2.8) (21.1, 24.9) 3.1 (3, 3.2) (1904.5, 10000) (96.6, 317) (89.5, 473.6) 55.3 (30, 104.5) (51.2, 60.7) 5.4 (5.3, 5.5) (10000, 10000) (82.8, 148) (164.7, 195.3) 23.6 (15.1, 27.3) (234.8, 252.4) 14.7 (14.6, 14.9) (8820.1, 10000) (79.4, 138.5) 38.4 (36.8, 41.8) 8.1 (5.4, 9.3) (1184.2, ) 61 (55.4, 92.3) (8459.8, 10000) (66.8, 163.9) 7.1 (4.3, 7.8) 1.8 (1.1, 2.7) (21, 25.7) 3.1 (3, 3.2) (1113, 10000) (90.5, 702.9) (52.6, 476.1) 77.2 (29.7, 222.4) (51, 51.8) 5.4 (5.3, 5.5) (10000, 10000) (64, 180) (193.1, 196.2) 23.9 (11.7, 33.2) (235.7, 240.5) 14.7 (14.5, 15) (8833.4, 10000) 131 (98.3, 181.3) 41 (37, 42.4) 8.9 (6.6, 12.4) (1132.5, ) 56.6 (52.5, 151.7) (8701.5, 10000) (100.4, 158.7) 8.1 (4.3, 8.8) 2.2 (0.7, 2.6) (21.1, 25.7) 3.1 (3, 3.2) (3750.1, 10000) (86.3, 565) 465 (172.3, 472.2) (28.1, 183.7) (51, 62.1) 5.4 (5.3, 6.4) (10000, 10000) (85.9, 283) (161.2, 196) 30.3 (16.1, 49.9) (236.7, 266.5) 14.9 (14.6, 15.7) (9837.9, 10000) (93.4, 295.2) 39.7 (37.5, 42.2) 11.1 (6.3, 18.8) (1115.1, 1885) 57.3 (52.5, 106.9) (10000, 10000) (104.9, 278.5) 8.7 (5.3, 9) 2.6 (1.2, 5.3) (20.3, 25.7) 3 (2.9, 3.1) (660.9, 10000) (132.1, 432.1) (32.5, 476.6) 91.1 (44.4, 147.4) (50.9, 53.3) 5.4 (5.3, 5.4) (3354.4, 10000) (143.2, 505.4) (63, 196.3) 45 (26.5, 93.3) (239.9, 267.6) 15.2 (14.7, 17.5) (1207.4, 10000) (176.6, 539) 38.9 (4.9, 41.7) 18.7 (11.9, 35.3) (1091, ) 57.8 (51.9, 71.2) (10000, 10000) (155.5, 481.4) 8.4 (7.2, 9.2) 4.4 (2.7, 8.2) (20.3, 20.7) 3 (2.9, 3) (7.4, 3167) (210.6, 817.7) 57.7 (0.4, 155.5) (71.4, 282.3) (51.2, 51.7) 5.4 (5.3, 5.4) (147, ) (371.1, 862.6) 14.6 (2.9, 33.2) 97.6 (69.2, 159.9) (235.4, 260) 15 (14.5, 16.5) 1199 (316.8, 10000) (171.8, 879.9) 4.8 (1.3, 38.9) 36.9 (11.6, 58.8) (1081, ) 53.4 (51.5, 73.5) (575.1, 10000) (273.1, 879.4) 1.5 (0.5, 8.8) 11.8 (4.8, 16.4) 39 Table 2.2: Median values of the metrics considered in this simulation study are presented by lattice dimension (d), spatial correlation strength (ρ), and method. The range of each metric is included in brackets. Bold values indicate significantly better performance of a method for a specific metric.

45 40 Chapter 3 Application 3.1 Methodology Data Description A CAR model was used to detect the spatial structure of, and provide spatially smoothed estimates of, catch per unit effort (CPUE) for the lake whitefish (Coregonus clupeaformis) fishery of the North Channel of Lake Huron. CPUE is considered a measure of relative abundance. The abundance of lake whitefish may vary spatially according to the degree in which local environmental conditions are in line with a species habitat preferences. Local abundance may also be affected by commercial harvest intensity, and aggregation and dispersion behaviours of lake whitefish. It is likely that several more such spatial processes exist, combining to result in the spatial correlation of lake whitefish abundance. The raw CPUE data were calculated as the total harvest of lake whitefish in round kilograms divided by the total effort in terms of meters of gillnet for each actively fished 5 minute 5 minute grid in the Northern Channel of Lake Huron. Only gillnet harvest was considered for simplicity, as it accounted for the vast majority of

46 41 commercial harvest (> 98% of all harvest). Harvest and effort were totalled across 34 years of commercial fisheries data ( inclusive), for the calculation of CPUE. In total, 85 5 minute x 5 minute grid cells were considered for the CAR model. In other words, the harvest weight and effort from each gillnet harvest event (h) were aggregated over all years for each grid cell (j), and CPUE was calculated as Harvest h CPUE j = i hɛj, for i = 1979,..., 2012, and j= 1,..., 85. Effort h i hɛj Analysis A simple CAR model was assumed for the CPUE data, where y =β 0 + u,

47 42 where y is a vector of CPUE observations for grid cells 1,..., 85, β 0 is an intercept term, and u are the spatial random effects as described by the CAR model. The following priors were assumed for the model: β 0 Normal(µ = 0, σ 2 = 1), u MvNormal(µ = 0, Σ = (τ(d ρw )) 1 ), τ Gamma(α = 1, β = 4), ρ Uniform(A = 1, B = 1). The normal prior for β 0 allows for a positive or negative intercept. The use of a normal prior here is standard for Bayesian regression. The gamma prior on τ ensures positiveness, which is necessary for its use in the covariance matrix associated with the CAR model. The uniform prior on ρ is uninformative, and allows for positive or negative correlation amongst the regions. Similar to the simulation study, samples from the joint posterior distribution were generated, with MH and HMC sampling initiated at the MAP point. Time required for computation and the effective sample size were calculated for each MCMC method.

48 Results Trace plots Trace plots associated with the fitting of the Lake Huron lake whitefish CPUE data to a CAR model, using both HMC and MH can be found in figure 3.1. Substantial differences can be seen in the behaviour of these trace plots. MH seems to have performed especially poorly in the production of region estimates in comparison to HMC. HMC appears to produce samples for ρ and τ with very low autocorrelation. There seems to be slightly more autocorrelation in the samples generated by MH for these parameters, but their trace plots still show reasonable mixing Performance metrics MH required 6.2 seconds for iterations, whereas HMC required seconds. In these times, MH produced effective sample sizes of 109.9, 612.0, and 2.1 for ρ, τ and β 0, respectively. For these same parameters, HMC produced effective sample sizes of 538.9, , and respectively. In terms of effective samples per second, MH was found to have rates of 17.72, 98.70, and 0.36 for ρ, τ and β 0, in comparison to HMC which was found to have a rates of 4.23, 48.67, and 2.15, respectively. These results are also presented in table 3.1. These results differ from the relative efficiency that the simulation study has lead us to expect. In comparison, in the simulation study, the median effective samples per second for ρ, for a lattice, were and 31.1 for HMC and MH, respectively. The irregular spatial structure of the CPUE data, the differences in spatial correlation strengths, and/or

49 44 Figure 3.1: Comparison of trace plots and posterior densities resulting from MH (left) and HMC (right) sampling methods.

50 45 ESS ESS/sec MH HMC MH HMC ρ τ β Table 3.1: ESS and effective samples per second (ESS/sec) for each parameter of the CPUE CAR model from MH and HMC. the use of the CAR model in a predictive capacity in the application may explain for these differences Model predictions In figure 3.2 the observed CPUE data is compared side by side with the predicted data from the CAR model using MH and HMC. The relative CPUE as predicted with HMC in comparison to the observed CPUE is shown in figure 3.3. The largest differences occur where very high or very low values of CPUE effort had been observed for a region. 3.3 Discussion While fitting the CAR model for Lake Huron lake whitefish relative abundance, it was observed that the shape of an area has implications for the CAR model. Indeed, it is noted by Wall (2004) that the original use of the CAR model was for doubly infinite regular lattices, and when it is applied to finite, irregular lattices, the implied spatial correlations are not well understood. In general, Wall (2004) showed that the implied spatial correlation of CAR models for irregular lattices are unin-

51 46 Figure 3.2: Observed CPUE (left), beside CPUE as predicted by the CAR model after iterations from MH (center) and HMC (right).

52 47 Figure 3.3: A plot of the the observed CPUE relative to CPUE as predicted by the CAR model using HMC sampling. Brighter colours (e.g. yellow vs. red) indicate that the predicted CPUE is relatively higher in comparison to the observed CPUE. tuitive. This implies that when there is an emphasis on understanding the spatial structure of data, rather than other model coefficients, alternative methods should be used. As HMC is particularly effective in exploring correlated parameter spaces, it excels when used in a predictive capacity for CAR models. That is, when measures from spatially correlated regions are treated as model parameters, as CPUE is here, HMC is able to effectively explore that parameter space. The trace plots in figure 3.1 show poor mixing of the regional estimates for MH, and ideal mixing of these same estimates when using HMC. We can represent the performance of MH and HMC for generating regional estimates with a single parameter, β 0. Over iterations, MH had an effective sample size of only 2.1 for β 0, where HMC generated effective samples. The consequences of MH s poor mixing can be seen in the regional estimates in figure 3.2. In this figure it is clear that the regional estimates produced by MH have excessive noise in comparison to those produced by HMC. In order to use MH for this type of application, it would be required to run for many more iterations.

53 48 Chapter 4 Conclusions CAR models are a popular choice for spatially correlated areal data. When these data are high resolution, inference on the parameters of the CAR model becomes increasingly computationally intensive. In these situations, the relative efficiency of MCMC methods becomes of critical importance. Our simulation study compares two broad categories of MCMC methods, MH and HMC. No research has been previously conducted on the relative merits of these different MCMC methods for specific use with CAR models. Our simulation study suggested that HMC is generally the preferred MCMC method for these types of models. HMC was more efficient in the majority of simulation runs, but less so under extreme scenarios, and with a declining margin for increasing resolutions. However, two HMC-based algorithms recently described by Chen et al. (2014) and Pakman and Paninski (2013) present promising strategies for each of these concerns. The application that accompanied this simulation study found that MH had greater efficiency in performing inference for the CAR model parameters, τ and ρ, in comparison to HMC. However, in predicting regional CPUE, HMC outperformed MH. This raises further questions regarding the impact of fitting CAR models to irregular lattices on computational efficiency. In their standard forms, we cannot say that HMC is unequivocally superior

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

Paul Karapanagiotidis ECO4060

Paul Karapanagiotidis ECO4060 Paul Karapanagiotidis ECO4060 The way forward 1) Motivate why Markov-Chain Monte Carlo (MCMC) is useful for econometric modeling 2) Introduce Markov-Chain Monte Carlo (MCMC) - Metropolis-Hastings (MH)

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Gradient-based Monte Carlo sampling methods

Gradient-based Monte Carlo sampling methods Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Hamiltonian Monte Carlo

Hamiltonian Monte Carlo Chapter 7 Hamiltonian Monte Carlo As with the Metropolis Hastings algorithm, Hamiltonian (or hybrid) Monte Carlo (HMC) is an idea that has been knocking around in the physics literature since the 1980s

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

1 Geometry of high dimensional probability distributions

1 Geometry of high dimensional probability distributions Hamiltonian Monte Carlo October 20, 2018 Debdeep Pati References: Neal, Radford M. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2.11 (2011): 2. Betancourt, Michael. A conceptual

More information

Introduction to Hamiltonian Monte Carlo Method

Introduction to Hamiltonian Monte Carlo Method Introduction to Hamiltonian Monte Carlo Method Mingwei Tang Department of Statistics University of Washington mingwt@uw.edu November 14, 2017 1 Hamiltonian System Notation: q R d : position vector, p R

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Tutorial on Probabilistic Programming with PyMC3

Tutorial on Probabilistic Programming with PyMC3 185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Hamiltonian Monte Carlo for Scalable Deep Learning

Hamiltonian Monte Carlo for Scalable Deep Learning Hamiltonian Monte Carlo for Scalable Deep Learning Isaac Robson Department of Statistics and Operations Research, University of North Carolina at Chapel Hill isrobson@email.unc.edu BIOS 740 May 4, 2018

More information

17 : Optimization and Monte Carlo Methods

17 : Optimization and Monte Carlo Methods 10-708: Probabilistic Graphical Models Spring 2017 17 : Optimization and Monte Carlo Methods Lecturer: Avinava Dubey Scribes: Neil Spencer, YJ Choe 1 Recap 1.1 Monte Carlo Monte Carlo methods such as rejection

More information

Manifold Monte Carlo Methods

Manifold Monte Carlo Methods Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

arxiv: v1 [stat.co] 2 Nov 2017

arxiv: v1 [stat.co] 2 Nov 2017 Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University

More information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 10-708 Probabilistic Graphical Models Homework 3 (v1.1.0) Due Apr 14, 7:00 PM Rules: 1. Homework is due on the due date at 7:00 PM. The homework should be submitted via Gradescope. Solution to each problem

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Chapter 12 PAWL-Forced Simulated Tempering

Chapter 12 PAWL-Forced Simulated Tempering Chapter 12 PAWL-Forced Simulated Tempering Luke Bornn Abstract In this short note, we show how the parallel adaptive Wang Landau (PAWL) algorithm of Bornn et al. (J Comput Graph Stat, to appear) can be

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Downloaded from:

Downloaded from: Camacho, A; Kucharski, AJ; Funk, S; Breman, J; Piot, P; Edmunds, WJ (2014) Potential for large outbreaks of Ebola virus disease. Epidemics, 9. pp. 70-8. ISSN 1755-4365 DOI: https://doi.org/10.1016/j.epidem.2014.09.003

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Delayed Rejection Algorithm to Estimate Bayesian Social Networks

Delayed Rejection Algorithm to Estimate Bayesian Social Networks Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2014 Delayed Rejection Algorithm to Estimate Bayesian Social Networks Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2014 A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions Shane T. Jensen University of Pennsylvania Dean

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

arxiv: v4 [stat.co] 4 May 2016

arxiv: v4 [stat.co] 4 May 2016 Hamiltonian Monte Carlo Acceleration using Surrogate Functions with Random Bases arxiv:56.5555v4 [stat.co] 4 May 26 Cheng Zhang Department of Mathematics University of California, Irvine Irvine, CA 92697

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case

Areal data models. Spatial smoothers. Brook s Lemma and Gibbs distribution. CAR models Gaussian case Non-Gaussian case Areal data models Spatial smoothers Brook s Lemma and Gibbs distribution CAR models Gaussian case Non-Gaussian case SAR models Gaussian case Non-Gaussian case CAR vs. SAR STAR models Inference for areal

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005

Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach. Radford M. Neal, 28 February 2005 Creating Non-Gaussian Processes from Gaussian Processes by the Log-Sum-Exp Approach Radford M. Neal, 28 February 2005 A Very Brief Review of Gaussian Processes A Gaussian process is a distribution over

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis

Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis Web Appendices: Hierarchical and Joint Site-Edge Methods for Medicare Hospice Service Region Boundary Analysis Haijun Ma, Bradley P. Carlin and Sudipto Banerjee December 8, 2008 Web Appendix A: Selecting

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina

Local Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling G. B. Kingston, H. R. Maier and M. F. Lambert Centre for Applied Modelling in Water Engineering, School

More information

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis

MIT /30 Gelman, Carpenter, Hoffman, Guo, Goodrich, Lee,... Stan for Bayesian data analysis MIT 1985 1/30 Stan: a program for Bayesian data analysis with complex models Andrew Gelman, Bob Carpenter, and Matt Hoffman, Jiqiang Guo, Ben Goodrich, and Daniel Lee Department of Statistics, Columbia

More information

Calibrating Environmental Engineering Models and Uncertainty Analysis

Calibrating Environmental Engineering Models and Uncertainty Analysis Models and Cornell University Oct 14, 2008 Project Team Christine Shoemaker, co-pi, Professor of Civil and works in applied optimization, co-pi Nikolai Blizniouk, PhD student in Operations Research now

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Machine Learning. Probabilistic KNN.

Machine Learning. Probabilistic KNN. Machine Learning. Mark Girolami girolami@dcs.gla.ac.uk Department of Computing Science University of Glasgow June 21, 2007 p. 1/3 KNN is a remarkably simple algorithm with proven error-rates June 21, 2007

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL Xuebin Zheng Supervisor: Associate Professor Josef Dick Co-Supervisor: Dr. David Gunawan School of Mathematics

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Convergence Diagnostics For Markov chain Monte Carlo. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 9, 2017

Convergence Diagnostics For Markov chain Monte Carlo. Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 9, 2017 Convergence Diagnostics For Markov chain Monte Carlo Eric B. Ford (Penn State) Bayesian Computing for Astronomical Data Analysis June 9, 2017 MCMC: A Science & an Art Science: If your algorithm is designed

More information

Infinite Mixtures of Gaussian Process Experts

Infinite Mixtures of Gaussian Process Experts in Advances in Neural Information Processing Systems 14, MIT Press (22). Infinite Mixtures of Gaussian Process Experts Carl Edward Rasmussen and Zoubin Ghahramani Gatsby Computational Neuroscience Unit

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

Quasi-Newton Methods for Markov Chain Monte Carlo

Quasi-Newton Methods for Markov Chain Monte Carlo Quasi-Newton Methods for Markov Chain Monte Carlo Yichuan Zhang and Charles Sutton School of Informatics University of Edinburgh Y.Zhang-60@sms.ed.ac.uk, csutton@inf.ed.ac.uk Abstract The performance of

More information

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017 Chalmers April 6, 2017 Bayesian philosophy Bayesian philosophy Bayesian statistics versus classical statistics: War or co-existence? Classical statistics: Models have variables and parameters; these are

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016 Spatial Statistics Spatial Examples More Spatial Statistics with Image Analysis Johan Lindström 1 1 Mathematical Statistics Centre for Mathematical Sciences Lund University Lund October 6, 2016 Johan Lindström

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Large Scale Bayesian Inference

Large Scale Bayesian Inference Large Scale Bayesian I in Cosmology Jens Jasche Garching, 11 September 2012 Introduction Cosmography 3D density and velocity fields Power-spectra, bi-spectra Dark Energy, Dark Matter, Gravity Cosmological

More information

Efficiency and Reliability of Bayesian Calibration of Energy Supply System Models

Efficiency and Reliability of Bayesian Calibration of Energy Supply System Models Efficiency and Reliability of Bayesian Calibration of Energy Supply System Models Kathrin Menberg 1,2, Yeonsook Heo 2, Ruchi Choudhary 1 1 University of Cambridge, Department of Engineering, Cambridge,

More information

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods arxiv:1705.08510v3 [stat.co] 7 Sep 2018 Akihiko Nishimura Department of Biomathematics, University of California

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

Bayesian Areal Wombling for Geographic Boundary Analysis

Bayesian Areal Wombling for Geographic Boundary Analysis Bayesian Areal Wombling for Geographic Boundary Analysis Haolan Lu, Haijun Ma, and Bradley P. Carlin haolanl@biostat.umn.edu, haijunma@biostat.umn.edu, and brad@biostat.umn.edu Division of Biostatistics

More information

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Journal of Modern Applied Statistical Methods Volume 13 Issue 1 Article 26 5-1-2014 Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters Yohei Kawasaki Tokyo University

More information