An Application of Bayesian Melding to Ecological Networks. Joshua Michael Gould. A research paper presented to the. University of Waterloo

Size: px
Start display at page:

Download "An Application of Bayesian Melding to Ecological Networks. Joshua Michael Gould. A research paper presented to the. University of Waterloo"

Transcription

1 An Application of Bayesian Melding to Ecological Networks by Joshua Michael Gould A research paper presented to the University of Waterloo In partial fulfillment of the requirements for the degree of Master of Mathematics in Statistics-Biostatistics Waterloo, Ontario, Canada, 2008

2 Abstract This paper considers Ecological Network Analysis (ENA) data (Ulanowicz, 2004) in the context of statistical inference. Background for this data type is provided, and a framework for statistical inference based on such data is developed. This framework comprises a method referred to as Bayesian melding, which was developed by Poole & Raftery (2000) to combine prior information with information induced by deterministic dynamics models to arrive at a melded Bayesian prior. We describe and illustrate this method in the context of ENA data; the method incorporates this melded prior into Bayesian inference and provides a statistical interpretation for a deterministic mass balance model, with reference to quantities for inference. For posterior inference, we further require the use of iterative Gibbs sampling incorporating the Metropolis- Hastings algorithm, a type of Markov chain Monte Carlo simulation. We describe this sampling algorithm, which is further implemented for two ENA datasets, observed from Cone Spring (Tilly, 1968) and the Chesapeake Bay mesohaline network (Baird & Ulanowicz, 1989). In each case, such ENA data quantify flows of energy or materials among different compartments of one or more species within a local ecosystem. The implementations described here consider the deterministic model wherein the expected medium dissipated from an ecosystem equals the difference of the expected medium into it and the expected medium out of it. ii

3 Acknowledgements My most sincere thanks must go to Dr. Grace Chiu, for her detailed suggestions, advice, and support, both financial and organizational. I am also grateful for the resources and support provided by the Department of Statistics & Actuarial Science. Thanks also to Dr. Hugh Chipman for serving as second reader and to Mary Lou Dufton for answering my many questions and helping with some last minute delays. To my friends and family - I could not have done this without your support - thank you. iii

4 Table of Contents Abstract ii Acknowledgements iii Table of Contents iv 1 Introduction Inference for Ecological Networks Assumptions for Statistical Inference Cone Spring Example Results Methods Bayesian Melding Notation and Method Metropolis-Hastings Algorithm Melding Example Modelling and Implementation Prior Specification Sampling Algorithm Melding iv

5 3.2.2 Gibbs Sampler Metropolis-Hastings Remarks Pseudo Code Prior Specification Melding Gibbs Sampler Results Conclusions Future Work Bibliography 41 Appendix 44 v

6 1 Introduction At the level of a whole ecosystem, the research approach known as ecological network analysis (ENA) considers the transfer and dissipation of material and energy among and from the species comprising the system (Ulanowicz, 2004). Such trophic exchanges pertain to physical and chemical variables (e.g. nitrogen content, energy, etc.) exchanged among different organisms or groups of organisms in an ecosystem (i.e. different trophic levels), as in what is commonly known as a food chain. The corresponding data provide information about exogenous inputs and outputs into the system for each species as well as transfers to and from each species. A crucial component of this approach is the use of a deterministic linear balance model; it assumes that the interdependence of units (i.e. species or groups of species) in the system conserves mass or energy. In such a balance model, all inputs to a unit exactly balance all outputs, an assumption derived from thermodynamic theory. Note further that ENA often refers to such units as compartments, each defining a trophic level, and in turn consisting of one or more species. For example, data for a given ecosystem might include two compartments, one each for all plants and for all animals, or the data might include a separate compartment for each species. However, ENA research has not yet incorporated stochastic modelling to any great degree, whereas the balance model approach itself lacks a statistical framework for assessing its suitability. In this paper, we consider the deterministic balance model from the standpoint of statistical inference. The approach employed here is Bayesian melding (Poole & Raftery, 2000), a method for proper statistical inference for deterministic dynamics models. In Bayesian melding, different sources of prior information are pooled in a sound probabilistic manner, taking into account not only prior knowledge about model inputs and outputs, 1

7 but also the impact of the model itself. This melding of premodel prior information and model-induced information gives rise to a melded prior distribution, which can then be employed as a usual Bayesian prior to arrive at posterior distributions for model inputs and outputs. Since Bayesian melding often yields priors for which the analytic forms are unavailable, posteriors must be evaluated numerically. This further entails the use of computationally intensive techniques such as Markov chain Monte Carlo (MCMC) in order to sample from posteriors. To demonstrate this, we present the results of applying Bayesian melding to a simple toy example and to two ENA datasets. The remainder of this chapter provides further background concerning ecological networks, the statistical approach taken, and an example of Bayesian melding for the simple ecosystem of Cone Spring (Tilly, 1968). Chapter 2 describes Bayesian melding in greater detail, along with the necessary computational techniques, and a simple toy example. In Chapter 3, the results of melding implemented for data from the Chesapeake Bay mesohaline ecosystem (Baird & Ulanowicz, 1989) are presented. The sampling algorithm is described in detail along with the computational methods employed. Note that all results were obtained using the R statistical computing package. Lastly, some general conclusions are provided in Chapter 4 along with prospects for future work. 1.1 Inference for Ecological Networks As noted previously, the ENA approach does not statistically incorporate the underlying assumptions in a rigorous manner. In particular, conclusions made for a network depend on the balance model that equates the inflow and outflow of a certain medium for each species (or, more generally, compartments) relative to others. A simple version of this physical model taken from Ulanowicz (2004) assumes that in a given ecosystem, the medium in balances the medium out exactly. So, for species i in an ecosystem with n species, the physical balance model is as follows: X i + T +i = T i+ + E i + R i, i = 1,..., n (1.1) 2

8 where X i and E i are the rates of exogeneous transfer to and from the species, respectively, and T +i and T i+ are the rates of transfer from other species to species i and to other species from species i, respectively. Finally, R i denotes the rate of dissipation from i. Note that exogenous transfers to a species X i represent external inputs to the system, whereas exogenous transfers from a species E i denote external outputs from the system which are still useful to other ecosystems of comparable scale. In contrast, dissipations R i represent outputs from each species that are no longer useful, such as heat dissipated during respiration (Ulanowicz, 2004). In the context of energy, then, the balance model (1.1) maintains the law of conservation of energy, with usable energy inputs to the system exactly balancing all outputs from the system, consisting of both usable outputs and that which is lost. Necessarily, then, all terms in the model are assumed to be non-negative. It should be noted, however, that the thermodynamic balance of model (1.1) is not typically observed in practice. Instead, only a subset of the variables in a model such as (1.1) are observed ( input variables ), with the remaining output variables then deduced or estimated by a balancing algorithm 1 which attempts to solve the system of equations in (1.1) to achieve balance for each species and over all important media (energy, nutrients, biomass, etc.). This balance assumption is illustrated with the small five-compartment dataset from Cone Spring (Tilly, 1968), where we have the following observed data: X = (11184, 0, 0, 0, 635) E = (300, 355, 0, 0, 860) (1.2a) T = (1.2b) 1 See Ulanowicz,

9 Applying the physical balance model (1.1) yields the deduced or unobserved data R = (2003, 3275, 1814, 203, 3109) In this case, no special algorithm is required to solve for R. When some values of X i, E i, T ij, and R i are unobserved for several (i, j) combinations simultaneously, a special algorithm will be required. For the purposes of interpreting data presented in this essay, we consider R i to be unobserved with the other values observed. The use of such algorithms to achieve balance, however, ignores sources of variability in the input such as measurement error and the use of multiple reference sources. Although such uncertainty has been subjected to sensitivity analyses via perturbation of input values (Bundy, 2005; Essington, 2007), the resulting confidence intervals for model output variables do not address statistical inference, and so do not necessarily pertain to the ecosystems under study Assumptions for Statistical Inference Ultimately, the assumption of linear balance of models such as Ulanowicz s remains a concern for the ENA approach (Dame & Christian, 2006). Setting aside this concern, for our statistical approach, we first consider each of the terms in (1.1) as random variables. Rather than making the assumption of balance for each species i, our statistical model assumes that the average medium in balances the average medium out. That is, α + β in = β out + ɛ + φ E(X i ) + E(T +i ) = E(T i+ ) + E(E i ) + E(R i ) (1.3) where α, β in, β out, ɛ, φ denote the expectations of X i, T +i, T i+, E i, and R i, respectively. This statistical version of the balance model takes into account the fact that certain quantities are unmeasurable or only indirectly observable. To illustrate our statistical approach, we consider that the dissipation from the system, R i, is unmeasurable and must be deduced from balance. Since such deduced (output) values are not actual data, inference made on observed (input) parameters does not translate directly to inference on output 4

10 parameters. That is, our statistical model does not assume that X i + T +i = T i+ + E i + R i as in (1.1) or, equivalently, that R i = X i + T +i (T i+ + E i ). Instead, it assumes (1.3) so that inference made on X i, T +i, T i+, and E i does not directly correspond to inference on R i. In Bayesian melding (Poole & Raftery, 2000), we consider input and output variables as random, with known means θ and φ for inputs and outputs, respectively. The model M( ) defines the deterministic relationship between these parameters, with M(θ) = φ. Now, under a conventional empirical approach that ignores M( ), one would provide a premodel prior distribution of (θ, φ) and a set of likelihoods for input/output data specified according to identified properties. However, to incorporate the deterministic model M with the entirely empirical approach, the premodel prior for θ is now mapped by M onto an induced prior for φ. The premodel and induced priors for φ are then melded, yielding a melded prior for φ which is then mapped to a melded prior for θ using M 1, the model inverse function. Finally, Bayes rule is applied to obtain a posterior for θ, which is subsequently mapped to φ by M to yield a posterior for φ. Inference is then made from these posteriors. In the work of Poole & Raftery (2000), Bayesian melding has been applied in the case of deterministic dynamics models in which a single equation evolves over time. For ENA, multiple linear equations describe the state of an ecosystem at a single point in time. Hence, by assuming balance for expectations as in (1.3), we combine these multiple equations into one. The essay features our first attempt of employing Bayesian melding in such a context, involving the following statistical model, simplified from (1.3): θ 1 = α + β in θ 2 = ɛ + β out φ = M(θ) = θ 1 θ 2 (1.4) where θ = (θ 1, θ 2 ) denotes the inputs and φ the output. Correspondingly, we write each of all transfers to and from a species i as single random variables, denoted W i = X i + T +i and U i = T i+ + E i. Thus the model M(θ) can be written equivalently as M(θ) = θ 1 θ 2 φ. Note that this model M(θ) is not invertible and that, for simplicity, all terms in (1.3) are assumed to be strictly positive. (Such an assumption is realistic 5

11 for most major media.) As we will discuss in Sections 2.1 and 3.2, we employ a technique proposed by Poole & Raftery (2000) to handle non-invertibility when computing the melded prior for θ. We now consider a simple ENA example in the context of our model (1.4) and Bayesian melding. 1.2 Cone Spring Example To illustrate the application of Bayesian melding to a simple ecological network, we consider the five unit ecosystem of Cone Spring (Tilly, 1968). As shown in Figure 1.1, the units (or compartments) correspond not to individual species but to groups of different types of organisms, each occupying different ecological niches: plants, detritus, bacteria, detritivores, and carnivores. In this case, the network consists of energy flows among the compartments and into and out of the system, as measured in kcal/m 2 /y. In the flow diagram of Figure 1.1, exogenous inputs X i are represented by arrows not emerging from any other compartments, whereas those not entering a compartment represent exogenous inputs E i. Dissipative respirations R i are represented by ground symbols. All remaining arrows denote transfers among the n = 5 compartments, with T +i = n j=1 T ji denoting the sum of all energy transfers to a compartment and T i+ = n j=1 T ij the sum of all energy transfers out of a compartment. As noted previously, we simplify these variables by writing W i = X i + T +i and U i = T i+ + E i, which now denote all inputs to and outputs from the ith compartment, respectively. For example, for compartment 1, representing plants in the Cone Spring ecosystem, we have exogenous inputs X 1 = kcal/m 2 /y, with transfers from other compartments T +1 = 0, yielding W 1 = Additionally, exogenous outputs E 1 = 300, with transfers to other compartments T 1+ = 8881, yielding U 1 = Finally, dissipations from plants are given by R 1 = We are interested in making inference on the parameters of model (1.4). In the Bayesian context, we specify a joint pre-(deterministic-) model distribution for θ 1, θ 2, and φ, which represent the expected medium (in this case, energy) into, out of, and dissipated from the ecosystem. Note that the model parameters are not assumed to be independent. In the context of data collection, however, the dissipations R i and E(R i ) should not be deterministically influenced by {W, U} or θ = E(W i, U i ). Hence φ has a premodel prior distribution arising as the marginal distribution to the joint prior mentioned previously. Conversely, the model M( ) 6

12 Figure 1.1: Flow diagram (Ulanowicz, 2004) displaying the trophic exchanges of energy (kcal/m 2 /y) in the Cone Spring ecosystem (Tilly, 1968) from. Arrows not emerging from a box denote exogenous inputs X i, whereas those not entering a box denote exogenous outputs E i. Ground symbols denote dissipations R i. coerces φ to equal θ 1 θ 2, and so induces a separate prior for φ from the bivariate joint premodel prior for θ. Additionally, we must specify appropriate likelihoods for the data type. For an ecological network such as Cone Spring, we specify a trivariate lognormal prior for the model parameters and exponential likelihoods for the data, with the inverses of θ 1, θ 2, and φ serving as the rate parameters for these likelihoods. These specifications are described explicitly in detail in chapter 3. For our purposes in this essay, it suffices to mention a few further details about the methods required for Bayesian inference on θ 1, θ 2, and φ. First, note that we rescaled the data down by 10 3 so as to avoid difficulties involving large numbers resulting from the lognormal prior parameterizations mentioned above. Additionally, since balance is assumed only for the expectations rather than the data, W i θ 1, U i θ 2, and R i φ are assumed to be mutually independent. Note further that, since θ 1, θ 2, and φ are all positive, and φ = θ 1 θ 2 > 0, we have that θ 1 > θ 2. Finally, in order to sample from the posterior distribution for model inputs (i.e. for θ 1, θ 2 W, U, R), we must first compute the melded prior for θ 1 and θ 2, and subsequently use this melded prior in each cycle of a Gibbs sampler to obtain samples from the posterior. The procedures 7

13 required for Bayesian melding are described in detail in chapter 2 (also see Poole & Raftery (2000)), whereas the overall Gibbs sampling algorithm is given in chapter 3. Suitable references concerning Markov chain Monte Carlo and Gibbs sampling include Givens & Hoeting (2005) and Hoff (2007) Results Since the Cone Spring data comprise only five compartments and, hence, five observations, it follows that posterior samples for the model parameters are not greatly informed by the data. The following results then should be taken principally as illustrations of the type of output that is obtained when Bayesian melding is applied to the ENA data type. With uniform random starting values of θ (0) 1 = 49.2 and θ (0) 2 = 3.9, 5000 subsequent iterations of full Gibbs cycles were run, leading to two vectors of 5001 posterior samples of either parameter. Specifically, each corresponds to samples from the full conditional posterior of the θ parameters, i.e. θ 1 θ 2, W, U, R and similarly for θ 2 θ 1, W, U, R, which are shown in Figure 1.2. Applying the model M to these samples yields posterior samples for φ = θ 1 θ 2, which is also shown in Figure 1.2. The Gibbs sampling algorithm ran relatively slowly, perhaps due to the lack of data, requiring just under 9.7 hours to complete on a machine with an Intel Core 2 Duo T7300 operating at 2.00 GHz with 2.00 GB of memory and running Windows Vista Service Pack 1. The same machine was used for all subsequent runs. Histograms of the posterior distributions are shown in Figure 1.3. In each case the first 500 samples are omitted as a suitable burn-in period, where the chain has not yet converged to the stationary posterior distribution. Note that, for each parameter, both the posterior and premodel prior distributions are rightskewed; the posterior distributions, however, display significantly reduced variance and reduced means as well. The joint posterior distribution for θ 1 and θ 2 is also shown in Figure 1.4. It is notably right-skewed (i.e. toward larger values of each parameter) and unimodal, and cuts off along the diagonal defining the space {(θ 1, θ 2 ) : θ 1 > θ 2 }. In Figure 1.3 (d), the sample autocorrelations are shown for the posterior samples of θ 1 ; the evident high degree of autocorrelation suggests slow convergence of the Markov chain to the posterior distribution, indicating that running multiple or simply much longer chains is desirable for future analyses. Some specific comparisons are relevant to mention. In Table 1.1, means and standard deviations (naive, 8

14 Figure 1.2: Trace plot showing Markov chain Monte Carlo output for model input parameters θ 1 and θ 2 and output parameter φ. negatively biased SDs in the case of the posterior) are given for each of the premodel priors and posterior distributions, with the standard errors given for the Cone Spring data. 2 These are normal standard errors, that is, the data standard deviations divided by the square root of the number of observations n = 5. As is evident from the table, the posterior means are below the prior means for all three parameters, and are closer in magnitude to those of the data. Naive standard deviations for the posterior samples are also lower than for prior specifications, but since they are subject to significant negative bias due to high autocorrelations, they are not strictly comparable. In Chapter 3, we obtain more accurate posterior variance estimates by thinning the Monte Carlo samples to remove this autocorrelation. 2 Note that in Table 1.1 data given parameter refers to the vectors W, U, and R. 9

15 Figure 1.3: Histograms of Bayesian Melding posterior distributions for (a) θ 1, (b) θ 2, (c) φ, with premodel prior distributions shown as solid lines. The first 500 observations are omitted as a burn-in period. Sample autocorrelations for θ 1 are shown in (d). Additionally, the R package boa (Smith, 2007) provides a variety of descriptive statistics and convergence diagnostics for MCMC output. Of interest here are quantiles of the full conditional posterior samples as well as 95% highest posterior density (HPD) regions for the θ and φ parameters. 3 These regions correspond to the narrowest possible intervals containing 95% of the posterior probability (Givens & Hoeting, 2005). For θ 1, the boa package yields 2.5%, 50.0%, and 97.5% quartiles of 4.05, 6.53, and 10.97, with a 95% HPD interval of (3.81, 10.24). The corresponding quartiles for θ 2 are 1.93, 3.71, and 7.30, with 95% HPD interval (1.79, 6.83). Finally, the quartiles for φ are 1.17, 2.58, and 5.99, with 95% HPD interval (0.98, 5.23). 3 Note that these HPD intervals are computed using the Monte Carlo method of Chen and Shao (1999), which assumes a unimodal marginal posterior distribution. 10

16 Figure 1.4: Joint posterior distribution of θ 1 and θ 2 with initial 500 observations removed as burn-in period. Density values increase from red to orange to yellow. In this chapter, we have presented the general framework for statistical inference via Bayesian melding on ENA data. An example of this data type has been described, namely the small Cone Spring dataset, and the results of applying Bayesian melding and a Gibbs sampler to obtain posterior model parameter distributions have been summarized. Further background concerning Bayesian melding is detailed in the next chapter, which is illustrated through a simple toy example. The chapter thereafter provides the explicit details of the Gibbs sampling algorithm as well as the results of implementation of this algorithm for a larger ENA dataset. 11

17 Table 1.1: Summary statistics for Cone Spring data. Note that standard error values are for the data only. Mean Std. Dev./Errors parameter θ 1 θ 2 φ θ 1 θ 2 φ data given parameter premodel prior melded posterior θ 1 θ 2 φ 95% HPD interval (3.81, 10.24) (1.79, 6.83) (0.98, 5.28) Quartiles 2.5% % %

18 2 Methods This chapter presents the method of Bayesian melding in greater detail in the context of non-dynamical deterministic models. The method is subsequently illustrated with an example consisting of a simple invertible toy model with a single input θ and output φ, where M(θ) = θ 3 φ. We also introduce the Markov chain Monte Carlo procedure employed to sample from the posterior for θ, namely the Metropolis-Hastings algorithm. 2.1 Bayesian Melding Bayesian melding allows for formal statistical inference on deterministic simulation models, while taking into full account information and uncertainty about inputs and outputs to the model (Poole & Raftery, 2000). In many such models, input and output parameters are often specified via a trial-and-error approach; plausible inputs are chosen based on previous research or knowledge and typically tuned until plausible outputs are obtained. The framework of Bayesian inference not only offers a formalization to this exercise, but for detailed analysis available from proper statistical inference. Consequently, prior information about parameters can be employed in concert with data to obtain posterior inference about both input and output parameters. (See Hoff (2007) for an introductory reference to Bayesian inference.) In the case of deterministic models, the objective is to combine (a) prior information about m inputs θ and p outputs φ that is independent of the model with (b) model-based prior information. Note that, in general, we define a deterministic model M as some mapping M : θ φ, with θ Θ R m and φ Φ R p. Poole & Raftery (2000) describe an earlier approach to this objective, referred to as Bayesian synthesis 13

19 (Raftery, Givens, & Zeh, 1995). In this approach, the joint premodel prior distribution p(θ, φ) incorporates all prior information independent of the model. Model information is integrated simply by restricting the premodel distribution to the submanifold {(θ, φ) : φ = M(θ)}, yielding a postmodel distribution π(θ, φ). However, Wolpert (1995) commented that such a postmodel distribution is ill-defined and, consequently, is subject to a condition known as the Borel paradox. This has the further consequence that the postmodel distribution depends on how the model M is parameterized, which will further result in an ill-defined conditional distribution. As Poole & Raftery (2000) note, this is not satisfactory, and as such they have developed Bayesian melding to reformulate such model-based inference as a standard Bayesian procedure Notation and Method For a model M, we consider inputs θ and outputs φ = M(θ). We denote the stated premodel prior distributions for θ and φ as p 1 (θ) and p 2 (φ), respectively, marginalized from their joint premodel prior stated irrespective of M. Additionally, applying the model M to p 1 (θ) yields a model-induced prior for φ denoted by p 1(φ). Bayesian melding occurs when these two prior distributions for φ - the stated prior p 2 (φ) and induced prior p 1(φ) - are melded, a process which occurs via logarithmic pooling (Poole & Raftery, 2000). Note that this pooling occurs to form a combined prior distribution, called a melded prior, which is subsequently updated using Bayes rule. Then the melded prior for outputs φ is given by p (φ) = k α p 1(φ) α p 2 (φ) 1 α (2.1) where α [0, 1] is a pooling weight, so that when α = 0.5, equation (2.1) corresponds to taking the geometric mean of the two prior densities. Poole & Raftery (2000) show that this function is indeed a probability density; obtaining its form requires only the calculation of the normalization constant k α. For certain forms of M and p 1 (θ), it may be possible to express the induced and melded output priors, p 1(φ) and p (φ), analytically, although this is rare in practice. Even in such cases, the form of M 1 could prevent the melded input prior p (θ) from having closed form, particularly if this inverse does not exist. For example, for a model with m inputs θ = (θ 1,..., θ m ) and p outputs φ = (φ 1,..., φ p ), where p < m, the model 14

20 φ = M(θ) is not one-to-one and is hence noninvertible. 4 Since we are interested in eventually obtaining p (θ), we first follow Poole & Raftery (2000) who obtain this melded input prior 5 as follows: ( ) p1 (θ) p (θ) = p (M(θ)) p 1 ( (M(θ)) ) 1 α p2 (M(θ)) = k α p 1 (θ) p 1 (M(θ)) (2.2) Note that this corresponds to the original stated input prior p 1 ( ), weighted by the ratio of two densities in the φ space, p 2 ( ) and p 1( ), both evaluated at φ = M(θ) induced from given values of θ Θ (Poole & Raftery, 2000). This technique eliminates the explicit evaluation of p (φ), and hence the need for an invertible M. The pooling weight α defines the essentially arbitrary relative importance of each of the induced and stated output priors p 1( ) and p 2 ( ). We select α = 0.5 resulting in geometric pooling, where (2.1) amounts to computing the geometric mean of the two prior output densities (Poole & Raftery, 2000). If p 1( ) has no closed form, it (and, hence, p (θ)) can be numerically evaluated by employing kernel density estimation. Finally, we obtain from Poole & Raftery (2000) the posterior distribution for inputs, which is given by p (θ X, Y) L (θ, M(θ)) p (θ) (2.3) where X and Y correspond to the input and output data, respectively. For ENA inference, we assume that E(X i ) = θ and E(Y i ) = φ. Additionally, L (θ, M(θ)) is the joint likelihood of X and Y. Note further that for deterministic models, the output data Y may not be actual observed data; for example, it may be determined via a balance algorithm as with the dissipation variable R i of ENA data from Section 1.1. In the general case, with m inputs and p outputs, X i is the ith m 1 vector of observation inputs and Y i is the ith p 1 vector of observation outputs. Since the posterior distribution (2.3) is frequently of non-standard form, we obtain samples iteratively via Markov chain Monte Carlo. Bayesian melding is incorporated into MCMC via the melded prior for inputs (2.2). As shall be described in the following section, at each step of the Metropolis-Hastings algorithm the melded prior is evaluated, and it can also serve as the algorithm s proposal distribution. 4 See Hogg et al (2005) for background regarding transformations of random variables. 5 Poole & Raftery (2000) denote this equation (16). 15

21 2.1.2 Metropolis-Hastings Algorithm The goal of MCMC methods is to construct a Markov chain for which the stationary distribution equals the target distribution f, i.e. the posterior distribution. The Metropolis-Hastings algorithm is a general such method and the one employed here. For background concerning this algorithm and further topics in MCMC, see Givens & Hoeting (2005). The algorithm begins at stage t = 0 with the selection of starting values denoted by θ (0) and drawn at random from a suitable starting distribution, with the requirement that f ( θ (0)) > 0. Then, given θ (t) at stage t {0, 1, 2,...}, the algorithm generates θ (t+1) via the following steps: 1. Sample a candidate value θ from a proposal distribution J ( θ (t)). 2. Compute the Metropolis-Hastings ratio R ( θ (t), θ ) where ( R θ (t), θ ) = p(θ X, Y) J(θ (t) θ ) p(θ (t) X, Y) J(θ θ (t) ) (2.4) where, in our context, the target distribution f(θ) equals p (θ X, Y), the posterior distribution for inputs θ. Note that R ( θ (t), θ ) is referred to as the Metropolis-Hastings ratio and is always defined since the proposal θ only occurs if f(θ (t) ) > 0 and J(θ θ (t) ) > Accept or reject θ according to the following: { θ θ (t+1) with prob min { R ( θ (t), θ ), 1 } = θ (t), otherwise 4. Increment t and return to step 1. Following sufficiently many iterations of the algorithm, the samples of θ will converge to a stationary distribution, namely the target distribution f (Givens & Hoeting (2005)). In order to obtain samples from the posterior (2.3), we must choose an appropriate proposal distribution J ( θ(t)) and evaluate the melded prior p (θ), which appears in both the numerator and denominator of the Metropolis-Hastings ratio (2.4). With respect to the choice of proposal distribution, some details are worth mentioning. First, one choice in the context of Bayesian inference is to use the prior distribution itself as a proposal. Since the (continuous) melded prior lacks an analytical form in many cases (and, indeed, in all examples considered in this paper), 16

22 a form of slice sampling 6 can be employed. In this sampling method, the support on which the melded prior is evaluated numerically is discretized into slices, and the discrete density corresponding to each slice is computed. These slice densities then serve as inverse sampling weights, so that a particular slice is chosen with probability equal to its discrete density. After having sampled a slice, a sample from the melded prior is obtained by sampling uniformly between the endpoints of the selected slice. Alternatively, a proposal distribution could be suitably chosen so that it covers the support of the stationary distribution and does not yield candidate values θ that are accepted or rejected too frequently. It is also useful to use a proposal for which the spread can be tuned; if it is too diffuse compared to the target distribution, the candidate values will be rejected too frequently, leading to slow convergence, and the same applies if the proposal variance is too low. If, for example, a normal proposal distribution is chosen, the variance can be tuned to improve the speed of convergence to the target distribution. Note that a symmetric proposal such as a normal has the additional property that J(θ (t) θ ) = J(θ θ (t) ), in which case the proposal disappears from the Metropolis-Hastings ratio (2.4). Furthermore, the method is now simply called the Metropolis algorithm (Givens & Hoeting, 2005). In general, running many hundreds or thousands of iterations of the Metropolis-Hastings algorithm is necessary to ensure convergence of the Markov chain to the stationary target distribution. Convergence can be assessed in a number of ways such as through the calculation of diagnostics as in the boa package in R, as well as through simple visual inspection of trace plots of the Markov chain. A useful alternative to running a single very long chain to ensure convergence is to run several chains from different random starting values to assess mixing. By plotting two or more chains together, we can determine the quality of mixing visually by examining how well the separate chains overlap. Finally, as mentioned in the context of the Cone Spring example of chapter 1, it is generally advisable to discard the first portion of values in the chain as the burn-in period. The precise length of the burn-in period will depend on the starting values; in any case, though, we would not expect good convergence in this initial portion of the Markov chain, so that these values would not be informative for approximating the target posterior distribution or, correspondingly, making inference 6 See Givens & Hoeting (2005) for some background and further references concerning slice sampling. 17

23 about the corresponding parameters. 2.2 Melding Example We now consider the case of a simple invertible toy model with a single input θ and output φ, where the deterministic model is given by M(θ) = θ 3 φ. We specify data likelihoods as follows: X θ N(θ, 1) Y φ N(φ, 4) (2.5a) (2.5b) which have (premodel) priors given by θ U( 2, 2) φ U( 3, 3) (2.5c) (2.5d) Simulated random data were generated consisting of 100 observations each for X i and Y i, with true means θ 0 = 1.2 and φ 0 = 2. We can thus compare these true values to the posterior distributions for θ and φ. Note that this differs from our ENA inference approach, where one would have obtained Y i = X 3 i in imposing the model on the data. Nevertheless, the intention of the exercise here was to investigate (and demonstrate) the feasibility of implementing Bayesian melding with a simple equation relating the expectations of input and output variables. It was a successful attempt, as we demonstrate below, and was the basis of the implementation of our ENA inference for the datasets in Chapters 1 and 3. Applying Bayesian melding to this example yields the priors and posteriors shown in Figure 2.1. Note that each of the induced prior for φ (Figure 2.1 (a)) and melded prior for θ (Figure 2.1 (b)) is symmetric. It should be mentioned, however, that although the melded prior in Figure 2.1 (b) is computed according to equation (2.2), we do not compute the proportionality constant k α, since it is not required for the Metropolis-Hastings algorithm. This would affect only the scaling of the melded prior. In contrast, the posterior distributions for θ (Figure 2.1 (c)) and φ (Figure 2.1 (d)) are slightly skewed, to the left in the case of θ and to the right in the case of φ. This is also shown in the posterior densities shown in Figure 2.2 (a) and (b). 18

24 Figure 2.1: Prior and posterior distributions following Bayesian melding on the toy model: (a) induced prior for φ; (b) melded prior for θ; (c) posterior histogram for θ; (d) posterior histogram for φ. As evidenced from the plot of autocorrelation for posterior samples of θ shown in Figure 2.2, there is not great cause for concern with respect to slow convergence to the posterior here, as it rapidly decays to zero. In fact, by lag 5, sample autocorrelation decays to just below 0.1. For this implementation, the proposal distribution was normal with θ (t) (the current value) as mean and δ = 0.2 as standard deviation. Note that use of a symmetric normal proposal renders the algorithm Metropolis rather than Metropolis-Hastings. This method yielded a reasonable acceptance rate of 37.5% and required about 8 min 23 sec to complete 8000 iterations on the same machine used in the Cone Spring example. By comparison, the slice sampling method using a discretization of slices required more time (11 min 27 sec) to complete the same number of iterations on the same machine, and yielded a considerably poorer acceptance rate of 6.2%. This 19

25 Figure 2.2: Posterior densities for θ (a), φ (b), with plot of autocorrelations (c) and MCMC trace (d) for θ. is unacceptably low, however, so we can conclude that the sort of slice sampling employed here does not yield satisfactory results. Examination of the plot of autocorrelation (not shown) shows comparatively slow decay as well. The low acceptance rate may be due to the irregular shape of the melded prior (Figure 2.1 (b)) in this case. In Table 2.1, some comparisons are given between the means and standard deviations for θ and φ for each of the simulated data, premodel prior, and posterior distributions. It is notable that the posterior means are similar to the data means; however, with the incorporation of the model φ = θ 3 into the inference, the posterior expectation for φ is close to the posterior expectation for θ cubed (i.e = 1.82). As in the Cone Spring example, we can also obtain quartiles and 95% HPD intervals from the R boa package. For θ in this example, we obtain 2.5%, 50.0%, and 97.5% quartiles equal to 1.09, 1.22, and 1.36, respectively, with 20

26 Table 2.1: Summary statistics for Bayesian melding applied to toy model M(θ) = θ 3 φ. Note that standard error values are for the data only. Mean Std. Dev./Errors parameter θ φ θ φ data given parameter premodel prior melded posterior % HPD Quartiles interval (2.5%, 50.0%, 97.5%) input θ (1.114, 1.362) 1.09, 1.22, 1.36 output φ (1.376, 2.517) 1.31, 1.84, % HPD interval (1.114, 1.362). Conversely, for φ we obtain quartiles of 1.31, 1.84, and 2.49 with 95% HPD interval (1.376, 2.517). This chapter has described the method of Poole & Raftery (2000) referred to as Bayesian melding with respect to its motivation and implementation in the context of non-dynamical ENA. The method was further illustrated via a simple univariate model with a single input and single output, with posterior sampling effected by the Metropolis-Hastings algorithm. Some conclusions follow from the results described above. Although the algorithm runs quickly using either a normal proposal or the melded prior itself as proposal (which employs slice sampling), use of the melded proposal yields unsatisfactory results, with an acceptance rate of only 6.2%. This is in contrast to the reasonable acceptance rate of 37.5% when the normal proposal is used with appropriate tuning. Note also that use of the normal proposal yields faster results as well, in which case the algorithm is simply Metropolis. In the next chapter, we implement Bayesian melding for a larger ENA dataset, the Chesapeake Bay 21

27 mesohaline ecosystem (Baird & Ulanowicz, 1989). Here we employ a Gibbs sampler with Metropolis-Hastings steps to obtain posterior samples for the two input parameters, recalling that the ENA data type for our simplified model from chapter 1 includes two inputs and one output. 22

28 3 Modelling and Implementation In this chapter we apply Bayesian melding to a larger ENA dataset observed from the Chesapeake Bay mesohaline network (Baird & Ulanowicz, 1989). This ecosystem comprises 36 compartments, including such groups of organisms as phytoplankton, different bacteria, zooplankton, other micro-organisms, and individual species such as catfish and striped bass. Correspondingly, the dataset contains 36 observations arranged in the format described in the Cone Spring example of chapter 1, with W i, U i, and R i denoting the medium in, medium out, and medium dissipated for the ith compartment. This chapter describes the motivation for the specification of priors initially, and continues with details of the computation of the melded prior, the posterior sampling algorithm, and some general information concerning the R code employed to evaluate the melded prior. Finally, the results of the implementation of a Gibbs sampler for the posteriors are discussed. As in previous chapters, suitable references for Gibbs sampling, the Metropolis-Hastings algorithm, and Markov chain Monte Carlo generally include Givens & Hoeting (2005) and Hoff (2007). 3.1 Prior Specification Since W i, U i, and R i are real numbers that run from zero to tens of thousands, we specify the following exponential likelihoods: W i θ 1 = X i + T +i Exp (1/θ 1 ) (3.1a) U i θ 2 = T i+ + E i Exp (1/θ 2 ) (3.1b) R i φ Exp (1/φ) (3.1c) 23

29 These likelihoods are subject to the expections noted in (1.3). That is, E(W i θ 1 ) = θ 1, E(U i θ 2 ) = θ 2, and E(R i φ) = φ. The explicit forms of these likelihoods for W i, U i, and R i respectively are given by L 1 (θ 1 ) = θ n 1 e W i/θ 1 L 2 (θ 2 ) = θ n 2 e U i/θ 2 (3.2) L 3 (φ) = φ n e R i/φ Additionally, W i θ 1, U i θ 1, and R i φ are assumed to be mutually independent. The dependence of W i, U i, and R i implied by the usual notion of balance is reflected through their marginal joint distribution as based on the following joint prior. We specify the joint prior distribution for the parameters Φ = (θ 1, θ 2, φ) as trivariate lognormal. That is, taking Φ MV LN(µ, Σ) Since Φ is the expectation parameter of {W, U, R}, the µ and Σ parameters are the mean and covariance parameters for log (Φ). Realizations of Φ are generated in R using the rlnorm.rplus() function from the compositions library. This function permits the generation of multivariate lognormal samples with means µ and covariance structure Σ. In practice, the lognormal prior hyper-parameters are specified based on the order of the raw data. Although as noted above the ranges of W i, U i, and R i go from zero to the tens of thousands, we re-scale 7 them down by Hence, if values of W i are on the order of 10 following rescaling, we specify a prior mean of 10. Prior variances are specified similarly, except with a greater degree of flexibility. For example, if we obtain a standard deviation also on the order of 10, we specify a prior variance of 100. We might also choose to decrease this prior variance to 50 to mitigate the effect of simply squaring the data order 10. The goal here is to specify reasonable priors which reflect the data type. Furthermore, since the average medium in (θ 1 ) is assumed to equal the total average medium out (θ 2 + φ), we specify E(θ 2 ) and E(φ) each to be half here. 7 A different scaling factor could be employed depending on what is reasonable for the dataset, we do not deviate from

30 the magnitude of the prior mean of W i and similarly for variance. We then specify ψ = (E(θ 1 ), E(θ 2 ), E(φ)) and ω = (V ar(θ 1 ), V ar(θ 2 ), V ar(φ)) as follows: ψ = (100, 50, 50) ω = (50000, 25000, 25000) Since these prior specifications apply to the raw not logged data, to obtain µ and Σ, we solve ψ 1 = e µ1+σ2 1 /2 and ω 1 = (e σ2 1 1)e 2µ 1+σ 2 1 numerically and the same procedure can be applied to solve for µ2 and σ 2 2 with ψ 2 and ω 2. Since ψ 2 = ψ 3 and ω 2 = ω 3, we have that µ 2 = µ 3 and σ 2 = σ 3, and so a third set of numerical solutions need not be obtained. We further assume an exchangeable correlation structure with off-diagonals all equal to 0.5, on the basis that the expected magnitude of inputs are correlated with the expected magnitudes of output and dissipations, which themselves are also correlated. If we consider the analogy of an individual s income playing the role of W i, spending, the role of U i, and savings, the role of R i, then it is reasonable to assume that, marginally, all three are positively correlated with each other. This leads to the following covariance matrix: Σ = σ (σ 1 σ 2 ) 0.5(σ 1 σ 2 ) 0.5(σ 1 σ 2 ) σ (σ 2 2). (3.3) 0.5(σ 1 σ 2 ) 0.5(σ 2 2) σ Sampling Algorithm Melding The melded prior for θ 1, θ 2 must first be approximated. Note that p (θ 1, θ 2 ) is bivariate lognormal with logged data means µ 1 and µ 2 and covariance structure corresponding to the upper-left 2 2 matrix in Σ. Then the melded prior 8 for θ = (θ 1, θ 2 ) is given by p (θ) = kp 1 (θ) ( ) 1 α p2 (M(θ)) p 1 (M(θ)) (3.4) 8 Note that equation (3.4) for p (θ) corresponds to equation (16) in Poole & Raftery (2000). 25

31 where p 1 (θ) is the joint prior bivariate lognormal density described above, p 2 (φ) is the prior univariate lognormal density with logged data mean µ 2 and variance σ2, 2 and p 1 (φ) is the induced distribution for φ (Poole & Raftery, 2000). Note that α denotes the pooling weight for the melding of the stated and induced priors; Poole & Raftery (2000) note that the choice of α is essentially arbitrary. We set α = 0.5 for the reason stated in Section Note that the induced distribution p 1 (φ) is numerically evaluated by applying the model M(θ 1, θ 2 ) = θ 1 θ 2 to the premodel prior realizations of θ 1 and θ 2 generated using rlnorm.rplus(). The resulting induced distribution for φ is obtained using kernel density estimation and the R function density(). As we shall see in Section 3.2.3, for the Gibbs sampler we require prior conditionals for each of θ 1 and θ 2 ; that is, p (θ 1 θ 2 ) and p (θ 2 θ 1 ). These conditionals are given by p (θ i θ j ) = p (θ 1, θ 2 ) p (θ j ) (3.5) where i, j = 1, 2 and j i. The denominators in (3.5) will cancel in the Metropolis-Hastings ratios described in Section 3.2.3, so they need not be calculated Gibbs Sampler We present the Gibbs sampling algorithm below. In each full scan of a Gibbs cycle, we iteratively generate new random samples of θ 1 and θ 2, in each case sampling θ j conditional on θ i (i j) and on the data {W, U, R}. These full (melded) conditional distributions appear in step 2 of the algorithm below, and their explicit forms are shown in Section 3.2.3: 1. Select starting values ( ) θ (0) 1, θ(0) 2 and set t = Generate, in turn, for j = 1, 2: ( ) θ (t+1) 1 p θ 1 θ (t) 2, W, U, R ( ) θ (t+1) 2 p θ 2 θ (t+1) 1, W, U, R 3. Having completed a full scan of t, increment t and return to step 2. 26

32 3.2.3 Metropolis-Hastings Note that the full conditional posteriors are sampled in part 2. This is accomplished in both cases via Metropolis-Hastings steps (j = 1, 2). That is, we sample from the full conditional posteriors for θ 1 and θ 2 with the general form p (θ i θ j, W, U, R) L(θ 1, θ 2, φ) p (θ i θ j ). Recall from Section 1.1 that, since balance is not assumed for the data values but only for the expectations, we may conveniently assume that W i θ 1, U i θ 2, and R i φ are mutually independent. Consequently we can write L(θ 1, θ 2, φ) = L 1 (θ 1 )L 2 (θ 2 )L 3 (φ). The Metropoils-Hastings step (j = 1) for θ (t+1) 1 proceeds as follows: ( 1. Sample a candidate value θ1 from a proposal distribution J ( ) 2. (a) Compute the Metropolis-Hastings ratio R θ (t) 1, θ 1 where θ (t) 1 ( ) R θ (t) 1, θ 1 = L 1(θ1)L 2 (θ (t) 2 )L 3(φ j=1 ) p(θ 1 θ (t) 2 ) J(θ(t) 1 θ 1) L 1 (θ (t) 1 )L 2(θ (t) 2 )L 3(φ (t) ) p(θ (t) 1 θ(t) 2 ) J(θ 1 θ(t) 1 ) where φ j=1 = θ 1 θ (t) 2 and φ (t) = θ (t) 1 θ (t) 2. (b) Accept or reject θ1 according to the following: θ θ (t+1) 1 with prob min 1 = θ (t) 1 otherwise ). { ( ) } R θ (t) 1, θ 1, 1 3. Increment t and proceed to the second Gibbs step (i.e. Metropolis-Hastings step for θ 2 with j = 2) as follows. ( ) (a) For sampling θ (t+1) 2, compute the Metropolis-Hastings ratio R θ (t) 2, θ 2 where ( ) R θ (t) 2, θ 2 = L 1(θ (t+1) 1 )L 2 (θ2)l 3 (φ j=2 ) p(θ 2 θ (t+1) 1 ) J(θ (t) 2 θ 2) L 1 (θ (t+1) 1 )L 2 (θ (t) 2 )L 3(φ (t+1) ) p(θ (t) 2 θ(t+1) 1 ) J(θ2 θ(t) 2 ) where φ (t+1) = θ (t+1) 1 θ (t) 2 and φ j=2 = θ(t+1) 1 θ 2. 27

33 (b) Accept or reject θ2 according to the following: θ θ (t+1) 2 with prob min 2 = 4. Increment t and return to step 2 of the Gibbs sampler. θ (t) 2 otherwise { ( ) } R θ (t) 2, θ 2, Remarks ( 1. Note that in R R ( θ (t) 2, θ 2 ratio. θ (t) 1, θ 1 ) the L(θ (t) 2 ) terms cancel and the same holds for the L(θ(t+1) 1 ) terms in ). In neither case is any new sample being taken, so they naturally disappear from the 2. If the proposal distributions J ( ) for each step (they need not be the same) are symmetric, they also disappear from the ratio, and we consider Metropolis steps rather than Metropolis-Hastings (Givens & Hoeting, 2005). 3. A potentially suitable proposal distribution J ( ) is a chi-square distribution with a high number of degrees of freedom, as this would match the magnitude of our data type. For greater flexibility, we generalize this to a Gamma distribution with shape parameter 2y and rate parameter 2. In this case, we have J(x y) = ( Γ(2y) 2 2y) 1 x 2y 1 e 2x. In the algorithm itself, we not only sample from J(x y) but evaluate it as well in the Metropolis- Hastings ratios. In the denominator of each ratio, we evaluate J(θ θ (t) ) at θ as a Gamma density with shape parameter 2θ (t), yielding E(θ ) = θ (t). The reverse occurs in the numerator of each ratio. Additionally, we introduce a tuning parameter for this proposal distribution by specifying a modified shape parameter 2(y + δ), where δ is some small real number. This allows for some control over the variance of the proposal which is given by 2(y + δ) 2. Positive values of δ will increase the proposal variance and vice-versa for negative values. 28

34 3.3 Pseudo Code In this section, we describe in pseudo code the R functions and procedures employed for the sampling algorithm Prior Specification We specify the prior distributions for Φ = (θ 1, θ 2, φ) as trivariate lognormal as described previously in section Melding The goal here is to compute the melded density for inputs given by (3.4). This proceeds in several steps: 1. Denote the N 3 matrix of realizations of Φ by Φ, where N denotes the number of realizations generated for each parameter in Φ. Then the bivariate lognormal density p(θ 1, θ 2 ) is obtained by using the dlnorm.rplus function from the compositions library with the first two columns of Φ, which correspond to samples of θ 1 and θ 2. This function takes in µ = (µ 1, µ 2 ) and Σ = σ (σ 1 σ 2 ) 0.5(σ 1 σ 2 ) σ 2 2 as arguments and yields density values corresponding to the marginal joint density of θ 1 and θ 2, denoted p(θ 1, θ 2 ). 2. We now compute the induced density for φ, that is, p 1(φ). Applying the model φ = M(θ) = θ 1 θ 2 to the realizations obtained in Φ, we then compute the numerical induced density for φ using the density() function, where we select 0 as the left endpoint. Use of the approx() function allows for the evaluation of this induced density at a given value (or values) of φ, where approx() takes the grid for the realizations of φ and corresponding density values produced by density(). Note that the grid for φ is that produced when applying the model M to the realized θ vectors, and does not actually involve the samples of φ obtained in the third column of Φ. 29

35 3. Next, the stated premodel prior for φ, that is, p 2 (φ) must be evaluated. The third column of Φ corresponds to samples of φ, so that applying the dlnorm function to this column yields the numerical density of p 2 (φ). In this case, the function takes in µ 2 and σ2 2 as parameters. As in step 2, use of the approx function then allows for p 2 (φ) to be evaluated for the grid corresponding to φ = M(θ) = θ 1 θ Finally, the melded prior density p(θ 1, θ 2 ) given by (3.4) can be computed. The original samples contained in Φ are employed again to form a grid for θ 1, θ 2, and φ. We are concerned only with the grid defined by θ 1 and θ 2, however, as these exactly define the space of the melded prior density. Hence, the samples for φ can be ignored apart from step 3, where they are employed to evaluate the stated prior for φ. Using the approximate evaluation routines described in steps 1-4, for given realizations of θ 1, θ 2, and φ = M(θ) = θ 1 θ 2, we can compute p(θ 1, θ 2 ), p 1(M(θ)), and p 2 (M(θ)), which are then combined according to (3.4), and so yield the melded prior for θ 1 and θ 2. Note that the proportionality constant k can be ignored, as it will cancel in the Metropolis-Hastings ratio. Having obtained a numerical density for the melded prior for inputs, the interpp() function of the akima library can then be employed to evaluate p(θ 1, θ 2 ) as necessary in the Metropolis-Hastings ratio. Note finally that existing R functions are employed to evaluate bivariate and univariate lognormal densities (using dlnorm.rplus and dlnorm, respectively) in the code even though the closed forms of these densities are available. This avoids potential numerical instabilities from novel coding and, correspondingly, the pitfall of human programming error Gibbs Sampler The Gibbs Sampler is implemented primarily via a for loop structure, with each iteration performing two Metropolis-Hastings steps (j = 1, 2 for θ j ) in succession. As noted above, the interp() function is employed twice in each M-H step to evaluate the melded prior p(θ 1, θ 2 ). Samples from the Gamma proposal distribution are obtained using rgamma(), with the proposal density evaluated using dgamma(). 30

36 The exponential likelihoods are evaluated using dexp(). Finally, a logical flow structure is employed to avoid unnecessary calculations and improve efficiency. This structure allows for incorrect samples from the proposal to be rejected immediately, thus avoiding expensive and unnecessary interpolations. See the Appendix for the exact R code employed. 3.4 Results With 36 compartments comprising the Chesapeake Bay mesohaline network dataset, it follows that the data more significantly inform the posterior samples than in the case of the Cone Spring data of chapter 1. For Bayesian melding applied to the Chesapeake Bay ENA data, we employed starting values of θ (0) 1 = and θ (0) 2 = The resulting Gibbs sampler was run for 5000 subsequent iterations yielding two vectors of length 5001 of posterior samples from each parameter. In this case, the iterations required 7 h 38.8 min, a time comparable to other runs and better than the time required in the Cone Spring example. As before, the algorithm was run on a machine with an Intel Core 2 Duo T7300 chipset operating at 2.00 GHz with 2.00 GB of memory and running Windows Vista Service Pack 1. The first 500 observations were omitted as the burn-in period, and the tuning parameter δ mentioned in section was simply set to zero. The bivariate acceptance rate was 33.3%, with marginal acceptance rates of 43.4% and 41.1% for θ 1 and θ 2, respectively. Note that the posterior samples correspond to θ 1 θ 2, W, U, R and similarly for θ 2. Applying the model M to these samples yields φ = θ 1 θ 2 ; the samples for all three parameters are shown in the trace plot of Figure 3.1. Histograms of the posterior distributions are shown in Figure 3.2, with the corresponding premodel lognormal prior overlaying each. Note that the (melded) posteriors preserve some of the right-skewness of the priors, though they otherwise exhibit symmetry. The marginal melded priors for θ 1 and θ 2 as shown in Figure 3.3 (a) and (b), respectively, exhibit much more extreme right-skewness, which is preserved only somewhat weakly in the corresponding posterior densities of Figure 3.3 (c) and (d), the kernel smoothed densities of the histograms in Figure 3.2. The joint posterior distribution for θ 1 and θ 2 shown in Figure 3.4 similarly does not exhibit any extreme skewness, though the tails appear more dispersed for larger values 31

37 Figure 3.1: Trace plot showing Markov chain Monte Carlo output for posterior samples of model input and output parameters θ 1, θ 2 and φ via Bayesian melding. of each parameter. As with the marginal posterior densities, the joint distribution is unimodal. Note that the joint density is zero where θ 1 θ 2. The autocorrelations for samples of θ 1 shown in Figure 3.2 (d) are unfortunately very high; this indicates that thinning of the chain is required to obtain more accurate variance estimates. However, thinning the chain by removing successive observations yields no great improvement; autocorrelations decay only very slowly even with half of all observations removed, with significant correlations remaining until approximately lag 25 in this case. Such high autocorrelations suggest slow convergence; however, as Figure 3.5 shows, running two chains of length 5000 with different starting values 9 results in relatively good mixing. As evident from the figure, the trace plots for each run overlap considerably, which indicates that, autocorrelations aside, the Gibbs samples are converging to the stationary posterior. Notably, 9 In this case, the same chain as mentioned above and another with θ (0) 1 = and θ (0) 2 =

38 Figure 3.2: Histograms of Bayesian Melding posterior distributions for (a) θ 1, (b) θ 2, (c) φ, with premodel prior distributions shown as solid lines. The first 500 observations are omitted as a burn-in period. Sample autocorrelations for θ 1 are shown in (d). this does not suggest that tuning (i.e. δ 0) was required to achieve convergence for this particular dataset. We obtain summary statistics for the Chesapeake Bay data as given in Table 3.1. As before, note that normal standard errors are given for the data, whereas the standard deviations for the posterior omit the burn-in period and are, in any case, naive and significantly negatively biased. In order to obtain accurate variance estimates for the posterior samples, we would have to eliminate most of the significant autocorrelations, requiring the removal of 80% or more of each chain. For example, when some 4040 observations in each chain were removed, leaving only 960 remaining, non-significant autocorrelations were achieved at lag 9 for θ 1 and lag 10 for θ 2 ; this is shown for θ 1 in Figure 3.6 (a). Although the decay of autocorrelations 33

39 Figure 3.3: Melded priors for θ 1 (a) and θ 2 (b), with posterior densities for each in (c) and (d), respectively, with the latter obtained with the first 500 observations omitted as burn-in period. exhibited following thinning is much faster, it is desirable to achieve an even faster decay. However, since we are now left with fewer than 1000 posterior samples, running the Markov chains for more iterations is clearly undesirable unless tuning is employed. Following thinning, the removal of the first 100 samples as a burn-in period, and taking only every 9th observation for θ 1 and only every 10th for θ 2, the posterior means for θ 1 and θ 2 are and 64.06, with posterior standard deviations 9.25 and 8.48, respectively. The posterior means are largely unchanged from those given in Table 3.1, with that for θ 2 identical at three digits of precision. In contrast, the standard deviation of θ 1 is increased, as we would expect given the high autocorrelation of the raw Monte Carlo output. Conversely, the standard deviation for θ 2 is actually decreased, which we might attribute to the presence of negative autocorrelations in the thinned data with 80% removed. This occurs 34

40 Figure 3.4: Joint posterior distribution of θ 1 and θ 2 with initial 500 observations removed as burn-in period. Density values increase from red to orange to yellow. for both θ 1 and θ 2, with the negative autocorrelations for the former shown in Figure 3.6. Note that the samples for φ are not as highly correlated, with non-significant autocorrelation obtained at lag 6. Removing the initial 500 observations as a burn-in period and then every 6th observation leaves 750. The adjusted posterior mean and standard deviation are and 4.22, respectively, the latter of which is increased from that obtained without any thinning. Lastly, although this method has eliminated the autocorrelation of the posterior samples, only 96 observations remain, indicating that a longer chain (and subsequent thinning) and (or) proper tuning are required to obtain a good approximation to the posterior density. Finally, as in the previous examples, we obtain quartiles for the model parameters and corresponding 95% highest posterior density (HPD) regions, which are given in Table 3.1. For θ 1 we obtain 2.5%, 50.0%, and 97.5% quartiles of 71.67, 88.70, and , respectively, with corresponding quartiles for θ 2 of 47.72, 63.62, and and for φ of 18.23, 24.54, and The 95% HPD intervals for θ 1, θ 2, and φ are (71.56, 35

41 Figure 3.5: Trace plot showing two different runs of Markov chain Monte Carlo output from different starting values for model input and output parameters θ 1, θ 2 and φ. The significant overlap (apart from the burn-in period) between the sets of chains for each run indicate convergence to the stationary posterior ), (48.32, 82.61), and (17.99, 33.38), respectively. Note that these are considerably narrower than the frequentist 95% confidence intervals for the data means (i.e. E(W i θ 1 ) = θ 1 and E(U i θ 2 ) = θ 2 ) based on the Central Limit Theorem 10, which are also given in Table 3.1. Similar to the Cone Spring results, the data, premodel prior, and posterior means in each case lie within the HPD regions for each input parameter, except for the prior mean for φ. Note that these results comprise posterior distributions for the expected energy into, out of, and dissipated from the Chesapeake Bay mesohaline network ecosystem, denoted by the parameters θ 1, θ 2, and φ, respectively. The posterior means given in Table 3.1 (both before and after thinning of the Monte 10 These correspond to ( W ± z0.05 s(w )/ n ), where W denotes the sample mean of W i and s(w ) the sample standard deviation of W i. 36

Statistical Inference for Food Webs

Statistical Inference for Food Webs Statistical Inference for Food Webs Part I: Bayesian Melding Grace Chiu and Josh Gould Department of Statistics & Actuarial Science CMAR-Hobart Science Seminar, March 6, 2009 1 Outline CMAR-Hobart Science

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

ST 740: Markov Chain Monte Carlo

ST 740: Markov Chain Monte Carlo ST 740: Markov Chain Monte Carlo Alyson Wilson Department of Statistics North Carolina State University October 14, 2012 A. Wilson (NCSU Stsatistics) MCMC October 14, 2012 1 / 20 Convergence Diagnostics:

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

FULL TITLE Statistical Inference for Food Webs with Emphasis on Ecological Networks via Bayesian Melding. SHORT TITLE For Peer Review

FULL TITLE Statistical Inference for Food Webs with Emphasis on Ecological Networks via Bayesian Melding. SHORT TITLE For Peer Review 0 0 0 0 0 FULL TITLE Statistical Inference for Food Webs with Emphasis on Ecological Networks via Bayesian Melding SHORT TITLE Statistical Inference for Ecological Networks via Bayesian Melding AUTHORS

More information

Statistical Inference for Stochastic Epidemic Models

Statistical Inference for Stochastic Epidemic Models Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, )

SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, ) Econometrica Supplementary Material SUPPLEMENT TO MARKET ENTRY COSTS, PRODUCER HETEROGENEITY, AND EXPORT DYNAMICS (Econometrica, Vol. 75, No. 3, May 2007, 653 710) BY SANGHAMITRA DAS, MARK ROBERTS, AND

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Supplementary Note on Bayesian analysis

Supplementary Note on Bayesian analysis Supplementary Note on Bayesian analysis Structured variability of muscle activations supports the minimal intervention principle of motor control Francisco J. Valero-Cuevas 1,2,3, Madhusudhan Venkadesan

More information

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract

Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

Bayesian Methods in Multilevel Regression

Bayesian Methods in Multilevel Regression Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

eqr094: Hierarchical MCMC for Bayesian System Reliability

eqr094: Hierarchical MCMC for Bayesian System Reliability eqr094: Hierarchical MCMC for Bayesian System Reliability Alyson G. Wilson Statistical Sciences Group, Los Alamos National Laboratory P.O. Box 1663, MS F600 Los Alamos, NM 87545 USA Phone: 505-667-9167

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 5. Bayesian Computation Historically, the computational "cost" of Bayesian methods greatly limited their application. For instance, by Bayes' Theorem: p(θ y) = p(θ)p(y

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

The Recycling Gibbs Sampler for Efficient Learning

The Recycling Gibbs Sampler for Efficient Learning The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions R U T C O R R E S E A R C H R E P O R T Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions Douglas H. Jones a Mikhail Nediak b RRR 7-2, February, 2! " ##$%#&

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Yong Huang a,b, James L. Beck b,* and Hui Li a a Key Lab

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

Scaling up Bayesian Inference

Scaling up Bayesian Inference Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Appendix: Modeling Approach

Appendix: Modeling Approach AFFECTIVE PRIMACY IN INTRAORGANIZATIONAL TASK NETWORKS Appendix: Modeling Approach There is now a significant and developing literature on Bayesian methods in social network analysis. See, for instance,

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

MCMC for non-linear state space models using ensembles of latent sequences

MCMC for non-linear state space models using ensembles of latent sequences MCMC for non-linear state space models using ensembles of latent sequences Alexander Y. Shestopaloff Department of Statistical Sciences University of Toronto alexander@utstat.utoronto.ca Radford M. Neal

More information

MCMC notes by Mark Holder

MCMC notes by Mark Holder MCMC notes by Mark Holder Bayesian inference Ultimately, we want to make probability statements about true values of parameters, given our data. For example P(α 0 < α 1 X). According to Bayes theorem:

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS

ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS ECO 513 Fall 2009 C. Sims HIDDEN MARKOV CHAIN MODELS 1. THE CLASS OF MODELS y t {y s, s < t} p(y t θ t, {y s, s < t}) θ t = θ(s t ) P[S t = i S t 1 = j] = h ij. 2. WHAT S HANDY ABOUT IT Evaluating the

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:

More information

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH

Lecture 5: Spatial probit models. James P. LeSage University of Toledo Department of Economics Toledo, OH Lecture 5: Spatial probit models James P. LeSage University of Toledo Department of Economics Toledo, OH 43606 jlesage@spatial-econometrics.com March 2004 1 A Bayesian spatial probit model with individual

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

An introduction to Bayesian statistics and model calibration and a host of related topics

An introduction to Bayesian statistics and model calibration and a host of related topics An introduction to Bayesian statistics and model calibration and a host of related topics Derek Bingham Statistics and Actuarial Science Simon Fraser University Cast of thousands have participated in the

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

arxiv: v1 [stat.co] 23 Apr 2018

arxiv: v1 [stat.co] 23 Apr 2018 Bayesian Updating and Uncertainty Quantification using Sequential Tempered MCMC with the Rank-One Modified Metropolis Algorithm Thomas A. Catanach and James L. Beck arxiv:1804.08738v1 [stat.co] 23 Apr

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Theory of Stochastic Processes 8. Markov chain Monte Carlo Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html

More information

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Eco517 Fall 2004 C. Sims MIDTERM EXAM Eco517 Fall 2004 C. Sims MIDTERM EXAM Answer all four questions. Each is worth 23 points. Do not devote disproportionate time to any one question unless you have answered all the others. (1) We are considering

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors RicardoS.Ehlers Laboratório de Estatística e Geoinformação- UFPR http://leg.ufpr.br/ ehlers ehlers@leg.ufpr.br II Workshop on Statistical

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information