An Application of Bayesian Melding to Ecological Networks. Joshua Michael Gould. A research paper presented to the. University of Waterloo

Size: px

Start display at page:

Download "An Application of Bayesian Melding to Ecological Networks. Joshua Michael Gould. A research paper presented to the. University of Waterloo"

Maximillian Lang
6 years ago
Views:

1 An Application of Bayesian Melding to Ecological Networks by Joshua Michael Gould A research paper presented to the University of Waterloo In partial fulfillment of the requirements for the degree of Master of Mathematics in Statistics-Biostatistics Waterloo, Ontario, Canada, 2008

2 Abstract This paper considers Ecological Network Analysis (ENA) data (Ulanowicz, 2004) in the context of statistical inference. Background for this data type is provided, and a framework for statistical inference based on such data is developed. This framework comprises a method referred to as Bayesian melding, which was developed by Poole & Raftery (2000) to combine prior information with information induced by deterministic dynamics models to arrive at a melded Bayesian prior. We describe and illustrate this method in the context of ENA data; the method incorporates this melded prior into Bayesian inference and provides a statistical interpretation for a deterministic mass balance model, with reference to quantities for inference. For posterior inference, we further require the use of iterative Gibbs sampling incorporating the Metropolis- Hastings algorithm, a type of Markov chain Monte Carlo simulation. We describe this sampling algorithm, which is further implemented for two ENA datasets, observed from Cone Spring (Tilly, 1968) and the Chesapeake Bay mesohaline network (Baird & Ulanowicz, 1989). In each case, such ENA data quantify flows of energy or materials among different compartments of one or more species within a local ecosystem. The implementations described here consider the deterministic model wherein the expected medium dissipated from an ecosystem equals the difference of the expected medium into it and the expected medium out of it. ii

3 Acknowledgements My most sincere thanks must go to Dr. Grace Chiu, for her detailed suggestions, advice, and support, both financial and organizational. I am also grateful for the resources and support provided by the Department of Statistics & Actuarial Science. Thanks also to Dr. Hugh Chipman for serving as second reader and to Mary Lou Dufton for answering my many questions and helping with some last minute delays. To my friends and family - I could not have done this without your support - thank you. iii

4 Table of Contents Abstract ii Acknowledgements iii Table of Contents iv 1 Introduction Inference for Ecological Networks Assumptions for Statistical Inference Cone Spring Example Results Methods Bayesian Melding Notation and Method Metropolis-Hastings Algorithm Melding Example Modelling and Implementation Prior Specification Sampling Algorithm Melding iv

5 3.2.2 Gibbs Sampler Metropolis-Hastings Remarks Pseudo Code Prior Specification Melding Gibbs Sampler Results Conclusions Future Work Bibliography 41 Appendix 44 v

6 1 Introduction At the level of a whole ecosystem, the research approach known as ecological network analysis (ENA) considers the transfer and dissipation of material and energy among and from the species comprising the system (Ulanowicz, 2004). Such trophic exchanges pertain to physical and chemical variables (e.g. nitrogen content, energy, etc.) exchanged among different organisms or groups of organisms in an ecosystem (i.e. different trophic levels), as in what is commonly known as a food chain. The corresponding data provide information about exogenous inputs and outputs into the system for each species as well as transfers to and from each species. A crucial component of this approach is the use of a deterministic linear balance model; it assumes that the interdependence of units (i.e. species or groups of species) in the system conserves mass or energy. In such a balance model, all inputs to a unit exactly balance all outputs, an assumption derived from thermodynamic theory. Note further that ENA often refers to such units as compartments, each defining a trophic level, and in turn consisting of one or more species. For example, data for a given ecosystem might include two compartments, one each for all plants and for all animals, or the data might include a separate compartment for each species. However, ENA research has not yet incorporated stochastic modelling to any great degree, whereas the balance model approach itself lacks a statistical framework for assessing its suitability. In this paper, we consider the deterministic balance model from the standpoint of statistical inference. The approach employed here is Bayesian melding (Poole & Raftery, 2000), a method for proper statistical inference for deterministic dynamics models. In Bayesian melding, different sources of prior information are pooled in a sound probabilistic manner, taking into account not only prior knowledge about model inputs and outputs, 1

7 but also the impact of the model itself. This melding of premodel prior information and model-induced information gives rise to a melded prior distribution, which can then be employed as a usual Bayesian prior to arrive at posterior distributions for model inputs and outputs. Since Bayesian melding often yields priors for which the analytic forms are unavailable, posteriors must be evaluated numerically. This further entails the use of computationally intensive techniques such as Markov chain Monte Carlo (MCMC) in order to sample from posteriors. To demonstrate this, we present the results of applying Bayesian melding to a simple toy example and to two ENA datasets. The remainder of this chapter provides further background concerning ecological networks, the statistical approach taken, and an example of Bayesian melding for the simple ecosystem of Cone Spring (Tilly, 1968). Chapter 2 describes Bayesian melding in greater detail, along with the necessary computational techniques, and a simple toy example. In Chapter 3, the results of melding implemented for data from the Chesapeake Bay mesohaline ecosystem (Baird & Ulanowicz, 1989) are presented. The sampling algorithm is described in detail along with the computational methods employed. Note that all results were obtained using the R statistical computing package. Lastly, some general conclusions are provided in Chapter 4 along with prospects for future work. 1.1 Inference for Ecological Networks As noted previously, the ENA approach does not statistically incorporate the underlying assumptions in a rigorous manner. In particular, conclusions made for a network depend on the balance model that equates the inflow and outflow of a certain medium for each species (or, more generally, compartments) relative to others. A simple version of this physical model taken from Ulanowicz (2004) assumes that in a given ecosystem, the medium in balances the medium out exactly. So, for species i in an ecosystem with n species, the physical balance model is as follows: X i + T +i = T i+ + E i + R i, i = 1,..., n (1.1) 2

8 where X i and E i are the rates of exogeneous transfer to and from the species, respectively, and T +i and T i+ are the rates of transfer from other species to species i and to other species from species i, respectively. Finally, R i denotes the rate of dissipation from i. Note that exogenous transfers to a species X i represent external inputs to the system, whereas exogenous transfers from a species E i denote external outputs from the system which are still useful to other ecosystems of comparable scale. In contrast, dissipations R i represent outputs from each species that are no longer useful, such as heat dissipated during respiration (Ulanowicz, 2004). In the context of energy, then, the balance model (1.1) maintains the law of conservation of energy, with usable energy inputs to the system exactly balancing all outputs from the system, consisting of both usable outputs and that which is lost. Necessarily, then, all terms in the model are assumed to be non-negative. It should be noted, however, that the thermodynamic balance of model (1.1) is not typically observed in practice. Instead, only a subset of the variables in a model such as (1.1) are observed ( input variables ), with the remaining output variables then deduced or estimated by a balancing algorithm 1 which attempts to solve the system of equations in (1.1) to achieve balance for each species and over all important media (energy, nutrients, biomass, etc.). This balance assumption is illustrated with the small five-compartment dataset from Cone Spring (Tilly, 1968), where we have the following observed data: X = (11184, 0, 0, 0, 635) E = (300, 355, 0, 0, 860) (1.2a) T = (1.2b) 1 See Ulanowicz,

9 Applying the physical balance model (1.1) yields the deduced or unobserved data R = (2003, 3275, 1814, 203, 3109) In this case, no special algorithm is required to solve for R. When some values of X i, E i, T ij, and R i are unobserved for several (i, j) combinations simultaneously, a special algorithm will be required. For the purposes of interpreting data presented in this essay, we consider R i to be unobserved with the other values observed. The use of such algorithms to achieve balance, however, ignores sources of variability in the input such as measurement error and the use of multiple reference sources. Although such uncertainty has been subjected to sensitivity analyses via perturbation of input values (Bundy, 2005; Essington, 2007), the resulting confidence intervals for model output variables do not address statistical inference, and so do not necessarily pertain to the ecosystems under study Assumptions for Statistical Inference Ultimately, the assumption of linear balance of models such as Ulanowicz s remains a concern for the ENA approach (Dame & Christian, 2006). Setting aside this concern, for our statistical approach, we first consider each of the terms in (1.1) as random variables. Rather than making the assumption of balance for each species i, our statistical model assumes that the average medium in balances the average medium out. That is, α + β in = β out + ɛ + φ E(X i ) + E(T +i ) = E(T i+ ) + E(E i ) + E(R i ) (1.3) where α, β in, β out, ɛ, φ denote the expectations of X i, T +i, T i+, E i, and R i, respectively. This statistical version of the balance model takes into account the fact that certain quantities are unmeasurable or only indirectly observable. To illustrate our statistical approach, we consider that the dissipation from the system, R i, is unmeasurable and must be deduced from balance. Since such deduced (output) values are not actual data, inference made on observed (input) parameters does not translate directly to inference on output 4

10 parameters. That is, our statistical model does not assume that X i + T +i = T i+ + E i + R i as in (1.1) or, equivalently, that R i = X i + T +i (T i+ + E i ). Instead, it assumes (1.3) so that inference made on X i, T +i, T i+, and E i does not directly correspond to inference on R i. In Bayesian melding (Poole & Raftery, 2000), we consider input and output variables as random, with known means θ and φ for inputs and outputs, respectively. The model M( ) defines the deterministic relationship between these parameters, with M(θ) = φ. Now, under a conventional empirical approach that ignores M( ), one would provide a premodel prior distribution of (θ, φ) and a set of likelihoods for input/output data specified according to identified properties. However, to incorporate the deterministic model M with the entirely empirical approach, the premodel prior for θ is now mapped by M onto an induced prior for φ. The premodel and induced priors for φ are then melded, yielding a melded prior for φ which is then mapped to a melded prior for θ using M 1, the model inverse function. Finally, Bayes rule is applied to obtain a posterior for θ, which is subsequently mapped to φ by M to yield a posterior for φ. Inference is then made from these posteriors. In the work of Poole & Raftery (2000), Bayesian melding has been applied in the case of deterministic dynamics models in which a single equation evolves over time. For ENA, multiple linear equations describe the state of an ecosystem at a single point in time. Hence, by assuming balance for expectations as in (1.3), we combine these multiple equations into one. The essay features our first attempt of employing Bayesian melding in such a context, involving the following statistical model, simplified from (1.3): θ 1 = α + β in θ 2 = ɛ + β out φ = M(θ) = θ 1 θ 2 (1.4) where θ = (θ 1, θ 2 ) denotes the inputs and φ the output. Correspondingly, we write each of all transfers to and from a species i as single random variables, denoted W i = X i + T +i and U i = T i+ + E i. Thus the model M(θ) can be written equivalently as M(θ) = θ 1 θ 2 φ. Note that this model M(θ) is not invertible and that, for simplicity, all terms in (1.3) are assumed to be strictly positive. (Such an assumption is realistic 5

11 for most major media.) As we will discuss in Sections 2.1 and 3.2, we employ a technique proposed by Poole & Raftery (2000) to handle non-invertibility when computing the melded prior for θ. We now consider a simple ENA example in the context of our model (1.4) and Bayesian melding. 1.2 Cone Spring Example To illustrate the application of Bayesian melding to a simple ecological network, we consider the five unit ecosystem of Cone Spring (Tilly, 1968). As shown in Figure 1.1, the units (or compartments) correspond not to individual species but to groups of different types of organisms, each occupying different ecological niches: plants, detritus, bacteria, detritivores, and carnivores. In this case, the network consists of energy flows among the compartments and into and out of the system, as measured in kcal/m 2 /y. In the flow diagram of Figure 1.1, exogenous inputs X i are represented by arrows not emerging from any other compartments, whereas those not entering a compartment represent exogenous inputs E i. Dissipative respirations R i are represented by ground symbols. All remaining arrows denote transfers among the n = 5 compartments, with T +i = n j=1 T ji denoting the sum of all energy transfers to a compartment and T i+ = n j=1 T ij the sum of all energy transfers out of a compartment. As noted previously, we simplify these variables by writing W i = X i + T +i and U i = T i+ + E i, which now denote all inputs to and outputs from the ith compartment, respectively. For example, for compartment 1, representing plants in the Cone Spring ecosystem, we have exogenous inputs X 1 = kcal/m 2 /y, with transfers from other compartments T +1 = 0, yielding W 1 = Additionally, exogenous outputs E 1 = 300, with transfers to other compartments T 1+ = 8881, yielding U 1 = Finally, dissipations from plants are given by R 1 = We are interested in making inference on the parameters of model (1.4). In the Bayesian context, we specify a joint pre-(deterministic-) model distribution for θ 1, θ 2, and φ, which represent the expected medium (in this case, energy) into, out of, and dissipated from the ecosystem. Note that the model parameters are not assumed to be independent. In the context of data collection, however, the dissipations R i and E(R i ) should not be deterministically influenced by {W, U} or θ = E(W i, U i ). Hence φ has a premodel prior distribution arising as the marginal distribution to the joint prior mentioned previously. Conversely, the model M( ) 6

12 Figure 1.1: Flow diagram (Ulanowicz, 2004) displaying the trophic exchanges of energy (kcal/m 2 /y) in the Cone Spring ecosystem (Tilly, 1968) from. Arrows not emerging from a box denote exogenous inputs X i, whereas those not entering a box denote exogenous outputs E i. Ground symbols denote dissipations R i. coerces φ to equal θ 1 θ 2, and so induces a separate prior for φ from the bivariate joint premodel prior for θ. Additionally, we must specify appropriate likelihoods for the data type. For an ecological network such as Cone Spring, we specify a trivariate lognormal prior for the model parameters and exponential likelihoods for the data, with the inverses of θ 1, θ 2, and φ serving as the rate parameters for these likelihoods. These specifications are described explicitly in detail in chapter 3. For our purposes in this essay, it suffices to mention a few further details about the methods required for Bayesian inference on θ 1, θ 2, and φ. First, note that we rescaled the data down by 10 3 so as to avoid difficulties involving large numbers resulting from the lognormal prior parameterizations mentioned above. Additionally, since balance is assumed only for the expectations rather than the data, W i θ 1, U i θ 2, and R i φ are assumed to be mutually independent. Note further that, since θ 1, θ 2, and φ are all positive, and φ = θ 1 θ 2 > 0, we have that θ 1 > θ 2. Finally, in order to sample from the posterior distribution for model inputs (i.e. for θ 1, θ 2 W, U, R), we must first compute the melded prior for θ 1 and θ 2, and subsequently use this melded prior in each cycle of a Gibbs sampler to obtain samples from the posterior. The procedures 7

13 required for Bayesian melding are described in detail in chapter 2 (also see Poole & Raftery (2000)), whereas the overall Gibbs sampling algorithm is given in chapter 3. Suitable references concerning Markov chain Monte Carlo and Gibbs sampling include Givens & Hoeting (2005) and Hoff (2007) Results Since the Cone Spring data comprise only five compartments and, hence, five observations, it follows that posterior samples for the model parameters are not greatly informed by the data. The following results then should be taken principally as illustrations of the type of output that is obtained when Bayesian melding is applied to the ENA data type. With uniform random starting values of θ (0) 1 = 49.2 and θ (0) 2 = 3.9, 5000 subsequent iterations of full Gibbs cycles were run, leading to two vectors of 5001 posterior samples of either parameter. Specifically, each corresponds to samples from the full conditional posterior of the θ parameters, i.e. θ 1 θ 2, W, U, R and similarly for θ 2 θ 1, W, U, R, which are shown in Figure 1.2. Applying the model M to these samples yields posterior samples for φ = θ 1 θ 2, which is also shown in Figure 1.2. The Gibbs sampling algorithm ran relatively slowly, perhaps due to the lack of data, requiring just under 9.7 hours to complete on a machine with an Intel Core 2 Duo T7300 operating at 2.00 GHz with 2.00 GB of memory and running Windows Vista Service Pack 1. The same machine was used for all subsequent runs. Histograms of the posterior distributions are shown in Figure 1.3. In each case the first 500 samples are omitted as a suitable burn-in period, where the chain has not yet converged to the stationary posterior distribution. Note that, for each parameter, both the posterior and premodel prior distributions are rightskewed; the posterior distributions, however, display significantly reduced variance and reduced means as well. The joint posterior distribution for θ 1 and θ 2 is also shown in Figure 1.4. It is notably right-skewed (i.e. toward larger values of each parameter) and unimodal, and cuts off along the diagonal defining the space {(θ 1, θ 2 ) : θ 1 > θ 2 }. In Figure 1.3 (d), the sample autocorrelations are shown for the posterior samples of θ 1 ; the evident high degree of autocorrelation suggests slow convergence of the Markov chain to the posterior distribution, indicating that running multiple or simply much longer chains is desirable for future analyses. Some specific comparisons are relevant to mention. In Table 1.1, means and standard deviations (naive, 8

14 Figure 1.2: Trace plot showing Markov chain Monte Carlo output for model input parameters θ 1 and θ 2 and output parameter φ. negatively biased SDs in the case of the posterior) are given for each of the premodel priors and posterior distributions, with the standard errors given for the Cone Spring data. 2 These are normal standard errors, that is, the data standard deviations divided by the square root of the number of observations n = 5. As is evident from the table, the posterior means are below the prior means for all three parameters, and are closer in magnitude to those of the data. Naive standard deviations for the posterior samples are also lower than for prior specifications, but since they are subject to significant negative bias due to high autocorrelations, they are not strictly comparable. In Chapter 3, we obtain more accurate posterior variance estimates by thinning the Monte Carlo samples to remove this autocorrelation. 2 Note that in Table 1.1 data given parameter refers to the vectors W, U, and R. 9

15 Figure 1.3: Histograms of Bayesian Melding posterior distributions for (a) θ 1, (b) θ 2, (c) φ, with premodel prior distributions shown as solid lines. The first 500 observations are omitted as a burn-in period. Sample autocorrelations for θ 1 are shown in (d). Additionally, the R package boa (Smith, 2007) provides a variety of descriptive statistics and convergence diagnostics for MCMC output. Of interest here are quantiles of the full conditional posterior samples as well as 95% highest posterior density (HPD) regions for the θ and φ parameters. 3 These regions correspond to the narrowest possible intervals containing 95% of the posterior probability (Givens & Hoeting, 2005). For θ 1, the boa package yields 2.5%, 50.0%, and 97.5% quartiles of 4.05, 6.53, and 10.97, with a 95% HPD interval of (3.81, 10.24). The corresponding quartiles for θ 2 are 1.93, 3.71, and 7.30, with 95% HPD interval (1.79, 6.83). Finally, the quartiles for φ are 1.17, 2.58, and 5.99, with 95% HPD interval (0.98, 5.23). 3 Note that these HPD intervals are computed using the Monte Carlo method of Chen and Shao (1999), which assumes a unimodal marginal posterior distribution. 10

Figure 1.4: Joint posterior distribution of θ 1 and θ 2 with initial 500 observations removed as burn-in period. Density values increase from red to orange to yellow.

16 Figure 1.4: Joint posterior distribution of θ 1 and θ 2 with initial 500 observations removed as burn-in period. Density values increase from red to orange to yellow. In this chapter, we have presented the general framework for statistical inference via Bayesian melding on ENA data. An example of this data type has been described, namely the small Cone Spring dataset, and the results of applying Bayesian melding and a Gibbs sampler to obtain posterior model parameter distributions have been summarized. Further background concerning Bayesian melding is detailed in the next chapter, which is illustrated through a simple toy example. The chapter thereafter provides the explicit details of the Gibbs sampling algorithm as well as the results of implementation of this algorithm for a larger ENA dataset. 11

17 Table 1.1: Summary statistics for Cone Spring data. Note that standard error values are for the data only. Mean Std. Dev./Errors parameter θ 1 θ 2 φ θ 1 θ 2 φ data given parameter premodel prior melded posterior θ 1 θ 2 φ 95% HPD interval (3.81, 10.24) (1.79, 6.83) (0.98, 5.28) Quartiles 2.5% % %

18 2 Methods This chapter presents the method of Bayesian melding in greater detail in the context of non-dynamical deterministic models. The method is subsequently illustrated with an example consisting of a simple invertible toy model with a single input θ and output φ, where M(θ) = θ 3 φ. We also introduce the Markov chain Monte Carlo procedure employed to sample from the posterior for θ, namely the Metropolis-Hastings algorithm. 2.1 Bayesian Melding Bayesian melding allows for formal statistical inference on deterministic simulation models, while taking into full account information and uncertainty about inputs and outputs to the model (Poole & Raftery, 2000). In many such models, input and output parameters are often specified via a trial-and-error approach; plausible inputs are chosen based on previous research or knowledge and typically tuned until plausible outputs are obtained. The framework of Bayesian inference not only offers a formalization to this exercise, but for detailed analysis available from proper statistical inference. Consequently, prior information about parameters can be employed in concert with data to obtain posterior inference about both input and output parameters. (See Hoff (2007) for an introductory reference to Bayesian inference.) In the case of deterministic models, the objective is to combine (a) prior information about m inputs θ and p outputs φ that is independent of the model with (b) model-based prior information. Note that, in general, we define a deterministic model M as some mapping M : θ φ, with θ Θ R m and φ Φ R p. Poole & Raftery (2000) describe an earlier approach to this objective, referred to as Bayesian synthesis 13

19 (Raftery, Givens, & Zeh, 1995). In this approach, the joint premodel prior distribution p(θ, φ) incorporates all prior information independent of the model. Model information is integrated simply by restricting the premodel distribution to the submanifold {(θ, φ) : φ = M(θ)}, yielding a postmodel distribution π(θ, φ). However, Wolpert (1995) commented that such a postmodel distribution is ill-defined and, consequently, is subject to a condition known as the Borel paradox. This has the further consequence that the postmodel distribution depends on how the model M is parameterized, which will further result in an ill-defined conditional distribution. As Poole & Raftery (2000) note, this is not satisfactory, and as such they have developed Bayesian melding to reformulate such model-based inference as a standard Bayesian procedure Notation and Method For a model M, we consider inputs θ and outputs φ = M(θ). We denote the stated premodel prior distributions for θ and φ as p 1 (θ) and p 2 (φ), respectively, marginalized from their joint premodel prior stated irrespective of M. Additionally, applying the model M to p 1 (θ) yields a model-induced prior for φ denoted by p 1(φ). Bayesian melding occurs when these two prior distributions for φ - the stated prior p 2 (φ) and induced prior p 1(φ) - are melded, a process which occurs via logarithmic pooling (Poole & Raftery, 2000). Note that this pooling occurs to form a combined prior distribution, called a melded prior, which is subsequently updated using Bayes rule. Then the melded prior for outputs φ is given by p (φ) = k α p 1(φ) α p 2 (φ) 1 α (2.1) where α [0, 1] is a pooling weight, so that when α = 0.5, equation (2.1) corresponds to taking the geometric mean of the two prior densities. Poole & Raftery (2000) show that this function is indeed a probability density; obtaining its form requires only the calculation of the normalization constant k α. For certain forms of M and p 1 (θ), it may be possible to express the induced and melded output priors, p 1(φ) and p (φ), analytically, although this is rare in practice. Even in such cases, the form of M 1 could prevent the melded input prior p (θ) from having closed form, particularly if this inverse does not exist. For example, for a model with m inputs θ = (θ 1,..., θ m ) and p outputs φ = (φ 1,..., φ p ), where p < m, the model 14

20 φ = M(θ) is not one-to-one and is hence noninvertible. 4 Since we are interested in eventually obtaining p (θ), we first follow Poole & Raftery (2000) who obtain this melded input prior 5 as follows: ( ) p1 (θ) p (θ) = p (M(θ)) p 1 ( (M(θ)) ) 1 α p2 (M(θ)) = k α p 1 (θ) p 1 (M(θ)) (2.2) Note that this corresponds to the original stated input prior p 1 ( ), weighted by the ratio of two densities in the φ space, p 2 ( ) and p 1( ), both evaluated at φ = M(θ) induced from given values of θ Θ (Poole & Raftery, 2000). This technique eliminates the explicit evaluation of p (φ), and hence the need for an invertible M. The pooling weight α defines the essentially arbitrary relative importance of each of the induced and stated output priors p 1( ) and p 2 ( ). We select α = 0.5 resulting in geometric pooling, where (2.1) amounts to computing the geometric mean of the two prior output densities (Poole & Raftery, 2000). If p 1( ) has no closed form, it (and, hence, p (θ)) can be numerically evaluated by employing kernel density estimation. Finally, we obtain from Poole & Raftery (2000) the posterior distribution for inputs, which is given by p (θ X, Y) L (θ, M(θ)) p (θ) (2.3) where X and Y correspond to the input and output data, respectively. For ENA inference, we assume that E(X i ) = θ and E(Y i ) = φ. Additionally, L (θ, M(θ)) is the joint likelihood of X and Y. Note further that for deterministic models, the output data Y may not be actual observed data; for example, it may be determined via a balance algorithm as with the dissipation variable R i of ENA data from Section 1.1. In the general case, with m inputs and p outputs, X i is the ith m 1 vector of observation inputs and Y i is the ith p 1 vector of observation outputs. Since the posterior distribution (2.3) is frequently of non-standard form, we obtain samples iteratively via Markov chain Monte Carlo. Bayesian melding is incorporated into MCMC via the melded prior for inputs (2.2). As shall be described in the following section, at each step of the Metropolis-Hastings algorithm the melded prior is evaluated, and it can also serve as the algorithm s proposal distribution. 4 See Hogg et al (2005) for background regarding transformations of random variables. 5 Poole & Raftery (2000) denote this equation (16). 15

21 2.1.2 Metropolis-Hastings Algorithm The goal of MCMC methods is to construct a Markov chain for which the stationary distribution equals the target distribution f, i.e. the posterior distribution. The Metropolis-Hastings algorithm is a general such method and the one employed here. For background concerning this algorithm and further topics in MCMC, see Givens & Hoeting (2005). The algorithm begins at stage t = 0 with the selection of starting values denoted by θ (0) and drawn at random from a suitable starting distribution, with the requirement that f ( θ (0)) > 0. Then, given θ (t) at stage t {0, 1, 2,...}, the algorithm generates θ (t+1) via the following steps: 1. Sample a candidate value θ from a proposal distribution J ( θ (t)). 2. Compute the Metropolis-Hastings ratio R ( θ (t), θ ) where ( R θ (t), θ ) = p(θ X, Y) J(θ (t) θ ) p(θ (t) X, Y) J(θ θ (t) ) (2.4) where, in our context, the target distribution f(θ) equals p (θ X, Y), the posterior distribution for inputs θ. Note that R ( θ (t), θ ) is referred to as the Metropolis-Hastings ratio and is always defined since the proposal θ only occurs if f(θ (t) ) > 0 and J(θ θ (t) ) > Accept or reject θ according to the following: { θ θ (t+1) with prob min { R ( θ (t), θ ), 1 } = θ (t), otherwise 4. Increment t and return to step 1. Following sufficiently many iterations of the algorithm, the samples of θ will converge to a stationary distribution, namely the target distribution f (Givens & Hoeting (2005)). In order to obtain samples from the posterior (2.3), we must choose an appropriate proposal distribution J ( θ(t)) and evaluate the melded prior p (θ), which appears in both the numerator and denominator of the Metropolis-Hastings ratio (2.4). With respect to the choice of proposal distribution, some details are worth mentioning. First, one choice in the context of Bayesian inference is to use the prior distribution itself as a proposal. Since the (continuous) melded prior lacks an analytical form in many cases (and, indeed, in all examples considered in this paper), 16

22 a form of slice sampling 6 can be employed. In this sampling method, the support on which the melded prior is evaluated numerically is discretized into slices, and the discrete density corresponding to each slice is computed. These slice densities then serve as inverse sampling weights, so that a particular slice is chosen with probability equal to its discrete density. After having sampled a slice, a sample from the melded prior is obtained by sampling uniformly between the endpoints of the selected slice. Alternatively, a proposal distribution could be suitably chosen so that it covers the support of the stationary distribution and does not yield candidate values θ that are accepted or rejected too frequently. It is also useful to use a proposal for which the spread can be tuned; if it is too diffuse compared to the target distribution, the candidate values will be rejected too frequently, leading to slow convergence, and the same applies if the proposal variance is too low. If, for example, a normal proposal distribution is chosen, the variance can be tuned to improve the speed of convergence to the target distribution. Note that a symmetric proposal such as a normal has the additional property that J(θ (t) θ ) = J(θ θ (t) ), in which case the proposal disappears from the Metropolis-Hastings ratio (2.4). Furthermore, the method is now simply called the Metropolis algorithm (Givens & Hoeting, 2005). In general, running many hundreds or thousands of iterations of the Metropolis-Hastings algorithm is necessary to ensure convergence of the Markov chain to the stationary target distribution. Convergence can be assessed in a number of ways such as through the calculation of diagnostics as in the boa package in R, as well as through simple visual inspection of trace plots of the Markov chain. A useful alternative to running a single very long chain to ensure convergence is to run several chains from different random starting values to assess mixing. By plotting two or more chains together, we can determine the quality of mixing visually by examining how well the separate chains overlap. Finally, as mentioned in the context of the Cone Spring example of chapter 1, it is generally advisable to discard the first portion of values in the chain as the burn-in period. The precise length of the burn-in period will depend on the starting values; in any case, though, we would not expect good convergence in this initial portion of the Markov chain, so that these values would not be informative for approximating the target posterior distribution or, correspondingly, making inference 6 See Givens & Hoeting (2005) for some background and further references concerning slice sampling. 17

23 about the corresponding parameters. 2.2 Melding Example We now consider the case of a simple invertible toy model with a single input θ and output φ, where the deterministic model is given by M(θ) = θ 3 φ. We specify data likelihoods as follows: X θ N(θ, 1) Y φ N(φ, 4) (2.5a) (2.5b) which have (premodel) priors given by θ U( 2, 2) φ U( 3, 3) (2.5c) (2.5d) Simulated random data were generated consisting of 100 observations each for X i and Y i, with true means θ 0 = 1.2 and φ 0 = 2. We can thus compare these true values to the posterior distributions for θ and φ. Note that this differs from our ENA inference approach, where one would have obtained Y i = X 3 i in imposing the model on the data. Nevertheless, the intention of the exercise here was to investigate (and demonstrate) the feasibility of implementing Bayesian melding with a simple equation relating the expectations of input and output variables. It was a successful attempt, as we demonstrate below, and was the basis of the implementation of our ENA inference for the datasets in Chapters 1 and 3. Applying Bayesian melding to this example yields the priors and posteriors shown in Figure 2.1. Note that each of the induced prior for φ (Figure 2.1 (a)) and melded prior for θ (Figure 2.1 (b)) is symmetric. It should be mentioned, however, that although the melded prior in Figure 2.1 (b) is computed according to equation (2.2), we do not compute the proportionality constant k α, since it is not required for the Metropolis-Hastings algorithm. This would affect only the scaling of the melded prior. In contrast, the posterior distributions for θ (Figure 2.1 (c)) and φ (Figure 2.1 (d)) are slightly skewed, to the left in the case of θ and to the right in the case of φ. This is also shown in the posterior densities shown in Figure 2.2 (a) and (b). 18

24 Figure 2.1: Prior and posterior distributions following Bayesian melding on the toy model: (a) induced prior for φ; (b) melded prior for θ; (c) posterior histogram for θ; (d) posterior histogram for φ. As evidenced from the plot of autocorrelation for posterior samples of θ shown in Figure 2.2, there is not great cause for concern with respect to slow convergence to the posterior here, as it rapidly decays to zero. In fact, by lag 5, sample autocorrelation decays to just below 0.1. For this implementation, the proposal distribution was normal with θ (t) (the current value) as mean and δ = 0.2 as standard deviation. Note that use of a symmetric normal proposal renders the algorithm Metropolis rather than Metropolis-Hastings. This method yielded a reasonable acceptance rate of 37.5% and required about 8 min 23 sec to complete 8000 iterations on the same machine used in the Cone Spring example. By comparison, the slice sampling method using a discretization of slices required more time (11 min 27 sec) to complete the same number of iterations on the same machine, and yielded a considerably poorer acceptance rate of 6.2%. This 19

25 Figure 2.2: Posterior densities for θ (a), φ (b), with plot of autocorrelations (c) and MCMC trace (d) for θ. is unacceptably low, however, so we can conclude that the sort of slice sampling employed here does not yield satisfactory results. Examination of the plot of autocorrelation (not shown) shows comparatively slow decay as well. The low acceptance rate may be due to the irregular shape of the melded prior (Figure 2.1 (b)) in this case. In Table 2.1, some comparisons are given between the means and standard deviations for θ and φ for each of the simulated data, premodel prior, and posterior distributions. It is notable that the posterior means are similar to the data means; however, with the incorporation of the model φ = θ 3 into the inference, the posterior expectation for φ is close to the posterior expectation for θ cubed (i.e = 1.82). As in the Cone Spring example, we can also obtain quartiles and 95% HPD intervals from the R boa package. For θ in this example, we obtain 2.5%, 50.0%, and 97.5% quartiles equal to 1.09, 1.22, and 1.36, respectively, with 20

26 Table 2.1: Summary statistics for Bayesian melding applied to toy model M(θ) = θ 3 φ. Note that standard error values are for the data only. Mean Std. Dev./Errors parameter θ φ θ φ data given parameter premodel prior melded posterior % HPD Quartiles interval (2.5%, 50.0%, 97.5%) input θ (1.114, 1.362) 1.09, 1.22, 1.36 output φ (1.376, 2.517) 1.31, 1.84, % HPD interval (1.114, 1.362). Conversely, for φ we obtain quartiles of 1.31, 1.84, and 2.49 with 95% HPD interval (1.376, 2.517). This chapter has described the method of Poole & Raftery (2000) referred to as Bayesian melding with respect to its motivation and implementation in the context of non-dynamical ENA. The method was further illustrated via a simple univariate model with a single input and single output, with posterior sampling effected by the Metropolis-Hastings algorithm. Some conclusions follow from the results described above. Although the algorithm runs quickly using either a normal proposal or the melded prior itself as proposal (which employs slice sampling), use of the melded proposal yields unsatisfactory results, with an acceptance rate of only 6.2%. This is in contrast to the reasonable acceptance rate of 37.5% when the normal proposal is used with appropriate tuning. Note also that use of the normal proposal yields faster results as well, in which case the algorithm is simply Metropolis. In the next chapter, we implement Bayesian melding for a larger ENA dataset, the Chesapeake Bay 21

27 mesohaline ecosystem (Baird & Ulanowicz, 1989). Here we employ a Gibbs sampler with Metropolis-Hastings steps to obtain posterior samples for the two input parameters, recalling that the ENA data type for our simplified model from chapter 1 includes two inputs and one output. 22

28 3 Modelling and Implementation In this chapter we apply Bayesian melding to a larger ENA dataset observed from the Chesapeake Bay mesohaline network (Baird & Ulanowicz, 1989). This ecosystem comprises 36 compartments, including such groups of organisms as phytoplankton, different bacteria, zooplankton, other micro-organisms, and individual species such as catfish and striped bass. Correspondingly, the dataset contains 36 observations arranged in the format described in the Cone Spring example of chapter 1, with W i, U i, and R i denoting the medium in, medium out, and medium dissipated for the ith compartment. This chapter describes the motivation for the specification of priors initially, and continues with details of the computation of the melded prior, the posterior sampling algorithm, and some general information concerning the R code employed to evaluate the melded prior. Finally, the results of the implementation of a Gibbs sampler for the posteriors are discussed. As in previous chapters, suitable references for Gibbs sampling, the Metropolis-Hastings algorithm, and Markov chain Monte Carlo generally include Givens & Hoeting (2005) and Hoff (2007). 3.1 Prior Specification Since W i, U i, and R i are real numbers that run from zero to tens of thousands, we specify the following exponential likelihoods: W i θ 1 = X i + T +i Exp (1/θ 1 ) (3.1a) U i θ 2 = T i+ + E i Exp (1/θ 2 ) (3.1b) R i φ Exp (1/φ) (3.1c) 23

29 These likelihoods are subject to the expections noted in (1.3). That is, E(W i θ 1 ) = θ 1, E(U i θ 2 ) = θ 2, and E(R i φ) = φ. The explicit forms of these likelihoods for W i, U i, and R i respectively are given by L 1 (θ 1 ) = θ n 1 e W i/θ 1 L 2 (θ 2 ) = θ n 2 e U i/θ 2 (3.2) L 3 (φ) = φ n e R i/φ Additionally, W i θ 1, U i θ 1, and R i φ are assumed to be mutually independent. The dependence of W i, U i, and R i implied by the usual notion of balance is reflected through their marginal joint distribution as based on the following joint prior. We specify the joint prior distribution for the parameters Φ = (θ 1, θ 2, φ) as trivariate lognormal. That is, taking Φ MV LN(µ, Σ) Since Φ is the expectation parameter of {W, U, R}, the µ and Σ parameters are the mean and covariance parameters for log (Φ). Realizations of Φ are generated in R using the rlnorm.rplus() function from the compositions library. This function permits the generation of multivariate lognormal samples with means µ and covariance structure Σ. In practice, the lognormal prior hyper-parameters are specified based on the order of the raw data. Although as noted above the ranges of W i, U i, and R i go from zero to the tens of thousands, we re-scale 7 them down by Hence, if values of W i are on the order of 10 following rescaling, we specify a prior mean of 10. Prior variances are specified similarly, except with a greater degree of flexibility. For example, if we obtain a standard deviation also on the order of 10, we specify a prior variance of 100. We might also choose to decrease this prior variance to 50 to mitigate the effect of simply squaring the data order 10. The goal here is to specify reasonable priors which reflect the data type. Furthermore, since the average medium in (θ 1 ) is assumed to equal the total average medium out (θ 2 + φ), we specify E(θ 2 ) and E(φ) each to be half here. 7 A different scaling factor could be employed depending on what is reasonable for the dataset, we do not deviate from

30 the magnitude of the prior mean of W i and similarly for variance. We then specify ψ = (E(θ 1 ), E(θ 2 ), E(φ)) and ω = (V ar(θ 1 ), V ar(θ 2 ), V ar(φ)) as follows: ψ = (100, 50, 50) ω = (50000, 25000, 25000) Since these prior specifications apply to the raw not logged data, to obtain µ and Σ, we solve ψ 1 = e µ1+σ2 1 /2 and ω 1 = (e σ2 1 1)e 2µ 1+σ 2 1 numerically and the same procedure can be applied to solve for µ2 and σ 2 2 with ψ 2 and ω 2. Since ψ 2 = ψ 3 and ω 2 = ω 3, we have that µ 2 = µ 3 and σ 2 = σ 3, and so a third set of numerical solutions need not be obtained. We further assume an exchangeable correlation structure with off-diagonals all equal to 0.5, on the basis that the expected magnitude of inputs are correlated with the expected magnitudes of output and dissipations, which themselves are also correlated. If we consider the analogy of an individual s income playing the role of W i, spending, the role of U i, and savings, the role of R i, then it is reasonable to assume that, marginally, all three are positively correlated with each other. This leads to the following covariance matrix: Σ = σ (σ 1 σ 2 ) 0.5(σ 1 σ 2 ) 0.5(σ 1 σ 2 ) σ (σ 2 2). (3.3) 0.5(σ 1 σ 2 ) 0.5(σ 2 2) σ Sampling Algorithm Melding The melded prior for θ 1, θ 2 must first be approximated. Note that p (θ 1, θ 2 ) is bivariate lognormal with logged data means µ 1 and µ 2 and covariance structure corresponding to the upper-left 2 2 matrix in Σ. Then the melded prior 8 for θ = (θ 1, θ 2 ) is given by p (θ) = kp 1 (θ) ( ) 1 α p2 (M(θ)) p 1 (M(θ)) (3.4) 8 Note that equation (3.4) for p (θ) corresponds to equation (16) in Poole & Raftery (2000). 25

31 where p 1 (θ) is the joint prior bivariate lognormal density described above, p 2 (φ) is the prior univariate lognormal density with logged data mean µ 2 and variance σ2, 2 and p 1 (φ) is the induced distribution for φ (Poole & Raftery, 2000). Note that α denotes the pooling weight for the melding of the stated and induced priors; Poole & Raftery (2000) note that the choice of α is essentially arbitrary. We set α = 0.5 for the reason stated in Section Note that the induced distribution p 1 (φ) is numerically evaluated by applying the model M(θ 1, θ 2 ) = θ 1 θ 2 to the premodel prior realizations of θ 1 and θ 2 generated using rlnorm.rplus(). The resulting induced distribution for φ is obtained using kernel density estimation and the R function density(). As we shall see in Section 3.2.3, for the Gibbs sampler we require prior conditionals for each of θ 1 and θ 2 ; that is, p (θ 1 θ 2 ) and p (θ 2 θ 1 ). These conditionals are given by p (θ i θ j ) = p (θ 1, θ 2 ) p (θ j ) (3.5) where i, j = 1, 2 and j i. The denominators in (3.5) will cancel in the Metropolis-Hastings ratios described in Section 3.2.3, so they need not be calculated Gibbs Sampler We present the Gibbs sampling algorithm below. In each full scan of a Gibbs cycle, we iteratively generate new random samples of θ 1 and θ 2, in each case sampling θ j conditional on θ i (i j) and on the data {W, U, R}. These full (melded) conditional distributions appear in step 2 of the algorithm below, and their explicit forms are shown in Section 3.2.3: 1. Select starting values ( ) θ (0) 1, θ(0) 2 and set t = Generate, in turn, for j = 1, 2: ( ) θ (t+1) 1 p θ 1 θ (t) 2, W, U, R ( ) θ (t+1) 2 p θ 2 θ (t+1) 1, W, U, R 3. Having completed a full scan of t, increment t and return to step 2. 26

32 3.2.3 Metropolis-Hastings Note that the full conditional posteriors are sampled in part 2. This is accomplished in both cases via Metropolis-Hastings steps (j = 1, 2). That is, we sample from the full conditional posteriors for θ 1 and θ 2 with the general form p (θ i θ j, W, U, R) L(θ 1, θ 2, φ) p (θ i θ j ). Recall from Section 1.1 that, since balance is not assumed for the data values but only for the expectations, we may conveniently assume that W i θ 1, U i θ 2, and R i φ are mutually independent. Consequently we can write L(θ 1, θ 2, φ) = L 1 (θ 1 )L 2 (θ 2 )L 3 (φ). The Metropoils-Hastings step (j = 1) for θ (t+1) 1 proceeds as follows: ( 1. Sample a candidate value θ1 from a proposal distribution J ( ) 2. (a) Compute the Metropolis-Hastings ratio R θ (t) 1, θ 1 where θ (t) 1 ( ) R θ (t) 1, θ 1 = L 1(θ1)L 2 (θ (t) 2 )L 3(φ j=1 ) p(θ 1 θ (t) 2 ) J(θ(t) 1 θ 1) L 1 (θ (t) 1 )L 2(θ (t) 2 )L 3(φ (t) ) p(θ (t) 1 θ(t) 2 ) J(θ 1 θ(t) 1 ) where φ j=1 = θ 1 θ (t) 2 and φ (t) = θ (t) 1 θ (t) 2. (b) Accept or reject θ1 according to the following: θ θ (t+1) 1 with prob min 1 = θ (t) 1 otherwise ). { ( ) } R θ (t) 1, θ 1, 1 3. Increment t and proceed to the second Gibbs step (i.e. Metropolis-Hastings step for θ 2 with j = 2) as follows. ( ) (a) For sampling θ (t+1) 2, compute the Metropolis-Hastings ratio R θ (t) 2, θ 2 where ( ) R θ (t) 2, θ 2 = L 1(θ (t+1) 1 )L 2 (θ2)l 3 (φ j=2 ) p(θ 2 θ (t+1) 1 ) J(θ (t) 2 θ 2) L 1 (θ (t+1) 1 )L 2 (θ (t) 2 )L 3(φ (t+1) ) p(θ (t) 2 θ(t+1) 1 ) J(θ2 θ(t) 2 ) where φ (t+1) = θ (t+1) 1 θ (t) 2 and φ j=2 = θ(t+1) 1 θ 2. 27

33 (b) Accept or reject θ2 according to the following: θ θ (t+1) 2 with prob min 2 = 4. Increment t and return to step 2 of the Gibbs sampler. θ (t) 2 otherwise { ( ) } R θ (t) 2, θ 2, Remarks ( 1. Note that in R R ( θ (t) 2, θ 2 ratio. θ (t) 1, θ 1 ) the L(θ (t) 2 ) terms cancel and the same holds for the L(θ(t+1) 1 ) terms in ). In neither case is any new sample being taken, so they naturally disappear from the 2. If the proposal distributions J ( ) for each step (they need not be the same) are symmetric, they also disappear from the ratio, and we consider Metropolis steps rather than Metropolis-Hastings (Givens & Hoeting, 2005). 3. A potentially suitable proposal distribution J ( ) is a chi-square distribution with a high number of degrees of freedom, as this would match the magnitude of our data type. For greater flexibility, we generalize this to a Gamma distribution with shape parameter 2y and rate parameter 2. In this case, we have J(x y) = ( Γ(2y) 2 2y) 1 x 2y 1 e 2x. In the algorithm itself, we not only sample from J(x y) but evaluate it as well in the Metropolis- Hastings ratios. In the denominator of each ratio, we evaluate J(θ θ (t) ) at θ as a Gamma density with shape parameter 2θ (t), yielding E(θ ) = θ (t). The reverse occurs in the numerator of each ratio. Additionally, we introduce a tuning parameter for this proposal distribution by specifying a modified shape parameter 2(y + δ), where δ is some small real number. This allows for some control over the variance of the proposal which is given by 2(y + δ) 2. Positive values of δ will increase the proposal variance and vice-versa for negative values. 28

34 3.3 Pseudo Code In this section, we describe in pseudo code the R functions and procedures employed for the sampling algorithm Prior Specification We specify the prior distributions for Φ = (θ 1, θ 2, φ) as trivariate lognormal as described previously in section Melding The goal here is to compute the melded density for inputs given by (3.4). This proceeds in several steps: 1. Denote the N 3 matrix of realizations of Φ by Φ, where N denotes the number of realizations generated for each parameter in Φ. Then the bivariate lognormal density p(θ 1, θ 2 ) is obtained by using the dlnorm.rplus function from the compositions library with the first two columns of Φ, which correspond to samples of θ 1 and θ 2. This function takes in µ = (µ 1, µ 2 ) and Σ = σ (σ 1 σ 2 ) 0.5(σ 1 σ 2 ) σ 2 2 as arguments and yields density values corresponding to the marginal joint density of θ 1 and θ 2, denoted p(θ 1, θ 2 ). 2. We now compute the induced density for φ, that is, p 1(φ). Applying the model φ = M(θ) = θ 1 θ 2 to the realizations obtained in Φ, we then compute the numerical induced density for φ using the density() function, where we select 0 as the left endpoint. Use of the approx() function allows for the evaluation of this induced density at a given value (or values) of φ, where approx() takes the grid for the realizations of φ and corresponding density values produced by density(). Note that the grid for φ is that produced when applying the model M to the realized θ vectors, and does not actually involve the samples of φ obtained in the third column of Φ. 29

35 3. Next, the stated premodel prior for φ, that is, p 2 (φ) must be evaluated. The third column of Φ corresponds to samples of φ, so that applying the dlnorm function to this column yields the numerical density of p 2 (φ). In this case, the function takes in µ 2 and σ2 2 as parameters. As in step 2, use of the approx function then allows for p 2 (φ) to be evaluated for the grid corresponding to φ = M(θ) = θ 1 θ Finally, the melded prior density p(θ 1, θ 2 ) given by (3.4) can be computed. The original samples contained in Φ are employed again to form a grid for θ 1, θ 2, and φ. We are concerned only with the grid defined by θ 1 and θ 2, however, as these exactly define the space of the melded prior density. Hence, the samples for φ can be ignored apart from step 3, where they are employed to evaluate the stated prior for φ. Using the approximate evaluation routines described in steps 1-4, for given realizations of θ 1, θ 2, and φ = M(θ) = θ 1 θ 2, we can compute p(θ 1, θ 2 ), p 1(M(θ)), and p 2 (M(θ)), which are then combined according to (3.4), and so yield the melded prior for θ 1 and θ 2. Note that the proportionality constant k can be ignored, as it will cancel in the Metropolis-Hastings ratio. Having obtained a numerical density for the melded prior for inputs, the interpp() function of the akima library can then be employed to evaluate p(θ 1, θ 2 ) as necessary in the Metropolis-Hastings ratio. Note finally that existing R functions are employed to evaluate bivariate and univariate lognormal densities (using dlnorm.rplus and dlnorm, respectively) in the code even though the closed forms of these densities are available. This avoids potential numerical instabilities from novel coding and, correspondingly, the pitfall of human programming error Gibbs Sampler The Gibbs Sampler is implemented primarily via a for loop structure, with each iteration performing two Metropolis-Hastings steps (j = 1, 2 for θ j ) in succession. As noted above, the interp() function is employed twice in each M-H step to evaluate the melded prior p(θ 1, θ 2 ). Samples from the Gamma proposal distribution are obtained using rgamma(), with the proposal density evaluated using dgamma(). 30

36 The exponential likelihoods are evaluated using dexp(). Finally, a logical flow structure is employed to avoid unnecessary calculations and improve efficiency. This structure allows for incorrect samples from the proposal to be rejected immediately, thus avoiding expensive and unnecessary interpolations. See the Appendix for the exact R code employed. 3.4 Results With 36 compartments comprising the Chesapeake Bay mesohaline network dataset, it follows that the data more significantly inform the posterior samples than in the case of the Cone Spring data of chapter 1. For Bayesian melding applied to the Chesapeake Bay ENA data, we employed starting values of θ (0) 1 = and θ (0) 2 = The resulting Gibbs sampler was run for 5000 subsequent iterations yielding two vectors of length 5001 of posterior samples from each parameter. In this case, the iterations required 7 h 38.8 min, a time comparable to other runs and better than the time required in the Cone Spring example. As before, the algorithm was run on a machine with an Intel Core 2 Duo T7300 chipset operating at 2.00 GHz with 2.00 GB of memory and running Windows Vista Service Pack 1. The first 500 observations were omitted as the burn-in period, and the tuning parameter δ mentioned in section was simply set to zero. The bivariate acceptance rate was 33.3%, with marginal acceptance rates of 43.4% and 41.1% for θ 1 and θ 2, respectively. Note that the posterior samples correspond to θ 1 θ 2, W, U, R and similarly for θ 2. Applying the model M to these samples yields φ = θ 1 θ 2 ; the samples for all three parameters are shown in the trace plot of Figure 3.1. Histograms of the posterior distributions are shown in Figure 3.2, with the corresponding premodel lognormal prior overlaying each. Note that the (melded) posteriors preserve some of the right-skewness of the priors, though they otherwise exhibit symmetry. The marginal melded priors for θ 1 and θ 2 as shown in Figure 3.3 (a) and (b), respectively, exhibit much more extreme right-skewness, which is preserved only somewhat weakly in the corresponding posterior densities of Figure 3.3 (c) and (d), the kernel smoothed densities of the histograms in Figure 3.2. The joint posterior distribution for θ 1 and θ 2 shown in Figure 3.4 similarly does not exhibit any extreme skewness, though the tails appear more dispersed for larger values 31

Figure 3.1: Trace plot showing Markov chain Monte Carlo output for posterior samples of model input and output parameters θ 1, θ 2 and φ via Bayesian melding. of each parameter.

37 Figure 3.1: Trace plot showing Markov chain Monte Carlo output for posterior samples of model input and output parameters θ 1, θ 2 and φ via Bayesian melding. of each parameter. As with the marginal posterior densities, the joint distribution is unimodal. Note that the joint density is zero where θ 1 θ 2. The autocorrelations for samples of θ 1 shown in Figure 3.2 (d) are unfortunately very high; this indicates that thinning of the chain is required to obtain more accurate variance estimates. However, thinning the chain by removing successive observations yields no great improvement; autocorrelations decay only very slowly even with half of all observations removed, with significant correlations remaining until approximately lag 25 in this case. Such high autocorrelations suggest slow convergence; however, as Figure 3.5 shows, running two chains of length 5000 with different starting values 9 results in relatively good mixing. As evident from the figure, the trace plots for each run overlap considerably, which indicates that, autocorrelations aside, the Gibbs samples are converging to the stationary posterior. Notably, 9 In this case, the same chain as mentioned above and another with θ (0) 1 = and θ (0) 2 =

38 Figure 3.2: Histograms of Bayesian Melding posterior distributions for (a) θ 1, (b) θ 2, (c) φ, with premodel prior distributions shown as solid lines. The first 500 observations are omitted as a burn-in period. Sample autocorrelations for θ 1 are shown in (d). this does not suggest that tuning (i.e. δ 0) was required to achieve convergence for this particular dataset. We obtain summary statistics for the Chesapeake Bay data as given in Table 3.1. As before, note that normal standard errors are given for the data, whereas the standard deviations for the posterior omit the burn-in period and are, in any case, naive and significantly negatively biased. In order to obtain accurate variance estimates for the posterior samples, we would have to eliminate most of the significant autocorrelations, requiring the removal of 80% or more of each chain. For example, when some 4040 observations in each chain were removed, leaving only 960 remaining, non-significant autocorrelations were achieved at lag 9 for θ 1 and lag 10 for θ 2 ; this is shown for θ 1 in Figure 3.6 (a). Although the decay of autocorrelations 33

39 Figure 3.3: Melded priors for θ 1 (a) and θ 2 (b), with posterior densities for each in (c) and (d), respectively, with the latter obtained with the first 500 observations omitted as burn-in period. exhibited following thinning is much faster, it is desirable to achieve an even faster decay. However, since we are now left with fewer than 1000 posterior samples, running the Markov chains for more iterations is clearly undesirable unless tuning is employed. Following thinning, the removal of the first 100 samples as a burn-in period, and taking only every 9th observation for θ 1 and only every 10th for θ 2, the posterior means for θ 1 and θ 2 are and 64.06, with posterior standard deviations 9.25 and 8.48, respectively. The posterior means are largely unchanged from those given in Table 3.1, with that for θ 2 identical at three digits of precision. In contrast, the standard deviation of θ 1 is increased, as we would expect given the high autocorrelation of the raw Monte Carlo output. Conversely, the standard deviation for θ 2 is actually decreased, which we might attribute to the presence of negative autocorrelations in the thinned data with 80% removed. This occurs 34

40 Figure 3.4: Joint posterior distribution of θ 1 and θ 2 with initial 500 observations removed as burn-in period. Density values increase from red to orange to yellow. for both θ 1 and θ 2, with the negative autocorrelations for the former shown in Figure 3.6. Note that the samples for φ are not as highly correlated, with non-significant autocorrelation obtained at lag 6. Removing the initial 500 observations as a burn-in period and then every 6th observation leaves 750. The adjusted posterior mean and standard deviation are and 4.22, respectively, the latter of which is increased from that obtained without any thinning. Lastly, although this method has eliminated the autocorrelation of the posterior samples, only 96 observations remain, indicating that a longer chain (and subsequent thinning) and (or) proper tuning are required to obtain a good approximation to the posterior density. Finally, as in the previous examples, we obtain quartiles for the model parameters and corresponding 95% highest posterior density (HPD) regions, which are given in Table 3.1. For θ 1 we obtain 2.5%, 50.0%, and 97.5% quartiles of 71.67, 88.70, and , respectively, with corresponding quartiles for θ 2 of 47.72, 63.62, and and for φ of 18.23, 24.54, and The 95% HPD intervals for θ 1, θ 2, and φ are (71.56, 35

Figure 3.5: Trace plot showing two different runs of Markov chain Monte Carlo output from different starting values for model input and output parameters θ 1, θ 2 and φ.

41 Figure 3.5: Trace plot showing two different runs of Markov chain Monte Carlo output from different starting values for model input and output parameters θ 1, θ 2 and φ. The significant overlap (apart from the burn-in period) between the sets of chains for each run indicate convergence to the stationary posterior ), (48.32, 82.61), and (17.99, 33.38), respectively. Note that these are considerably narrower than the frequentist 95% confidence intervals for the data means (i.e. E(W i θ 1 ) = θ 1 and E(U i θ 2 ) = θ 2 ) based on the Central Limit Theorem 10, which are also given in Table 3.1. Similar to the Cone Spring results, the data, premodel prior, and posterior means in each case lie within the HPD regions for each input parameter, except for the prior mean for φ. Note that these results comprise posterior distributions for the expected energy into, out of, and dissipated from the Chesapeake Bay mesohaline network ecosystem, denoted by the parameters θ 1, θ 2, and φ, respectively. The posterior means given in Table 3.1 (both before and after thinning of the Monte 10 These correspond to ( W ± z0.05 s(w )/ n ), where W denotes the sample mean of W i and s(w ) the sample standard deviation of W i. 36

Statistical Inference for Food Webs

Statistical Inference for Food Webs Part I: Bayesian Melding Grace Chiu and Josh Gould Department of Statistics & Actuarial Science CMAR-Hobart Science Seminar, March 6, 2009 1 Outline CMAR-Hobart Science