An alternative marginal likelihood estimator for phylogenetic models

Size: px
Start display at page:

Download "An alternative marginal likelihood estimator for phylogenetic models"

Transcription

1 An alternative marginal likelihood estimator for phylogenetic models arxiv: v1 [stat.co] 13 Jan 2010 Serena Arima 1 and Luca Tardella 2 1 Dipartimento di studi geoeconomici, linguistici, statistici e storici per l analisi regionale, SAPIENZA Università di Roma 2 Dipartimento di Statistica, Probabilità e Statistiche Applicate, SAPIENZA Università di Roma January 13, 2010 Abstract Bayesian phylogenetic methods are generating noticeable enthusiasm in the field of molecular systematics. Several phylogenetic models are often at stake and different approaches are used to compare them within a Bayesian framework. The Bayes factor, defined as the ratio of the marginal likelihoods of two competing models, plays a key role in Bayesian model selection. However, its computation is still a challenging problem. Several computational solutions have been proposed none of which can be considered outperforming the others simultaneously in terms of simplicity of implementation, computational burden and precision of the estimates. Available Bayesian phylogenetic software has privileged so far the simplicity of the harmonic mean estimator (HM) and the arithmetic mean estimator (AM). However it is known that the resulting estimates of the Bayesian evidence in favor of one model are often Dipartimento di studi geoeconomici, linguistici, statistici e storici per l analisi regionale, SAPIENZA Università di Roma, via del Castro Laurenziano 9, Roma, serena.arima@uniroma1.it 1

2 biased and inaccurate up to having an infinite variance so that the reliability of the corresponding conclusions is doubtful. We focus on an alternative generalized harmonic mean (GHM) estimator which, recycling MCMC simulations from the posterior, shares the computational simplicity of the original HM estimator, but, unlike it, overcomes the infinite variance issue. We show that the Inflated Density Ratio (IDR) estimator when applied to some standard phylogenetic benchmark data, produces fully satisfactory results outperforming those simple estimators currently provided by most of the publicly available software. keywords: Bayes factor, harmonic mean, importance sampling, marginal likelihood, phylogenetic models. 1 Introduction The theory of evolution states that all organisms are related through a history of common ancestor and that life on Earth diversified in a tree-like pattern connecting all living species. Phylogenetics aims at inferring the tree that better represents the evolutionary relationships among species studying differences and similarities in their genomic sequences. Since genes evolve accumulating changes, the larger the number of differences in the genetic code of two species is, the larger the evolutionary distance between them is likely to be. Alternative tree estimation methods have been proposed, such as parsimony methods (see [10], chapter 7) and distance methods [11, 6]; in this paper, we will focus on stochastic models for substitution rates and we address model choice in a fully Bayesian framework with an alternative model evidence estimation procedure. 1.1 Substitution models: a brief overview Comparing two species, we define substitution the replacement in the same situs of one nucleotide in one species by another one in the other species. The stochastic models describing this replacement process are called substitution models. A phylogeny or a phylogenetic tree is a representation of the genealogical relationships among species, also called taxa or taxonomies. The tips (leaves or external nodes) 2

3 represent the present-day species, while the internal nodes usually represent extinct ancestors for which genomic sequences are no longer available. The ancestor of all sequences is the root of the tree. The branching pattern of a tree is called topology, denoted with τ, while the lengths ν τ of the branches of the tree τ represent the time periods covered by the branches. DNA substitution models are probabilistic models which aim at modeling changes between nucleotides in homologous DNA strands; replacements within these sequences are modeled by a 4-state Markov process, in which each state represents a nucleotide. The nucleotide sites are usually assumed to evolve independently each other. Substitutions at any particular site are described by a Markov chain, with the four nucleotides as states of the chain. The evolution of this Markov chain is completely specified by a substitution rate matrix Q(t) = r ij (t), which defines the rates of substitution r ij of the four bases: each element r ij, i j, represents the instantaneous rate of substitution from nucleotide i to nucleotide j. The diagonal elements of the rate matrix are defined as r ii = j i r ij so that 4 j=1 r ij = 0, i. The probability matrix of change P (t) = {p ij (t)}, defines the probability of change from state i to state j, that is p ij (t) = P r{x(t) = j X(t 1) = i}. It is traditionally assumed that the substitution process is stationary with equilibrium distribution Π = (π A, π C, π G, π T ) and time-reversible, that is π i r ij = π j r ji (1.1) where π i is the proportion of time the Markov chain spends in state i and π i r ij is the amount of flow from state i to j. Equation (1.1) is known as detailed-balance condition and means that flow between any two states in the opposite direction is the same. The substitution process is assumed homogeneous over time and parameters of the substitution process constant over time so that Q(t) = Q and P (t) = P. Following the notation in [17], we define r ij = ρ ij π i, i j, where ρ ij is the transition rate from nucleotide i to nucleotide j. This reparameterization is particularly useful for the specification of substitution models, since it makes clear the distinction between the nucleotide frequencies π A, π G, π C, π T and substitution rates ρ ij, allowing for different assumptions on evolutionary patterns. Several substitutions 3

4 sub-models, reflecting specific biological assumptions, have been proposed. The most general time-reversible nucleotide substitution model is the so-called GTR defined by the following rate matrix - ρ AC π C ρ AG π G ρ AT π T Q = ρ AC π A - ρ CG π G ρ CT π T ρ AG π A ρ CG π C - ρ GT π (1.2) T ρ AT π T ρ CT π C ρ GT π G - and more thoroughly illustrated in [23]. Several substitution models, such as K80 and HKY85, can be obtained simplifying the Q matrix reflecting specific biological assumptions: the simplest one is the JC69 model, originally proposed in [21], which assumes that all nucleotides are interchangeable and have the same rate of change, that is ρ ij = ρ i, j and π A = π C = π G = π T. In this paper, for illustrative purposes we will consider GTR and JC69 models; for a wider illustration of alternative substitution models see for example [10, 8]. 1.2 Bayesian inference for substitution models The parameter space of a phylogenetic model can be represented as Ω = {τ, ν τ, θ} be where τ T is the tree topology, ν τ the set of branch length of topology τ, N T is the cardinality of the number of the possible topologies and θ = (ρ, π) the parameters of the rate matrix. Notice that N T is a huge number even for few species (for n = 10 species, N T ). Given a tree topology τ and branch lengths ν τ, one can compute the likelihood L = P (X τ, ν τ, θ) using the pruning algorithm, a practical and efficient recursive algorithm proposed in [9]. One can then make inference on the unknown model parameters looking for the parameters which maximize the likelihood. Instead, in a Bayesian framework, the parameter space is endowed with a prior distribution π(τ, ν τ, θ) and the likelihood is used to update the prior uncertainty about model parameters following the Bayes rule: p(τ, ν τ, θ X, M) = NT i=1 p(x τ, ν τ, θ)π(τ, ν τ, θ) ν τi θ p(x τ, ν τ, θ)π(τ, ν τi θ)dν i dθ. 4

5 The resulting distribution p(τ, ν τ, θ X, M) is the posterior distribution which coherently combines prior believes and data information. This distribution is clearly not analytically computable but can be approached through appropriate approximations. Indeed, over the last ten years, powerful numerical methods based on Markov Chain Monte Carlo have been developed, allowing one to carry out Bayesian inference under a large category of probabilistic models, even when dimension of the parameter space is very large. Indeed, several ad-hoc MCMC algorithms have been developed for phylogenetic models such as [24, 27] and are currently implemented in publicly available softwares such as in MrBayes [40] and PHASE [14]. 2 Model selection for substitution models Given the variety of substitution models, an important issue of any model-based phylogenetic analysis is to select the one that is the most supported by the data. Several model selection procedures have been proposed. A standard approach to model selection is to perform the hierarchical likelihood ratio test (LRT) [35] for choosing between alternative nested models. A number of popular programs allow users to compare pairs of model using this test such as PAUP [44], PAML [50] and the R package APE [36]. However, as it has been shown in [34], performing systematic LRT is not an optimal choice for model selection in phylogenetics. This is because the model that is finally selected can depend on the order in which the pairwise comparisons are performed [33]. Moreover, it is well-known that LRT tends to favor parameter rich models. The Akaike Information Criterion (AIC) is another model-selection criterion commonly used also in phylogenetics [34]: one of the advantages of the AIC is that it allows non nested models to be compared, and it can be easily implemented. However, the AIC, as well as LRT, tends to favor parameter-rich models. A slightly different approach proposed to overcome this selection bias, is the Bayesian Information Criterion BIC ([41]), which penalizes parameter-rich models. All these criteria can, in practice, select very different substitution models, as shown in [1]; moreover, they compare ratios of likelihood values penalized for an increase in the dimension of one of the models, without directly account for uncer- 5

6 tainty in the estimates of model parameters. This aspect is completely accounted for within a fully Bayesian framework through the use of the Bayes Factor. In fact, Bayes Factor directly incorporates this uncertainty and it is more intuitive than other methods since it can be directly used to assess the comparative evidence provided by the data in terms of the most probable model under equal prior model probabilities. The Bayes Factor can be thought of as the Bayesian counterpart of the likelihood ratio tests. In the past decades its use was limited to simple models because of analytical intractability in most complex models and computational problems. However, thanks to the development of effective computational tools combined with more efficient simulation methods, the Bayes Factor is becoming more and more common in many fields of application [38]. In particular, the use of the Bayes Factor for comparing phylogenetic models was firstly introduced in [42] and [43] and since then its popularity has grown and there are currently some publicly available softwares which output approximations of the Bayes Factor. Indeed the complexity of these models and the computational burden in the light of high-dimensional parameter space make the problem of finding alternative and more efficient computational strategy for computing Bayes Factor still open and in continuous development [25]. Suppose, we aim at comparing two competing substitution models M 0 and M 1. The Bayes Factor is defined as the ratio of the marginal likelihoods of the two models that is where, for i = 0, 1 N T p(y M i, θ (i) ) = BF 01 = p(y M 1) p(y M 0 ) τ=1 ν (i) τ θ (i) p(y M i )p(θ M i )dθ. The marginal likelihood basically corresponds to the denominator of the Bayes rule formula and is indeed the normalizing constant of the unnormalized density obtained as the product of prior and likelihood at the numerator. [22] gives numerical guidelines for interpreting the evidence scale. Values of BF 01 > 1 (log(bf 01 ) > 0) will be considered as the evidence in favor of M 1 but only a value of BF 01 > 100 (log(bf 01 > 4.6) can be considered as decisive. 6

7 3 Alternative computational tools for Bayesian model evidence It is clear that the computation of the marginal likelihoods for phylogenetic models is not straightforward: it involves summation over all possible topologies N T and the solution of the k-dimensional integral for the branch length parameters ν τ and the substitution rate matrix θ = (ρ, π). Most of the marginal likelihood estimation methods proposed in the literature have been applied extensively also in molecular phylogenetics [28, 25, 43]. Among these methods, many of them are valid only under very specific conditions. For instance, the Dickey-Savage ratio [46] applied in phylogenetics in [43], assumes nested models. The Laplace estimator [22] and the Bayesian Information Criterion [41], applied in phylogenetics firstly in [28], require large sample approximations around the maximum likelihood, which can be sometimes difficult to compute or approximate for very complex models. A recent appealing variation of the Laplace approximation has been proposed [39]: however, its use is endangered when the posterior distribution deviates from normality and the maximization of the likelihood can be neither straightforward nor accurate. The reversible jump approach [16, 5], where MCMC is devised to jump between models according to a Metropolis-Hastings rule, is, in principle, a wholly general algorithm. In practice, the Metropolis-Hastings proposal moves between models have to be accepted at a sufficient rate for the method to be practical. In order to meet this requirement, the algorithm has been extended to phylogenetic model selection in [19]; however, the implementation of this algorithm is not straightforward for the end user and often requires appropriate delicate tuning. Moreover, it suffers extra implementation difficulties when comparing models based on an entirely different parametric rationale [25]. Currently, to our knowledge, the most widely used methods for estimating the marginal likelihood of phylogenetic models are the thermodynamic integration reviewed in [13] and first applied in a phylogenetic context in [26] and the harmonic mean approach as originally proposed in [29]. The thermodynamic integration produces reliable estimates of Bayes Factors of phylogenetic models in a large varieties 7

8 of models. Although this method has the advantage of general applicability, it can incur high computational costs and may require specific adjustments; for certain model comparisons, a full thermodynamic integration may take weeks on a modern desktop computer, even under a fixed tree topology for small single protein data sets [39]. On the other hand, the HM estimator can be easily computed and it does not demand further computational efforts other than those already made to draw inference on model parameters, since it only needs simulations from the posterior distributions. For this reason in the next sections we focus on generalizations and alternative instances of the HM approach. 4 HM: Harmonic Mean estimators In this section we introduce the basic ideas and formulae for the Harmonic Mean (HM) estimator and its generalized version, the Generalized Harmonic Mean (GHM) estimator. Since the marginal likelihood is nothing but the normalizing constant of the unnormalized density obtained as the product of prior and likelihood we illustrate the derivation of GHM as follows. Let us consider the normalizing constant of a non-negative, integrable density g be defined as c = g(θ)dθ Ω where θ Ω R k and g(θ) is the unnormalized version of the probability distribution g(θ). The GHM estimator of c is based on the following identity c = 1 [ ( ) ] 1 (4.1) g(θ) E g f(θ) where f is a Lebesgue integrable function such that f(θ)dθ = 1. The GHM Ω estimator, denoted as ĉ GHM is the empirical counterpart of the identity (4.1), namely ĉ GHM = 1 1 T T t=1 f(θ t). (4.2) g(θ t) In Bayesian inference the very first instance of such GHM estimator was introduced to estimate the marginal likelihood considered as the normalizing constant corresponding to the unnormalized density g(θ) = π(θ)l(θ). Hence, taking f(θ) = π(θ) 8

9 one obtains as special case of (4.2) the Harmonic Mean estimator ĉ HM = 1 1 T 1 T t=1 L(θ t) (4.3) which can be easily computed by recycling simulations θ 1,..., θ T from the target posterior distribution g(θ) available from MC or MCMC sampling scheme. This probably explains the original enthusiasm in favor of ĉ HM which indeed was considered a potential convenient competitor of the standard Monte Carlo Importance Sampling estimate given by the (Prior) Arithmetic Mean (AM) estimator ĉ AM = 1 T T L(θ t ) (4.4) t=1 where θ 1,..., θ T are sampled from the prior π. The simplicity of implementation combined with a light computational burden due also to recycling posterior simulations reducing computing time of a factor in the hundreds with respect to thermodynamic integration has favored the widespread use of the Harmonic Mean estimator. In fact, the Harmonic Mean estimator is implemented in several Bayesian phylogenetic softwares as shown in Table 1 and recent biological papers [49, 48, 30] reports the HM as a routinely used model selection tool. In this paper we rely on MrBayes software [40] in order to implement MCMC machinery and obtain posterior distribution of the parameters of interest. However, Software MrBayes PhyloBayes PHASE BEAST Marginal likelihood estimation method Harmonic Mean Thermodynamic Integration under normal approximation Reversible Jump and Harmonic Mean Harmonic Mean or bootstrapped Harmonic Mean Table 1: Marginal likelihood estimation methods for the most widely used phylogenetic software: despite of its inaccuracy, the harmonic mean estimator is still the most diffuse marginal likelihood estimation tool. both ĉ AM and ĉ HM can end up with a very large variance and unstable behavior. For ĉ AM this is simply explained by the fact that the likelihood usually gives support to a region with low prior weight hence sampling from the prior yields low 9

10 chance to hit high likelihood region and large chance to hit much lower likelihood region ending up in a large variance of the estimate ĉ HM. Indeed, starting from the original paper [29], (see in particular R. Neal s discussion) it has been recognized that even in very simple and standard models the HM estimator may have infinite variance and lead to an unreliable approximation. Several generalizations and improved alternatives have been proposed and recently reviewed in [37]. This fact raises sometimes the question whether they are reliable tools and certainly has encouraged researchers to look for alternative solutions. In the following sections we will consider a new marginal likelihood estimator, the Inflated Density Ratio (IDR) estimator, proposed in [32], which is a particular instance of the Generalized Harmonic Mean (GHM) approach as proposed in [7] and [12]. This new estimator basically shares the original simplicity and computation feasibility of the HM estimator but, unlike it, it can guarantee important theoretical properties, such as the finiteness of the variance. 5 IDR: Inflated Density Ratio estimator The IDR estimator is an alternative way of estimating the normalizing constant of a density function. This new estimator is obtained via a basic identity involving the ratio of two functions: the unnormalized target density and a perturbed version of it. Like the HM estimator, it can be easily computed by recycling MC or MCMC simulations from the target distribution. In [32] theoretical properties of IDR are investigated. Is is shown that it is a consistent estimator with guaranteed finite variance under fairly mild sufficient conditions which are shown to hold for many typical families of densities and tail behavior ranging from gaussian to double-exponential up to Cauchy. An asymptotic estimator of the Relative Mean Square Error is provided as well as a simple confidence interval formula. A different formulation of the Generalized Harmonic Mean estimator, based on a particular choice of f(θ) is proposed in [32]. The function f(θ) is obtained through a perturbation of the original target function g(θ): this perturbed version of g, that we will denote g Pk, is defined such that its total mass has some known functional relation to the total mass c of the target density g. In particular, g Pk is obtained as a parametric inflation of g so 10

11 that Ω g Pk (θ) = c + k (5.1) where k is a known inflation constant which can be arbitrarily fixed. The perturbation method has been widely described in [31] and in [32] for unidimensional and multidimensional densities. In the unidimensional case the perturbed density is defined as follows where 2r k = k g(0) g(θ + r k ) if θ < r k g Pk (θ) = g(0) if r k θ r k (5.2) g(θ r k ) if θ > r k = 1 is the length of the interval centered around the origin. Figure 1 can help visualizing how the perturbation acts. The Inflated Density Ratio estimator ĉ IDR for c is defined as ĉ IDR = k 1 T g Pk (θ t) T t=1 g(θ t) 1 (5.3) where θ 1,..., θ T is a sample from the normalized target density g. The use of the perturbed density as importance function, leads to some advantages with respect to the other instances of c GHM proposed in the literature. Indeed the function f k (θ) = g P k (θ) g(θ) need not be positive and it is shown to yield a finite-variance k estimator under a wide range of g densities. Two simple sufficient conditions for the finiteness of the variance are introduced in [32]. Moreover, the use of a parametric perturbation makes the method more flexible and efficient with a moderate extra computational effort. Like all methods based on importance sampling strategies, the properties of the estimator ĉ IDR strongly depend on the ratio g P k (θ) : in particular, the asymptotic g(θ) Relative Mean Square Error of the estimator RMSEĉIDR is defined as 11

12 g(θ) Original density Perturbed density g Pk (θ) θ θ Figure 1: Left Panel: original density g. Right Panel: perturbed density g Pk defined as in (5.2) The total mass of the perturbed density is then c + k. The shaded area correspond to the inflated mass k with k = 2 r k g(0) as in (5.2). [ ) ] 2 RMSEĉIDR = (ĉidr c E g c V ar c k [ ] gpk (θ) g(θ) (5.4) A sufficient and necessary condition for the finiteness of the RMSEĉIDR is the finiteness of the second moment of the ratio g P k (θ), that is: g(θ) [ ] [ (gpk ) ] 2 gpk (θ) (θ) RMSEĉIDR = V ar g < E g < g(θ) g(θ) The Relative Mean Square Error can be estimated as follows: RMSEĉIDR ĉidr k [ ] gpk (θ) V ar g g(θ) (5.5) where V ar g is the sample variance of the ratio. The expression in Equation (5.5) clarifies the key role of the choice of k with respect 12

13 to the error of the estimator: for k 0, the variance of the ratio g P k (θ) tends to 0, g(θ) [ ] since g Pk is very close to g, but c gpk (θ) tends to infinity: in other words, if V k ar g g(θ) 1 would favor as little values of k as possible, acts in the opposite direction. In k order to address the choice of k, [32] suggested to choose the perturbation k which minimizes the estimation error: in practice, they suggest to calculate the values of the estimator for a grid different perturbation values, ĉ IDR (k), k = 1,..., K and choose the optimal k opt as the k for which RMSEĉIDR (k) is minimum. This procedure for the calibration of k requires iterative evaluation of ĉ IDR (k) hence is relatively heavier than the HM estimator, but it does not require extra simulations which in the phylogenetic context is often the most time-consuming part. Hence, the computational cost is alleviated by the fact that one uses the same sample from g and the only quantity to be evaluated K times is the inflated density g Pk. Once obtained the ratio of the perturbed and the original density, the computation of ĉ IDR (k) is straightforward. For practical purposes, the computation of the inflated density when the support of g is the whole R k can be easily implemented in R using a function available at In [32] and [4] it has been shown that in order to improve the precision of ĉ IDR it is recommended evaluate a a (local) mode ˆm 0 and the sample variance covariance matrix Ŝ based on the simulated θ 1,..., θ t and perform a simple one-to-one affine transform S 1 2 (θ i ˆm 0 ) so that the corresponding density which takes into account that change of measure has the same normalizing constant has a local mode at the origin where the inflation occurs and an approximately standard variance covariance matrix. This is indeed automatically implemented in the publicly available R code available. Some adaptions are needed in those contexts, such as phylogenetic evolutionary models, where the parameter space is constrained or of mixed nature and we will illustrate in the present paper how one can successfully implement those adaptions and make ĉ IDR available as a simple and effective alternative evaluation of model evidence in standard phylogenetic software. Preliminary investigation of the effectiveness and comparative performance of the ĉ IDR has been already carried out in [32] and [4] an we will briefly summarize the main encouraging findings. In order to assess the true effectiveness of the IDR method, it has been applied to simulated data from different distributions for 13

14 which the normalizing constant is known. As shown in [32], the estimator produces fully convincing results with several known distributions even for a 100-dimensional multivariate normal: in terms of estimator precision, these results are comparable with those in [26] obtained with the thermodynamic integration method. In [4] simple antithetic variates tricks allow the IDR estimator to perform well even for those distributions with severe variations from the symmetric gaussian case such as asymmetric and even some multimodal distributions. Table 2 shows the estimates obtained by applying the IDR method in several scenarios: univariate and multivariate (100) Normal distribution, univariate and multivariate (5 and 30) skew Normal distribution and multivariate (2,3, and 10) two components Normal mixture. Table 2 shows that the method correctly reproduces the true value of the normalizing constant for different shape and dimension of the target function. Some real data implementation with standard generalized linear models have been also reported in [32]. These results encouraged us to extend the use of ĉ IDR in more complex settings such as phylogenetic models. 6 Implementing IDR for substitution models with fixed topology The Inflated Density Ratio approach can be extended in order to compute the marginal likelihood for more complex models, such as phylogenetic models. The main difficulty in computing the marginal likelihood for these models is that they involve parameters defined in a mixed parameter space: in fact, the parameter of the substitution model θ and the branch lengths ν τ are defined in a continuous space while the tree topology τ in a discrete one. However we now show how to extend the flexibility of the IDR approach whether or not the topology is fixed. For a fixed topology τ and a sequence alignment X, the parameter space of a phylogenetic model M is defined by Ω = (θ, ν τ ); the joint posterior distribution on Ω is given by p(θ, ν τ X, M) = p(x θ, ν τ)p(θ, ν τ ) m(θ, ν τ ) (6.1) 14

15 Distribution log c k opt log ĉ IDR RMSEĉIDR N(µ, σ) N 100 (µ, σ) SN(µ, σ, τ) SN 5 (µ, σ, τ) SN 30 (µ, σ, τ) Mix N Mix N Mix N Table 2: IDR approach for Normal distributions (univariate and 100 dimension), univariate and multivariate (5 and 30 dimension) skew normal distributions and multivariate (2,3 and 10 dimension) mixture of two Normal components. The value log c represents the logarithm of the true value of the normalizing constant; k is the optimal inflation coefficient which minimizes the Relative Mean Square Error RMSEĉIDR computed as in (5.5). log ĉ IDR is the value of the estimated normalizing constant on a logarithmic scale. A sensitivity study has showed that the performance of the method is robust to changes of (µ, σ, τ) parameter choices. 15

16 where m(θ, ν τ ) = p(x θ, ν τ )p(θ, ν τ ) (6.2) θ ν τ is the marginal likelihood we aim at estimating. When the topology is fixed the parameter space Ω is homogeneous in the sense that it consists of parameters all defined on a continuous support. In order to apply the Inflated Density Ratio we only need the following two ingredients a sample (θ (1), ν τ (1) ),..., (θ (n), ν τ (n) ) from the posterior distribution, p(θ, ν τ X, M) the likelihood and the prior distribution evaluated at each posterior sampled value (θ (k), ν (k) τ ), that is p(x θ (k), ν τ (k) ) and p(θ (k), ν τ (k) ) The first ingredient is just the usual output of the Monte Carlo Markov Chain simulations derived from model M and the data X. The computation of the likelihood and the joint prior is indeed already coded within the available software. The first one is usually accomplished through the pruning algorithm while computing the prior is straightforward. A necessary condition for the inflation idea to work as prescribed in [32] is that parameter posterior density must have full support on the whole real k-dimensional space. In our phylogenetic models this is not always the case and we explain here simple fully automatic remedies to overcome this kind of obstacle. We start with parameters like the branch lengths are constrained to lie in the positive half-line. In that case the remedy is straightforward: a simple logarithmic transformation can reparameterize the posterior density in terms of ν τ = log(ν τ ) (6.3) so that the support corresponding to the reparameterized density is unconstrained. Obviously the log(ν τ ) reparameterization calls for the appropriate change-of-measure Jacobian when evaluating the corresponding transformed density. For model parameters with linear constraints like the substitution θ = {ρ, π}, a little less obvious transformation is needed. In this case θ = {ρ, π} are subject to the following set of constraints: i {A,T,C,G} j {A,T,C,G} π i = 1 ρ ij π j = 0 i {A, T, C, G} 16

17 Indeed it is well known that similarly to the first simplex constraint the last set of constraints together with the reversibility can be rephrased in terms of another simplex constraint concerning only the extra-diagonal entries of the substitution rate matrix (1.2) namely ρ AC + ρ AG + ρ AT + ρ CG + ρ CT + ρ GT = 1. We have relied on the so-called additive logistic transformation [45, 2] which is a one-to-one transformation from R D 1 to the (D 1)-dimensional simplex S D = (x 1,..., x d ) : x 1 > 0,..., x D > 0; x x D = 1. Hence we can use its inverse, called additive log-ratio transformation, which is defined as follows y i = log ( xi x d ) j = 1,..., D 1 for any x = (x 1,..., x d ) S D. Applying these transformations to nucleotide frequencies and to exchangeability parameters, the transformed parameters assume values in the entire real support and the IDR estimator can be computed. Again the reparameterization calls for the appropriate change-of-measure Jacobian when evaluating the corresponding transformed density and details for this particular reparameterization are given in [2]. In this work, the application of the IDR method has then been performed using the MCMC output of the simulations from the posterior distribution obtained using the MrBayes software and using the R packages PML, for the evaluation of the likelihood, and HI for the IDR perturbation f k (θ). Some ad-hoc R functions have been developed in order to implement the aforementioned parameters transformations; R code is available upon request from the first author. 7 Numerical examples and comparative performance In this section the successful implementation of the IDR estimator is illustrated in some of the typical phylogenetic models with literature benchmark datasets. Although it requires a little bit of extra computation with respect to model evidence 17

18 estimators such as HM, IDR can produce more reliable and accurate model evidence assessments keeping the simplicity of the original HM idea. For illustrative purposes we restrict ourselves to check the comparative performance of the IDR with respect to two of the simplest and most favorite model evidence output in the publicly available software (yet most problematic ones): the Harmonic Mean estimator and the Arithmetic Mean estimator. Indeed, while the former is guaranteed to be a consistent estimate of the marginal likelihood but possibly with infinite variance, the latter one is consistent only when formula (4.4) is applied when θ 1,..., θ T are sampled from the prior. Since it is known such a prior AM is very unstable and unreliable it has often been replaced by a posterior AM where θ 1,..., θ T are sampled from the posterior rather than from the prior. In that case the resulting quantity can be interpreted only as a surrogate evidence in favor of one model and by no means confused with the rigorous concept of marginal likelihood and its essential relation with Bayes Factor. We now show how the performance of IDR in two phylogenetic examples where simulated data is used to have a better control of what one should expect from marginal likelihood and comparative evidence of alternative models. 7.1 Hadamard 1: marginal likelihood computation We first use the synthetic data set Hadamard 1 in [10]. It consists of a sequence 1000 of amino acid alignments of six species, A, B, C, D, E and F simulated from a GT R + Γ model. The true tree in shown in the left Panel of Figure 2. MrBayes software [3] uses the Metropolis Coupling algorithm simulate a Markov Chain which is then used as a sample from the posterior distribution in order to make posterior inference on the model parameters and provide estimates of the marginal likelihood via AM and HM. Appropriate constraints have been considered in order to fix the topology. In this case, the parameter space Ω consists of 18 parameters. In order to reduce the autocorrelation and improve the convergence of the sampled values, has been discarded with thinning rate equal to 10. We have recycled this MCMC output to estimate the marginal likelihood of the GT R + Γ model for the known true topology through the IDR method. Results for different 18

19 F A B E D D C C A B Figure 2: True phylogenetic trees for simulated data: the tree in the left Panel refers to Hadamard 1 data and the tree in the right Panel to Hadamard 2. perturbation values k, are shown in Table 3: this table shows the values of the IDR estimator for different perturbation values, the Relative Mean Square Error ( RMSEĉIDR,Delta ) and the corresponding confidence interval of the estimated (ĈI). Since the smallest error corresponds to a perturbation value k opt = 10 7, the IDR estimator (on a logarithmic scale) for the GT R + Γ model is log ĉ IDR = We compare the results of the IDR method with those obtained with the Harmonic Mean (HM) and the posterior Arithmetic Mean (AM). The three methods produce somewhat different quantities although sometimes compatible once accounted for the estimation error. For each method, the Monte Carlo error of the estimates has been computed simulating and re-estimating the model 10 times ( RMSEĉ,MC ). However, it is known that under critical condition for HM the MC error of the HM cannot fully guarantee its precision, but it still remains a necessary premise for an accurate estimate. Hence we look at three different precision assessments based respectively on the asymptotic delta-method approximation of the RMSE ( RMSEĉ,Delta ), the estimated RMSEĉ,MC based on independent MC replicates of posterior simulations and ĉ and an evaluation of RMSE ( RMSEĉ,Boot ) based on bootstrap replication from a single simulation stream. We estimate the ratio bias of the estimator using a bootstrap strategy: B bootstrap samples have been considered and the marginal likelihood ĉ b have been estimated for each bootstrap sample. 19

20 k log ĉ IDR RMSEĉIDR,Delta RMSE ĉ IDR,Delta [ , ] [ , ] [ , ] [ , ] [ , ] ĈI Table 3: Inflated Density Ratio method for the Hadamard 1 data: the parameters lie in 18-dimensional space, that is Ω = R 18. The table shows the values of the IDR estimator for a small grid of different perturbation values, the corresponding relative mean square errors RMSEĉIDR,Delta as in (5.5) without accounting for autocorrelation, the relative mean square errors corrected for the autocorrelation ( RMSE ĉ IDR,Delta ) and the corresponding confidence interval (ĈI). Since the smallest error corresponds to a perturbation value k opt = 10 7, the IDR estimator (on a logarithmic scale) for the GT R+Γ model is log ĉ IDR = RMSE ĉ IDR,Delta has been obtained by applying formula (5.5) and considering an upper bound for the discounting factor of the sample variance as in (7.2), due to the autocorrelated simulations. In this application the discounting factor more than doubles the uncorrected RMSEĉIDR,Delta. 20

21 The bias B boot and the ratio standard deviation Ŝboot are estimated respectively as follows: RMSE boot = Tables 7.1 shows the obtained results. Method log(ĉ) RMSEĉ,Delta 1 B B i=1 RMSEĉ,MC (ĉb ) 2 i ĉ 1 (7.1) RMSEĉ,Boot RMSE ĉ,delta IDR HM > > AM Table 4: Hadamard 1 data (Ω = R 18 ): marginal likelihood estimates obtained with the Inflated Density Ratio method ĉ IDR, with the Harmonic Mean approach ĉ HM and with the Arithmetic Mean approach ĉ AM. Three different RMSE estimates are provided: RMSEĉ,Delta has been computed as in (5.5); comes from RMSEĉ,MC 10 Monte Carlo independent replicates of the estimation; RMSEĉ,Boot is a boostrap approximation of RMSE (B=1000); RMSE ĉ,delta comes from a delta-method approximation corrected for the autocorrelation as in (7.2). Indeed also the MC errors are very different. In fact, the smallest MC error is obtained by simply applying the arithmetic mean of the θ t values simulated from the posterior. We have already stressed the fact that the posterior AM does not aim at estimating the marginal likelihood, but we have nonetheless considered it in the following to verify how distant the corresponding values are and how different the conclusions can be when comparing alternative models via the AM estimator. For the Harmonic Mean method the MC error is quite high, reflecting the instability of the HM estimator. On the other hand, the Inflated Density Ratio approach seems to be a good compromise in terms of order of magnitude of the error of the estimate ĉ IDR and robustness of its relative error estimation ranging from 0.15 with independent Monte Carlo re-estimation to 0.30 of the Delta method. Notice that, in order to take into account the autocorrelation of the posterior simulated values, a correction has been applied to the errors estimated with delta and bootstrap method. In particular, RMSE ĉ IDR,Delta has been estimated as in (5.5) with a variance estimate inflated in the MCMC context to take into account of the 21

22 correlated samples leading to a protective effective sample size given by n ESS = n K k=1 ˆρ k (7.2) 7.2 Hadamard 2: Bayes Factor computation We have considered a second benchmark synthetic data set, from [10] referred to as Hadamard 2: it consists of 200 amino acids and four species, A, B, C, D; these data have been simulated from the Jukes-Cantor model (JC69) and the true tree is shown in the right Panel of Figure 2. For the true topology, we compute the marginal likelihood for the JC69 model and for the GTR+Γ model: parameters lie in a 5-dimensional space for the JC69 model and in 14-dimensional space for the GTR+Γ model. The simulated values from the Metropolis-Coupled algorithm have been rearranged to compute the marginal likelihoods for both models using the IDR method, the HM and the AM approach. As for the Hadamard 1 data, Monte Carlo errors have been computed by repeating the estimates 10 times. Tables 7.2 and Table 7.2 show the results obtained respectively for the GTR+Γ and the JC69 models. Method log(ĉ) RMSEĉ,Delta RMSEĉ,MC RMSEĉ,Boot RMSE ĉ,delta IDR HM AM Table 5: Hadamard 2 data: marginal likelihood estimates of the GTR+Γ model (Ω = R 14 ) obtained with the IDR method ĉ IDR, with the HM approach ĉ HM and with RMSEĉ,Delta the AM approach ĉ AM. Three different RMSE estimates are provided: has been computed as in (5.5); RMSEĉ,MC comes from 10 Monte Carlo independent replicates of the estimation; RMSEĉ,Boot is a bootstrap approximation of RMSE (B=1000); RMSE ĉ,delta comes from a delta-method approximation of RMSE corrected for the autocorrelation as in (7.2). All methods produce somewhat different results in terms marginal likelihood; as for the previous example, the smallest Monte Carlo error is associated with the Arithmetic Mean method. Also in this case, the MC error of the Harmonic Mean 22

23 Method log(ĉ) RMSEĉ,Delta RMSEĉ,MC RMSEĉ,Boot RMSE ĉ,delta IDR HM AM Table 6: Hadamard 2 data: marginal likelihood estimates of the Jukes-Cantor model ((Ω = R 5 )) obtained with the IDR method ĉ IDR, with the HM approach ĉ HM and with the AM approach ĉ AM. Three different RMSE estimates are provided: RMSEĉ,Delta has been computed as in (5.5); RMSEĉ,MC comes from 10 Monte Carlo independent replicates of the estimation; RMSEĉ,Boot is a bootstrap approximation of the RMSE (B=1000); RMSE ĉ,delta comes from a delta-method approximation of RMSE corrected for the autocorrelation as in (7.2). method is much larger than those produced by the Inflated Density Ratio method. The corresponding Bayes Factors (on logarithmic scale) for JC69 and GTR+Γ are shown in Table 7.2: considering the reference values for the Bayes Factor defined in [22], all methods strongly support the Jukes-Cantor model, which is the true model. However the strongest evidence in favor of the (known) correct model corresponds to the Bayes Factor as estimated by the Inflated Density Ratio. Method log( BF GT R+Γ JC69 ) MC-ĈI log( BF ) IDR [ , ] HM [5.0206, ] AM [1.0617, ] Table 7: Hadamard 2 data: Bayes Factors computed with IDR, HM and AM approach. of the estimates. MC-ĈI is confidence interval by considering the MC errors of 10 replications 7.3 Hadamard 2: tree selection We now show how it is possible to extend the IDR approach for dealing with non fixed topologies and selecting competing trees. For a fixed substitution model, competing trees can be compared by considering the evidence of the data for a fixed tree 23

24 topology: a tree topology τ i can be evaluated by computing its posterior probability π(τ X) derived from the Bayes theorem as p(τ i X) = p(x τ i)p(τ i ). p(x) Unlike the topology, the parameters θ of the evolutionary process are continuous; in order to compare tree topologies, the continuous parameters are treated as nuisance parameters and are integrated out: p(x τ i ) = Ω p(x τ i, θ)p(θ τ i )dθ In testing tree τ i against τ j, the Bayes Factor is computed as the ratio of the probabilities of the data under each topology: BF ij = p(x τ i) p(x τ j ) (7.3) where p(x τ i ) is the marginal likelihood of the tree τ i with respect to the other parameters [43]. Consider the Hadamard 2 data: in the previous Subsection, we have verified the feasibility of the Inflated Density Ratio approach in comparing the true model (Jukes- Cantor) with the GTR+Γ model, under a fixed topology. The method produces more stable and plausible results. For the same data, we fix the Jukes-Cantor model and compute the Bayes Factors in order to compare alternative topologies (N T = 3 possible topologies) with the true one (τ = 1). Results are shown in Table 7.3. Also in this case, the Inflated Density Ratio gains in precision and robustness τ Size MC-ĈI log( BF IDR ) MC-ĈI log( BF HM ) ) MC-ĈI log( BF AM ) log(bf 21 ) 10 4 [3.511, 3.599] [2.37, 4.066] [2.929, 2.989] log(bf 31 ) 10 4 [3.817, 3.901] [2.503, 3.163] [2.053, 3.131] Table 8: Hadamard 2 data: the Bayes Factor is computed in order to compare competing topologies with the true one. The Bayes Factor is approximated with the IDR method (BF IDR ), the HM (BF HM ) and the AM (BF AM ) approach. Confidence intervals have been constructed by considering the MC errors of 10 replications of the estimates. for the estimate with respect to the Harmonic Mean estimator. 24

25 8 Extending IDR to unknown topology In the previous Subsection, we apply the Inflated Density Ratio approach to estimate the marginal likelihood for competing substitution models under a fixed topology and for comparing competing topologies under a fixed substitution model. In both cases, the space of parameters involves only continuous parameters, so the IDR method can be applied directly as described in [32]. Moreover, the Inflated Density Ratio approach can be also extended in order to compute the marginal likelihood when both the substitution model parameters and the tree topology are unknown, that is N T i=1 ν τi θ p(x τ, ν τ, θ)p(τ, ν τi, θ)dν i dθ (8.1) where N T is the number of possible tree topologies. The Equation (8.1) involves the integration over possible substitution model parameters and branch lengths, and the summation over all possible tree topologies. The Inflated Density Ratio approach can be extended in order to compute the quantity in (8.1) as follows: to simplify notation, we set θ = (π, ρ, ν) the set of continuous parameters and τ a discrete parameter of the tree space. It follows that: g(θ, τ)dθ (8.2) where τ 1, 2,..., N T. We can then rewrite g Pk (θ, τ)dθ = c + k τ θ θ g Pk (θ, τ)dθ = c + k τ τ g Pk (θ, τ 1 )dθ g Pk (θ, τ NT )dθ = c + k θ θ g Pk (θ, τ 1 ) g(θ, τ 1) c 1 dθ g Pk (θ, τ NT ) g(θ, τ N T ) c NT dθ = c + k θ g(θ, τ 1 ) c 1 θ g(θ, τ NT ) c [ ] [ ] NT gpk (θ, τ 1 ) gpk (θ, τ NT ) c 1 E g(θ,τ1 ) c NT E g(θ,τnt ) = c + k g(θ, τ 1 ) g(θ, τ NT ) [ ] c 1 gpk (θ, τ 1 ) c E g(θ,τ 1 ) c [ ] N gpk T (θ, τ NT ) g(θ, τ 1 ) c E g(θ,τ NT ) 1 = k g(θ, τ NT ) c 25

26 where in g Pk (θ, τ i ) the perturbation is done of the continuous space of θ conditioned to τ = τ 1. Let c 1 c = w 1, c 2 c = w 2,..., c N T c = w NT and k = k 1 + k k NT ; therefore, the equality in (8.3) can be written as c = k w 1 E g(θ,τ1 )[ g I(θ,τ 1 ) ] + w g(θ,τ 1 ) 2E g(θ,τ2 )[ g I(θ,τ 2 ) ] w g(θ,τ 2 ) N T E g(θ,τnt )[ g I(θ,τ NT ) ] 1 (8.3) g(θ,τ NT ) in which each element can be estimated as: T k n T k i=1 g Pk (θ (i), τ k ) g(θ (i), τ k ) (8.4) where T k is the observed frequency of topology τ k The marginal likelihood estimator proposed in Equation (8.3) has been applied to Hadamard 2 data under the JC69 model. Results are shown in Table 8. Also in Model log(ĉ IDR ) RMSEĉ,Delta RMSEĉ,MC JC Table 9: Hadamard 2 data: marginal likelihood for JC69 model. RMSE have been estimated with the Delta method ( RMSEĉ,Delta ) and with 10 Monte Carlo replicates ( RMSEĉ,MC ). this case, the MC error and the RMSE of the estimate are quite low, confirming the results obtained with simulated data. Moreover in this way the method can be extended to more complex models defined in higher dimensional space. The extension of the Inflated Density Ratio method to mixed parameter space. However, the same extension, to our knowledge, is not straightforward for other marginal likelihood estimation methods. Indeed the path sampling in [26] is applied to very complex models, much more complex than those considered in this work, but the marginal likelihood is always computed under a fixed topology (in particular, the consensus tree topology). This fact testify the flexibility of the Inflated Density Ratio method and its possible applicability to very different and complex models. 26

27 9 Discussion In this paper, we investigate the possibility of using simple effective recipes for estimating the marginal likelihood for complex models such as phylogenetic models. In a Bayesian framework, several methods have been proposed in order to estimate the model evidence of competing models and then eventually evaluate the Bayes Factor. To our knowledge, the most widely used methods for estimating the marginal likelihood of phylogenetic models are the thermodynamic integration and the harmonic mean approach. The thermodynamic integration has been proved to produce more reliable estimates of Bayes Factors of phylogenetic models in a large varieties of models. Although this method has the advantage of general applicability, it can involve computational costs and may require tunings and adjustments. Moreover, under certain model comparisons, a full thermodynamic integration may take weeks on a modern desktop computer, even under a fixed tree topology for small single protein data sets. Indeed, the simplicity of implementation combined with a relatively high computational burden are two appealing features which explain why the HM is still currently one of the most favorite option for routine implementation (see [47]). However, the simplicity of HM is often not matched with its accuracy and recent literature is highlighting unreliability of HM estimators in phylogenetic models [26] as well as in more general biological applications [15]. In this paper, we have provided evidence of effectiveness of a simple alternative marginal likelihood estimator, the Inflated Density Ratio estimator (IDR), belonging to the classs of generalized harmonic mean estimators. It shares the original simplicity and computation feasibility of the HM estimator but, unlike it, it enjoys important theoretical properties, such as the finiteness of the variance. Moreover it allows one to recycle the posterior simulations and this is particularly appealing in those contexts such as the phylogenetic models where the computational burden of the simulation is heavier than the evaluation of the likelihood, posterior densities and the like. Like all importance sampling techniques based on a single stream of simulation the computational burden can be shared in a paralled computing environment reducing the computing time. Also the grid search for optimizing the estimated RMSE can be speeded up with a parallel evaluation for each inflated density. 27

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Bayesian Phylogenetic Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice University Bayes Rule P(X = x Y = y) = P(X = x, Y = y) P(Y = y) = P(X = x)p(y = y X = x) P x P(X = x 0 )P(Y = y X

More information

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies 1 What is phylogeny? Essay written for the course in Markov Chains 2004 Torbjörn Karfunkel Phylogeny is the evolutionary development

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Bayesian Phylogenetics:

Bayesian Phylogenetics: Bayesian Phylogenetics: an introduction Marc A. Suchard msuchard@ucla.edu UCLA Who is this man? How sure are you? The one true tree? Methods we ve learned so far try to find a single tree that best describes

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Infer relationships among three species: Outgroup:

Infer relationships among three species: Outgroup: Infer relationships among three species: Outgroup: Three possible trees (topologies): A C B A B C Model probability 1.0 Prior distribution Data (observations) probability 1.0 Posterior distribution Bayes

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Introduction to Bayesian Inference

Introduction to Bayesian Inference University of Pennsylvania EABCN Training School May 10, 2016 Bayesian Inference Ingredients of Bayesian Analysis: Likelihood function p(y φ) Prior density p(φ) Marginal data density p(y ) = p(y φ)p(φ)dφ

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

BAYESIAN MODEL CRITICISM

BAYESIAN MODEL CRITICISM Monte via Chib s BAYESIAN MODEL CRITICM Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Advanced Statistical Methods. Lecture 6

Advanced Statistical Methods. Lecture 6 Advanced Statistical Methods Lecture 6 Convergence distribution of M.-H. MCMC We denote the PDF estimated by the MCMC as. It has the property Convergence distribution After some time, the distribution

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013 Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Sequential Monte Carlo Methods

Sequential Monte Carlo Methods University of Pennsylvania Bradley Visitor Lectures October 23, 2017 Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice

Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Non-homogeneous Markov Mixture of Periodic Autoregressions for the Analysis of Air Pollution in the Lagoon of Venice Roberta Paroli 1, Silvia Pistollato, Maria Rosa, and Luigi Spezia 3 1 Istituto di Statistica

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem? Who was Bayes? Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 The Reverand Thomas Bayes was born in London in 1702. He was the

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

Bayesian Phylogenetics

Bayesian Phylogenetics Bayesian Phylogenetics Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison October 6, 2011 Bayesian Phylogenetics 1 / 27 Who was Bayes? The Reverand Thomas Bayes was born

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Bayesian Networks in Educational Assessment

Bayesian Networks in Educational Assessment Bayesian Networks in Educational Assessment Estimating Parameters with MCMC Bayesian Inference: Expanding Our Context Roy Levy Arizona State University Roy.Levy@asu.edu 2017 Roy Levy MCMC 1 MCMC 2 Posterior

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics Bayesian phylogenetics the one true tree? the methods we ve learned so far try to get a single tree that best describes the data however, they admit that they don t search everywhere, and that it is difficult

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian inference: what it means and why we care

Bayesian inference: what it means and why we care Bayesian inference: what it means and why we care Robin J. Ryder Centre de Recherche en Mathématiques de la Décision Université Paris-Dauphine 6 November 2017 Mathematical Coffees Robin Ryder (Dauphine)

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

SC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Inferring Speciation Times under an Episodic Molecular Clock

Inferring Speciation Times under an Episodic Molecular Clock Syst. Biol. 56(3):453 466, 2007 Copyright c Society of Systematic Biologists ISSN: 1063-5157 print / 1076-836X online DOI: 10.1080/10635150701420643 Inferring Speciation Times under an Episodic Molecular

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Phylogenetic Assumptions

Phylogenetic Assumptions Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

VCMC: Variational Consensus Monte Carlo

VCMC: Variational Consensus Monte Carlo VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object

More information

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment Yong Huang a,b, James L. Beck b,* and Hui Li a a Key Lab

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information