Roberto Casarin, Fabrizio Leisen, David Luengo, and Luca Martino. Adaptive Sticky Generalized Metropolis

Size: px
Start display at page:

Download "Roberto Casarin, Fabrizio Leisen, David Luengo, and Luca Martino. Adaptive Sticky Generalized Metropolis"

Transcription

1 Roberto Casarin, Fabrizio Leisen, David Luengo, and Luca Martino Adaptive Sticky Generalized Metropolis ISSN: No. 19/WP/213

2 W o r k i n g P a p e r s D e p a r t me n t o f E c o n o m i c s C a Fo s c a r i U n i v e r s i t y o f V e n i c e N o. 19/ W P / 21 3 ISSN Title: Adaptive Sticky Generalized Metropolis Luca Martino Universidad Carlos III de Madrid Fabrizio Leisen University of Kent Roberto Casarin Università Ca Foscari Venezia David Luengo Universidad Politecnica de Madrid 2 August 213 Abstract We introduce a new class of adaptive Metropolis algorithms called adaptive sticky algorithms for efficient general-purpose simulation from a target probability distribution. The transition of the Metropolis chain is based on a multiple-try scheme and the different proposals are generated by adaptive nonparametric distributions. Our adaptation strategy uses the interpolation of support points from the past history of the chain as in the adaptive rejection Metropolis. The algorithm efficiency is strengthened by a step that controls the evolution of the set of support points. This extra stage improves the computational cost and accelerates the convergence of the proposal distribution to the target. Despite the algorithms are presented for univariate target distributions, we show that they can be easily extended to the multivariate context by a Gibbs sampling strategy. We show the ergodicity of the proposed algorithms and illustrate their efficiency and effectiveness through some simulated examples involving target distributions with complex structures. Keywords Adaptive Markov chain Monte Carlo, Adaptive rejection Metropolis, Muliple-try Metropolis, Metropolis within Gibbs. JEL Codes C1, C15, C11, C4, C63. Address for correspondence: Roberto Casarin Department of Economics Ca Foscari University of Venice Cannaregio 873, Fondamenta S.Giobbe 3121 Venezia - Italy Phone: (++39) Fax: (++39) r.casarin@unive.it This Working Paper is published under the auspices of the Department of Economics of the Ca Foscari University of Venice. Opinions expressed herein are those of the authors and not those of the Department. The Working Paper series is designed to divulge preliminary or incomplete work, circulated to favour discussion and comments. Citation of this paper should consider its provisional character. The Working Paper Series is available only on line ( For editorial correspondence, please contact: wp.dse@unive.it Department of Economics Ca Foscari University of Venice Cannaregio 873, Fondamenta San Giobbe 3121 Venice Italy Fax:

3 Adaptive Sticky Generalized Metropolis Luca Martino Roberto Casarin Fabrizio Leisen David Luengo Universidad Carlos III de Madrid University Ca Foscari, Venice University of Kent Universidad Politecnica de Madrid Abstract We introduce a new class of adaptive Metropolis algorithms called adaptive sticky algorithms for efficient general-purpose simulation from a target probability distribution. The transition of the Metropolis chain is based on a multiple-try scheme and the different proposals are generated by adaptive nonparametric distributions. Our adaptation strategy uses the interpolation of support points from the past history of the chain as in the adaptive rejection Metropolis. The algorithm efficiency is strengthened by a step that controls the evolution of the set of support points. This extra stage improves the computational cost and accelerates the convergence of the proposal distribution to the target. Despite the algorithms are presented for univariate target distributions, we show that they can be easily extended to the multivariate context by a Gibbs sampling strategy. We show the ergodicity of the proposed algorithms and illustrate their efficiency and effectiveness through some simulated examples involving target distributions with complex structures. Keywords: Adaptive Markov chain Monte Carlo, Adaptive rejection Metropolis, Muliple-try Metropolis, Metropolis within Gibbs. 1 Introduction Markov Chain Monte Carlo (MCMC) methods (see Liu (24); Liang et al. (21); Robert and Casella (24) and references therein) are now a very Corresponding author: Fabrizio Leisen, fabrizio.leisen@gmail.com. Other contacts: r.casarin@unive.it (Roberto Casarin); luca.martino@uc3m.es (Luca Martino). 1

4 important numerical tool in statistics and in many others fields, because they can generate samples from any target distribution available up to a normalizing constant. The standard MCMC techniques requires the specification of a proposal distribution and produces a Markov chain that converges the target distribution. The main issue in MCMC is the choice of the proposal distribution, which can heavily affect the mixing of the MCMC chain when the target distribution has a complex structure, e.g., multimodality and heavy tails. Thus, in the last decade and after the seminal paper of (Haario et al., 21), a remarkable stream of literature focuses on adaptive proposal distributions, which allow for self-tuning procedures of the MCMC algorithms and for flexible movements within the state space and reasonable acceptance probabilities of the adaptive MCMC chain. Adaptive MCMC algorithms are used in many statistical applications (e.g., see Roberts and Rosenthal (29), Craiu et al. (29), Giordani and Kohn (21) and Richardson et al. (211)) and different adaptive strategies have been proposed in the literature. One of the strategies consists in updating the proposal distribution according to the past values of the chain (e.g., see (Haario et al., 21) and Andrieu and Robert (21)). Another strategy relies on the use of auxiliary chains, which are run in parallel (e.g., see Jasra et al. (27a), Jasra et al. (27b), Campillo et al. (29), Atchade (21), Casarin et al. (213)) and interact with the principal chain. One of the most used class of MCMC algorithms, is the Metropolis- Hastings (MH) algorithm (see Metropolis et al. (1953), Hastings (197)) and its generalizations. Among the different variants of the MH, in this paper we focus on multiple-try Metropolis (MTM) (see Liu et al. (2)), which have revealed to be efficients in different applications (e.g., see Craiu and Lemieux (27) and So (26)). While in the MH formulation one accepts or rejects a single proposed move, the MTM is designed so that the next state of the chain is selected among multiple proposals. The multipleproposal setup can be used effectively to explore the sample space of the target distribution. The MTM has been further generalized with the use of antithetic and quasi-monte Carlo sampling (Craiu and Lemieux (27) and Bédard et al. (21)), the extension to a trans-dimensional setup (Pandolfi et al. (21)) and the use of general weighting function in the selection step of the MTM (Martino and Read (212) and Martino and Read (213)). In this paper, we contribute to the adaptive MCMC literature by proposing a new class of adaptive generealized Metropolis algorithms, customarily called adaptive sticky algorithms. More specifically, we propose a new class of adaptive MTM algorithms called adaptive sticky MTM (ASMTM) which has the adaptive sticky Metropolis (ASM) as a special 2

5 case. Adaptation strategies for MTM based on interacting chains have been proposed in Casarin et al. (213). We follow here an alternative route and use the past history of a single MTM chain to adapt the proposal distribution over the chain iterations. The proposal distribution is nonparametric and the construction method relies upon alternative interpolation strategies. Our adaptation mechanism builds on and extends the mechanism in the adaptive rejection sampling (ARS) of Gilks and Wild (1992) and in the accept/reject Metropolis (ARMS) of Gilks et al. (1995b) and its extensions (see Meyer et al. (28), Cai et al. (28), Hörmann (1995),Görür and Teh (211), Martino and Míguez (211) and Martino et al. (212)). We shall notice that the interpolation approach has been used also in Krzykowski and Mackowiak (26) and Shao et al. (213), but not in an adaptive MH framework. Our extension of the algorithms in the ARMS class is twofold. First we use the more efficient multiple-proposal transition instead of the single proposal transition kernel. Secondly we apply a random test procedure for the inclusion of new points in the support set of the proposal distribution. We discuss different testing procedures for the inclusion of new support points. They represent more efficient generalizations of the accept/reject rule of the ARMS algorithm. Another contribution of the paper regards the converge of the proposed adapitve algorithms. Adaptive MCMC algorithms, which use previous iterations or auxiliary variables in their future transitions, violate the Markov property which provides the justification for conventional MCMC. Thus, their validity in terms of convergence to the desired target distribution, has to be demonstrated. We shall notice that convergence of adaptive MCMC is reached under various conditions (Haario et al. (21), Atchade and Rosenthal (25), Andrieu and Moulines (26), Roberts and Rosenthal (27), Saksman and Vihola (21), Latuszynski et al. (213), and Holden et al. (29)). In this paper we follow the Holden et al. (29) approach and show the ergodicity of the adaptive Metropolis chain under suitable conditions on the proposal distribution. Our interpolation approach guaranties that the adaptive proposal distributions satisfy such conditions. These results extend to adaptive MTM algorithm the previous results on adaptive MH due to Holden et al. (29). Finally, we discuss some practical issues such as the acceleration techniques for the reduction of the computational cost. A possible extension to the multivariate setup is also proposed following a Gibbs sampling updating rule. The resulting ASM-within-Gibbs sampling algorithm represents an effective simulation technique thanks to the efficiency of the nonparametric proposal distributions used in the ASM. We study the 3

6 efficiency and and effectiveness of the proposed algorithms on different simulation experiments involving target distributions with multiple-mode, heavy tails and skewness. The structure of the paper is as follows. Section 2 introduces adaptive sticky Metropolis and discuss convergence issues. Section 3 presents different updating schemes for the proposal distributions. Section 4 discusses some practical issues for the implementation and Section 5 presents a multivariate extension of the sampling scheme. Section 6 contains algorithm comparisons using simulated data. Section 7 contains conclusions and suggestions for further research. 2 Adaptive Generalized Metropolis 2.1 Adaptive Sticky Metropolis Let π(x) be a real target distribution known up the normalizing constant. Fix an initial state x of the chain x t, t =, 1, 2,..., and an initial set of support points S = {s 1,..., s m }, with m >. Assume that the current state of the chain is x t, then the general update of the proposed Adaptive Sticky Metropolis (ASM) algorithm is described in Algorithm 1. 4

7 Algorithm 1. Adaptive Sticky Metropolis (ASM) For t = 1,..., T : 1. Construction of the proposal: Build a proposal q t (x S t 1 ) via a suitable interpolation procedure using the set of support points S t MH step: 2.1 Draw x from q t (x S t 1 ). 2.2 Set x t+1 = x and z = x t with probability [ ] α(x t, x, S t 1 ) = min 1, π(x )q t (x t S t 1 ) π(x t )q t (x, S t 1 ) and set x t+1 = x t and z = x, with probability 1 α(x t, x, S t 1 ). 3. Test to update S t : Let η : R + [, 1] be a strictly increasing continuous function such that η() =. Then, set { St 1 {z} with prob. η(d S t = t (z)), S t 1 with prob. 1 η(d t (z)), where d t (z) is a positive measure (at the iteration t) of the distance in z between the target and the proposal distributions. The proposal distribution changes along the iterations (see step 1 of Algorithm 1) following an adaptation scheme which relies upon a suitable interpolation of a set of support points. In Section 3 we provide several interpolation methods based on a partition of the support of π(x). The insight behind this adaptation strategy is to build a proposal that is closer and closer to the target as the number of iterations increases. The proposal generated the proposal are then used in a standard acceptreject Metropolis-Hastings (MH) step (see step 2 of the algorithm), hence the resulting algorithm is in the class of adaptive MH. Another important feature of the proposed adaptation strategy is given by the test for updating the set of support points (see step 3). This step includes with probability η the rejected proposal from the MH step in the set of support points by applying an accept-reject rule. The ratio behind 5

8 this test is to use information from the target distribution in order to include in the set only the points where the proposal is far from the target. More specifically, we set the acceptance probability η as a function of a distance d t (z). This allows to design a strategy that incorporates the point z only if distance in z between the proposal distribution and the target is large. Moreover, a suitable construction of the proposal leads to a probability of adding a new point that converges to zero. This implies that both the total number of points in the support set and the computational cost of building the proposals are kept bounded along the iterations, provided that η() =. Different choices of η, which ensure quick convergence of the proposal to the target, are presented in Section 4.1. Finally, it should be noted that Algorithm 1 is a special case of the adaptive sticky MTM presented in the next section (see Algorithm 2) and the proof of the validity of the algorithm follows closely the proof given in next session for the adaptive sticky MTM and, therefore, it is not given here. 2.2 Adaptive Sticky Multiple Try Metropolis In the ASM one accepts or rejects a single proposed value. We extend the ASM by allowing for multiple-proposals in order to further improve the ability of the Metropolis chain to explore the state space. We focus on the multiple-try Metropolis (MTM) (see Liu et al. (2) and Craiu and Lemieux (27)) and propose an Adaptive Sticky MTM (ASMTM). The ASMTM can also be seen as a generalization of the MTM which allows for adaptive proposal distributions. Note that our adaptation strategy can be combined with MTM algorithms with different proposal distributions and with interacting MTM algorithms (see Casarin et al. (213)) to design new adaptive algorithms. We adaptation can be also used within the multi-point algorithms (e.g., Martino and Read (212); Pandolfi et al. (21)) as well. At the iteration t, the ASMTM builds the proposal distribution q t (x S t 1 ) (step 1 of Algorithm 2) using the current set of support points S t 1. Let x t = x be the current value of the chain and x j, j = 1,..., M, a set of i.i.d. proposals simulated from q t (x S t 1 ) (see step 2). Define the unnormalized selection weights w jt (x, x j) = π(x)q t (x j S t 1 )λ t (x, x j S t 1 ) where λ t (x, x S t 1 ) is a non-negative symmetric function in x and x. It is worth noticing that in the adaptive MTM not only the proposal distribution changes over the iterations, but also the function λ t may adapt following the update in the set of support points. 6

9 Algorithm 2. Adaptive Sticky Multiple Try (ASMTM) For t = 1,..., T : 1. Construction of the proposal: Build a proposal q t (x S t 1 ) via a suitable interpolation procedure using the set of support points S t 1. In Section 3 we provide several procedures that are based in a partition of the support of π(x). 2. MTM step: 2.1 Draw x 1,..., x M from q t(x S t 1 ) and compute the unnormalized weights w t (x i ) = π(x i ) q t(x i S t 1). 2.2 Select x = x j among the M proposals with probability proportional to w t (x i ), i = 1,..., M. 2.3 Set the auxiliary point x i = x i, i j and x j = x t 2.4 Set x t+1 = x and z i = x i with probability [ α(x t, x, x j, S t 1 ) = min 1, w t(x 1 ) + + w t(x M ) ] w t (x 1 ) + + w t(x M ), and set x t+1 = x t and z i = x i, with probability 1 α(x t, x, x j, S t 1). 3. Test to update S t : Let η i : R + [, 1], i = 1,..., M, be strictly increasing continuous functions such that η i () =, i and M i=1 η i 1. Then, set { St 1 {z i } with prob. η i (d t (z i )), i = 1,..., M S t = S t 1 with prob. 1 M i=1 η i(d t (z i )), where d t (z) is a positive measure (at the iteration t) of the distance in z between the target and the proposal distributions. Liu et al. (2) discussed various possible specifications of the function λ t and found in their experiments that the efficiency gain when using MTM is generally not sensitive to the choice of this function. However, in some of the experiments of Liu et al. (2) and in quite all the simulation experiments of Casarin et al. (213), the choice λ t (x, x S t 1 ) = 1/(q t (x S t 1 )q t (x S t 1 )) leads to better performance of the MTM algorithms. Thus, in this work we 7

10 consider this choice of λ t and focus on w jt (x, x ) = w t (x), j, where w t (x) are unnormalized importance weights w t (x) = π(x) q t (x S t 1 ) The importance weights are used at the step 2 of the ASMTM to select one of the proposals. The selected candidate is accepted or rejected with the generalized acceptance probability given at step 2. Finally, step 3 includes the selected proposal in the set of support points, with probability η. This updating step can be extended to allow for more than one proposals to be included into the set of support points. The strategy leads to recycle the proposals and possibly improves the adaptation of the proposal distributions. For the sake of simplicity, in the presentation of the ASMTM algorithm, we consider the case only one proposal is added, at each iteration, to S t 1. We show the convergence of the ASMTM algorithm by extending to the MTM the results in Holden et al. (29) where they show the convergence for independent MH scheme with adaptive proposal avoiding the requirement of diminishing adaptation. The difference between the adaptive independent MH algorithm of Holden et al. (29) and a standard independent MH algorithm is that the proposal distribution q t (x S t 1 ) depend on the set of support points S t 1, which can include part of the past history of the MH algorithm except for the current state of the MH chain (see Liang et al. (21), pp ). The main difference between our adaptive independent MTM algorithm and the adaptive independent MH algorithm of Holden et al. (29) is that the at each iteration multiple-proposals can be used in the Metropolis transition. The following theorem implies that the AMTM chain never leaves the stationary distribution π(x) once it is reached. Theorem 1. The target distribution π(x) is invariant for the adaptive independent MTM algorithm; that is, p t (x t S t 1 ) = π(x t ) implies p t+1 (x t+1 S t ) = π(x t+1 ), where p t ( S t 1 ) denotes the distribution of x t conditional on the past samples. Proof. Let ρ be the state appended to the history S t 1. Without loss of generality, suppose that η j = 1 where j is the index sampled at the selection step, then S t = ρ S t 1 with ρ = z j. Moreover, let f t (S t ) be the joint distribution of the history S t and let q t, j (x j S t 1) = i j q t(x i S t 1) where x j = (x 1,..., x j 1, x j+1,..., x M). Following Liu et al. (2), 8

11 Theorem 1 and Casarin et al. (213) Theorem 1, the actual transition probability of the MTM step in our ASMTM writes as follows: A(ρ, x t+1 ) = h Mt (dj) q t, J (x J S t 1 )dx Jq t (x J S t 1 ) J X M δ ρ (dx J)δ xt+1 (dx J) δ x X M+1 k (dx k [1, ) min i j w t(x i ) + w t(x J ) ] k J i j w t(x i ) + w t(x J ) M = q t, j (x j S t 1 )dx jα(ρ, x t+1, x j, S t 1 ) w t(x t+1 )q t (x t+1 S t 1 ) X M 1 k j w t(x k ) + w t(x t+1 ) j=1 where J = {1,..., M} and h Mt (dj) = M j=1 w t (x j ) M k=1 w t(x k )δ j(dj) is the empirical measure generated by the selection step. We show that the chain with this transition probability never leaves the stationary distribution once it is reached: p t+1 (x t+1 S t )f t (S t ) = { M w t (x t+1 ) = f t 1 (S t 1 ) π(ρ)q t (x t+1 S t 1 ) X i j M 1 w t(x i ) + w t(x t+1 ) j=1 α(ρ, x t+1, x j, S t 1 )q t, j (x j S t 1 )dx j+ w t (ρ) + π(x t+1 )q t (ρ S t 1 ) X i j M 1 w t(x i ) + w t(ρ) [1 α(x t+1, ρ, x j, S t 1 )]q t, j (x j S t 1 )dx j { M w t (ρ) = f t 1 (S t 1 ) π(x t+1 )q t (ρ S t 1 ) j=1 X i j M 1 w t(x i ) + w t(ρ) } q t, j (x j S t 1 )dx j = f t 1 (S t 1 )π(x t+1 )q t (ρ S t 1 )g t (ρ S t 1 ) } where g t (ρ S t 1 ) = M X M 1 and this concludes the proof. w t (ρ) i j w t(x i ) + w t(ρ) q t, j(x j S t 1 )dx j 9

12 Let us assume that the proposal distribution q t (x S t 1 ) satisfies the strong Doeblin s condition q t (x S t 1 ) a t (S t 1 )π(x) (1) for all x X and S t 1 X t 1, where X denotes the state space, and a t (S t 1 ) (, 1]. This condition is satisfied in our proposal distributions discussed in the next sections. Theorem 2. Assume the proposal q t (x S t 1 ) in the ASMTM algorithm satisfies the condition 1 for all t. Then p t π T V 2 X t t (1 a j (S j 1 ))dµ(s t 1 ) (2) The algorithm converges if the product goes to zero when t. j=1 Proof. Let x t be the current value of the chain at the iteration t and x the j-th proposal accepted if u t < α j (x t, x j, x j, S t 1), where u t is a uniform number on the [, 1] interval. The acceptance probability α j (x t, x j, x j, S t 1) satisfies where min 1, k j π(x k ) q t(x k S t 1) + π(x j ) q t(x j S t 1) π(x k ) k j q t(x k S t 1) + > min 1, a t(s t 1 ) M k j { = min 1, π(xt) q t(x t S t 1 ) π(x j ) q t (x j S t 1)ãt(S t 1, x ) ã t (S t 1, x ) = a t(s t 1 ) k j M π(x k ) q t (x k S t 1) + π(x j ) q t (x j S t 1) } π(x k ) q t(x k S t 1) + π(x j ) q t(x j S t 1) π(x j ) q t(x j S t 1) Then A t be the condition that u t q t (x S t 1 )/π(x ) ã t (S t 1, x ). Then 1

13 the conditional distribution of x given S t 1, x t and A t is proportional to M w t (x ) X i j M 1 w t(x i ) + w t(x ) P (A t S t 1, x, x j, x t )q t, j (x j S t 1 )dx jq t (x S t 1 ) = j=1 = M j=1 w t (x ) X i j M 1 w t(x i ) + w t(x ) P ( u t q t (x S t 1 )q t, j (x j S t 1 )dx j M a t (S t 1 ) = X M 1 M π(x )q t, j (x j S t 1 )dx j j=1 = a t (S t 1 )π(x ) π(x ) ) q t (x S t 1 )ãt(s t 1, x ) Following Holden et al. (29) we define { with probability 1 at+1 (S I t+1 = t ) if I t = 1 otherwise for t 1, with I =, and the probability not to be in the stationary after j step is P (I t = S t 1 ) = b t (S t 1 ) where b t (S t 1 ) = t (1 a j (S j 1 )) j=1 Then conditional distribution of x t+1 can be written as p t (x S t ) = π(x)(1 b t (S t 1 )) + v t (x S t )b t (S t ) where v t is a probability distribution. Then the distance between the limiting distribution and the conditional distribution of x t+1 can be bounded as follows p t π T V = p t (x S t )p t (S t )dµ(s t ) π(x) dµ(x) X X t = (v t (x S t ) π(x))b t (S t 1 )p t (S t )dµ(s t ) dµ(x) X X t (3) v t (x S t ) π(x) dµ(x)b t (S t 1 )p t (S t )dµ(s t ) X t X 2 b t (S t 1 )p t (S t )dµ(s t ) X t Thanks to this bound, the probability to jump in the stationary within t steps, can be made arbitrarily close to one. 11

14 3 Construction of sticky proposal functions There are many alternatives available for the construction of a suitable proposal distribution in the ASM and ASMTM algorithms. In section we focus on some procedures that approximate the target distribution interpolating points that belong to the graph of the (unnormalized) target. The points are identified by evaluating the target at the support points and the set of support points change over the algorithm iterations. The name sticky, we choose for this algorithm, highlights the ability of the adaptation schemes to generate a sequence of proposal distributions which converge to the target, allowing for a full adaptation of the proposal distribution. The adaptation relies upon interpolation scheme which are easy to improve by adding new points to the support set and are easy to sample. A general approach to interpolation is based on piecewise linear function. We note that the resulting proposal density can be represented as a mixture of probability density functions, so that to draw from it one need to compute mixture weights, to sample from a discrete distribution in order to choose one of the mixture components and finally to be able to draw samples from the selected component. The use of mixture distributions as proposal is common to many adaptive algorithms. In the class of importance sampling methods the validity of the algorithms with adaptive proposal can be easily showed by an importance sampling argument. See for example the Population Monte Carlo (Cappé et al. (24)), the iterative importance sampling (Cappé et al. (28)) and the adaptive importance sampling (Hoogerheide et al. (212)) algorithms. Adaptive proposal mixtures are less frequently used in Metropolis algorithms, mainly due to the difficulties in showing the validity of the algorithms. One of the recent papers in this direction is Holden et al. (29), who proposes a Metropolis algorithm with proposals from adaptive mixture distributions and shows the geometric ergodicity of the adaptive Metropolis chain. In this paper, we contribute to this stream of literature, proposing new adaptation schemes and extending the results of Holden et al. (29). In this paper, we will present three different adaptation strategies for the proposal distributions. Let us assume that a set S t = {s 1,..., s mt } of m t support points is available at the iteration t + 1 of a Metropolis algorithm. Define a sequence, of m t + 1 intervals: I = (, s 1 ], I j = (s j, s j+1 ] for j = 1,..., m t 1, and I mt = (s mt, + ). In the first type of adaptation schemes, the proposal distribution is a mixture of m t +1 densities with bounded disjoint supports I j, j =,..., m t. 12

15 An addition of a new support point, say s, can change the shape of the densities associated to the different intervals. For instance, if s I k, then the algorithm will update the mixture components associated with I k, I k 1 and I k+1. This feature of the adaptation scheme has, as a special case, the construction in Gilks et al. (1995b). The proposal distribution, in the second type of adaptation schemes, is a mixture of densities with bounded disjoint supports, like the one used in the first method, but the addition of a new support point, say s, can change only one component of the mixture. For instance, if s I k, then the k-th density of the mixture will be improved. This proposal updating scheme is a simpler alternative to Gilks et al. (1995b). Finally, we consider proposal adaptation schemes based on mixtures of densities with overlapping supports. This kind of adaptation strategies uses the points in S t in a quite general approach to the construction of sticky proposal distributions. For instance, Shao et al. (213) propose recently B-splines techniques to build proposal distributions for accept/reject algorithms. In an adaptive importance sampling framework, Cappé et al. (28) and Hoogerheide et al. (212) use adaptive mixture of Student-t distributions with overlapping supports. In the following sections, we discuss the three adaptation schemes and illustrate how our sticky proposal construction applies within these schemes. Moreover, we briefly discuss the construction of the tails of the mixture distribution and the different procedures to handle unbounded target distributions. 3.1 Disjoint supports and proposal changes in different intervals The first adaptation strategy relies upon interpolation for points on the graph of the target. For the sake of simplicity we describe the interpolation procedure representing the target and proposal densities in a log-domain. Hence, let us define the log-density functions W t+1 (x) log[q t+1 (x S t )], V (x) log[π(x)]. (4) where q t+1 (x S t ) is the proposal at the iteration t + 1 of the Algorithms 1 and 2 and π is the target distribution. Let us denote as L j,j+1 (x) the straight line passing through the points (s j, V (s j )) and (s j+1, V (s j+1 )) for j = 1,..., m t 1 where s j S t. Also, set L 1, (x) = L,1 (x) L 1,2 (x), 13

16 L mt,m t+1(x) = L mt+1,m t+2(x) L mt 1,m t (x). In Gilks et al. (1995b), W t+1 (x) is a piecewise linear function, W t+1 (x) = max [ L j,j+1 (x), min [L j 1,j (x), L j+1,j+2 (x)] ], (5) with x I i where I j = (s j, s j+1 ], j = 1,..., m t 1 and I = (, s 1 ] and I mt = (s mt, + ). The function W t+1 (x) can be re-written as follows 8 L 1,2(x), x I ; max {L 1,2(x), L 2,3(x)}, x I 1; >< max {L j,j+1(x), min {L j 1,j(x), L j+1,j+2(x)}}, x I j, W t(x) = 2 j m t 2; max {L mt 1,m t (x), L mt 2,m t 1(x)}, x I mt 1; >: L mt 1,m t (x), x I mt. Eq. 5 and 6 show that the construction of the log-density in a interval I j depends also on the points s j 1 and s j+2. Therefore, an addition of a point in a interval can change the construction in the adjacent regions. For instance, let us assume S t = {s 1, s 2, s 3, s 4, s 5 }. Fig. 1(a) illustrate the construction using the points in the set S t. Fig. 1(b) show how the construction change when a new point is added between the points s 1 and s 2 of the set S t used Fig. 1(a). As illustrated in Fig. 1(b), intervals I = (, s 1 ], I 1 = (s 1, s 2 ] and I 2 = (s 2, s 3 ], this construction requires to modify lines for the intervals I and I 1 of Fig. 1(a) and to compute the intersection point between two straight lines (see interval I 2 = (s 2, s 3 ] of Fig. 1(b)), to be able to draw adequately from the corresponding proposal distribution. Note that, a similar procedure using pieces of quadratic functions in the log-domain (namely, pieces of truncated Gaussians density in the pdf domain) also has been proposed in Meyer et al. (28). 3.2 Disjoint supports and proposal changes in one interval Gilks et al. (1995b) introduced for the ARMS algorithm the procedure to build q t+1 (x S t+1 ), described in the previous section. The computational complexity of the procedure arises from the need to construct a proposal function above the target in more regions as possible, in order to take advantage of the rejection sampling step. We note that a simpler approach to build the proposal is to define W t+1 (x) inside the i-th interval as the straight line passing through (s i, V (s i )) and (s i+1, V (s i+1 )), L i,i+1 (x), for 1 i m t 1, and extending the straight lines corresponding to I 1 and (6) 14

17 W t (x) V(x) s 1 s 2 s 3 s 4 s 5 (a) (b) Figure 1: Examples of piecewise linear function, W t+1 (x), built using the procedure described in Gilks et al. (1995b) for the set S t = {s 1,..., s 5 } of support points (graph (a)) and the set of support points s 1,..., s 6 (graph (b)), obtained by adding a new point between the two points s 1 and s 2 in S t. I mt 1. Formally, this can be expressed as L 1,2 (x), x I = (, s 1 ]; L i,i+1 (x), x I i = (s i, s i+1 ], W t+1 (x) = 1 i m t 1; L mt 1,mt (x), x I mt = (s mt, + ). (7) This construction is illustrated in Fig. 2(a). Although this procedure looks similar to the one used in ARMS by Gilks et al. (1995b), it is much simpler in fact, since there is not any minimization or maximization involved, and thus it does not require the calculation of intersection points to determine when one straight line is above the other. Observe that the proposal q t+1 (x S t ) = exp{w t+1 (x)}, with such a definition, is formed by exponential pieces (in the pdf-domain). Moreover, an even simpler procedure to construct W t+1 (x) can be devised using a piecewise constant approximation with two straight lines inside the first and last intervals. Mathematically, it can be expressed as L 1,2 (x), x I = (, s 1 ]; max {V (s i ), V (s i+1 )}, x I i = (s i, s i+1 ], W t+1 (x) = (8) 1 i m t 1; L mt 1,mt (x), x I mt = (s mt, + ). The construction described above leads to the simplest proposal density, i.e., 15

18 W t (x) V(x) W t (x) V(x) s 1 s 2 s 3 s 4 s 5 (a) s 1 s 2 s 3 s 4 s 5 (b) Figure 2: Examples of the construction of W t+1 (x) using the procedures described in Eq. (7) (graph (a)) and in Eq. (8) (graph (b)). a collection of uniform pdfs with two exponential tails. Fig. 2(b) shows an example of the construction of the proposal using this approach. Note that we can also apply the procedure proposed for adaptive trapezoid Metropolis sampling (ATRAMS, Cai et al. (28)) to build the proposal distribution. However, the structure of the ATRAMS algorithm Cai et al. (28) is completely different to the ASM and ARMS-type techniques. In both cases the proposal is constructed in the domain of the target pdf, π(x), rather than in the domain of the log-pdf, V (x) = log(π(x)). For instance, the basic idea proposed for ATRAMS is using straight lines, L i,i+1 (x), passing through the points (s i, π(s i )) and (s i+1, π(s i+1 )) for i = 1,..., m t 1 and two exponential pieces, E (x) and E mt (x), for the tails: E (x), x I = (, s 1 ]; L i,i+1 (x), x I i = (s i, s i+1 ], q t (x S t ) (9) i = 1,..., m t 1; E mt (x), x I mt = (s mt, + ). Unlike in Cai et al. (28), here the tails E (x) and E mt (x) do not necessarily have to be equivalent in the areas they enclose. Note that L denotes a straight line built directly in the domain of π(x), whereas L denotes the linear function constructed in the log-domain. Indeed, we may follow a much simpler approach calculating two secant lines L 1,2 (x) and L mt 1,mt (x) passing through (s 1, V (s 1 )), (s 2, V (s 2 )), and (s mt 1, V (s mt 1)), (s mt, V (s mt )) respectively, so that the two exponential tails are defined as E (x) = exp{l 1,2 (x)} and E mt (x) = exp{l mt 1,mt (x)}. Fig. 3 depicts an example of the construction of q t (x) using this last procedure. Note that drawing samples from these trapezoidal pdfs inside 16

19 (a) (b) Figure 3: Example of the construction of the proposal density, q t+1 (x S t ), using a procedure described in Cai et al. (28), within the ATRAMS algorithm, in the pdf domain (graph (a)) and in the log-domain (graph (b)). I i = (s i, s i+1 ] can be easily done (Cai et al., 28; Devroye, 1986). Indeed, given u, v U([s i, s i+1 ]) and w U([, 1]), then x = { min{u, v }, w < max{u, v }, w π(s i ) π(s i )+π(s i+1 ) ; π(s i ) π(s i )+π(s i+1 ) ; (1) is distributed according to a trapezoidal density defined in the interval I i = [s i, s i+1 ]. 3.3 Overlapping supports It is possible to consider proposal densities of the type q t+1 (x S t ) m t 1 j=1 ω j f j (x), where f i (x) could be B-spline functions (e.g., see Krzykowski and Mackowiak (26); Shao et al. (213)) or the densities of Gaussian or Student-t distributions (e.g., see Cappé et al. (28)), for instance. Clearly, we need to be able to draw from each f j (x). It is possible to draw from B-splines, however, if the target has unbounded domain and since the B-splines have always a finite support, then the B-splines should be combined with other kind of proposal densities specifically designed for the tails. In the case, 17

20 mixture of Gaussian or Student-t distribution are used instead, this problem is solved since the densities of these distributions are defined in R. In this type of adaptation scheme, the weights should be chosen satisfying the passing conditions through the points (s i, π(s i )), i = 1,..., m t. However, it is necessary that ω j for all j = 1,..., m t, (11) in order to be able to draw from the proposal q t (x S t ). The problem is that, in general, satisfying the passing conditions some weights can be negative, ω k <. For this reason, this approach can appear completely useless. To overcome this issue, the passing conditions can be relaxed finding weights that minimize a least squares cost function with the constrains ω j, for instance. Several further considerations are needed, however a detailed treatment of this case oversteps the aims of this work and deserves a specific and separate study. 3.4 Heavy-tail proposal distribution The adaptation procedures presented in the previous sections refer to proposal distributions with exponential tails. It is worth to mention that it is not strictly necessary to change the construction of the tails, but there are some benefits in handling the tails with different approaches. Specifically, we can diminish the dependence from the initial points and also speed up the convergence of the chain when the target has heavy tails. For instance, an alternative choice (in the log-domain) for the tails is to use functions of type h(x) = a + log[1/x γ ], γ > 1, (12) instead of the straight lines, in the intervals I and I mt. In the pdf-domain, the function T (x) corresponds to exp(a) x that is proportional to a Pareto-type γ pdf. By noting that h(x) = a γ log[x], then we can set a and γ for the left tail, in I, by solving the following linear system in a and γ, { V (s 1 ) = log[π(s 1 )] = a γ log[s 1 ], (13) V (s 2 ) = log[π(s 2 )] = a γ log[s 2 ]. Analogously, we can fix a and γ for the right tail considering the points (s mt 1, V (s mt 1)) and (s mt, V (s mt )). This approach is suitable when we are in presence of an heavy-tailed target. 18

21 It is important to observe that, in the log-pdf domain, if the tails of the function V (x) = log[π(x)] are convex, hence a proposal pdf with exponential decays can be perfectly adequate. On the other hand, if the tails of the function V (x) = log[π(x)] are concave, then Pareto choice can be more suitable. If there is no information about the tails of the target, we can implement an automatic strategy for fixing T (x). Hence, the resulting method is a complete self-tuning algorithm useful in several different frameworks. Consider the set of support points S t = {s 1,..., s mt } sorted in ascending order. Therefore we can use the first three (s 1, s 2 and s 3 ) and the last three (s mt 2, s mt 1 and s mt ) support points to estimate the concavity of the function V (x) in the tails. We can consider the quadratic polynomial function y = αx 2 + βx + c passing through (s 1, V (s 1 )), (s 2, V (s 2 )) and (s 3, V (s 3 )) or/and (s mt 2, V (s mt 2)), (s mt 1, V (s mt 1)) and (s mt, V (s mt )). Then, if α we use light-exponential tails whereas if α we use heavy tails. Clearly, the sign of α can vary along the iterations, depending on S t, so that the type of used tails can change accordingly. 3.5 Unbounded density functions The ASMTM algorithm, with the constructions described in the previous section, can be applied to bounded target pdfs π(x). A cautionary note is in order if the target pdf is unbounded. In this case, the sticky algorithms may need an unbounded proposal. As an example, consider a target function π(x) with a vertical asymptote at x = a and the set of support points S t = {s 1,..., s k,..., s mt }, sorted in ascending order with s k = a. A suitable construction procedure should use specific functions for the intervals I k 1 = (s k 1, s k ) and I k = (s k, s k+1 ]. For the rest of intervals, the constructions in the previous section are completely adequate. For instance, we can use functions of the following form g j (x) = 1 x a α + β j, j = k 1, k, with < α < 1 and the constants β k 1 = π(s k 1 ) 1 s k 1 a α and β k = π(s k+1 ) 1 s k+1 a α are set in order to obtain g k 1 (s k 1 ) = π(s k 1 ) and g k+1 (s k+1 ) = π(s k+1 ), respectively. 19

22 3.6 Approximation bounds For the approximation methods presented in the previous section it is possible to show that the proposal distributions generated by the interpolation algorithm converge to the target distribution when the number of support points goes to infinity Convergence of the sequence of the unnormalized proposal Theorem 3. Consider a continuous bounded target density π(x) with bounded second order derivative. Denote with π the unormalized density, with x X, and with { q t (x S t 1 )} + t=1 a sequence of possibly unnormalized proposal density functions such that q t (x S t 1 ) > for all x X. Then, X q t(x S t 1 ) π(x) dx t Proof. Let us consider a generic set of support points, S t 1 = {s 1,..., s mt 1 }, with s 1 <... < s mt 1, at time step t. Note that, by using any of the procedures described above in this section, the corresponding proposal density function, q t (x S t 1 ), is a bounded function, since π(x) is bounded. Moreover, since X π(x)dx < + and X q t(x S t 1 )dx < +, then the L 1 -distance between q t (x S t 1 ) and π(x) is bounded for any t, i.e., X q t(x S t 1 ) π(x) dx < +. Let us consider the finite interval I = [s 1, s mt ], then all the interpolation methods proposed in this section to build q t (x S t 1 ) can be represented as a Taylor approximation of the order zero or one inside each interval. Hence, the discrepancy between q t (x S t 1 ) and π(x) inside I can be bounded as I q t (x S t 1 ) π(x) dx = m t 1 1 i=1 m t 1 1 i=1 q t (x S t 1 ) π(x) dx I i r (i) l (x) dx, I i (14) where r (i) l (x) is the remainder associated to the l-th order (with l {, 1} in our case) polynomial approximation of π(x) inside the interval I i, as given by Taylor s theorem. Let us recall that the Lagrange form of this remainder is r (i) l (x) = (x s i) l+1 d l+1 π(x) (l + 1)! dx l+1, (15) x=ξ 2

23 for a value ξ [s i, x]. Moreover, since x I i = [s i, s i+1 ], it is straightforward to show that r (i) l (x) (s i+1 s i ) l+1 C (i) (l + 1)! l, (16) where C (i) l = max x Ii π l+1) (x), and π l+1) (x) denotes the (l + 1)-th derivative of π(x), i.e., π l+1) (x) = dl+1 π(x). Hence, replacing (16) in (14), dx l+1 we obtain m t 1 i=1 r (i) l I i mt 1 (x) dx i=1 (s i+1 s i ) l+2 C (i) (l + 2)! l. (17) Now, let us assume that a new point, s I k = [s k, s k+1 ] for 1 k m t 1, is added at the next iteration. In this case, the construction of the proposal changes only inside the interval I k, as shown in this section. Indeed, assume that I k is now split into I (1) = [s k, s ] and I (2) = [s, s k+1 ], i.e., I k = I (1) I (2). Obviously, max x I (j) π l+1) (x) max x Ik π l+1) (x) with j {1, 2}, and (s s k ) l+2 + (s k+1 s ) l+2 < (s i+1 s i ) l+2, (18) for any l, since A l+2 + B l+2 < (A + B) l+2 for any A, B > thanks to Newton s binomial theorem, and we have A = s s k > and B = s k+1 s >. Hence, the bound in Eq. (17) always decreases when a new support point is incorporated and we can finally ensure that m t 1 lim t + i=1 r (i) l I i (x) dx =, (19) since support points become arbitrarily close as t (i.e., s i+1 s i ), and thus the bound in the right hand side of (17) tends to zero as t. Hence, we can guarantee that I q t(x S t 1 ) π(x) dx for t +. Note that we cannot guarantee a monotonical decrease of the distance between q t (x S t 1 ) and π(x) inside I, since adding a new support point might occasionally lead to an increase in the discrepancy. However, we can guarantee that the upper bound on this distance decreases monotonically, thus ensuring that q t (x S t 1 ) π(x) as t, i.e., adding support points will eventually take us arbitrarily close to π(x). Finally, w.r.t. the tails, note that the distance between q t and π remains bounded even for heavy tailed distributions. Furthermore, the interval I will become greater as t +, since there is always a non-null probability of adding new support points inside the tails. Therefore, the probability 21

24 mass associated to the tails decreases monotonically as t. Hence, even though the distance between the target and the proposal may again increase occasionally due the introduction of a new support point in the tails, we can guarantee such a distance goes to zero as t goes to infinity Convergence to the normalized target For sake of simplicity, in this section we denote as q t (x S t 1 ) and π(x) the unnormalized density functions whereas q t (x S t 1 ) and π(x) indicate the normalized densities. However, we remark that in the rest of this work we have considered q t (x S t 1 ) and π(x) as unnormalized pdfs. Therefore, so far the interpolation (or approximation) was applied to the unnormalized target π(x) to deal with the general case. Hence the proposal function q t (x S t 1 ) is unnormalized as well. Namely, we build q t (x S t 1 ) via interpolation using the information of π(x). We denote the corresponding normalizing constants 1/c t and 1/c π, respectively. As the q t converges to π in L 1 as t goes to infinity, then the normalizing constants also convergences, i.e. c t converge to c π. Indeed, denoting as d(f, g) = π q t = f(x) g(x) dx, the L 1 distance between f(x) and g(x), we have the following result. Theorem 4. Let q t (x S t 1 ) = 1 c t q t (x S t 1 ) and π(x) = 1 c π π(x), where c π = π = X π(x)dx and c t = q t = X q t(x S t 1 )dx. If d( q t, π), t then d(q t, π) t Proof. Let us denote D t = d( q t, π) and D t = d(q t, π). We can use an extended triangle inequality of type d(a, E) d(a, B) + d(b, C) + d(c, E), using the points A = q t, B = 1 c t q t, C = 1 c π q t and E = π, i.e., ) ( d(q t, π) d (q t, 1ct 1 q t + d q t, 1 ) ( ) 1 q t + d q t, π, c t c π c π ) ( d(q t, π) d (q t, 1ct 1 q t + d q t, 1 ) ( 1 q t + d q t, 1 ) π, c t c π c π c π 1 D t + c t 1 c t c t c π + 1 Dt, c π X 22

25 Hence, setting C t = 1 ct c π we can finally write C t + 1 c π Dt D t. (2) Since D t, if lim t C t = and lim t Dt = then lim t D t = as well. Therefore, now we just need to prove c t c π when lim t Dt =. Clearly, π(x) q t (x S t 1 ) π(x) q t (x S t 1 ) = π(x) q t (x S t 1 ) since π(x), q t (x S t 1 ). The equality is given if π(x) q t (x S t 1 ), so that π(x) q t (x S t 1 ) = π(x) q t (x S t 1 ). Moreover, using again the triangle inequality, we can also write π = ( π q t ) + q t π q t + q t π q t π q t, q t = ( q t π) + π q t π + π ( π q t ) π q t. Combining the two previous inequalities, we obtain π q t π q t. Since D t = d( π, q) = π q t and c π = π, c t = q t, we can finally rewrite this expression as D t c π c t. (21) The expression above is also called reverse triangle inequality. lim t Dt =, we also have Then, if i.e., c t c π for t and C t = 1 ct c π. 4 Practical Implementation lim c π c t =, (22) t 4.1 Updating of the set of support points In this section, we focus on the update step of Algorithm 1-2 where a test is introduced for controlling the evolution of the set of support points. This step can be seen as a measure of similarity, in the proposed point, between the proposal and target distributions. It is a part of the algorithm 23

26 that is extremely important since it controls the trade-off between better performance and greater computational cost. Indeed, the use of more support points improves the performance but, at the same time, increases the computational cost. In this step a choice of two functions η and d t is needed. The first one is a strictly increasing function with values in [, 1], and d t is a distance between the proposal and the target distribution. For instance, following the literature on adaptive mixture proposals, one can choose logistic weights and a local absolute distance between proposal and target, which has a low computational cost. These choices corresponds to the following specification: η(d t (z)) = exp { d t (z)}, d t (z) = π(z) q t (z S t 1 ). (23) In order to reduce the computational complexity of the algorithm, it is possible to recycle some of the outputs of the Metropolis steps of the Algorithm 1. From this perspective a natural choice of η and d t could be η(d t (z)) = d t (z), d t (z) = q t(z S t 1 ) π(z) max{π(z), q t (z S t 1 )}. (24) Note that the choice of a linear function for η produces valid weights if d t [, 1]. This condition is satisfied in this case, in fact and then η(d t (z)) = q t(z S t 1 ) π(z) max{π(z), q t (z S t 1 )} = max{π(z), q t(z S t 1 )} min{π(z), q t (z S t 1 )}, max{π(z), q t (z S t 1 )} η(d t (z)) = 1 min{π(z), q t(z S t 1 )} max{π(z), q t (z S t 1 )}. (25) At a first look, this choice of η(d t (z)) may appear arbitrary, but it becomes natural if one think to the classical construction in the ARS technique. η(d t (z)) reminds the probability of adding a new support point in the ARS method and if q t (z S t 1 ) π(z), z D and t, then it becomes 1 π(z) q t(z S t 1 ), that is exactly the probability of incorporating z to the set of support points in the ARS method. The updating rules presented above for Algorithm 1 require some changes when used in a multiple proposal algorithm such as Algorithm 2. Let us 24

27 consider the updating scheme in Eq. 24. Let z i, i = 1,..., M be a set of proposals, then the updating step for S t 1 splits in two parts. First, a z is selected among the proposals, z 1,..., z M, with probability proportional to ϕ t (z i ) = max { } 1 w(z i ), w(z i ) = max{π(z i), q t (z i S t 1 )} min{π(z i ), q t (z i S t 1 )}, (26) i = 1,..., M. This step selects with high probability a sample at which the proposal value is far from the target. The second step is a control step, where z is included in the set of support points with probability d t (z) = 1 1 ϕ t (z) This step is similar to the accept-reject step in the ARMS algorithm and the probability of the point to be included corresponds exactly to the probability of a proposal to be be accepted in a ARMS algorithm. It can be shown that this two-steps updating procedure corresponds to the one-step procedure S t = S t 1 {z i } with prob. η i (d t (z i )) = ϕt(z i) 1 P M j=1 ϕt(z j), S t 1 with prob. M P M i=1 ϕ i(d t(z i )), 1 where ϕ t (z i ) = 1 d t(z i ) and d t(z i ) = 1 min{π(z i),q t(z i S t 1 )} max{π(z i ),q t(z i S t 1 )}. Finally, note that the updating rules in Eq. 23 and 24 can be generalized. For instance the updating rule η(d t (z)) = exp { γ(d t (z) ε)}, d t (z) = π(z) q t (z S t 1 ). (27) with γ, ε (, + ) has the rule in Eq. 23 as a special case for γ = 1 and ε =. This generalization has an interesting limiting case that will be considered in our experiments. For γ + we obtain a sort of deterministic updating of the set of support point. In this case, the function η takes value, if d t (z) > ε and 1 if d t (z) ε. Through the threshold parameter ε it is possible to control the number of support points. The parameter can be updated over the iterations following a deterministic rule in such a way to stop the adaptation of the proposal and to reduce the computational cost of the algorithm. 25

Adaptive Rejection Sampling with fixed number of nodes

Adaptive Rejection Sampling with fixed number of nodes Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, Brazil. Abstract The adaptive rejection sampling

More information

Adaptive Rejection Sampling with fixed number of nodes

Adaptive Rejection Sampling with fixed number of nodes Adaptive Rejection Sampling with fixed number of nodes L. Martino, F. Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos (São Paulo). Abstract The adaptive

More information

Parsimonious Adaptive Rejection Sampling

Parsimonious Adaptive Rejection Sampling Parsimonious Adaptive Rejection Sampling Luca Martino Image Processing Laboratory, Universitat de València (Spain). Abstract Monte Carlo (MC) methods have become very popular in signal processing during

More information

INDEPENDENT DOUBLY ADAPTIVE REJECTION METROPOLIS SAMPLING. Luca Martino, Jesse Read, David Luengo

INDEPENDENT DOUBLY ADAPTIVE REJECTION METROPOLIS SAMPLING. Luca Martino, Jesse Read, David Luengo INDEPENDENT DOUBLY ADAPTIVE REJECTION METROPOLIS SAMPLING Luca Martino, Jesse Read, David Luengo Dep. of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland Dep. of Signal Theory

More information

The Recycling Gibbs Sampler for Efficient Learning

The Recycling Gibbs Sampler for Efficient Learning The Recycling Gibbs Sampler for Efficient Learning L. Martino, V. Elvira, G. Camps-Valls Universidade de São Paulo, São Carlos (Brazil). Télécom ParisTech, Université Paris-Saclay. (France), Universidad

More information

An introduction to adaptive MCMC

An introduction to adaptive MCMC An introduction to adaptive MCMC Gareth Roberts MIRAW Day on Monte Carlo methods March 2011 Mainly joint work with Jeff Rosenthal. http://www2.warwick.ac.uk/fac/sci/statistics/crism/ Conferences and workshops

More information

Generalized Rejection Sampling Schemes and Applications in Signal Processing

Generalized Rejection Sampling Schemes and Applications in Signal Processing Generalized Rejection Sampling Schemes and Applications in Signal Processing 1 arxiv:0904.1300v1 [stat.co] 8 Apr 2009 Luca Martino and Joaquín Míguez Department of Signal Theory and Communications, Universidad

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

The Recycling Gibbs Sampler for Efficient Learning

The Recycling Gibbs Sampler for Efficient Learning The Recycling Gibbs Sampler for Efficient Learning Luca Martino, Victor Elvira, Gustau Camps-Valls Image Processing Laboratory, Universitat de València (Spain). Department of Signal Processing, Universidad

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Adaptive Population Monte Carlo

Adaptive Population Monte Carlo Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

September Math Course: First Order Derivative

September Math Course: First Order Derivative September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which

More information

Two adaptive rejection sampling schemes for probability density functions log-convex tails

Two adaptive rejection sampling schemes for probability density functions log-convex tails Two adaptive rejection sampling schemes for probability density functions log-convex tails 1 arxiv:1111.4942v1 [stat.co] 21 Nov 2011 Luca Martino and Joaquín Míguez Department of Signal Theory and Communications,

More information

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) A quick introduction to Markov chains and Markov chain Monte Carlo (revised version) Rasmus Waagepetersen Institute of Mathematical Sciences Aalborg University 1 Introduction These notes are intended to

More information

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa

Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation. Luke Tierney Department of Statistics & Actuarial Science University of Iowa Markov Chain Monte Carlo Using the Ratio-of-Uniforms Transformation Luke Tierney Department of Statistics & Actuarial Science University of Iowa Basic Ratio of Uniforms Method Introduced by Kinderman and

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Control Variates for Markov Chain Monte Carlo

Control Variates for Markov Chain Monte Carlo Control Variates for Markov Chain Monte Carlo Dellaportas, P., Kontoyiannis, I., and Tsourti, Z. Dept of Statistics, AUEB Dept of Informatics, AUEB 1st Greek Stochastics Meeting Monte Carlo: Probability

More information

Layered Adaptive Importance Sampling

Layered Adaptive Importance Sampling Noname manuscript No (will be inserted by the editor) Layered Adaptive Importance Sampling L Martino V Elvira D Luengo J Corander Received: date / Accepted: date Abstract Monte Carlo methods represent

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Simulating Random Variables

Simulating Random Variables Simulating Random Variables Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 23 R has many built-in random number generators... Beta, gamma (also

More information

16 : Approximate Inference: Markov Chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

Likelihood Inference for Lattice Spatial Processes

Likelihood Inference for Lattice Spatial Processes Likelihood Inference for Lattice Spatial Processes Donghoh Kim November 30, 2004 Donghoh Kim 1/24 Go to 1234567891011121314151617 FULL Lattice Processes Model : The Ising Model (1925), The Potts Model

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

Computer Practical: Metropolis-Hastings-based MCMC

Computer Practical: Metropolis-Hastings-based MCMC Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov

More information

A Modified Adaptive Accept-Reject Algorithm for Univariate Densities with Bounded Support

A Modified Adaptive Accept-Reject Algorithm for Univariate Densities with Bounded Support A Modified Adaptive Accept-Reect Algorithm for Univariate Densities with Bounded Support Carsten Botts Department of Mathematics and Statistics Williams College Williamstown, MA 01267, USA cbotts@williams.edu

More information

Surveying the Characteristics of Population Monte Carlo

Surveying the Characteristics of Population Monte Carlo International Research Journal of Applied and Basic Sciences 2013 Available online at www.irjabs.com ISSN 2251-838X / Vol, 7 (9): 522-527 Science Explorer Publications Surveying the Characteristics of

More information

Transformed Density Rejection with Inflection Points

Transformed Density Rejection with Inflection Points Transformed Density Rejection with Inflection Points Carsten Botts, Wolfgang Hörmann, Josef Leydold Research Report Series Report 110, July 2011 Institute for Statistics and Mathematics http://statmath.wu.ac.at/

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

13 Notes on Markov Chain Monte Carlo

13 Notes on Markov Chain Monte Carlo 13 Notes on Markov Chain Monte Carlo Markov Chain Monte Carlo is a big, and currently very rapidly developing, subject in statistical computation. Many complex and multivariate types of random data, useful

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Markov chain Monte Carlo Lecture 9

Markov chain Monte Carlo Lecture 9 Markov chain Monte Carlo Lecture 9 David Sontag New York University Slides adapted from Eric Xing and Qirong Ho (CMU) Limitations of Monte Carlo Direct (unconditional) sampling Hard to get rare events

More information

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm On the Optimal Scaling of the Modified Metropolis-Hastings algorithm K. M. Zuev & J. L. Beck Division of Engineering and Applied Science California Institute of Technology, MC 4-44, Pasadena, CA 925, USA

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 18-16th March Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 18-16th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Trans-dimensional Markov chain Monte Carlo. Bayesian model for autoregressions.

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Markov Chain Monte Carlo Lecture 4

Markov Chain Monte Carlo Lecture 4 The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.

More information

CSCI-6971 Lecture Notes: Monte Carlo integration

CSCI-6971 Lecture Notes: Monte Carlo integration CSCI-6971 Lecture otes: Monte Carlo integration Kristopher R. Beevers Department of Computer Science Rensselaer Polytechnic Institute beevek@cs.rpi.edu February 21, 2006 1 Overview Consider the following

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Randomized Quasi-Monte Carlo for MCMC

Randomized Quasi-Monte Carlo for MCMC Randomized Quasi-Monte Carlo for MCMC Radu Craiu 1 Christiane Lemieux 2 1 Department of Statistics, Toronto 2 Department of Statistics, Waterloo Third Workshop on Monte Carlo Methods Harvard, May 2007

More information

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection

A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection A generalization of the Multiple-try Metropolis algorithm for Bayesian estimation and model selection Silvia Pandolfi Francesco Bartolucci Nial Friel University of Perugia, IT University of Perugia, IT

More information

MCMC Methods: Gibbs and Metropolis

MCMC Methods: Gibbs and Metropolis MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/30 Introduction As we have seen, the ability to sample from the posterior distribution

More information

Robert Collins CSE586, PSU Intro to Sampling Methods

Robert Collins CSE586, PSU Intro to Sampling Methods Robert Collins Intro to Sampling Methods CSE586 Computer Vision II Penn State Univ Robert Collins A Brief Overview of Sampling Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

Propp-Wilson Algorithm (and sampling the Ising model)

Propp-Wilson Algorithm (and sampling the Ising model) Propp-Wilson Algorithm (and sampling the Ising model) Danny Leshem, Nov 2009 References: Haggstrom, O. (2002) Finite Markov Chains and Algorithmic Applications, ch. 10-11 Propp, J. & Wilson, D. (1996)

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

ORTHOGONAL PARALLEL MCMC METHODS FOR SAMPLING AND OPTIMIZATION

ORTHOGONAL PARALLEL MCMC METHODS FOR SAMPLING AND OPTIMIZATION ORTHOGONAL PARALLEL MCMC METHODS FOR SAMPLING AND OPTIMIZATION L Martino, V Elvira, D Luengo, J Corander, F Louzada Institute of Mathematical Sciences and Computing, Universidade de São Paulo, São Carlos

More information

A Review of Basic Monte Carlo Methods

A Review of Basic Monte Carlo Methods A Review of Basic Monte Carlo Methods Julian Haft May 9, 2014 Introduction One of the most powerful techniques in statistical analysis developed in this past century is undoubtedly that of Monte Carlo

More information

Bayesian Methods with Monte Carlo Markov Chains II

Bayesian Methods with Monte Carlo Markov Chains II Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3

More information

Jim Lambers MAT 460/560 Fall Semester Practice Final Exam

Jim Lambers MAT 460/560 Fall Semester Practice Final Exam Jim Lambers MAT 460/560 Fall Semester 2009-10 Practice Final Exam 1. Let f(x) = sin 2x + cos 2x. (a) Write down the 2nd Taylor polynomial P 2 (x) of f(x) centered around x 0 = 0. (b) Write down the corresponding

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

Markov chain Monte Carlo methods in atmospheric remote sensing

Markov chain Monte Carlo methods in atmospheric remote sensing 1 / 45 Markov chain Monte Carlo methods in atmospheric remote sensing Johanna Tamminen johanna.tamminen@fmi.fi ESA Summer School on Earth System Monitoring and Modeling July 3 Aug 11, 212, Frascati July,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown

More information

Reminder of some Markov Chain properties:

Reminder of some Markov Chain properties: Reminder of some Markov Chain properties: 1. a transition from one state to another occurs probabilistically 2. only state that matters is where you currently are (i.e. given present, future is independent

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Function Approximation

Function Approximation 1 Function Approximation This is page i Printer: Opaque this 1.1 Introduction In this chapter we discuss approximating functional forms. Both in econometric and in numerical problems, the need for an approximating

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Lecture XI. Approximating the Invariant Distribution

Lecture XI. Approximating the Invariant Distribution Lecture XI Approximating the Invariant Distribution Gianluca Violante New York University Quantitative Macroeconomics G. Violante, Invariant Distribution p. 1 /24 SS Equilibrium in the Aiyagari model G.

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Random Walks A&T and F&S 3.1.2

Random Walks A&T and F&S 3.1.2 Random Walks A&T 110-123 and F&S 3.1.2 As we explained last time, it is very difficult to sample directly a general probability distribution. - If we sample from another distribution, the overlap will

More information

Examples of Adaptive MCMC

Examples of Adaptive MCMC Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters

More information

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M. 5 Vector fields Last updated: March 12, 2012. 5.1 Definition and general properties We first need to define what a vector field is. Definition 5.1. A vector field v on a manifold M is map M T M such that

More information

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that 1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1

More information

19 : Slice Sampling and HMC

19 : Slice Sampling and HMC 10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often

More information

Optimizing and Adapting the Metropolis Algorithm

Optimizing and Adapting the Metropolis Algorithm 6 Optimizing and Adapting the Metropolis Algorithm Jeffrey S. Rosenthal University of Toronto, Toronto, ON 6.1 Introduction Many modern scientific questions involve high-dimensional data and complicated

More information

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis Lect4: Exact Sampling Techniques and MCMC Convergence Analysis. Exact sampling. Convergence analysis of MCMC. First-hit time analysis for MCMC--ways to analyze the proposals. Outline of the Module Definitions

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Sampling multimodal densities in high dimensional sampling space

Sampling multimodal densities in high dimensional sampling space Sampling multimodal densities in high dimensional sampling space Gersende FORT LTCI, CNRS & Telecom ParisTech Paris, France Journées MAS Toulouse, Août 4 Introduction Sample from a target distribution

More information

STAT232B Importance and Sequential Importance Sampling

STAT232B Importance and Sequential Importance Sampling STAT232B Importance and Sequential Importance Sampling Gianfranco Doretto Andrea Vedaldi June 7, 2004 1 Monte Carlo Integration Goal: computing the following integral µ = h(x)π(x) dx χ Standard numerical

More information

Gärtner-Ellis Theorem and applications.

Gärtner-Ellis Theorem and applications. Gärtner-Ellis Theorem and applications. Elena Kosygina July 25, 208 In this lecture we turn to the non-i.i.d. case and discuss Gärtner-Ellis theorem. As an application, we study Curie-Weiss model with

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

I forgot to mention last time: in the Ito formula for two standard processes, putting

I forgot to mention last time: in the Ito formula for two standard processes, putting I forgot to mention last time: in the Ito formula for two standard processes, putting dx t = a t dt + b t db t dy t = α t dt + β t db t, and taking f(x, y = xy, one has f x = y, f y = x, and f xx = f yy

More information

Lecture 6: Markov Chain Monte Carlo

Lecture 6: Markov Chain Monte Carlo Lecture 6: Markov Chain Monte Carlo D. Jason Koskinen koskinen@nbi.ku.dk Photo by Howard Jackman University of Copenhagen Advanced Methods in Applied Statistics Feb - Apr 2016 Niels Bohr Institute 2 Outline

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

NUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places.

NUMERICAL METHODS. x n+1 = 2x n x 2 n. In particular: which of them gives faster convergence, and why? [Work to four decimal places. NUMERICAL METHODS 1. Rearranging the equation x 3 =.5 gives the iterative formula x n+1 = g(x n ), where g(x) = (2x 2 ) 1. (a) Starting with x = 1, compute the x n up to n = 6, and describe what is happening.

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Gradient-based Monte Carlo sampling methods

Gradient-based Monte Carlo sampling methods Gradient-based Monte Carlo sampling methods Johannes von Lindheim 31. May 016 Abstract Notes for a 90-minute presentation on gradient-based Monte Carlo sampling methods for the Uncertainty Quantification

More information

Numerical Analysis for Statisticians

Numerical Analysis for Statisticians Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

More information

Effective Sample Size for Importance Sampling based on discrepancy measures

Effective Sample Size for Importance Sampling based on discrepancy measures Effective Sample Size for Importance Sampling based on discrepancy measures L. Martino, V. Elvira, F. Louzada Universidade de São Paulo, São Carlos (Brazil). Universidad Carlos III de Madrid, Leganés (Spain).

More information