Inexact approximations for doubly and triply intractable problems

Size: px
Start display at page:

Download "Inexact approximations for doubly and triply intractable problems"

Transcription

1 Inexact approximations for doubly and triply intractable problems March 27th, 2014

2 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers of) interacting objects usually modelling symmetrical interactions. Used widely in statistics, physics and computer science, e.g. image analysis; ferromagnetism; geostatistics; point processes; social networks.

3 Markov random fields Image analysis The log expression of 72 genes on a particular chromosome over 46 hours (Friel et al. 2009).

4 Markov random fields Pairwise Markov random fields

5 Markov random fields Intractable normalising constants Pairwise MRFs correspond to the factorisation f (y θ) γ(y θ) = φ(y i,y j θ). (i,j) Nei(y) We also need to specify the normalising constant Z(θ) = φ(y i,y j θ)dy y (i,j) Nei(y) In general we are interested in models that take the form Gibbs random fields f (y θ) = γ(y θ) Z(θ). f (y θ) = exp(θ T S(y)). Z(θ)

6 A doubly intractable problem Doubly intractable Suppose we want to estimate parameters θ after observing Y = y. Use Bayesian inference to find π(θ y) f (y θ)p(θ). Could use MCMC, but the acceptance probability in MH is { min 1, q(θ θ ) p(θ ) γ(y θ } ) 1 Z(θ) q(θ θ) p(θ) γ(y θ) Z(θ. ) 1

7 A doubly intractable problem Doubly intractable Suppose we want to estimate parameters θ after observing Y = y. Use Bayesian inference to find π(θ y) f (y θ)p(θ). Could use MCMC, but the acceptance probability in MH is { min 1, q(θ θ ) p(θ ) γ(y θ } ) 1 Z(θ) q(θ θ) p(θ) γ(y θ) Z(θ. ) 1

8 A doubly intractable problem ABC-MCMC Approximate an intractable likelihood at θ with: R 1 R π ε (S(x r ) S(y)) r=1 where the x r f (. θ) are R simulations from f (originally in Ratmann et al. (2009)). Often R = 1 and π ε (. S(y)) = U (. (S(y) ε,s(y) + ε)). Essentially a nonparametric kernel estimator to the conditional distribution of the statistics given θ, based on simulations from f. ABC-MCMC is an MCMC algorithm that targets this approximate posterior.

9 A doubly intractable problem ABC-MCMC Approximate an intractable likelihood at θ with: R 1 R π ε (S(x r ) S(y)) r=1 where the x r f (. θ) are R simulations from f (originally in Ratmann et al. (2009)). Often R = 1 and π ε (. S(y)) = U (. (S(y) ε,s(y) + ε)). Essentially a nonparametric kernel estimator to the conditional distribution of the statistics given θ, based on simulations from f. ABC-MCMC is an MCMC algorithm that targets this approximate posterior.

10 A doubly intractable problem ABC on ERGMs True ABC

11 A doubly intractable problem Synthetic likelihood An alternative approximation proposed in Wood (2010). Again take R simulations from f, x r f (. θ), and take the summary statistics of each. But instead use a multivariate normal approximation to the distribution of the summary statistics given θ: L(S(y) θ) = N (S(y) µ θ, Σ ) θ, where µ θ = 1 R R S (x r ), r=1 Σ θ = sst R 1, with s = (S (x 1 ) µ θ,...,s (x R ) µ θ ).

12 A doubly intractable problem The single auxiliary variable (SAV) method Møller et al. (2006) augment the target distribution with an extra variable u and use π(θ,u y) q u (u θ,y)f (y θ)p(θ) where q u is some (normalised) arbitrary distribution and u is on the same space as y. As the MH proposal in (θ,u)-space they use (θ,u ) f (u θ )q(θ θ). This gives an acceptance probability of { min 1, q(θ θ ) p(θ ) γ(y θ ) q u (u θ,y) q(θ θ) p(θ) γ(y θ) γ(u θ ) γ(u θ) q u (u θ,y) }.

13 A doubly intractable problem Exact approximations Note that q u(u θ,y) γ(u θ ) 1 estimator of Z(θ ). is an unbiased importance sampling still targets the correct distribution! first seen in the pseudo-marginal methods of Beaumont (2003) and Andrieu and Roberts (2009). Relies on being able to simulate exactly from f (. θ ), which is usually not possible or computationally expensive. Girolami et al. (2013) introduce an approach that does not require exact simulation ( Russian Roulette ).

14 A doubly intractable problem Exact approximations Note that q u(u θ,y) γ(u θ ) 1 estimator of Z(θ ). is an unbiased importance sampling still targets the correct distribution! first seen in the pseudo-marginal methods of Beaumont (2003) and Andrieu and Roberts (2009). Relies on being able to simulate exactly from f (. θ ), which is usually not possible or computationally expensive. Girolami et al. (2013) introduce an approach that does not require exact simulation ( Russian Roulette ).

15 A doubly intractable problem Exact approximations Note that q u(u θ,y) γ(u θ ) 1 estimator of Z(θ ). is an unbiased importance sampling still targets the correct distribution! first seen in the pseudo-marginal methods of Beaumont (2003) and Andrieu and Roberts (2009). Relies on being able to simulate exactly from f (. θ ), which is usually not possible or computationally expensive. Girolami et al. (2013) introduce an approach that does not require exact simulation ( Russian Roulette ).

16 A doubly intractable problem The exchange algorithm Murray et al. (2006) propose instead to use γ(u θ) γ(u θ ) importance sampling estimator of Z(θ) Z(θ ). This gives an acceptance probability of { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) γ(u θ) γ(u θ ) }. as an An unbiased estimator of the acceptance probability rather than the target, so no longer fits into the exact-approximation framework however, this method still has the correct target; something of a special case... Simpler, and often more efficient, than SAV.

17 A triply intractable problem Estimating the marginal likelihood The marginal likelihood (also known as the evidence) is p(y) = p(θ)f (y θ)dθ. Used in Bayesian model comparison θ p(m y) = p(m)p(y M), most commonly seen in the Bayes factor, for comparing models p(y M 1 ) p(y M 2 ). All commonly used methods require f (y θ) to be tractable in θ, and usually can t be estimated from MCMC output a triply intractable problem - Friel (2013).

18 A triply intractable problem Chib s method (via population exchange) Friel (2013) details an approach that uses Chib s method. For any θ: p(y) = f (y θ)π( θ) π( θ y) = γ(y θ)π( θ) Z( θ)π( θ y). A population variant of the exchange algorithm is used to simulate points from π(θ y). This approach gives an estimate of Z(θ) for each θ drawn from π(θ y). Then Chib s method is used, averaging the identity above over a number of high probability draws from π(θ y), using: the terms in the numerator directly; the estimate of Z(θ) from the population exchange algorithm; a kernel density estimate of π(θ y). Relies on θ being low dimensional.

19 A triply intractable problem Chib s method (via population exchange) Friel (2013) details an approach that uses Chib s method. For any θ: p(y) = f (y θ)π( θ) π( θ y) = γ(y θ)π( θ) Z( θ)π( θ y). A population variant of the exchange algorithm is used to simulate points from π(θ y). This approach gives an estimate of Z(θ) for each θ drawn from π(θ y). Then Chib s method is used, averaging the identity above over a number of high probability draws from π(θ y), using: the terms in the numerator directly; the estimate of Z(θ) from the population exchange algorithm; a kernel density estimate of π(θ y). Relies on θ being low dimensional.

20 A triply intractable problem Chib s method (via population exchange) Friel (2013) details an approach that uses Chib s method. For any θ: p(y) = f (y θ)π( θ) π( θ y) = γ(y θ)π( θ) Z( θ)π( θ y). A population variant of the exchange algorithm is used to simulate points from π(θ y). This approach gives an estimate of Z(θ) for each θ drawn from π(θ y). Then Chib s method is used, averaging the identity above over a number of high probability draws from π(θ y), using: the terms in the numerator directly; the estimate of Z(θ) from the population exchange algorithm; a kernel density estimate of π(θ y). Relies on θ being low dimensional.

21 A triply intractable problem Using importance sampling (IS) Importance sampling Returns a weighted sample {(θ (p),w (p) ) 1 p P} from π(θ y). For p = 1 : P Simulate θ (p) q(.) Weight w (p) = p(θ (p) )f (y θ (p) ). q(θ (p) ) Then p(y) = 1 P P p=1 w (p).

22 A triply intractable problem Using ABC-IS Didelot, Everitt, Johansen and Lawson (2011) investigate the use of the ABC approximation when using IS for marginal likelihoods. The weights are w (p) = p(θ (p) ) 1 R R r=1 π ε (S(x r (p) ) S(y)) q(θ (p) ) } R where { x r (p) f (. θ (p) ). r=1 This method gives p(s(y)) p(y). Didelot et al. (2011), Grelaud et al. (2009), Robert et al. (2011), Marin et al. (2014), discuss the choice of summary statistics.

23 A triply intractable problem Exponential family models Didelot et al. (2011): when comparing two exponential family models, if S 1 (y) is sufficient for the parameters in model 1 S 2 (y) is sufficient for the parameters in model 2 Then using the vector S(y) = (S 1 (y),s 2 (y)) for both models gives p(y M 1 ) p(y M 2 ) = p(s(y) M 1) p(s(y) M 2 ). Marin et al. (2014) has much more general guidance.

24 A triply intractable problem Synthetic likelihood IS We could also use the SL approximation within IS. The weight update is then p(θ (p) )N (S(y) µ θ, Σ ) θ w (p) = q(θ (p), ) where µ θ, Σ { θ are based on x (p) r } R f (. θ (p) ). r=1 Does not require choosing ε, but relies on normality assumption.

25 A triply intractable problem Exact methods? Importance sampling: p(y) = θ 1 P = 1 P f (y θ)p(θ) q(θ)dθ q(θ) P f (y θ (p) )p(θ (p) ) p=1 q(θ (p) ) P γ(y θ (p) )p(θ (p) ) 1 p=1 q(θ (p) ) Z(θ (p) ). Intractable...

26 A triply intractable problem Exact methods? Importance sampling: p(y) = θ 1 P = 1 P f (y θ)p(θ) q(θ)dθ q(θ) P f (y θ (p) )p(θ (p) ) p=1 q(θ (p) ) P γ(y θ (p) )p(θ (p) ) 1 p=1 q(θ (p) ) Z(θ (p) ). Intractable...

27 A triply intractable problem SAV importance sampling Consider the SAV target π(θ,u y) q u (u θ,y)f (y θ)p(θ), noting that it has the same marginal likelihood as π(θ y). Suppose we do importance sampling on this SAV target, and choose the proposal to be q(θ,u) = f (u θ)q(θ). We obtain p(y) = 1 P = 1 P P p=1 P p=1 q u (u θ (p),y)γ(y θ (p) )p(θ (p) ) Z(θ (p) ) γ(u θ (p) )q(θ (p) ) Z(θ (p) ) γ(y θ (p) )p(θ (p) ) q(θ (p) ) q u (u θ (p),y) γ(u θ (p). )

28 A triply intractable problem SAV importance sampling Consider the SAV target π(θ,u y) q u (u θ,y)f (y θ)p(θ), noting that it has the same marginal likelihood as π(θ y). Suppose we do importance sampling on this SAV target, and choose the proposal to be q(θ,u) = f (u θ)q(θ). We obtain p(y) = 1 P = 1 P P p=1 P p=1 q u (u θ (p),y)γ(y θ (p) )p(θ (p) ) Z(θ (p) ) γ(u θ (p) )q(θ (p) ) Z(θ (p) ) γ(y θ (p) )p(θ (p) ) q(θ (p) ) q u (u θ (p),y) γ(u θ (p). )

29 A triply intractable problem Exact approximations revisited Using unbiased weight estimates within importance sampling: (IS) 2 (Tran et al., 2013); random weight particle filters (Fearnhead et al. 2010); (SMC) 2 (Chopin et al. 2011). For each θ, we could use multiple u variables and use the estimate 1 Z(θ) = 1 q u (u (m) θ,y) M γ(u (m). θ) M m=1 For u the proposal is pre-determined, but we need to choose q u (u θ,y). Møller et al. (2006): one possible choice is q u (u θ,y) = γ(u θ)/z( θ) where θ is an ML estimate (or some other appropriate estimate) of θ.

30 A triply intractable problem Exact approximations revisited Using unbiased weight estimates within importance sampling: (IS) 2 (Tran et al., 2013); random weight particle filters (Fearnhead et al. 2010); (SMC) 2 (Chopin et al. 2011). For each θ, we could use multiple u variables and use the estimate 1 Z(θ) = 1 q u (u (m) θ,y) M γ(u (m). θ) M m=1 For u the proposal is pre-determined, but we need to choose q u (u θ,y). Møller et al. (2006): one possible choice is q u (u θ,y) = γ(u θ)/z( θ) where θ is an ML estimate (or some other appropriate estimate) of θ.

31 A triply intractable problem Exact approximations revisited Using unbiased weight estimates within importance sampling: (IS) 2 (Tran et al., 2013); random weight particle filters (Fearnhead et al. 2010); (SMC) 2 (Chopin et al. 2011). For each θ, we could use multiple u variables and use the estimate 1 Z(θ) = 1 q u (u (m) θ,y) M γ(u (m). θ) M m=1 For u the proposal is pre-determined, but we need to choose q u (u θ,y). Møller et al. (2006): one possible choice is q u (u θ,y) = γ(u θ)/z( θ) where θ is an ML estimate (or some other appropriate estimate) of θ.

32 A triply intractable problem SAVIS / MAVIS Using the suggested q u gives the following importance sampling estimate of 1/Z(θ) 1 Z(θ) = 1 Z( θ) 1 M M m=1 γ(u (m) θ) γ(u (m) θ). Or, using annealed importance sampling (Neal, 2001) with the sequence of targets f k (. θ, θ,y) γ k (. θ, θ) = γ(. θ) (K+1 k)/(k+1) +γ(. θ) k/(k+1), we obtain 1 Z(θ) = 1 Z( θ) 1 M M m=1 K k=0 γ k+1 (u (m) k θ,θ,y) γ k (u (m) k θ,θ,y).

33 A triply intractable problem SAVIS / MAVIS Using the suggested q u gives the following importance sampling estimate of 1/Z(θ) 1 Z(θ) = 1 Z( θ) 1 M M m=1 γ(u (m) θ) γ(u (m) θ). Or, using annealed importance sampling (Neal, 2001) with the sequence of targets f k (. θ, θ,y) γ k (. θ, θ) = γ(. θ) (K+1 k)/(k+1) +γ(. θ) k/(k+1), we obtain 1 Z(θ) = 1 Z( θ) 1 M M m=1 K k=0 γ k+1 (u (m) k θ,θ,y) γ k (u (m) k θ,θ,y).

34 A triply intractable problem Toy example: Poisson vs geometric Consider i.i.d. observations {y i } n i=1 of a discrete random variable that takes values in N. We find the Bayes factor for the models 1 Y θ Poisson(θ), θ Exp(1) f 1 ({y i } n i=1 θ) = λ x i exp( λ) i x i! 1 = exp(nλ) λ x i i x i! 2 Y θ Geometric(θ), θ Unif(0,1) f 2 ({y i } n i=1 θ) = p(1 p) x i = 1 p n (1 p) x i. i

35 A triply intractable problem Results: box plots

36 A triply intractable problem Results: ABC-IS

37 A triply intractable problem Results: SL-IS

38 A triply intractable problem Results: MAVIS

39 A triply intractable problem Application to social networks Compare the evidence for two alternative exponential random graph models p(y θ) exp(θ T S(y)). in model 1 S(y) = number of edges in model 2 S(y) = (number of edges, number of two stars) (so now θ is 2-d). Use prior p(θ) = N (0,25I ), as in Friel (2013).

40 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.

41 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.

42 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.

43 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.

44 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.

45 A triply intractable problem Comparison of methods ABC vs MAVIS both require the simulation of auxiliary variables, but in ABC/SL the use of summary statistics dramatically reduces the dimension of the space; but MAVIS only requires the auxiliary variable to look like it is a good simulation from f (. θ), not (the different requirement) that it is a good match to y. Plus the standard drawbacks of ABC remain choice of tolerance ε, S(.); not able to estimate the evidence, only Bayes factors. SL vs ABC SL fails when Gaussian assumption is not appropriate but it is surprisingly robust and there is no need to choose an ε.

46 A triply intractable problem Comparison of methods ABC vs MAVIS both require the simulation of auxiliary variables, but in ABC/SL the use of summary statistics dramatically reduces the dimension of the space; but MAVIS only requires the auxiliary variable to look like it is a good simulation from f (. θ), not (the different requirement) that it is a good match to y. Plus the standard drawbacks of ABC remain choice of tolerance ε, S(.); not able to estimate the evidence, only Bayes factors. SL vs ABC SL fails when Gaussian assumption is not appropriate but it is surprisingly robust and there is no need to choose an ε.

47 A triply intractable problem Comparison of methods ABC vs MAVIS both require the simulation of auxiliary variables, but in ABC/SL the use of summary statistics dramatically reduces the dimension of the space; but MAVIS only requires the auxiliary variable to look like it is a good simulation from f (. θ), not (the different requirement) that it is a good match to y. Plus the standard drawbacks of ABC remain choice of tolerance ε, S(.); not able to estimate the evidence, only Bayes factors. SL vs ABC SL fails when Gaussian assumption is not appropriate but it is surprisingly robust and there is no need to choose an ε.

48 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).

49 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).

50 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).

51 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).

52 Inexact approximations Returning to MCMC In the previous section we started to examine the use of importance samplers with estimated weights in practice, the small bias that is introduced is not as important as the Monte Carlo variance; empirically, a similar observation applies to SMC samplers. Returning to MCMC, we might wonder about the performance of algorithms with estimated acceptance probabilities Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels - Alquier, Friel, Everitt and Boland (2014). Is the MH method with acceptance probability α(θ,θ ) close to the method using α(θ,θ,x ), with x F (. θ,θ )?

53 Inexact approximations Returning to MCMC In the previous section we started to examine the use of importance samplers with estimated weights in practice, the small bias that is introduced is not as important as the Monte Carlo variance; empirically, a similar observation applies to SMC samplers. Returning to MCMC, we might wonder about the performance of algorithms with estimated acceptance probabilities Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels - Alquier, Friel, Everitt and Boland (2014). Is the MH method with acceptance probability α(θ,θ ) close to the method using α(θ,θ,x ), with x F (. θ,θ )?

54 Inexact approximations Motivation: noisy exchange algorithm Use γ(u θ) γ(u θ ), with u f (. θ ), as an importance sampling estimator of Z(θ), giving an acceptance probability of Z(θ ) { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) γ(u θ) γ(u θ ) }. Could be improved by simulating R importance points ur, to give { } min 1, q(θ θ ) p(θ ) γ(y θ R ) 1 γ(ur q(θ θ) p(θ) γ(y θ) R θ) γ(ur θ? ) r=1 However, this no longer gives an exact algorithm: r = 1, exact; 1 < r <, inexact; r =, exact.

55 Inexact approximations Motivation: noisy exchange algorithm Use γ(u θ) γ(u θ ), with u f (. θ ), as an importance sampling estimator of Z(θ), giving an acceptance probability of Z(θ ) { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) γ(u θ) γ(u θ ) }. Could be improved by simulating R importance points ur, to give { } min 1, q(θ θ ) p(θ ) γ(y θ R ) 1 γ(ur q(θ θ) p(θ) γ(y θ) R θ) γ(ur θ? ) r=1 However, this no longer gives an exact algorithm: r = 1, exact; 1 < r <, inexact; r =, exact.

56 Inexact approximations Noisy MCMC MCMC involves simulating a Markov chain (θ n ) n N with transition kernel P such that π is invariant under P: πp = π. In some situations there is a natural kernel P, s.t. πp = π, but which we cannot draw θ n+1 P(θ n, ) for a fixed θ n. A natural idea is to replace P by an approximation ˆP. Ideally, ˆP is close to P, but generally π ˆP π. This leads to the obvious question: Can we say something on how close the resultant Markov chain with transition kernel ˆP is that resulting from P? Eg, is it possible to upper bound? δ θ0 ˆP n π. It turns out that a useful answer is given by the study of the stability of Markov chains.

57 Inexact approximations Noisy MCMC MCMC involves simulating a Markov chain (θ n ) n N with transition kernel P such that π is invariant under P: πp = π. In some situations there is a natural kernel P, s.t. πp = π, but which we cannot draw θ n+1 P(θ n, ) for a fixed θ n. A natural idea is to replace P by an approximation ˆP. Ideally, ˆP is close to P, but generally π ˆP π. This leads to the obvious question: Can we say something on how close the resultant Markov chain with transition kernel ˆP is that resulting from P? Eg, is it possible to upper bound? δ θ0 ˆP n π. It turns out that a useful answer is given by the study of the stability of Markov chains.

58 Inexact approximations Noisy MCMC Theorem (Mitrophanov (2005), Corollary 3.1) If (H1) the MC with transition kernel P is uniformly ergodic: sup δ θ0 P n π Cρ n θ 0 for some C < and ρ < 1. Then we have, for any n N, for any starting point θ 0, ( ) δ θ0 P n δ ˆPn θ0 λ + Cρλ P 1 ρ ˆP where λ = log(1/c) log(ρ).

59 Inexact approximations Noisy MCMC Corollary Let us assume that (H1) holds. (The Markov chain with transition kernel P is uniformly ergodic), (H2) E x F θ ˆα(θ,θ,x ) α(θ,θ ) δ(θ,θ ). (1) Then we have, for any n N, for any starting point θ 0, ( ) δ θ0 P n δ θ0 ˆP n λ + Cρλ 2sup dθ q(θ θ)δ(θ,θ ) 1 ρ θ where λ = log(1/c) log(ρ).

60 Inexact approximations Noisy MCMC Note: when the upper bound in (1) is bounded: ˆα(θ,θ,x ) α(θ,θ ) δ(θ,θ ) δ <, E x F θ then it results that δ θ0 P n δ θ0 ˆP n δ ( λ + Cρλ 1 ρ Obviously, we expect that â is chosen in such a way that δ 1 and so in this case, δ θ0 P n δ θ0 ˆP n 1 as a consequence. ).

61 Inexact approximations Convergence of noisy exchange Lemma Here we show that the noisy exchange algorithm falls into our theoretical framework. ˆα satisfies (H2) in Lemma 4.2 with E x F θ ˆα(θ,θ,x ) α(θ,θ ) δ(θ,θ ) = 1 q(θ θ )π(θ )γ(y θ ( ) γ(y N q(θ Var ) θ) θ)π(θ)γ(y θ) y f (y θ ) γ(y θ. )

62 Inexact approximations Convergence of noisy exchange Moreover, we can show that. Assuming that the space Θ is bounded, then we have δ(θ,θ ) c2 h c2 πk 4 N, and therefore sup θ 0 Θ δ θ0 P n δ θ0 ˆP n C N where C = C (c π,c h,k ) is explicitly known.

63 Inexact approximations Noisy Langevin algorithms Langevin algorithm (e.g. Welling and Teh, 2011) use the update θ n+1 = θ n + Σ 2 logπ(θ n y) + η, η N(0,Σ). In practice, it is often the case that logπ(θ n ) cannot be computed. Here again, a natural idea is to replace logπ(θ n y) by an approximation or an estimate ˆ y logπ(θ n y). Noisy Langevin algorithm use θ n+1 = θ n + Σ logπ(θ 2 n y) + η, y F ( θ n ). η N(0,Σ), with

64 Inexact approximations A noisy Langevin algorithm for Gibbs random fields logπ(θ y) = logπ(θ) + logγ(y θ) log Therefore logπ(θ y) = logπ(θ) + logf (y θ) π(t)f (y t)dt. = logπ(θ) + [θ T s(y)] logz(θ) = logπ(θ) + s(y) E y f (. θ)[s(y )]. In practice, E y f θ [s(y )] is unavailable. However, it is possible to estimate it via Monte Carlo. Let y = (y 1,...,y N ) be N i.i.d. variables drawn from f (. θ ), we define ˆ y logπ(θ y) = logπ(θ) + s(y) 1 N N i=1 s(y i ). Here, we can also give theoretical guarantees for this algorithm.

65 Inexact approximations A noisy Langevin algorithm for Gibbs random fields logπ(θ y) = logπ(θ) + logγ(y θ) log Therefore logπ(θ y) = logπ(θ) + logf (y θ) π(t)f (y t)dt. = logπ(θ) + [θ T s(y)] logz(θ) = logπ(θ) + s(y) E y f (. θ)[s(y )]. In practice, E y f θ [s(y )] is unavailable. However, it is possible to estimate it via Monte Carlo. Let y = (y 1,...,y N ) be N i.i.d. variables drawn from f (. θ ), we define ˆ y logπ(θ y) = logπ(θ) + s(y) 1 N N i=1 s(y i ). Here, we can also give theoretical guarantees for this algorithm.

66 Inexact approximations Metropolis-adjusted Langevin exchange algorithm Of course, the noisy Langevin diffusion can be used as a proposal and corrected within an exchange algorithm. At each iteration we have draw y f (. θ n ); calculate ˆ y logπ(θ y) = logπ(θ) + s(y) 1 N N i=1 s(y i ); propose θ = θ n + Σ 2 ˆ y logπ(θ n y) + η n, where η n are i.i.d. N (0,Σ); accept with probability α(θ,θ n ) = γ(y θ n )π(θ )q(θ n θ )γ(y θ ) γ(y θ n )π(θ n )q(θ θ n )γ(y θ ). Can also be extended to a noisy version (as we did for the exchange algorithm).

67 Inexact approximations Simulation study 20 datasets were simulated from a first-order Ising model defined on a lattice, with a single interaction parameter θ = 0.4. The normalising constant z(θ) can be calculated exactly for a fine grid of {θ i : i = 1,...,N} values (Friel and Rue, 2007), which can be used to estimate ˆπ(y) = N i=2 (θ i θ i 1 ) 2 ( qθi (y) z(θ i ) π(θ i) + q θ i 1 (y) z(θ i 1 ) π(θ i 1) ), which in turn can be used to estimate of the posterior density at each grid point: π(θ i y) q θ i (y)π(θ i ) z(θ i )ˆπ(y), i = 1,...,n. Here we used a fine grid of 8,000 points in the interval [0,0.8].

68 Inexact approximations Results Bias Exchange Noisy Exch Noisy Lang MALA Exch Noisy MALA

69 Inexact approximations Conclusions A new approach (MAVIS) to evidence estimation for Gibbs random fields makes use of an inexact approximation; has some characteristics to recommend it over ABC and SL in general. Examined in more detail the use of noisy MCMC algorithms seen several examples of where this idea might be useful; an improved Monte Carlo variance may be more important than the biases that are introduced; quite a general framework (e.g. SL can also be seen to be a special case); Similar ideas can be used in SMC.

70 Inexact approximations Acknowledgements Noisy MCMC: Pierre Alquier, Nial Friel and Aiden Boland (UCD). Evidence estimation and synthetic likelihood: Nial Friel (UCD), Adam Johansen (Warwick), Melina Evdemon-Hogan and Ellen Rowing (Reading).

Evidence estimation for Markov random fields: a triply intractable problem

Evidence estimation for Markov random fields: a triply intractable problem Evidence estimation for Markov random fields: a triply intractable problem January 7th, 2014 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers

More information

An ABC interpretation of the multiple auxiliary variable method

An ABC interpretation of the multiple auxiliary variable method School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle

More information

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17 MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making

More information

Bayesian inference for intractable distributions

Bayesian inference for intractable distributions Bayesian inference for intractable distributions Nial Friel University College Dublin nial.friel@ucd.ie October, 2014 Introduction Main themes: The Bayesian inferential approach has had a profound impact

More information

A Review of Pseudo-Marginal Markov Chain Monte Carlo

A Review of Pseudo-Marginal Markov Chain Monte Carlo A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

A = {(x, u) : 0 u f(x)},

A = {(x, u) : 0 u f(x)}, Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent

More information

SMC 2 : an efficient algorithm for sequential analysis of state-space models

SMC 2 : an efficient algorithm for sequential analysis of state-space models SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra

More information

Notes on pseudo-marginal methods, variational Bayes and ABC

Notes on pseudo-marginal methods, variational Bayes and ABC Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Kernel adaptive Sequential Monte Carlo

Kernel adaptive Sequential Monte Carlo Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline

More information

Likelihood-free MCMC

Likelihood-free MCMC Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration

Bayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Kernel Sequential Monte Carlo

Kernel Sequential Monte Carlo Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section

More information

Adaptive HMC via the Infinite Exponential Family

Adaptive HMC via the Infinite Exponential Family Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family

More information

Bayesian Indirect Inference using a Parametric Auxiliary Model

Bayesian Indirect Inference using a Parametric Auxiliary Model Bayesian Indirect Inference using a Parametric Auxiliary Model Dr Chris Drovandi Queensland University of Technology, Australia c.drovandi@qut.edu.au Collaborators: Tony Pettitt and Anthony Lee February

More information

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling

Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling 1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]

More information

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June

More information

Riemann Manifold Methods in Bayesian Statistics

Riemann Manifold Methods in Bayesian Statistics Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes

More information

MONTE CARLO METHODS. Hedibert Freitas Lopes

MONTE CARLO METHODS. Hedibert Freitas Lopes MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods

Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods Lampros Bouranis *, Nial Friel, Florian Maire School of Mathematics and Statistics & Insight Centre for Data

More information

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy

More information

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference 1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL

PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL Xuebin Zheng Supervisor: Associate Professor Josef Dick Co-Supervisor: Dr. David Gunawan School of Mathematics

More information

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017 Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul

More information

Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo Methods Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)

More information

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced

More information

Monte Carlo Inference Methods

Monte Carlo Inference Methods Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Tutorial on ABC Algorithms

Tutorial on ABC Algorithms Tutorial on ABC Algorithms Dr Chris Drovandi Queensland University of Technology, Australia c.drovandi@qut.edu.au July 3, 2014 Notation Model parameter θ with prior π(θ) Likelihood is f(ý θ) with observed

More information

Approximate Inference using MCMC

Approximate Inference using MCMC Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Pseudo-marginal MCMC methods for inference in latent variable models

Pseudo-marginal MCMC methods for inference in latent variable models Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016

More information

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Kernel Adaptive Metropolis-Hastings

Kernel Adaptive Metropolis-Hastings Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013 Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained

More information

An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference

Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of

More information

Approximate Bayesian Computation: a simulation based approach to inference

Approximate Bayesian Computation: a simulation based approach to inference Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics

More information

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be

More information

Variational Scoring of Graphical Model Structures

Variational Scoring of Graphical Model Structures Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational

More information

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants Faming Liang Texas A& University Sooyoung Cheon Korea University Spatial Model Introduction

More information

arxiv: v1 [stat.me] 30 Sep 2009

arxiv: v1 [stat.me] 30 Sep 2009 Model choice versus model criticism arxiv:0909.5673v1 [stat.me] 30 Sep 2009 Christian P. Robert 1,2, Kerrie Mengersen 3, and Carla Chen 3 1 Université Paris Dauphine, 2 CREST-INSEE, Paris, France, and

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine

More information

Bayesian estimation of complex networks and dynamic choice in the music industry

Bayesian estimation of complex networks and dynamic choice in the music industry Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School,

More information

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals

More information

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models

More information

Bridge estimation of the probability density at a point. July 2000, revised September 2003

Bridge estimation of the probability density at a point. July 2000, revised September 2003 Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo 1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior

More information

Normalising constants and maximum likelihood inference

Normalising constants and maximum likelihood inference Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising

More information

Delayed Rejection Algorithm to Estimate Bayesian Social Networks

Delayed Rejection Algorithm to Estimate Bayesian Social Networks Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2014 Delayed Rejection Algorithm to Estimate Bayesian Social Networks Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

On some properties of Markov chain Monte Carlo simulation methods based on the particle filter

On some properties of Markov chain Monte Carlo simulation methods based on the particle filter On some properties of Markov chain Monte Carlo simulation methods based on the particle filter Michael K. Pitt Economics Department University of Warwick m.pitt@warwick.ac.uk Ralph S. Silva School of Economics

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

The zig-zag and super-efficient sampling for Bayesian analysis of big data

The zig-zag and super-efficient sampling for Bayesian analysis of big data The zig-zag and super-efficient sampling for Bayesian analysis of big data LMS-CRiSM Summer School on Computational Statistics 15th July 2018 Gareth Roberts, University of Warwick Joint work with Joris

More information

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007 Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA)

Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik

More information

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian inference for multivariate skew-normal and skew-t distributions Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof

More information

Markov Chain Monte Carlo Lecture 4

Markov Chain Monte Carlo Lecture 4 The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

STAT 425: Introduction to Bayesian Analysis

STAT 425: Introduction to Bayesian Analysis STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Manifold Monte Carlo Methods

Manifold Monte Carlo Methods Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:

More information

An introduction to Approximate Bayesian Computation methods

An introduction to Approximate Bayesian Computation methods An introduction to Approximate Bayesian Computation methods M.E. Castellanos maria.castellanos@urjc.es (from several works with S. Cabras, E. Ruli and O. Ratmann) Valencia, January 28, 2015 Valencia Bayesian

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Nested Sampling. Brendon J. Brewer.   brewer/ Department of Statistics The University of Auckland Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1 Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data

More information

MCMC Sampling for Bayesian Inference using L1-type Priors

MCMC Sampling for Bayesian Inference using L1-type Priors MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling

More information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9 Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods

More information

Improving power posterior estimation of statistical evidence

Improving power posterior estimation of statistical evidence Improving power posterior estimation of statistical evidence Nial Friel, Merrilee Hurn and Jason Wyse Department of Mathematical Sciences, University of Bath, UK 10 June 2013 Bayesian Model Choice Possible

More information

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Introduction to Markov Chain Monte Carlo & Gibbs Sampling Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu

More information

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers

ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors RicardoS.Ehlers Laboratório de Estatística e Geoinformação- UFPR http://leg.ufpr.br/ ehlers ehlers@leg.ufpr.br II Workshop on Statistical

More information

Fully Bayesian Analysis of Calibration Uncertainty In High Energy Spectral Analysis

Fully Bayesian Analysis of Calibration Uncertainty In High Energy Spectral Analysis In High Energy Spectral Analysis Department of Statistics, UCI February 26, 2013 Model Building Principle Component Analysis Three Inferencial Models Simulation Quasar Analysis Doubly-intractable Distribution

More information

arxiv: v5 [stat.co] 10 Apr 2018

arxiv: v5 [stat.co] 10 Apr 2018 THE BLOCK-POISSON ESTIMATOR FOR OPTIMALLY TUNED EXACT SUBSAMPLING MCMC MATIAS QUIROZ 1,2, MINH-NGOC TRAN 3, MATTIAS VILLANI 4, ROBERT KOHN 1 AND KHUE-DUNG DANG 1 arxiv:1603.08232v5 [stat.co] 10 Apr 2018

More information

arxiv: v1 [stat.co] 1 Jun 2015

arxiv: v1 [stat.co] 1 Jun 2015 arxiv:1506.00570v1 [stat.co] 1 Jun 2015 Towards automatic calibration of the number of state particles within the SMC 2 algorithm N. Chopin J. Ridgway M. Gerber O. Papaspiliopoulos CREST-ENSAE, Malakoff,

More information

MARKOV CHAIN MONTE CARLO

MARKOV CHAIN MONTE CARLO MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with

More information

CS281A/Stat241A Lecture 22

CS281A/Stat241A Lecture 22 CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution

More information

GAUSSIAN PROCESS REGRESSION

GAUSSIAN PROCESS REGRESSION GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The

More information