Inexact approximations for doubly and triply intractable problems
|
|
- Cuthbert Welch
- 5 years ago
- Views:
Transcription
1 Inexact approximations for doubly and triply intractable problems March 27th, 2014
2 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers of) interacting objects usually modelling symmetrical interactions. Used widely in statistics, physics and computer science, e.g. image analysis; ferromagnetism; geostatistics; point processes; social networks.
3 Markov random fields Image analysis The log expression of 72 genes on a particular chromosome over 46 hours (Friel et al. 2009).
4 Markov random fields Pairwise Markov random fields
5 Markov random fields Intractable normalising constants Pairwise MRFs correspond to the factorisation f (y θ) γ(y θ) = φ(y i,y j θ). (i,j) Nei(y) We also need to specify the normalising constant Z(θ) = φ(y i,y j θ)dy y (i,j) Nei(y) In general we are interested in models that take the form Gibbs random fields f (y θ) = γ(y θ) Z(θ). f (y θ) = exp(θ T S(y)). Z(θ)
6 A doubly intractable problem Doubly intractable Suppose we want to estimate parameters θ after observing Y = y. Use Bayesian inference to find π(θ y) f (y θ)p(θ). Could use MCMC, but the acceptance probability in MH is { min 1, q(θ θ ) p(θ ) γ(y θ } ) 1 Z(θ) q(θ θ) p(θ) γ(y θ) Z(θ. ) 1
7 A doubly intractable problem Doubly intractable Suppose we want to estimate parameters θ after observing Y = y. Use Bayesian inference to find π(θ y) f (y θ)p(θ). Could use MCMC, but the acceptance probability in MH is { min 1, q(θ θ ) p(θ ) γ(y θ } ) 1 Z(θ) q(θ θ) p(θ) γ(y θ) Z(θ. ) 1
8 A doubly intractable problem ABC-MCMC Approximate an intractable likelihood at θ with: R 1 R π ε (S(x r ) S(y)) r=1 where the x r f (. θ) are R simulations from f (originally in Ratmann et al. (2009)). Often R = 1 and π ε (. S(y)) = U (. (S(y) ε,s(y) + ε)). Essentially a nonparametric kernel estimator to the conditional distribution of the statistics given θ, based on simulations from f. ABC-MCMC is an MCMC algorithm that targets this approximate posterior.
9 A doubly intractable problem ABC-MCMC Approximate an intractable likelihood at θ with: R 1 R π ε (S(x r ) S(y)) r=1 where the x r f (. θ) are R simulations from f (originally in Ratmann et al. (2009)). Often R = 1 and π ε (. S(y)) = U (. (S(y) ε,s(y) + ε)). Essentially a nonparametric kernel estimator to the conditional distribution of the statistics given θ, based on simulations from f. ABC-MCMC is an MCMC algorithm that targets this approximate posterior.
10 A doubly intractable problem ABC on ERGMs True ABC
11 A doubly intractable problem Synthetic likelihood An alternative approximation proposed in Wood (2010). Again take R simulations from f, x r f (. θ), and take the summary statistics of each. But instead use a multivariate normal approximation to the distribution of the summary statistics given θ: L(S(y) θ) = N (S(y) µ θ, Σ ) θ, where µ θ = 1 R R S (x r ), r=1 Σ θ = sst R 1, with s = (S (x 1 ) µ θ,...,s (x R ) µ θ ).
12 A doubly intractable problem The single auxiliary variable (SAV) method Møller et al. (2006) augment the target distribution with an extra variable u and use π(θ,u y) q u (u θ,y)f (y θ)p(θ) where q u is some (normalised) arbitrary distribution and u is on the same space as y. As the MH proposal in (θ,u)-space they use (θ,u ) f (u θ )q(θ θ). This gives an acceptance probability of { min 1, q(θ θ ) p(θ ) γ(y θ ) q u (u θ,y) q(θ θ) p(θ) γ(y θ) γ(u θ ) γ(u θ) q u (u θ,y) }.
13 A doubly intractable problem Exact approximations Note that q u(u θ,y) γ(u θ ) 1 estimator of Z(θ ). is an unbiased importance sampling still targets the correct distribution! first seen in the pseudo-marginal methods of Beaumont (2003) and Andrieu and Roberts (2009). Relies on being able to simulate exactly from f (. θ ), which is usually not possible or computationally expensive. Girolami et al. (2013) introduce an approach that does not require exact simulation ( Russian Roulette ).
14 A doubly intractable problem Exact approximations Note that q u(u θ,y) γ(u θ ) 1 estimator of Z(θ ). is an unbiased importance sampling still targets the correct distribution! first seen in the pseudo-marginal methods of Beaumont (2003) and Andrieu and Roberts (2009). Relies on being able to simulate exactly from f (. θ ), which is usually not possible or computationally expensive. Girolami et al. (2013) introduce an approach that does not require exact simulation ( Russian Roulette ).
15 A doubly intractable problem Exact approximations Note that q u(u θ,y) γ(u θ ) 1 estimator of Z(θ ). is an unbiased importance sampling still targets the correct distribution! first seen in the pseudo-marginal methods of Beaumont (2003) and Andrieu and Roberts (2009). Relies on being able to simulate exactly from f (. θ ), which is usually not possible or computationally expensive. Girolami et al. (2013) introduce an approach that does not require exact simulation ( Russian Roulette ).
16 A doubly intractable problem The exchange algorithm Murray et al. (2006) propose instead to use γ(u θ) γ(u θ ) importance sampling estimator of Z(θ) Z(θ ). This gives an acceptance probability of { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) γ(u θ) γ(u θ ) }. as an An unbiased estimator of the acceptance probability rather than the target, so no longer fits into the exact-approximation framework however, this method still has the correct target; something of a special case... Simpler, and often more efficient, than SAV.
17 A triply intractable problem Estimating the marginal likelihood The marginal likelihood (also known as the evidence) is p(y) = p(θ)f (y θ)dθ. Used in Bayesian model comparison θ p(m y) = p(m)p(y M), most commonly seen in the Bayes factor, for comparing models p(y M 1 ) p(y M 2 ). All commonly used methods require f (y θ) to be tractable in θ, and usually can t be estimated from MCMC output a triply intractable problem - Friel (2013).
18 A triply intractable problem Chib s method (via population exchange) Friel (2013) details an approach that uses Chib s method. For any θ: p(y) = f (y θ)π( θ) π( θ y) = γ(y θ)π( θ) Z( θ)π( θ y). A population variant of the exchange algorithm is used to simulate points from π(θ y). This approach gives an estimate of Z(θ) for each θ drawn from π(θ y). Then Chib s method is used, averaging the identity above over a number of high probability draws from π(θ y), using: the terms in the numerator directly; the estimate of Z(θ) from the population exchange algorithm; a kernel density estimate of π(θ y). Relies on θ being low dimensional.
19 A triply intractable problem Chib s method (via population exchange) Friel (2013) details an approach that uses Chib s method. For any θ: p(y) = f (y θ)π( θ) π( θ y) = γ(y θ)π( θ) Z( θ)π( θ y). A population variant of the exchange algorithm is used to simulate points from π(θ y). This approach gives an estimate of Z(θ) for each θ drawn from π(θ y). Then Chib s method is used, averaging the identity above over a number of high probability draws from π(θ y), using: the terms in the numerator directly; the estimate of Z(θ) from the population exchange algorithm; a kernel density estimate of π(θ y). Relies on θ being low dimensional.
20 A triply intractable problem Chib s method (via population exchange) Friel (2013) details an approach that uses Chib s method. For any θ: p(y) = f (y θ)π( θ) π( θ y) = γ(y θ)π( θ) Z( θ)π( θ y). A population variant of the exchange algorithm is used to simulate points from π(θ y). This approach gives an estimate of Z(θ) for each θ drawn from π(θ y). Then Chib s method is used, averaging the identity above over a number of high probability draws from π(θ y), using: the terms in the numerator directly; the estimate of Z(θ) from the population exchange algorithm; a kernel density estimate of π(θ y). Relies on θ being low dimensional.
21 A triply intractable problem Using importance sampling (IS) Importance sampling Returns a weighted sample {(θ (p),w (p) ) 1 p P} from π(θ y). For p = 1 : P Simulate θ (p) q(.) Weight w (p) = p(θ (p) )f (y θ (p) ). q(θ (p) ) Then p(y) = 1 P P p=1 w (p).
22 A triply intractable problem Using ABC-IS Didelot, Everitt, Johansen and Lawson (2011) investigate the use of the ABC approximation when using IS for marginal likelihoods. The weights are w (p) = p(θ (p) ) 1 R R r=1 π ε (S(x r (p) ) S(y)) q(θ (p) ) } R where { x r (p) f (. θ (p) ). r=1 This method gives p(s(y)) p(y). Didelot et al. (2011), Grelaud et al. (2009), Robert et al. (2011), Marin et al. (2014), discuss the choice of summary statistics.
23 A triply intractable problem Exponential family models Didelot et al. (2011): when comparing two exponential family models, if S 1 (y) is sufficient for the parameters in model 1 S 2 (y) is sufficient for the parameters in model 2 Then using the vector S(y) = (S 1 (y),s 2 (y)) for both models gives p(y M 1 ) p(y M 2 ) = p(s(y) M 1) p(s(y) M 2 ). Marin et al. (2014) has much more general guidance.
24 A triply intractable problem Synthetic likelihood IS We could also use the SL approximation within IS. The weight update is then p(θ (p) )N (S(y) µ θ, Σ ) θ w (p) = q(θ (p), ) where µ θ, Σ { θ are based on x (p) r } R f (. θ (p) ). r=1 Does not require choosing ε, but relies on normality assumption.
25 A triply intractable problem Exact methods? Importance sampling: p(y) = θ 1 P = 1 P f (y θ)p(θ) q(θ)dθ q(θ) P f (y θ (p) )p(θ (p) ) p=1 q(θ (p) ) P γ(y θ (p) )p(θ (p) ) 1 p=1 q(θ (p) ) Z(θ (p) ). Intractable...
26 A triply intractable problem Exact methods? Importance sampling: p(y) = θ 1 P = 1 P f (y θ)p(θ) q(θ)dθ q(θ) P f (y θ (p) )p(θ (p) ) p=1 q(θ (p) ) P γ(y θ (p) )p(θ (p) ) 1 p=1 q(θ (p) ) Z(θ (p) ). Intractable...
27 A triply intractable problem SAV importance sampling Consider the SAV target π(θ,u y) q u (u θ,y)f (y θ)p(θ), noting that it has the same marginal likelihood as π(θ y). Suppose we do importance sampling on this SAV target, and choose the proposal to be q(θ,u) = f (u θ)q(θ). We obtain p(y) = 1 P = 1 P P p=1 P p=1 q u (u θ (p),y)γ(y θ (p) )p(θ (p) ) Z(θ (p) ) γ(u θ (p) )q(θ (p) ) Z(θ (p) ) γ(y θ (p) )p(θ (p) ) q(θ (p) ) q u (u θ (p),y) γ(u θ (p). )
28 A triply intractable problem SAV importance sampling Consider the SAV target π(θ,u y) q u (u θ,y)f (y θ)p(θ), noting that it has the same marginal likelihood as π(θ y). Suppose we do importance sampling on this SAV target, and choose the proposal to be q(θ,u) = f (u θ)q(θ). We obtain p(y) = 1 P = 1 P P p=1 P p=1 q u (u θ (p),y)γ(y θ (p) )p(θ (p) ) Z(θ (p) ) γ(u θ (p) )q(θ (p) ) Z(θ (p) ) γ(y θ (p) )p(θ (p) ) q(θ (p) ) q u (u θ (p),y) γ(u θ (p). )
29 A triply intractable problem Exact approximations revisited Using unbiased weight estimates within importance sampling: (IS) 2 (Tran et al., 2013); random weight particle filters (Fearnhead et al. 2010); (SMC) 2 (Chopin et al. 2011). For each θ, we could use multiple u variables and use the estimate 1 Z(θ) = 1 q u (u (m) θ,y) M γ(u (m). θ) M m=1 For u the proposal is pre-determined, but we need to choose q u (u θ,y). Møller et al. (2006): one possible choice is q u (u θ,y) = γ(u θ)/z( θ) where θ is an ML estimate (or some other appropriate estimate) of θ.
30 A triply intractable problem Exact approximations revisited Using unbiased weight estimates within importance sampling: (IS) 2 (Tran et al., 2013); random weight particle filters (Fearnhead et al. 2010); (SMC) 2 (Chopin et al. 2011). For each θ, we could use multiple u variables and use the estimate 1 Z(θ) = 1 q u (u (m) θ,y) M γ(u (m). θ) M m=1 For u the proposal is pre-determined, but we need to choose q u (u θ,y). Møller et al. (2006): one possible choice is q u (u θ,y) = γ(u θ)/z( θ) where θ is an ML estimate (or some other appropriate estimate) of θ.
31 A triply intractable problem Exact approximations revisited Using unbiased weight estimates within importance sampling: (IS) 2 (Tran et al., 2013); random weight particle filters (Fearnhead et al. 2010); (SMC) 2 (Chopin et al. 2011). For each θ, we could use multiple u variables and use the estimate 1 Z(θ) = 1 q u (u (m) θ,y) M γ(u (m). θ) M m=1 For u the proposal is pre-determined, but we need to choose q u (u θ,y). Møller et al. (2006): one possible choice is q u (u θ,y) = γ(u θ)/z( θ) where θ is an ML estimate (or some other appropriate estimate) of θ.
32 A triply intractable problem SAVIS / MAVIS Using the suggested q u gives the following importance sampling estimate of 1/Z(θ) 1 Z(θ) = 1 Z( θ) 1 M M m=1 γ(u (m) θ) γ(u (m) θ). Or, using annealed importance sampling (Neal, 2001) with the sequence of targets f k (. θ, θ,y) γ k (. θ, θ) = γ(. θ) (K+1 k)/(k+1) +γ(. θ) k/(k+1), we obtain 1 Z(θ) = 1 Z( θ) 1 M M m=1 K k=0 γ k+1 (u (m) k θ,θ,y) γ k (u (m) k θ,θ,y).
33 A triply intractable problem SAVIS / MAVIS Using the suggested q u gives the following importance sampling estimate of 1/Z(θ) 1 Z(θ) = 1 Z( θ) 1 M M m=1 γ(u (m) θ) γ(u (m) θ). Or, using annealed importance sampling (Neal, 2001) with the sequence of targets f k (. θ, θ,y) γ k (. θ, θ) = γ(. θ) (K+1 k)/(k+1) +γ(. θ) k/(k+1), we obtain 1 Z(θ) = 1 Z( θ) 1 M M m=1 K k=0 γ k+1 (u (m) k θ,θ,y) γ k (u (m) k θ,θ,y).
34 A triply intractable problem Toy example: Poisson vs geometric Consider i.i.d. observations {y i } n i=1 of a discrete random variable that takes values in N. We find the Bayes factor for the models 1 Y θ Poisson(θ), θ Exp(1) f 1 ({y i } n i=1 θ) = λ x i exp( λ) i x i! 1 = exp(nλ) λ x i i x i! 2 Y θ Geometric(θ), θ Unif(0,1) f 2 ({y i } n i=1 θ) = p(1 p) x i = 1 p n (1 p) x i. i
35 A triply intractable problem Results: box plots
36 A triply intractable problem Results: ABC-IS
37 A triply intractable problem Results: SL-IS
38 A triply intractable problem Results: MAVIS
39 A triply intractable problem Application to social networks Compare the evidence for two alternative exponential random graph models p(y θ) exp(θ T S(y)). in model 1 S(y) = number of edges in model 2 S(y) = (number of edges, number of two stars) (so now θ is 2-d). Use prior p(θ) = N (0,25I ), as in Friel (2013).
40 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.
41 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.
42 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.
43 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.
44 A triply intractable problem Results: social network Friel (2013) finds that the evidence for model 1 is that for model 2. Using 1000 importance points (with 100 simulations from the likelihood for each point)... ABC: ε = 0.1 gives p(y M 1 )/ p(y M 2 ) 4; ε = 0.05 gives p(y M 1 )/ p(y M 2 ) 20, but has only 5 points with non-zero weight! Synthetic likelihood obtains p(y M 1 )/ p(y M 2 ) 40. MAVIS finds log[ p(y M 1 )] = , log[ p(y M 2 )] = giving p(y M 1 )/ p(y M 2 ) 41.
45 A triply intractable problem Comparison of methods ABC vs MAVIS both require the simulation of auxiliary variables, but in ABC/SL the use of summary statistics dramatically reduces the dimension of the space; but MAVIS only requires the auxiliary variable to look like it is a good simulation from f (. θ), not (the different requirement) that it is a good match to y. Plus the standard drawbacks of ABC remain choice of tolerance ε, S(.); not able to estimate the evidence, only Bayes factors. SL vs ABC SL fails when Gaussian assumption is not appropriate but it is surprisingly robust and there is no need to choose an ε.
46 A triply intractable problem Comparison of methods ABC vs MAVIS both require the simulation of auxiliary variables, but in ABC/SL the use of summary statistics dramatically reduces the dimension of the space; but MAVIS only requires the auxiliary variable to look like it is a good simulation from f (. θ), not (the different requirement) that it is a good match to y. Plus the standard drawbacks of ABC remain choice of tolerance ε, S(.); not able to estimate the evidence, only Bayes factors. SL vs ABC SL fails when Gaussian assumption is not appropriate but it is surprisingly robust and there is no need to choose an ε.
47 A triply intractable problem Comparison of methods ABC vs MAVIS both require the simulation of auxiliary variables, but in ABC/SL the use of summary statistics dramatically reduces the dimension of the space; but MAVIS only requires the auxiliary variable to look like it is a good simulation from f (. θ), not (the different requirement) that it is a good match to y. Plus the standard drawbacks of ABC remain choice of tolerance ε, S(.); not able to estimate the evidence, only Bayes factors. SL vs ABC SL fails when Gaussian assumption is not appropriate but it is surprisingly robust and there is no need to choose an ε.
48 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).
49 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).
50 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).
51 Inexact approximations An inexact approximation MAVIS is exact only if exact sampling from f (. θ) is possible (also applies to ABC and synthetic likelihood); 1/Z( θ) is known. In practice use an internal MCMC to simulate from f (. θ); estimate 1/Z( θ) offline in advance of running the IS. Does the use of an inexact approximation matter? Everitt (2012) shows that the use of an internal MCMC within SAV-MCMC and ABC-MCMC does not result in large errors (adapted from the MCWM proof in Andrieu and Roberts (2009)).
52 Inexact approximations Returning to MCMC In the previous section we started to examine the use of importance samplers with estimated weights in practice, the small bias that is introduced is not as important as the Monte Carlo variance; empirically, a similar observation applies to SMC samplers. Returning to MCMC, we might wonder about the performance of algorithms with estimated acceptance probabilities Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels - Alquier, Friel, Everitt and Boland (2014). Is the MH method with acceptance probability α(θ,θ ) close to the method using α(θ,θ,x ), with x F (. θ,θ )?
53 Inexact approximations Returning to MCMC In the previous section we started to examine the use of importance samplers with estimated weights in practice, the small bias that is introduced is not as important as the Monte Carlo variance; empirically, a similar observation applies to SMC samplers. Returning to MCMC, we might wonder about the performance of algorithms with estimated acceptance probabilities Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels - Alquier, Friel, Everitt and Boland (2014). Is the MH method with acceptance probability α(θ,θ ) close to the method using α(θ,θ,x ), with x F (. θ,θ )?
54 Inexact approximations Motivation: noisy exchange algorithm Use γ(u θ) γ(u θ ), with u f (. θ ), as an importance sampling estimator of Z(θ), giving an acceptance probability of Z(θ ) { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) γ(u θ) γ(u θ ) }. Could be improved by simulating R importance points ur, to give { } min 1, q(θ θ ) p(θ ) γ(y θ R ) 1 γ(ur q(θ θ) p(θ) γ(y θ) R θ) γ(ur θ? ) r=1 However, this no longer gives an exact algorithm: r = 1, exact; 1 < r <, inexact; r =, exact.
55 Inexact approximations Motivation: noisy exchange algorithm Use γ(u θ) γ(u θ ), with u f (. θ ), as an importance sampling estimator of Z(θ), giving an acceptance probability of Z(θ ) { min 1, q(θ θ ) q(θ θ) p(θ ) p(θ) γ(y θ ) γ(y θ) γ(u θ) γ(u θ ) }. Could be improved by simulating R importance points ur, to give { } min 1, q(θ θ ) p(θ ) γ(y θ R ) 1 γ(ur q(θ θ) p(θ) γ(y θ) R θ) γ(ur θ? ) r=1 However, this no longer gives an exact algorithm: r = 1, exact; 1 < r <, inexact; r =, exact.
56 Inexact approximations Noisy MCMC MCMC involves simulating a Markov chain (θ n ) n N with transition kernel P such that π is invariant under P: πp = π. In some situations there is a natural kernel P, s.t. πp = π, but which we cannot draw θ n+1 P(θ n, ) for a fixed θ n. A natural idea is to replace P by an approximation ˆP. Ideally, ˆP is close to P, but generally π ˆP π. This leads to the obvious question: Can we say something on how close the resultant Markov chain with transition kernel ˆP is that resulting from P? Eg, is it possible to upper bound? δ θ0 ˆP n π. It turns out that a useful answer is given by the study of the stability of Markov chains.
57 Inexact approximations Noisy MCMC MCMC involves simulating a Markov chain (θ n ) n N with transition kernel P such that π is invariant under P: πp = π. In some situations there is a natural kernel P, s.t. πp = π, but which we cannot draw θ n+1 P(θ n, ) for a fixed θ n. A natural idea is to replace P by an approximation ˆP. Ideally, ˆP is close to P, but generally π ˆP π. This leads to the obvious question: Can we say something on how close the resultant Markov chain with transition kernel ˆP is that resulting from P? Eg, is it possible to upper bound? δ θ0 ˆP n π. It turns out that a useful answer is given by the study of the stability of Markov chains.
58 Inexact approximations Noisy MCMC Theorem (Mitrophanov (2005), Corollary 3.1) If (H1) the MC with transition kernel P is uniformly ergodic: sup δ θ0 P n π Cρ n θ 0 for some C < and ρ < 1. Then we have, for any n N, for any starting point θ 0, ( ) δ θ0 P n δ ˆPn θ0 λ + Cρλ P 1 ρ ˆP where λ = log(1/c) log(ρ).
59 Inexact approximations Noisy MCMC Corollary Let us assume that (H1) holds. (The Markov chain with transition kernel P is uniformly ergodic), (H2) E x F θ ˆα(θ,θ,x ) α(θ,θ ) δ(θ,θ ). (1) Then we have, for any n N, for any starting point θ 0, ( ) δ θ0 P n δ θ0 ˆP n λ + Cρλ 2sup dθ q(θ θ)δ(θ,θ ) 1 ρ θ where λ = log(1/c) log(ρ).
60 Inexact approximations Noisy MCMC Note: when the upper bound in (1) is bounded: ˆα(θ,θ,x ) α(θ,θ ) δ(θ,θ ) δ <, E x F θ then it results that δ θ0 P n δ θ0 ˆP n δ ( λ + Cρλ 1 ρ Obviously, we expect that â is chosen in such a way that δ 1 and so in this case, δ θ0 P n δ θ0 ˆP n 1 as a consequence. ).
61 Inexact approximations Convergence of noisy exchange Lemma Here we show that the noisy exchange algorithm falls into our theoretical framework. ˆα satisfies (H2) in Lemma 4.2 with E x F θ ˆα(θ,θ,x ) α(θ,θ ) δ(θ,θ ) = 1 q(θ θ )π(θ )γ(y θ ( ) γ(y N q(θ Var ) θ) θ)π(θ)γ(y θ) y f (y θ ) γ(y θ. )
62 Inexact approximations Convergence of noisy exchange Moreover, we can show that. Assuming that the space Θ is bounded, then we have δ(θ,θ ) c2 h c2 πk 4 N, and therefore sup θ 0 Θ δ θ0 P n δ θ0 ˆP n C N where C = C (c π,c h,k ) is explicitly known.
63 Inexact approximations Noisy Langevin algorithms Langevin algorithm (e.g. Welling and Teh, 2011) use the update θ n+1 = θ n + Σ 2 logπ(θ n y) + η, η N(0,Σ). In practice, it is often the case that logπ(θ n ) cannot be computed. Here again, a natural idea is to replace logπ(θ n y) by an approximation or an estimate ˆ y logπ(θ n y). Noisy Langevin algorithm use θ n+1 = θ n + Σ logπ(θ 2 n y) + η, y F ( θ n ). η N(0,Σ), with
64 Inexact approximations A noisy Langevin algorithm for Gibbs random fields logπ(θ y) = logπ(θ) + logγ(y θ) log Therefore logπ(θ y) = logπ(θ) + logf (y θ) π(t)f (y t)dt. = logπ(θ) + [θ T s(y)] logz(θ) = logπ(θ) + s(y) E y f (. θ)[s(y )]. In practice, E y f θ [s(y )] is unavailable. However, it is possible to estimate it via Monte Carlo. Let y = (y 1,...,y N ) be N i.i.d. variables drawn from f (. θ ), we define ˆ y logπ(θ y) = logπ(θ) + s(y) 1 N N i=1 s(y i ). Here, we can also give theoretical guarantees for this algorithm.
65 Inexact approximations A noisy Langevin algorithm for Gibbs random fields logπ(θ y) = logπ(θ) + logγ(y θ) log Therefore logπ(θ y) = logπ(θ) + logf (y θ) π(t)f (y t)dt. = logπ(θ) + [θ T s(y)] logz(θ) = logπ(θ) + s(y) E y f (. θ)[s(y )]. In practice, E y f θ [s(y )] is unavailable. However, it is possible to estimate it via Monte Carlo. Let y = (y 1,...,y N ) be N i.i.d. variables drawn from f (. θ ), we define ˆ y logπ(θ y) = logπ(θ) + s(y) 1 N N i=1 s(y i ). Here, we can also give theoretical guarantees for this algorithm.
66 Inexact approximations Metropolis-adjusted Langevin exchange algorithm Of course, the noisy Langevin diffusion can be used as a proposal and corrected within an exchange algorithm. At each iteration we have draw y f (. θ n ); calculate ˆ y logπ(θ y) = logπ(θ) + s(y) 1 N N i=1 s(y i ); propose θ = θ n + Σ 2 ˆ y logπ(θ n y) + η n, where η n are i.i.d. N (0,Σ); accept with probability α(θ,θ n ) = γ(y θ n )π(θ )q(θ n θ )γ(y θ ) γ(y θ n )π(θ n )q(θ θ n )γ(y θ ). Can also be extended to a noisy version (as we did for the exchange algorithm).
67 Inexact approximations Simulation study 20 datasets were simulated from a first-order Ising model defined on a lattice, with a single interaction parameter θ = 0.4. The normalising constant z(θ) can be calculated exactly for a fine grid of {θ i : i = 1,...,N} values (Friel and Rue, 2007), which can be used to estimate ˆπ(y) = N i=2 (θ i θ i 1 ) 2 ( qθi (y) z(θ i ) π(θ i) + q θ i 1 (y) z(θ i 1 ) π(θ i 1) ), which in turn can be used to estimate of the posterior density at each grid point: π(θ i y) q θ i (y)π(θ i ) z(θ i )ˆπ(y), i = 1,...,n. Here we used a fine grid of 8,000 points in the interval [0,0.8].
68 Inexact approximations Results Bias Exchange Noisy Exch Noisy Lang MALA Exch Noisy MALA
69 Inexact approximations Conclusions A new approach (MAVIS) to evidence estimation for Gibbs random fields makes use of an inexact approximation; has some characteristics to recommend it over ABC and SL in general. Examined in more detail the use of noisy MCMC algorithms seen several examples of where this idea might be useful; an improved Monte Carlo variance may be more important than the biases that are introduced; quite a general framework (e.g. SL can also be seen to be a special case); Similar ideas can be used in SMC.
70 Inexact approximations Acknowledgements Noisy MCMC: Pierre Alquier, Nial Friel and Aiden Boland (UCD). Evidence estimation and synthetic likelihood: Nial Friel (UCD), Adam Johansen (Warwick), Melina Evdemon-Hogan and Ellen Rowing (Reading).
Evidence estimation for Markov random fields: a triply intractable problem
Evidence estimation for Markov random fields: a triply intractable problem January 7th, 2014 Markov random fields Interacting objects Markov random fields (MRFs) are used for modelling (often large numbers
More informationAn ABC interpretation of the multiple auxiliary variable method
School of Mathematical and Physical Sciences Department of Mathematics and Statistics Preprint MPS-2016-07 27 April 2016 An ABC interpretation of the multiple auxiliary variable method by Dennis Prangle
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationBayesian inference for intractable distributions
Bayesian inference for intractable distributions Nial Friel University College Dublin nial.friel@ucd.ie October, 2014 Introduction Main themes: The Bayesian inferential approach has had a profound impact
More informationA Review of Pseudo-Marginal Markov Chain Monte Carlo
A Review of Pseudo-Marginal Markov Chain Monte Carlo Discussed by: Yizhe Zhang October 21, 2016 Outline 1 Overview 2 Paper review 3 experiment 4 conclusion Motivation & overview Notation: θ denotes the
More informationInference in state-space models with multiple paths from conditional SMC
Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationA = {(x, u) : 0 u f(x)},
Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent
More informationSMC 2 : an efficient algorithm for sequential analysis of state-space models
SMC 2 : an efficient algorithm for sequential analysis of state-space models N. CHOPIN 1, P.E. JACOB 2, & O. PAPASPILIOPOULOS 3 1 ENSAE-CREST 2 CREST & Université Paris Dauphine, 3 Universitat Pompeu Fabra
More informationNotes on pseudo-marginal methods, variational Bayes and ABC
Notes on pseudo-marginal methods, variational Bayes and ABC Christian Andersson Naesseth October 3, 2016 The Pseudo-Marginal Framework Assume we are interested in sampling from the posterior distribution
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationKernel adaptive Sequential Monte Carlo
Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36 Section 1 Outline
More informationLikelihood-free MCMC
Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationBayes Factors, posterior predictives, short intro to RJMCMC. Thermodynamic Integration
Bayes Factors, posterior predictives, short intro to RJMCMC Thermodynamic Integration Dave Campbell 2016 Bayesian Statistical Inference P(θ Y ) P(Y θ)π(θ) Once you have posterior samples you can compute
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationKernel Sequential Monte Carlo
Kernel Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) * equal contribution April 25, 2016 1 / 37 Section
More informationAdaptive HMC via the Infinite Exponential Family
Adaptive HMC via the Infinite Exponential Family Arthur Gretton Gatsby Unit, CSML, University College London RegML, 2017 Arthur Gretton (Gatsby Unit, UCL) Adaptive HMC via the Infinite Exponential Family
More informationBayesian Indirect Inference using a Parametric Auxiliary Model
Bayesian Indirect Inference using a Parametric Auxiliary Model Dr Chris Drovandi Queensland University of Technology, Australia c.drovandi@qut.edu.au Collaborators: Tony Pettitt and Anthony Lee February
More informationStatistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling
1 / 27 Statistical Machine Learning Lecture 8: Markov Chain Monte Carlo Sampling Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 27 Monte Carlo Integration The big question : Evaluate E p(z) [f(z)]
More informationCalibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods
Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods Jonas Hallgren 1 1 Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden BFS 2012 June
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationMONTE CARLO METHODS. Hedibert Freitas Lopes
MONTE CARLO METHODS Hedibert Freitas Lopes The University of Chicago Booth School of Business 5807 South Woodlawn Avenue, Chicago, IL 60637 http://faculty.chicagobooth.edu/hedibert.lopes hlopes@chicagobooth.edu
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationBayesian model selection for exponential random graph models via adjusted pseudolikelihoods
Bayesian model selection for exponential random graph models via adjusted pseudolikelihoods Lampros Bouranis *, Nial Friel, Florian Maire School of Mathematics and Statistics & Insight Centre for Data
More informationCS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling
CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling Professor Erik Sudderth Brown University Computer Science October 27, 2016 Some figures and materials courtesy
More informationBayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference
1 The views expressed in this paper are those of the authors and do not necessarily reflect the views of the Federal Reserve Board of Governors or the Federal Reserve System. Bayesian Estimation of DSGE
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationPSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL
PSEUDO-MARGINAL METROPOLIS-HASTINGS APPROACH AND ITS APPLICATION TO BAYESIAN COPULA MODEL Xuebin Zheng Supervisor: Associate Professor Josef Dick Co-Supervisor: Dr. David Gunawan School of Mathematics
More informationZig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul
More informationMarkov Chain Monte Carlo Methods
Markov Chain Monte Carlo Methods John Geweke University of Iowa, USA 2005 Institute on Computational Economics University of Chicago - Argonne National Laboaratories July 22, 2005 The problem p (θ, ω I)
More informationHastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model
UNIVERSITY OF TEXAS AT SAN ANTONIO Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model Liang Jing April 2010 1 1 ABSTRACT In this paper, common MCMC algorithms are introduced
More informationMonte Carlo Inference Methods
Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationTutorial on ABC Algorithms
Tutorial on ABC Algorithms Dr Chris Drovandi Queensland University of Technology, Australia c.drovandi@qut.edu.au July 3, 2014 Notation Model parameter θ with prior π(θ) Likelihood is f(ý θ) with observed
More informationApproximate Inference using MCMC
Approximate Inference using MCMC 9.520 Class 22 Ruslan Salakhutdinov BCS and CSAIL, MIT 1 Plan 1. Introduction/Notation. 2. Examples of successful Bayesian models. 3. Basic Sampling Algorithms. 4. Markov
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationPseudo-marginal MCMC methods for inference in latent variable models
Pseudo-marginal MCMC methods for inference in latent variable models Arnaud Doucet Department of Statistics, Oxford University Joint work with George Deligiannidis (Oxford) & Mike Pitt (Kings) MCQMC, 19/08/2016
More informationExercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters
Exercises Tutorial at ICASSP 216 Learning Nonlinear Dynamical Models Using Particle Filters Andreas Svensson, Johan Dahlin and Thomas B. Schön March 18, 216 Good luck! 1 [Bootstrap particle filter for
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationKernel Adaptive Metropolis-Hastings
Kernel Adaptive Metropolis-Hastings Arthur Gretton,?? Gatsby Unit, CSML, University College London NIPS, December 2015 Arthur Gretton (Gatsby Unit, UCL) Kernel Adaptive Metropolis-Hastings 12/12/2015 1
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationEco517 Fall 2013 C. Sims MCMC. October 8, 2013
Eco517 Fall 2013 C. Sims MCMC October 8, 2013 c 2013 by Christopher A. Sims. This document may be reproduced for educational and research purposes, so long as the copies contain this notice and are retained
More informationAn introduction to Sequential Monte Carlo
An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods
More informationLECTURE 15 Markov chain Monte Carlo
LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More informationApproximate Bayesian Computation: a simulation based approach to inference
Approximate Bayesian Computation: a simulation based approach to inference Richard Wilkinson Simon Tavaré 2 Department of Probability and Statistics University of Sheffield 2 Department of Applied Mathematics
More informationSAMPLING ALGORITHMS. In general. Inference in Bayesian models
SAMPLING ALGORITHMS SAMPLING ALGORITHMS In general A sampling algorithm is an algorithm that outputs samples x 1, x 2,... from a given distribution P or density p. Sampling algorithms can for example be
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationMonte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants
Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants Faming Liang Texas A& University Sooyoung Cheon Korea University Spatial Model Introduction
More informationarxiv: v1 [stat.me] 30 Sep 2009
Model choice versus model criticism arxiv:0909.5673v1 [stat.me] 30 Sep 2009 Christian P. Robert 1,2, Kerrie Mengersen 3, and Carla Chen 3 1 Université Paris Dauphine, 2 CREST-INSEE, Paris, France, and
More informationMarkov chain Monte Carlo
1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationPseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory
Pseudo-arginal Metropolis-Hastings: a siple explanation and (partial) review of theory Chris Sherlock Motivation Iagine a stochastic process V which arises fro soe distribution with density p(v θ ). Iagine
More informationBayesian estimation of complex networks and dynamic choice in the music industry
Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School,
More information(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis
Summarizing a posterior Given the data and prior the posterior is determined Summarizing the posterior gives parameter estimates, intervals, and hypothesis tests Most of these computations are integrals
More informationThe Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)
The Poisson transform for unnormalised statistical models Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB) Part I Unnormalised statistical models Unnormalised statistical models
More informationBridge estimation of the probability density at a point. July 2000, revised September 2003
Bridge estimation of the probability density at a point Antonietta Mira Department of Economics University of Insubria Via Ravasi 2 21100 Varese, Italy antonietta.mira@uninsubria.it Geoff Nicholls Department
More informationMarkov Chain Monte Carlo
1 Motivation 1.1 Bayesian Learning Markov Chain Monte Carlo Yale Chang In Bayesian learning, given data X, we make assumptions on the generative process of X by introducing hidden variables Z: p(z): prior
More informationNormalising constants and maximum likelihood inference
Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising
More informationDelayed Rejection Algorithm to Estimate Bayesian Social Networks
Dublin Institute of Technology ARROW@DIT Articles School of Mathematics 2014 Delayed Rejection Algorithm to Estimate Bayesian Social Networks Alberto Caimo Dublin Institute of Technology, alberto.caimo@dit.ie
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationLearning the hyper-parameters. Luca Martino
Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth
More informationOn some properties of Markov chain Monte Carlo simulation methods based on the particle filter
On some properties of Markov chain Monte Carlo simulation methods based on the particle filter Michael K. Pitt Economics Department University of Warwick m.pitt@warwick.ac.uk Ralph S. Silva School of Economics
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationThe zig-zag and super-efficient sampling for Bayesian analysis of big data
The zig-zag and super-efficient sampling for Bayesian analysis of big data LMS-CRiSM Summer School on Computational Statistics 15th July 2018 Gareth Roberts, University of Warwick Joint work with Joris
More informationSequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007
Sequential Monte Carlo and Particle Filtering Frank Wood Gatsby, November 2007 Importance Sampling Recall: Let s say that we want to compute some expectation (integral) E p [f] = p(x)f(x)dx and we remember
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationEstimating the marginal likelihood with Integrated nested Laplace approximation (INLA)
Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) arxiv:1611.01450v1 [stat.co] 4 Nov 2016 Aliaksandr Hubin Department of Mathematics, University of Oslo and Geir Storvik
More informationBayesian inference for multivariate skew-normal and skew-t distributions
Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 13-28 February 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Limitations of Gibbs sampling. Metropolis-Hastings algorithm. Proof
More informationMarkov Chain Monte Carlo Lecture 4
The local-trap problem refers to that in simulations of a complex system whose energy landscape is rugged, the sampler gets trapped in a local energy minimum indefinitely, rendering the simulation ineffective.
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 2) Fall 2017 1 / 19 Part 2: Markov chain Monte
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationManifold Monte Carlo Methods
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:
More informationAn introduction to Approximate Bayesian Computation methods
An introduction to Approximate Bayesian Computation methods M.E. Castellanos maria.castellanos@urjc.es (from several works with S. Cabras, E. Ruli and O. Ratmann) Valencia, January 28, 2015 Valencia Bayesian
More informationBayesian model selection: methodology, computation and applications
Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationNested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland
Department of Statistics The University of Auckland https://www.stat.auckland.ac.nz/ brewer/ is a Monte Carlo method (not necessarily MCMC) that was introduced by John Skilling in 2004. It is very popular
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationMetropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9
Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods
More informationImproving power posterior estimation of statistical evidence
Improving power posterior estimation of statistical evidence Nial Friel, Merrilee Hurn and Jason Wyse Department of Mathematical Sciences, University of Bath, UK 10 June 2013 Bayesian Model Choice Possible
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors. RicardoS.Ehlers
ComputationalToolsforComparing AsymmetricGARCHModelsviaBayes Factors RicardoS.Ehlers Laboratório de Estatística e Geoinformação- UFPR http://leg.ufpr.br/ ehlers ehlers@leg.ufpr.br II Workshop on Statistical
More informationFully Bayesian Analysis of Calibration Uncertainty In High Energy Spectral Analysis
In High Energy Spectral Analysis Department of Statistics, UCI February 26, 2013 Model Building Principle Component Analysis Three Inferencial Models Simulation Quasar Analysis Doubly-intractable Distribution
More informationarxiv: v5 [stat.co] 10 Apr 2018
THE BLOCK-POISSON ESTIMATOR FOR OPTIMALLY TUNED EXACT SUBSAMPLING MCMC MATIAS QUIROZ 1,2, MINH-NGOC TRAN 3, MATTIAS VILLANI 4, ROBERT KOHN 1 AND KHUE-DUNG DANG 1 arxiv:1603.08232v5 [stat.co] 10 Apr 2018
More informationarxiv: v1 [stat.co] 1 Jun 2015
arxiv:1506.00570v1 [stat.co] 1 Jun 2015 Towards automatic calibration of the number of state particles within the SMC 2 algorithm N. Chopin J. Ridgway M. Gerber O. Papaspiliopoulos CREST-ENSAE, Malakoff,
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More informationCS281A/Stat241A Lecture 22
CS281A/Stat241A Lecture 22 p. 1/4 CS281A/Stat241A Lecture 22 Monte Carlo Methods Peter Bartlett CS281A/Stat241A Lecture 22 p. 2/4 Key ideas of this lecture Sampling in Bayesian methods: Predictive distribution
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More information