Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants

Size: px

Start display at page:

Download "Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants"

Buddy Welch
5 years ago
Views:

1 Monte Carlo Dynamically Weighted Importance Sampling for Spatial Models with Intractable Normalizing Constants Faming Liang Texas A& University Sooyoung Cheon Korea University

2 Spatial Model Introduction Spatial models, e.g., autologistic model, Potts model, and autonormal model, have been used in modeling of many scientific problems: Image analysis (Hurn et al. 2003) Disease mapping (Green and Richardson, 2002) genetic analysis (Francois et al., 2006) A major problem with the models is that the normalizing constant is intractable!

3 Spatial Model Introduction The Problem Suppose we have a data X generated from a statistical model with the likelihood function f(x θ) = p(x, θ), x X, θ Θ, (1) Z(θ) where θ is the parameter, and Z(θ) is the normalizing constant which depends on θ and is not available in closed form. Let π(θ) denote the prior density of θ. The posterior distribution of θ given X = x is then given by π(θ x) = 1 p(x, θ)π(θ). (2) Z(θ)

4 Spatial Model Introduction Difficulty: The Metropolis-Hastings algorithm cannot be directly applied to simulate from π(θ x), because the acceptance probability would involve the unknown ratio Z(θ)/Z(θ ), where θ denotes the proposed value. The Metropolis-Hastings ratio is given by r = Z(θ) Z(θ ) p(x, θ )π(θ ) p(x, θ)π(θ) T (θ θ) T (θ θ ).

5 Spatial Model Introduction Existing approaches to the problem: The likelihood approximation-based methods: Pseudo-likelihood (Besag, 1974) MCMLE (Geyer and Thompson, 1992) Stochastic approximation Monte Carlo (Liang, 2007; Liang et al., 2007) Auxiliary variable MCMC methods: Møller et al. s algorithm (2006) Exchange Algorithm (Murray et al., 2006) Double MH algorithm (Liang, 2009)

6 Spatial Model Introduction Algorithm Summary: In MCDWIS, the state space of the Markov chain is augmented to a population, a collection of weighted samples (θ, w) = {θ 1, w 1 ;... ; θ n, w n }, where n is called the population size, and (θ i, w i ) is called an individual state of the population. Given the current population (θ t, w t ), an iteration of the MCDWIS involves two steps: 1. Monte Carlo Dynamic weighting (MCDW): Update each individual state of the current population by a MCDW transition. 2. Population control: Split or replicate the individual states with large weights and discard the individual states with small weights. The MCDWIS removes the need of perfect sampling, and thus can be applied to many statistical models for which perfect sampling is not available or very expensive.

7 Spatial Model Introduction (θt,wt) (θ t,w t) (θt+1,wt+1) MCDW Population control Wup,t enriched Wlow,t survived? pruned? N t+1 [Nmin, N max] Figure 1: A diagram of the MCDWIS algorithm.

8 Dynamically Weighted Importance Sampling Theory Let g t (θ, w) denote the joint density of (θ, w), an individual state of (θ t, w t ), and let ψ(θ) denote the target distribution. Dynamically weighted importance sampling differs from conventional importance sampling in that for any given θ, the weight w is a random variable instead of a constant defined as the ratio of the true and trial densities at θ. Definition 0.1 The distribution g t (θ, w) defined on Θ (0, ) is called correctly weighted with respect to ψ(θ) if the following conditions hold, wg t (θ, w)dw = c tθ ψ(θ), (3) c tθψ(θ)dθ A c = ψ(θ)dθ, (4) Θ tθψ(θ)dθ A where A is any Borel set, A Θ.

9 Dynamically Weighted Importance Sampling Theory Definition 0.2 If g t (θ, w) is correctly weighted with respect to ψ(θ), and the samples (θ t,i, w t,i ) are simulated from g t (θ, w) for i = 1, 2,..., n t, then (θ t, w t ) = (θ t,1, w t,1 ; ; θ t,nt, w t,nt ) is called a correctly weighted population with respect to ψ(θ). Let (θ t, w t ) be a correctly weighted population with respect to ψ(θ), and let θ 1,..., θ m be distinct states in θ t. Generate a random variable/vector ϑ such that P {ϑ = θ i} = nt where I( ) is an indicator function. j=1 w ji(θ j = θ i) nt j=1 w, i = 1, 2,..., m, (5) j Theorem 0.1 As the population size n t, the random variable ϑ generated in (5) converges in distribution to a random variable θ which is distributed with the pdf ψ(θ).

10 Dynamically Weighted Importance Sampling Theory Let (θ 1, w 1 ),..., (θ N, w N ) be a series of correctly weighted populations generated by a DWIS algorithm with respect to ψ(θ). Then the quantity µ = E ψ ρ(θ), assuming existence, can be estimated by nt N t=1 i=1 µ = w t,iρ(θ t,i ) N nt t=1 i=1 w t,i which is consistent and asymptotically normally distributed., (6) Definition 0.3 A transition rule for a population (θ, w) is said to be invariant with respect to the dynamic importance weights (IDIW) if the joint density of (θ, w) remains correctly weighted whenever the initial joint density is correctly weighted.

11 MCDWIS Methodology Monte Carlo Dynamic Weighting Sampler 1. Draw θ from some proposal distribution T (θ θ). 2. Simulate auxiliary samples y 1,..., y m from f(y θ ) using a MCMC algorithm, say, the MH algorithm. Estimate the normalizing constant ratio R t (θ, θ ) = Z(θ)/Z(θ ) by R t (θ, θ ) = 1 m m i=1 p(y i, θ) p(y i, θ ), (7) which is also known as the importance sampling (IS) estimator of R t (θ, θ ). 3. Calculate the Monte Carlo dynamic weighting ratio r d = r d (θ, θ, w) = w R t (θ, θ ) p(x, θ ) p(x, θ) T (θ θ ) T (θ θ).

12 MCDWIS Methodology 4. Choose β t = β t (θ t, w t ) 0 and draw U unif(0, 1). Update (θ, w) as (θ, w ) { (θ, w (θ, r d /a), if U a, ) = (θ, w/(1 a)), otherwise, where a = r d /(r d + β t ); β t is a function of (θ t, w t ), but remains a constant for each individual state of the same population.

13 MCDWIS Methodology Theorem 0.2 The Monte Carlo dynamic weighting sampler is IDIW; that is, if the joint distribution g t (θ, w) for (θ t, w t ) is correctly weighted with respect to π(θ x), after one Monte Carlo dynamic weighting step, the new joint density g t+1 (θ, w ) for (θ t+1, w t+1 ) is also correctly weighted with respect to π(θ x). Remarks: R t (θ, θ ) is an unbiased estimator of R t (θ, θ ). To avoid an extremely large weight caused by a nearly zero divisor, both Θ and X iare assumed to be compact. Then, there exists a constant r 0 such that for any pair (θ, θ ) Θ Θ, r 0 R t (θ, θ ) p(x, θ ) p(x, θ) T (θ θ ) T (θ θ) 1 r 0. (8)

14 2. (Pruned) If w t,i < W low,t, prune the state with probability q = 1 w t,i /W low,t. If it is pruned, drop (θ t,i, w t,i ) from (θ t, w t ); otherwise, update (θ t,i, w t,i ) as (θ t,i, W low,t ) and set n t = n t + 1. MCDWIS Methodology A Population Control Scheme Let (θ t,i, w t,i ) be the ith individual state of the population, let n t and n t denote the current and new population sizes, let W low,t and W up,t denote the lower and upper weight control bounds, let n min and n max denote the minimum and maximum population size allowed by the user, and let n low and n up denote the lower and upper reference bound of the population size. 1. (Initialization) Initialize the parameters W low,t and W up,t by W low,t = n t i=1 w t,i /n up, W up,t = n t i=1 w t,i /n low. Set n t = 0 and λ > 1. Do steps 2 4 for i = 1, 2,, n t.

15 MCDWIS Methodology 3. (Enriched) If w t,i > W up,t, set d = [w t,i /W up,t + 1], w t,i = w t,i /d, replace (θ t,i, w t,i ) by d identical states (θ t,i, w t,i), and set n t = n t + d, where [z] denotes the integer part of z. 4. (Unchanged) If W low,t w t,i W up,t, keep (θ t,i, w t,i ) unchanged, and set n t = n t (Checking) If n t > n max, set W low,t λw low,t, W up,t λw up,t and n t = 0, do step 2 4 again for i = 1, 2,, n t. If n t < n min, set W low,t W low,t /λ, W up,t W up,t /λ and n t = 0, do step 2 4 again for i = 1, 2,, n t. Otherwise, stop. In this scheme, λ is required to be greater than 1, and n low, n up, n min and n max are required to satisfy the constraint n min < n low < n up < n max. With the APEPCS, the population size is strictly controlled to the range [n min, n max ], and the weights are adjusted to the range [W low,t, W up,t ]. Therefore, the APEPCS avoids the possible overflow or extinction of a population in simulations.

16 MCDWIS Methodology Theorem 0.3 The APEPCS is IDIW; that is, if the joint distribution g t (θ, w) for (θ t, w t ) is correctly weighted with respect to π(θ x), then after one run of the scheme, the new joint distribution g t+1 (θ, w ) for (θ t+1, w t+1 ) is also correctly with respect to π(θ x).

17 MCDWIS Methodology A Monte Carlo Dynamically Weighted Importance Sampler Let W c denote a dynamic weighting move switching parameter, which switches the value of β t between 0 and 1 depending on the value of W up,t. (Move type setting) If W up,t W c, then set β t = 1. Otherwise, set β t = 0. (MCDW) Apply the Monte Carlo dynamic weighting move to the population (θ t, w t ). The new population is denoted by (θ t+1, w t+1). (Population Control) Apply APEPCS to (θ t+1, w t+1). population is denoted by (θ t+1, w t+1 ). The new

18 MCDWIS Methodology Let (θ 1, w 1 ),..., (θ N, w N ) denote a series of populations generated by MCDWIS. Then, according to (6), the quantity µ = E π ρ(θ) can be estimated by µ = N t=n 0 +1 nt i=1 w t,iρ(θ t,i ) N nt t=n 0 i=1 w, (9) t,i where N 0 denotes the number of burn-in iterations.

19 MCDWIS Methodology Weight Behavior Analysis Lemma 0.1 Let f(x θ) = p(x, θ)/z(θ) denote the likelihood function of x, let π(θ) denote the prior distribution of θ, and let T ( ) denote a proposal distribution of θ. Define p(θ, θ x) = p(x, θ)π(θ)t (θ θ), and r(θ, θ ) = R(θ, θ )p(θ, θ x)/p(θ, θ x) to be a Monte Carlo MH ratio, where R(θ, θ ) denotes an unbiased estimator of Z(θ)/Z(θ ). Then e 0 = E log r(θ, θ ) 0, where the expectation is taken with respect to the joint density ϕ( R) p(θ, θ x)/z(θ).

20 MCDWIS Methodology The weight process of MCDWIS can be characterized by the following process: Z t = { Z t 1 + log r(θ t 1, θ t ) log(d t ), if Z t 1 > 0, 0, if Z t 1 < 0, (10) Theorem 0.4 Under mild conditions, the MCDWIS almost surely has finite moments of any order.

21 Spatial Autologistic Models Numerical Results Let x = {x i : i D} denote the observed binary data, where x i is called a spin and D is the set of indices of the spins. f(x θ) = 1 Z(θ) exp θ a i D x i + θ b 2 ( x i i D j n(i) ) x j, (θ a, θ b ) Θ, (11) where θ = (θ a, θ b ), the parameter θ a determines the overall proportion of x i = +1, the parameter θ b determines the intensity of interaction between x i and its neighbors, and Z(θ) is the intractable normalizing constant defined by Z(θ) = for all possible x The prior is specified by exp θ a j D x j + θ b 2 ( x i i D (θ a, θ b ) Θ = [ 1, 1] [0, 1]. j n(i) ) x j.

22 Spatial Autologistic Models Numerical Results Estimates comparison: MCDWIS estimate: ( θ a, θ b ) = ( , ) with the standard error ( , ). Exchange algorithm: ( θ a, θ b ) = ( , ) with the standard error ( , ). Contour Monte Carlo: ( , ) (Liang, 2007) Stochastic Approximation Monte Carlo: ( , ) (Liang et al., 2007) Monte Carlo MLE: ( 0.304, 0.117) (Sherman et al., 2006)

23 Spatial Autologistic Models Numerical Results True Observations Fitted mortality rate Figure 2: US cancer mortality data. (a) The mortality map of liver and gallbladder cancers (including bile ducts) for white males during the decade Black squares denote counties of high cancer mortality rate, and white squares denote counties of low cancer mortality rate. (b) Fitted cancer mortality rates by the autologistic model with the parameters being replaced by its approximate Bayesian estimates. The cancer mortality rate of each county is represented by the gray level of the corresponding square.

24 Spatial Autologistic Models Numerical Results (a) (b) (c) population size theta log(wup) iteration iteration iteration Figure 3: Simulation results of the MCDWIS for the U.S. Cancer Mortality example: (a) time plot of population size; (b) time plot of β t ; and (c) time plot of log(w up,t ). The dotted line in plot (c) shows the value of log(w c ).

25 Spatial Autologistic Models Numerical Results MCDWIS Exchange algorithm MPLE (θ a, θ b ) θa θb T θa θb T θa θb (0,0.1) (0,0.2) (0,0.3) (0,0.4) (0.1,0.1) (0.3,0.3) (0.5,0.5) (.0025) (.0019) 5.8 (.0024) (.0018) 1.2 (.0024) (.0019) (.0021) (.0019) 5.8 (.0020) (.0019) 2.8 (.0022) (.0022) (.0013) (.0017) 5.8 (.0014) (.0017) 7.9 (.0016) (.0022) (.0012) (.0020) 5.8 (.0005) (.0012) (.0012) (.0020) (.0025) (.0023) 5.8 (.0025) (.0022) 1.1 (.0025) (.0023) (.0105) (.0045) 5.8 (.0097) (.0043) 3.5 (.0102) (.0046) (.0347) (.0122) 5.8 (.0393) (.0123)

26 MCDWIS Discussion Unlike other auxiliary variable MCMC algorithms, MCDWIS removes the need of perfect sampling, and thus can be applied to a wide range of problems for which perfect sampling is not available or very expensive. The MCDWIS allows for the use of Monte Carlo estimates in MCMC simulations, while still leaving the target distribution invariant under the criterion of dynamically weighted importance sampling. The MCDWIS can potentially be used to Bayesian inference for the missing data problems, where it often involves simulating from a posterior distribution with intractable integrals.

A = {(x, u) : 0 u f(x)},

A = {(x, u) : 0 u f(x)}, Draw x uniformly from the region {x : f(x) u }. Markov Chain Monte Carlo Lecture 5 Slice sampler: Suppose that one is interested in sampling from a density f(x), x X. Recall that sampling x f(x) is equivalent