Randomized Quasi-Monte Carlo for MCMC

Size: px

Start display at page:

Download "Randomized Quasi-Monte Carlo for MCMC"

Mitchell Walton
6 years ago
Views:

1 Randomized Quasi-Monte Carlo for MCMC Radu Craiu 1 Christiane Lemieux 2 1 Department of Statistics, Toronto 2 Department of Statistics, Waterloo Third Workshop on Monte Carlo Methods Harvard, May 2007 Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

2 Outline 1 Local Search/Optimization Algorithms: MTM 2 Multiple-Try Metropolis and variations Multi-distributed MTM Multiple-Correlated-Try Metropolis 3 Randomized Quasi-Monte Carlo Antithetic or Stratified Sampling? QMC and Randomized QMC: Discussion and Examples Latin Hypercube Sampling Korobov Rule Beyond uniformity 4 Examples 5 Bonus Algorithm - Multi-Annealed Metropolis Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

3 Local Search and Optimization Algorithms (LSOA) Search locally a region of the state space to sample from. Adapt locally to improve the efficiency of the algorithm. Local Search and Optimization Algorithms (LSOA): Multiple-Try Metropolis (Liu, Liang and Wong, JASA 2000), Hit-and-Run (Chen and Schmeiser, JCGS 1993), Adaptive Direction Sampling (Gilks, Roberts and George, Stat. 1994), Delayed rejection (Green and Mira, Biomka, 2001), Directional Metropolis (Eidsvik and Tjelmeland, Statist. & Comput. 2006). Idea: Produce a more structured search using stratified sampling. Today s Topic: Discussion of RQMC implemented within LSOA. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

4 Original MTM (Liu, Liang and Wong, JASA 2000) Suppose T is a proposal distribution such that T(x y) > 0 T(y x) > 0 and λ(x, y) is a symmetric function. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

5 Original MTM (Liu, Liang and Wong, JASA 2000) Suppose T is a proposal distribution such that T(x y) > 0 T(y x) > 0 and λ(x, y) is a symmetric function. (i) Draw K independent trial proposals y 1,...,y K from T( x t ). Sample one with p i w(y i x t ) = π(y i )T(x t y i )λ(x t, y i ). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

6 Original MTM (Liu, Liang and Wong, JASA 2000) Suppose T is a proposal distribution such that T(x y) > 0 T(y x) > 0 and λ(x, y) is a symmetric function. (i) Draw K independent trial proposals y 1,...,y K from T( x t ). Sample one with p i w(y i x t ) = π(y i )T(x t y i )λ(x t, y i ). (ii) Generate x 1,..., x k 1 T( y) and put x k = x t. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

7 Original MTM (Liu, Liang and Wong, JASA 2000) Suppose T is a proposal distribution such that T(x y) > 0 T(y x) > 0 and λ(x, y) is a symmetric function. (i) Draw K independent trial proposals y 1,...,y K from T( x t ). Sample one with p i w(y i x t ) = π(y i )T(x t y i )λ(x t, y i ). (ii) Generate x 1,..., x k 1 T( y) and put x k = x t. { (iii) Accept y with probability min 1, P K i=1 w(y i x t ) P K i=1 w(x i y) }. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

8 Original MTM (Liu, Liang and Wong, JASA 2000) Suppose T is a proposal distribution such that T(x y) > 0 T(y x) > 0 and λ(x, y) is a symmetric function. (i) Draw K independent trial proposals y 1,...,y K from T( x t ). Sample one with p i w(y i x t ) = π(y i )T(x t y i )λ(x t, y i ). (ii) Generate x 1,..., x k 1 T( y) and put x k = x t. { (iii) Accept y with probability min 1, P K i=1 w(y i x t ) P K i=1 w(x i y) Do we better explore the sample space with K proposals? }. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

9 Original MTM (Liu, Liang and Wong, JASA 2000) Suppose T is a proposal distribution such that T(x y) > 0 T(y x) > 0 and λ(x, y) is a symmetric function. (i) Draw K independent trial proposals y 1,...,y K from T( x t ). Sample one with p i w(y i x t ) = π(y i )T(x t y i )λ(x t, y i ). (ii) Generate x 1,..., x k 1 T( y) and put x k = x t. { (iii) Accept y with probability min 1, P K i=1 w(y i x t ) P K i=1 w(x i y) Do we better explore the sample space with K proposals? As long as we take advantage of the flexibility provided by the MTM. }. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

10 Multi-distributed MTM (MD-MTM) The proposals do not have to be identically distributed: y j T j ( x) for 1 j k. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

11 Multi-distributed MTM (MD-MTM) The proposals do not have to be identically distributed: y j T j ( x) for 1 j k. If y = y j0 is selected than put xj 0 = x t and sample xj T j ( y). May accomodate different sampling regimes. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

12 Multi-distributed MTM (MD-MTM) The proposals do not have to be identically distributed: y j T j ( x) for 1 j k. If y = y j0 is selected than put xj 0 = x t and sample xj T j ( y). May accomodate different sampling regimes. π(x) = 0.3N(( 20, 20) T,Σ 1 )+0.4N((0, 0) T,Σ 2 )+0.3N((20, 20) T,Σ 3 ), where Σ 1 = diag(0.5, 0.5), Σ 2 = diag(2, 2) and Σ 3 = diag(0.4, 0.4). Use MD-MTM for a random walk Metropolis with Gaussian proposal and variances σ {5, 50, 100, 150, 200}. Compare with MTM for a random walk Metropolis constructed with only one of the σ s. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

13 MD MTM sigma= Acc. Rate = Acc. Rate = sigma=20 sigma= Acc. Rate = Acc. Rate = Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

14 Multiple-Correlated-Try Metropolis (MCTM) The proposals do not have to be independent. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

15 Multiple-Correlated-Try Metropolis (MCTM) The proposals do not have to be independent. MCTM(I) Suppose we sample K trial proposals y 1,...,y K from T(y 1,..., y K x t ) where T(y 1,..., y K x t )dy [ i] = T(y i x t ), 1 i K. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

16 Multiple-Correlated-Try Metropolis (MCTM) The proposals do not have to be independent. MCTM(I) Suppose we sample K trial proposals y 1,...,y K from T(y 1,..., y K x t ) where T(y 1,..., y K x t )dy [ i] = T(y i x t ), 1 i K. The algorithm proceeds as in the independent case with one exception. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

17 Multiple-Correlated-Try Metropolis (MCTM) The proposals do not have to be independent. MCTM(I) Suppose we sample K trial proposals y 1,...,y K from T(y 1,..., y K x t ) where T(y 1,..., y K x t )dy [ i] = T(y i x t ), 1 i K. The algorithm proceeds as in the independent case with one exception. MCTM (II) Draw (x1,..., x K 1 ) variates from the conditional transition kernel T(x 1,..., x K 1 y, x K = x t ) and let xk = x t. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

18 Multiple-Correlated-Try Metropolis (MCTM) The proposals do not have to be independent. MCTM(I) Suppose we sample K trial proposals y 1,...,y K from T(y 1,..., y K x t ) where T(y 1,..., y K x t )dy [ i] = T(y i x t ), 1 i K. The algorithm proceeds as in the independent case with one exception. MCTM (II) Draw (x1,..., x K 1 ) variates from the conditional transition kernel T(x 1,..., x K 1 y, x K = x t ) and let xk = x t. We have freedom in choosing T as long as we can perform the conditional sampling step. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

19 Antithetic or Stratified Sampling? Y j = (Y j1,...,y jd ) R d for 1 j K. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

20 Antithetic or Stratified Sampling? Y j = (Y j1,...,y jd ) R d for 1 j K. Say K = 2, Var(Y) = Σ = diag(σ1 2,...,σ2 d ), Cov(Y 1h, Y 2h ) = σh 2ρ h. Then [ [ d ] E Y 1 Y 2 2] d = E (Y 1h Y 2h ) 2 = σh 2 (1 ρ h). h=1 h=1 Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

21 Antithetic or Stratified Sampling? Y j = (Y j1,...,y jd ) R d for 1 j K. Say K = 2, Var(Y) = Σ = diag(σ1 2,...,σ2 d ), Cov(Y 1h, Y 2h ) = σh 2ρ h. Then [ [ d ] E Y 1 Y 2 2] d = E (Y 1h Y 2h ) 2 = σh 2 (1 ρ h). h=1 One possible choice is ρ h = 1/(K 1) (covariance matrix needs to be positive definite!). The optimal choice of ρ h depends on σ h, 1 h d. h=1 Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

22 Antithetic or Stratified Sampling? Y j = (Y j1,...,y jd ) R d for 1 j K. Say K = 2, Var(Y) = Σ = diag(σ1 2,...,σ2 d ), Cov(Y 1h, Y 2h ) = σh 2ρ h. Then [ [ d ] E Y 1 Y 2 2] d = E (Y 1h Y 2h ) 2 = σh 2 (1 ρ h). h=1 One possible choice is ρ h = 1/(K 1) (covariance matrix needs to be positive definite!). The optimal choice of ρ h depends on σ h, 1 h d. Conditional sampling needed for MCTM is straightforward in the case of Gaussian proposals. h=1 Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

23 Antithetic or Stratified Sampling? Y j = (Y j1,...,y jd ) R d for 1 j K. Say K = 2, Var(Y) = Σ = diag(σ1 2,...,σ2 d ), Cov(Y 1h, Y 2h ) = σh 2ρ h. Then [ [ d ] E Y 1 Y 2 2] d = E (Y 1h Y 2h ) 2 = σh 2 (1 ρ h). h=1 One possible choice is ρ h = 1/(K 1) (covariance matrix needs to be positive definite!). The optimal choice of ρ h depends on σ h, 1 h d. Conditional sampling needed for MCTM is straightforward in the case of Gaussian proposals. If Y j = ψ(x t, U j ) with ψ increasing in U j then we need to use U s which are negatively associated (NA - Craiu and Meng, AnnStat. 2005). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27 h=1

24 Quasi-Monte Carlo (QMC) QMC is a de-randomized MC; the samples are constructed deterministically. QMC: highly uniform and stratified sampling in the unit hypercube. Interest: high-uniformity, a.k.a. low discrepancy. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

25 Quasi-Monte Carlo (QMC) QMC is a de-randomized MC; the samples are constructed deterministically. QMC: highly uniform and stratified sampling in the unit hypercube. Interest: high-uniformity, a.k.a. low discrepancy. For v (0, 1) d, B(v) = {u (0, 1) d : 0 < u j v j, 1 j d}. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

26 Quasi-Monte Carlo (QMC) QMC is a de-randomized MC; the samples are constructed deterministically. QMC: highly uniform and stratified sampling in the unit hypercube. Interest: high-uniformity, a.k.a. low discrepancy. For v (0, 1) d, B(v) = {u (0, 1) d : 0 < u j v j, 1 j d}. For a point set S n define α(s n, v) = card{s n B(v)}. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

27 Quasi-Monte Carlo (QMC) QMC is a de-randomized MC; the samples are constructed deterministically. QMC: highly uniform and stratified sampling in the unit hypercube. Interest: high-uniformity, a.k.a. low discrepancy. For v (0, 1) d, B(v) = {u (0, 1) d : 0 < u j v j, 1 j d}. For a point set S n define α(s n, v) = card{s n B(v)}. Star-discrepancy D n (S n) = sup v (0,1) d v 1... v d α(s n, v)/n. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

28 Randomized QMC (RQMC) Efficiency can be connected to discrepancy. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

29 Randomized QMC (RQMC) Efficiency can be connected to discrepancy. Koksma-Hlawka Theorem µ 1 f(u) n u S n D n V(f). Both V(f) and D are difficult to compute in general. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

30 Randomized QMC (RQMC) Efficiency can be connected to discrepancy. Koksma-Hlawka Theorem µ 1 f(u) n u S n D n V(f). Both V(f) and D are difficult to compute in general. Randomized versions of QMC (RQMC) algorithms are used to establish error estimates via Monte Carlo simulation. (Art Owen, Stanford; Christiane Lemieux, Waterloo; Pierre L Ecuyer, Montréal). Randomization is performed so that it preserves the low-discrepancy of the set point. Makes QMC set points suitable for MC as marginals are uniform. RQMC and MCMC (Owen and Tribble, PNAS 05; Lemieux and Sidorsky, Math. & Comp. Model., 2006). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

31 Latin Hypercube Sampling (LHS) Simultaneous stratification of the univariate marginals if U Uniform(0, 1) d. Construction: For 1 i K, 1 j d: Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

32 Latin Hypercube Sampling (LHS) Simultaneous stratification of the univariate marginals if U Uniform(0, 1) d. Construction: For 1 i K, 1 j d: i) V ij Uniform(0, 1) are i.i.d Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

33 Latin Hypercube Sampling (LHS) Simultaneous stratification of the univariate marginals if U Uniform(0, 1) d. Construction: For 1 i K, 1 j d: i) V ij Uniform(0, 1) are i.i.d ii) {τ j, 1 j d} are independent permutations of {0,..., K 1} Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

34 Latin Hypercube Sampling (LHS) Simultaneous stratification of the univariate marginals if U Uniform(0, 1) d. Construction: For 1 i K, 1 j d: i) V ij Uniform(0, 1) are i.i.d ii) {τ j, 1 j d} are independent permutations of {0,..., K 1} U ij = τ j(i 1) + V ij. K Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

35 Latin Hypercube Sampling (LHS) Simultaneous stratification of the univariate marginals if U Uniform(0, 1) d. Construction: For 1 i K, 1 j d: i) V ij Uniform(0, 1) are i.i.d ii) {τ j, 1 j d} are independent permutations of {0,..., K 1} U ij = τ j(i 1) + V ij. K U (i) is the ith row of the matrix U. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

36 Latin Hypercube Sampling (LHS) Simultaneous stratification of the univariate marginals if U Uniform(0, 1) d. Construction: For 1 i K, 1 j d: i) V ij Uniform(0, 1) are i.i.d ii) {τ j, 1 j d} are independent permutations of {0,..., K 1} U ij = τ j(i 1) + V ij. K U (i) is the ith row of the matrix U. Given U (i) = (U i1,...,u id ) (0, 1) d, 1 i K we can use: a) For 1 j d fixed: {U 1j,...,U (Kj) } as a set of NA Uniform(0, 1) r.v. s. b) (U (1),..., U (K) ) as a stratified sample of (0, 1) d. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

37 LHS and Gaussian sampling: K = 64, d = 2 Independent Bivariate normal y x LHS Bivariate normal y x Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

38 LHS in MTM Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

39 LHS in MTM V i U(0, 1), U i = i+v i K, Y i = F 1 X t (U i ), 0 i K 1 (no permutation is needed!). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

40 LHS in MTM V i U(0, 1), U i = i+v i K, Y i = F 1 X t (U i ), 0 i K 1 (no permutation is needed!). If Y = Y i0 is selected: Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

41 LHS in MTM V i U(0, 1), U i = i+v i K, Y i = F 1 X t (U i ), 0 i K 1 (no permutation is needed!). If Y = Y i0 is selected: Intuitively, if Y has been generated using a uniform from the interval, ( i 0 K, i 0+1 K ), then we want to sample x 2,..., x k using all the K 1 uniforms which ARE NOT IN ( i 0 K, i 0+1 K ). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

42 LHS in MTM V i U(0, 1), U i = i+v i K, Y i = F 1 X t (U i ), 0 i K 1 (no permutation is needed!). If Y = Y i0 is selected: Intuitively, if Y has been generated using a uniform from the interval, ( i 0 K, i 0+1 K ), then we want to sample x 2,..., x k using all the K 1 uniforms which ARE NOT IN ( i 0 K, i 0+1 K ). i) Put j 0 = [KF y (X t )] (where [ ] is integer part) and construct a random permutation τ such that τ(j 0 ) = i 0. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

43 LHS in MTM V i U(0, 1), U i = i+v i K, Y i = F 1 X t (U i ), 0 i K 1 (no permutation is needed!). If Y = Y i0 is selected: Intuitively, if Y has been generated using a uniform from the interval, ( i 0 K, i 0+1 K ), then we want to sample x 2,..., x k using all the K 1 uniforms which ARE NOT IN ( i 0 K, i 0+1 K ). i) Put j 0 = [KF y (X t )] (where [ ] is integer part) and construct a random permutation τ such that τ(j 0 ) = i 0. ii) Put xj 0 = X t, and for i j 0 Ui = τ(i)+w i K and xi = Fy 1 (Ui ), with W i Uniform(0, 1) independent from the V s. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

44 LHS in MTM V i U(0, 1), U i = i+v i K, Y i = F 1 X t (U i ), 0 i K 1 (no permutation is needed!). If Y = Y i0 is selected: Intuitively, if Y has been generated using a uniform from the interval, ( i 0 K, i 0+1 K ), then we want to sample x 2,..., x k using all the K 1 uniforms which ARE NOT IN ( i 0 K, i 0+1 K ). i) Put j 0 = [KF y (X t )] (where [ ] is integer part) and construct a random permutation τ such that τ(j 0 ) = i 0. ii) Put xj 0 = X t, and for i j 0 Ui = τ(i)+w i K and xi = Fy 1 (Ui ), with W i Uniform(0, 1) independent from the V s. One can get away without permuting at all!! Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

45 Korobov Rules K points in [0, 1] d are generated using a generator a {1,...,K 1}. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

46 Korobov Rules K points in [0, 1] d are generated using a generator a {1,...,K 1}. S K = { i K (1, a, a2 mod K,...,a d 1 mod K) mod 1 } 0 i K 1. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

47 Korobov Rules K points in [0, 1] d are generated using a generator a {1,...,K 1}. S K = { i K (1, a, a2 mod K,...,a d 1 mod K) mod 1 } 0 i K 1. Bad choices: a and K are not relatively prime or a = 1. The point set is fully projection regular, i.e. any lower-dimensional projection contains K distinct points. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

48 Korobov Rules K points in [0, 1] d are generated using a generator a {1,...,K 1}. S K = { i K (1, a, a2 mod K,...,a d 1 mod K) mod 1 } 0 i K 1. Bad choices: a and K are not relatively prime or a = 1. The point set is fully projection regular, i.e. any lower-dimensional projection contains K distinct points. To randomize add a vector U K Uniform(0, 1) d mod 1 i.e. S R K = S K + U K mod 1. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

49 Example: Korobov rule and Gaussian sampling Independent N=128, a=47 y x Randomized Korobov y Radu Craiu (fisher.utstat.toronto.edu/craiu/) x RQMC for MCMC 3rd WMCM - May, / 27

50 Going beyond uniformity Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

51 Going beyond uniformity Sometimes we want to "favor" regions in the hypercube but uniformity is lost once we transform the QMC point set. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

52 Going beyond uniformity Sometimes we want to "favor" regions in the hypercube but uniformity is lost once we transform the QMC point set. g : [0, 1] [0, 1] is a bijection and Ũ = g(u) = (g(u 1),..., g(u d )). Y T( X) Y = ψ(x, U) so if Y = ψ(x, Ũ) then Y T( X). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

53 Going beyond uniformity Sometimes we want to "favor" regions in the hypercube but uniformity is lost once we transform the QMC point set. g : [0, 1] [0, 1] is a bijection and Ũ = g(u) = (g(u 1),..., g(u d )). Y T( X) Y = ψ(x, U) so if Y = ψ(x, Ũ) then Y T( X). If λ(x, Y) = T(Y X)/ T(Y X) is symmetric, i.e. λ(x, Y) = λ(y, X), then MCTM can be implemented. without computing T. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

54 Going beyond uniformity Sometimes we want to "favor" regions in the hypercube but uniformity is lost once we transform the QMC point set. g : [0, 1] [0, 1] is a bijection and Ũ = g(u) = (g(u 1),..., g(u d )). Y T( X) Y = ψ(x, U) so if Y = ψ(x, Ũ) then Y T( X). If λ(x, Y) = T(Y X)/ T(Y X) is symmetric, i.e. λ(x, Y) = λ(y, X), then MCTM can be implemented. without computing T. Example: sin[π(u 0.5)] + 1 g(u) =. 2 i) g is bijective. ii) g(x) = x, x {0, 1/2, 1}, g(u) + g(1 u) = 1, u (0, 1). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

55 Example Randomized Korobov y x Transformed Korobov y Radu Craiu (fisher.utstat.toronto.edu/craiu/) x RQMC for MCMC 3rd WMCM - May, / 27

56 Gaussian Proposals Suppose T(y x) is a multivariate Gaussian with no correlation. Then T(y x) = d j=1 e (y j x j ) 2 /2σ 2 j 2πσ 2 j T(y x) = d j=1 e (y j x j ) 2 /2σ 2 j 2πσ 2 j π 1 [ ]. Φ( y j x j σ j ) 1 Φ( y j x j σ j ) λ(y, x) = d j=1 1 [ π 1 Φ( x j y j σ j ) ] = λ(x, y). Φ( x j y j σ j ) Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

57 Bimodal density Use MCTM combined with random-ray Monte Carlo to sample from (Gelman and Meng, Am. Stat. 91) π(x 1, x 2 ) exp{ (9x 2 1 x x x 2 2 8x 1 8x 2 )/2} Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

58 Efficiency Comparison At each iteration a random direction v is generated; Along direction v we sample proposals y 1,..., y k around the current state x t using y i = x t + r i with r i Uniform( σ,σ). Table: Values of the MSE reduction factor R = MSE anti MSE ind. σ\k Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

59 Lupus Data Lupus Data (van Dyk and Meng, JCGS 2001): 55 patients, 2 covariates; logistic model. The posterior density is proportional to 2 π(β x, y) j=0 e 0.5β j/ π 55 i=1 [ ] exp(x T yi [ ] 1 yi i β) exp(xi T β) 1 + exp(xi T. β) Sample using MCTM with antithetically coupled proposals or stratified proposals via a randomized Korobov set shifted using the transformation g. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

60 Efficiency comparison for Lupus Data Table: Values of R for β 1 and p 25 = 1 {β1 >25} in the logit example. Antithetic QMC k\σ / / / / / / / / / / / / / / / / / / / / / / / /0.75 Table: Computation Times k = 8 Sample Size MTM MCTM (A) MCTM (S) Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

61 Multi-Annealed Metropolis (MAM) Inspired by Neal s (Stat. & Comp., 1996) tempered transition method. Uses a series of distributions {π t } t {1=T0,T 1,...,T M }, constructed between the distribution of interest, π, and π TM. Relies on a "path" that crosses all the intermediate distributions. If 1 = T 0 < T 1 < T 2 = T and Q Ti is a proposal distribution for π Ti π 1/T i. Suppose that Q Ti (X Y) = Q Ti (Y X) (e.g., random walk Metropolis). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

62 MAM - The algorithm Π 0 X t Y 1 Π 1 X 1 Y 2 Π 2 X 2 Given the chain is in state X: i) Sample X 1 Q 1 ( X), X 2 Q 2 {( X 1 ), Y 2 Q 2 ( X 2 ),} Y 1 Q 1 ( Y 2 ). ii) Accept Y 1 with probability min 1, π 0(Y 1 ) π 0 (X) π2(x 1 )π 1 (Y 2 ) π 1 (X 1 )π 2 (Y 2 ). Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

63 Example MAM(4) T=(2,8,10,15,20, 30) sigma=(2,3,4,5,8,15) Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for0mcmc 10 3rd WMCM 20 - May, / 27

64 Multiple-try extension of the MAM Π 0 X t Y 1 Y 1 Π 1 X 1 X 1 Y 2 Y 2 Π 2 X 2 X 2 Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

65 Multiple-try extension of the MAM Π 0 X t Y 1 Y 1 Π 1 X 1 X 1 Y 2 Y 2 Π 2 X 2 X 2 MTM can be implemented by generating multiple paths and compensate using the same number of reverse paths. We have freedom in choosing the "weight of a path". Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

66 Multiple-try extension of the MAM Π 0 X t Y 1 Y 1 Π 1 X 1 X 1 Y 2 Y 2 Π 2 X 2 X 2 MTM can be implemented by generating multiple paths and compensate using the same number of reverse paths. We have freedom in choosing the "weight of a path". MAM Pros : increases the acceptance rate >10 fold, does not require running the parallel chains all the time; Cons: Requires symmetric proposals. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

67 Negative Association for Multiple-Try MAM Suppose that X i Q i ( X i 1 ) X i = ψ i (X i 1, U i ) with U i Uniform[0, 1] d and ψ i is monotone in each component of U i. Y 1 = ψ 1 (ψ 2 (ψ 2 (ψ 1 (X, U 1 ), U 2 ), V 2 ), V 1 ) and Y 1 = ψ 1(ψ 2 (ψ 2 (ψ 1 (X, U 1 ), U 2 ), V 2 ), V 1 ). The paths do not need to be generated independently. If (U i, U i ) and (V i, V i ) are negatively associated then Corr(Y 1, Y 1 ) < 0 so the average distance between the paths is larger than under independence. Radu Craiu (fisher.utstat.toronto.edu/craiu/) RQMC for MCMC 3rd WMCM - May, / 27

Acceleration of the Multiple-Try Metropolis using Antithetic and Stratified Sampling

Acceleration of the Multiple-Try Metropolis using Antithetic and Stratified Sampling Radu V Craiu 1 Christiane Lemieux 2 October 4, 2005 Abstract The Multiple-Try Metropolis is a recent extension of the