Adaptive Monte Carlo Methods for Numerical Integration

Similar documents
Nests and Tootsie Pops: Bayesian Sampling with Monte Carlo

TPA: A New Method for Approximate Counting

Numerical Integration Monte Carlo style

Computational statistics

Bayesian Methods for Machine Learning

Statistics 352: Spatial statistics. Jonathan Taylor. Department of Statistics. Models for discrete data. Stanford University.

Stat 451 Lecture Notes Numerical Integration

Why Water Freezes (A probabilistic view of phase transitions)

CPSC 540: Machine Learning

Perfect simulation for image analysis

Computer intensive statistical methods Lecture 1

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Perfect simulation of repulsive point processes

Bayesian Computation in Color-Magnitude Diagrams

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Machine Learning for Data Science (CS4786) Lecture 24

6 Markov Chain Monte Carlo (MCMC)

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Monte Carlo methods for sampling-based Stochastic Optimization

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Markov chain Monte Carlo

Monte Carlo Methods. Leon Gu CSD, CMU

Notes on Machine Learning for and

Introduction to Machine Learning CMU-10701

Bayesian Model Comparison:

Stat 516, Homework 1

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Stat 451 Lecture Notes Simulating Random Variables

arxiv: v1 [stat.co] 18 Feb 2012

Curve Fitting Re-visited, Bishop1.2.5

Monte Carlo-based statistical methods (MASM11/FMS091)

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

Normalising constants and maximum likelihood inference

A New Approximation Scheme for Monte Carlo Applications

Introduction to Probabilistic Machine Learning

Markov Chain Monte Carlo Lecture 6

COS513 LECTURE 8 STATISTICAL CONCEPTS

Stat 451 Lecture Notes Monte Carlo Integration

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Computer intensive statistical methods

Reminder of some Markov Chain properties:

Session 5B: A worked example EGARCH model

Bayesian model selection in graphs by using BDgraph package

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Classical and Bayesian inference

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

7.1 Coupling from the Past

Prior Choice, Summarizing the Posterior

Computer intensive statistical methods

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Near-linear time simulation of linear extensions of a height-2 poset with bounded interaction

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Integrated Non-Factorized Variational Inference

Statistics 135 Fall 2007 Midterm Exam

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Bayesian Inference: Posterior Intervals

CSC 2541: Bayesian Methods for Machine Learning

Counting Linear Extensions of a Partial Order

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Part 1: Expectation Propagation

Monte Carlo Methods for Computation and Optimization (048715)

On Bayesian Computation

Part III. A Decision-Theoretic Approach and Bayesian testing

MCMC: Markov Chain Monte Carlo

Principles of Bayesian Inference

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Announcements. Proposals graded

Lecture 8: The Metropolis-Hastings Algorithm

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

STA 4273H: Statistical Machine Learning

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Bayesian GLMs and Metropolis-Hastings Algorithm

Adaptive Monte Carlo methods

A = {(x, u) : 0 u f(x)},

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Kernel adaptive Sequential Monte Carlo

Lecture 6: September 22

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Quantitative Biology II Lecture 4: Variational Methods

Metropolis-Hastings Algorithm

Nonparametric Bayesian Methods - Lecture I

Bayesian Complementary Clustering, MCMC and Anglo-Saxon placenames

Multimodal Nested Sampling

17 : Markov Chain Monte Carlo

A Level-Set Hit-And-Run Sampler for Quasi- Concave Distributions

Bayes: All uncertainty is described using probability.

Graphical Models and Kernel Methods

Notes on MapReduce Algorithms

Transcription:

Adaptive Monte Carlo Methods for Numerical Integration Mark Huber 1 and Sarah Schott 2 1 Department of Mathematical Sciences, Claremont McKenna College 2 Department of Mathematics, Duke University 8 March, 2011 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 1/47

Integration Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 2/47

Numerical integration The central problem: Approximate I = f ( x) dr n, f ( x) 0 x Ω Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 3/47

Applications Statistical Physics Ising model, Potts model, Gibbs distributions Dimension = number of interacting particles Statistics Frequentist p-values Bayesian posterior normalizing constant Dimension = number of data points Combinatorial optimization Counting matchings in graph Counting independent sets Dimension = number of edges or nodes in graph Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 4/47

1-D: easy Fortunately, in 1-D numerical integration easy Use rectangles, trapezoidal rule, Simpson s rule, et cetera Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 5/47

1-D: easy Fortunately, in 1-D numerical integration easy Use rectangles, trapezoidal rule, Simpson s rule, et cetera Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 5/47

Higher dimensions, not so easy In 1-D, k intervals In 2-D, k 2 squares Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 6/47

Running time exponential in number of dimensions In d dimensions, need k d evaluations Exponential growth in d Called The curse of dimensionality Monte Carlo Methods Use randomness to approximate integral Converge slowly: error = 1/square root of evaluations Error independent of # of dimensions! Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 7/47

Many ways to go from samples to integrals Some Monte Carlo methods: (Sequential) Importance sampling Bridge sampling Path sampling Nested sampling Harmonic mean estimator Problem: these methods have unknown variance! SIS and HME variance can easily be infinite Estimated variance itself has unknown variance Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 8/47

Our result We have a new method for approximating I = f ( x) dr n, f ( x) 0 x Ω with Î such that in time ( ) 1 P 1 + ɛ Î I 1 + ɛ > 1 δ O((log I) 2 ɛ 2 ln δ 1 ) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 9/47

Classic Monte Carlo Methods Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 10/47

Abstract problem Create measure of a set A using integral: µ(a) = f ( x)d x A Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 11/47

Acceptance/Rejection Finding the measure of a set Green area = B Black area = A µ(b) = µ(a) µ(b) µ(a) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 12/47

Acceptance/Rejection a.k.a Shoot at it randomly Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 13/47

Acceptance/Rejection a.k.a Shoot at it randomly Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 13/47

Acceptance/Rejection a.k.a Shoot at it randomly Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 13/47

Acceptance/Rejection a.k.a Shoot at it randomly Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 13/47

Acceptance/Rejection a.k.a Shoot at it randomly Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 13/47

Acceptance/Rejection a.k.a Shoot at it randomly Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 13/47

Acceptance/Rejection a.k.a Shoot at it randomly Best estimate: ˆµ(B) = 7 µ(a), A = black rectangle inside region 2 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 13/47

Analysis of Accept/Reject Algorithm Fire n times at box Say you hit H times Estimate: â = µ(a)n/h Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 14/47

How well does it work? True answer: 3.0933... After 10 iterations 5.04 After 10 3 iterations 2.8656 After 10 5 iterations 3.0870 After 10 7 iterations 3.0935 About a factor of 100 per extra digit Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 15/47

How well does it work? True answer: 3.0933... After 10 iterations 5.04 After 10 3 iterations 2.8656 After 10 5 iterations 3.0870 After 10 7 iterations 3.0935 About a factor of 100 per extra digit Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 15/47

How well does it work? True answer: 3.0933... After 10 iterations 5.04 After 10 3 iterations 2.8656 After 10 5 iterations 3.0870 After 10 7 iterations 3.0935 About a factor of 100 per extra digit Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 15/47

How well does it work? True answer: 3.0933... After 10 iterations 5.04 After 10 3 iterations 2.8656 After 10 5 iterations 3.0870 After 10 7 iterations 3.0935 About a factor of 100 per extra digit Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 15/47

How well does it work? True answer: 3.0933... After 10 iterations 5.04 After 10 3 iterations 2.8656 After 10 5 iterations 3.0870 After 10 7 iterations 3.0935 About a factor of 100 per extra digit Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 15/47

Bounding error Get relative error below ɛ with probability at least 1 δ: Need to analyze tails of binomial distribution to show: n 2(1 + ɛ) p 1 ɛ 2 ln 1 δ, p = µ(a) µ(b) The (1/ɛ 2 ) ln(1/δ) often called Monte Carlo error Best you can do in general So concentrate on improving 1/p part Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 16/47

When p small, runtime large Usually p exponentially small in dimension of problem Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 17/47

Running times Acceptance/Rejection: Product Estimator [1]: 2 1 p. [ 192 log 1 ] 2. p New method TPA: 2[log 1/p] 2 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 18/47

TPA Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 19/47

TPA Idea Product estimator......plus idea from Nested sampling Result Product estimator with random cooling schedule Output can be analyzed exactly (like A/R) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 20/47

The Tootsie Pop Algorithm What is a Tootsie Pop? Hard candy lollipops with a tootsie roll (chewy chocolate) at the center In 1970, Mr. Owl was asked the question: How many licks does it take to get to the center of a Tootsie Pop? Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 21/47

List of ingredients of TPA (a) A measure space (Ω, F, µ) (b) Two measurable sets: the center B and the shell B with B B (c) A family of sets {A(β)} where 1 β < β implies A(β ) A(β), 2 µ(a(β)) is continuous in β (d) Two special values β B and β B with A(β B ) = B and A(β B ) = B. Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 22/47

Example of nested sets A(β) = all points within distance β of center Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 23/47

Idea behind TPA Step β 0 1 2 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 2 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 0.92 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 0.92 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 0.92 3 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 0.92 3 0.09 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

Idea behind TPA Step β 0 1 1.72 2 0.92 3 0.09 1 β β B 2 Repeat 3 Draw X µ(a(β)) 4 β inf{β : X A(β )} 5 Until β β B Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 24/47

How much is shaved off at each step? Notation: Z (β) := µ(a(β)) Lemma Say X µ(a(β)) and β = min{β : X A(β )}. Then Z (β ) Z (β) Unif([0, 1]) Proof by picture: Let b satisfy Z (b)/z (β) = 1/3 Then ( Z (β ) ) P Z (β) 1/3 = P(X A(b)) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 25/47

Each step removes on average 1/2 the measure Continue until reach center Original measure µ(b) = Z (β B ) After k steps measure Z (β k ) = Z (β B )r 1 r 2 r k, iid where r i Unif([0, 1]) Recall β B index of center Let l be number of steps until hit center l := min{k : Z (β B )r 1 r k < Z (β B )} 1 Question: what is the distribution of l? Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 26/47

Logarithms Recall if U Unif([0, 1]), Since Consider the points ( ) Z (βk ) P i = ln Z (β B ) ln U Exp(1) Z (β k ) Z (β B ) r 1r 2 r k, where r i iid Unif([0, 1]), e 1 + e 2 + + e k, where e i iid Exp(1) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 27/47

The Poisson point process ln Z (β B ) ln Z (β B ) Better to work in ln Z (β i ) space Recall: U Unif([0, 1]) ln U Exp(1) In log space, each step moves down Exp(1) Result: a Poisson point process from ln Z (β B ) to ln Z (β B ) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 28/47

The Poisson point process ln Z (β B ) ln Z (β 1 ) ln Z (β B ) Better to work in ln Z (β i ) space Recall: U Unif([0, 1]) ln U Exp(1) In log space, each step moves down Exp(1) Result: a Poisson point process from ln Z (β B ) to ln Z (β B ) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 28/47

The Poisson point process ln Z (β B ) ln Z (β 1 ) ln Z (β 2 ) ln Z (β B ) Better to work in ln Z (β i ) space Recall: U Unif([0, 1]) ln U Exp(1) In log space, each step moves down Exp(1) Result: a Poisson point process from ln Z (β B ) to ln Z (β B ) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 28/47

The Poisson point process ln Z (β B ) ln Z (β 1 ) ln Z (β 2 ) ln Z (β 3 ) ln Z (β B ) Better to work in ln Z (β i ) space Recall: U Unif([0, 1]) ln U Exp(1) In log space, each step moves down Exp(1) Result: a Poisson point process from ln Z (β B ) to ln Z (β B ) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 28/47

The Poisson point process ln Z (β B ) ln Z (β 1 ) ln Z (β 2 ) ln Z (β 3 ) ln Z (β B ) Better to work in ln Z (β i ) space Recall: U Unif([0, 1]) ln U Exp(1) In log space, each step moves down Exp(1) Result: a Poisson point process from ln Z (β B ) to ln Z (β B ) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 28/47

The result Output of TPA: l Pois(ln(Z (β B )/Z (β B ))) Output of A/R: H Bin(n, Z (β B )/Z (β B )) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 29/47

Repeating the Poisson point process Suppose run the Poisson point process twice Result also Poisson point process rate 2 instead of rate 1 Now run k times Result also Poisson point process rate k instead of rate 1 Final answer Pois(k ln(z (β B )/Z (β B ))) Divide by k, result close to ln[z (β B )/Z (β B )] Exponentiate, result close to Z (β B )/Z (β B ) Can use Chernoff s Bound to choose k large enough Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 30/47

Bounding the tails Theorem Let p = Z (β B )/Z (β B ). For p < exp( 1) and ɛ <.3, after [ k = 2 ln 1 ] ( 3 p ɛ + 1 ) ɛ 2 ln 1 2δ runs, each of which uses on average ln(1/p) samples, the output ˆp satisfies: P((1 + ɛ) 1 ˆp/p 1 + ɛ) > 1 δ. Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 31/47

Bonus: Approximate for all parameters simultaneously Can cut Poisson point process at any point: Right half still Poisson point process Yields omnithermal approximation Approximate Z (β)/z (β B ) for all β [β B, β B ] at same time Number of runs still same: [ k = 2 ln 1 ] ( 3 p ɛ + 1 ) ɛ 2 ln 1 2δ Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 32/47

Proof idea: Poisson process Let N(t) be rate r Poisson process N(t) rt is a right continuous martingale Omnithermal approximation valid means did not drift too far away from 0 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 33/47

Examples Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 34/47

Example 1: Mixture Gaussian Spikes Multimodal toy example Prior uniform over cube Likelihood mixture of two normals Small spike centered at (0, 0,..., 0) Large spike centered at (0.2, 0.2,..., 0.2) p θ Unif([ 1/2, 1/2] d ) d 1 L(θ) = 100 exp ( (θ i 0.2) 2 ) 2πu 2u 2 + i=1 d i=1 ( ) 1 exp θ2 i 2πv 2v 2 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 35/47

Parameter truncation Create family by limiting distance to center of small spike Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 36/47

Running time results Problem: d = 20, u =.01, v =.02 True value: ln(1/p) = 115.0993 Algorithm (10 5 runs): ln(1/p) 115.10321 Running time for Example 1 Frequency 0.00 0.01 0.02 0.03 80 100 120 140 160 Number of steps Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 37/47

Example 2: Beta-binomial model Hierarchical model Data set: free throw numbers for 429 NBA players 08-09 Example data point: Kobe Bryant made 483 out of 564 Model: number made by player i is Bin(n i, p i ) n i are known, p i Beta(a, b) Hyperparameters a and b, a 1 + Exp(1), b 1 + Exp(1) Density of percentage of free throws made Density 0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 N = 429 Bandwidth = 0.02967 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 38/47

Again use parameter trunctation Goal: find integrated likelihood Use β to limit distance from mode 2-D Unimodal problem so sampling easy True value (via numerical integration) 1577.250 After 10 5 runs 1577.256 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 39/47

Example 3: Ising model Besag[1974] modeled soil plots as good (green) or bad (red) h(x) = 13 (# adj like colored plots) π(x) = exp(2βh(x)) Z (β) parameter β is inv temp Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 40/47

Integrated likelihood for Ising Parameter space one dimensional Z = 0 p β (b) exp(2bh(x)) Z (b) db, easy to do numerically if you know Z (β) over (0, ). Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 41/47

Use omnithermal approximation ln(z β ) One run of TPA 0 1 2 β Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 42/47

Use omnithermal approximation ln(z β ) Sixteen runs of TPA 0 1 2 β Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 43/47

Connection to MCMC Several sampling methods use temperatures Simulated annealing Simulated tempering TPA easy for these problems Can speed up chain by giving well balanced cooling schedule Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 44/47

Future directions Improvement for Gibbs distributions A Gibbs distribution: π({x}) = exp( βh(i)) Z (β) Can improve algorithm running time for Gibbs: O ((ln(1/p))ɛ 2 ln δ 1 ) Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 45/47

Conclusions Randomized adaptive cooling schedules Guaranteed performance bound for MC integration (No variance estimate or unknown derivatives appear) Speed: 2[ln(1/p)] 2 much better than previous methods Speed: O ([ln(1/p)]) even better for Gibbs distributions Future directions Extend ln(1/p) method to non-gibbs distributions Remove extra ln(n) factors Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 46/47

References M. Jerrum, L. Valiant, and V. Vazirani, Random generation of combinatorial structures from a uniform distribuiton Theoret. Comput. Sci., 43, 169 188, 1986 M. Huber and S. Schott, Using TPA for Bayesian inference Bayesian Statistics 9 J. Skilling, Nested sampling for general Bayesian computation, Bayesian Anal., 1(4), 833 860, 2006 D. Stefankovi c, S. Vempala and E. Vigoda, Adaptive Simulated Annealing: A Near-Optimal Connection between Sampling and Counting, J. of the ACM, 56(3), 1 36, 2009 Mark Huber and Sarah Schott, CMC,Duke Adaptive MC integration 47/47