On Markov Chain Monte Carlo

Similar documents
Path Coupling and Aggregate Path Coupling

Coupling AMS Short Course

7.1 Coupling from the Past

Glauber Dynamics for Ising Model I AMS Short Course

A note on adiabatic theorem for Markov chains and adiabatic quantum computation. Yevgeniy Kovchegov Oregon State University

Ch5. Markov Chain Monte Carlo

Stochastic optimization Markov Chain Monte Carlo

6 Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo. Simulated Annealing.

The coupling method - Simons Counting Complexity Bootcamp, 2016

Adiabatic times for Markov chains and applications

A note on adiabatic theorem for Markov chains

Decomposition Methods and Sampling Circuits in the Cartesian Lattice

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Polynomial-time classical simulation of quantum ferromagnets

What can be sampled locally?

Introduction to Machine Learning CMU-10701

Convex Optimization CMU-10725

Discrete solid-on-solid models

5. Simulated Annealing 5.1 Basic Concepts. Fall 2010 Instructor: Dr. Masoud Yaghini

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Characterization of cutoff for reversible Markov chains

Approximate Counting and Markov Chain Monte Carlo

Markov Chains and MCMC

Advanced Sampling Algorithms

CONVERGENCE RATE OF MCMC AND SIMULATED ANNEAL- ING WITH APPLICATION TO CLIENT-SERVER ASSIGNMENT PROBLEM

The Markov Chain Monte Carlo Method

CUTOFF FOR GENERAL SPIN SYSTEMS WITH ARBITRARY BOUNDARY CONDITIONS

Mixing times via super-fast coupling

Convergence Rate of Markov Chains

INFORMATION PERCOLATION AND CUTOFF FOR THE STOCHASTIC ISING MODEL

Lecture 1: Introduction: Equivalence of Counting and Sampling

Lecture 7 and 8: Markov Chain Monte Carlo

Markov Chains and MCMC

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Quantum speedup of backtracking and Monte Carlo algorithms

CUTOFF FOR THE ISING MODEL ON THE LATTICE

Lecture 19: November 10

17 : Markov Chain Monte Carlo

Flip dynamics on canonical cut and project tilings

Markov Chain Monte Carlo Methods

Parallel Tempering I

Machine Learning for Data Science (CS4786) Lecture 24

Ferromagnets and the classical Heisenberg model. Kay Kirkpatrick, UIUC

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Monte Carlo methods for sampling-based Stochastic Optimization

25.1 Markov Chain Monte Carlo (MCMC)

Probabilistic Graphical Models Lecture Notes Fall 2009

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Monte Carlo importance sampling and Markov chain

Brief introduction to Markov Chain Monte Carlo

Propp-Wilson Algorithm (and sampling the Ising model)

Markov and Gibbs Random Fields

MONOTONE COUPLING AND THE ISING MODEL

Lecture 3: September 10

Stat 516, Homework 1

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Markov Chain Monte Carlo methods

Markov chain Monte Carlo

Annealing and Tempering for Sampling and Counting. Nayantara Bhatnagar

MARKOV CHAINS AND COUPLING FROM THE PAST

Lecture 28: April 26

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

OPTIMIZATION BY SIMULATED ANNEALING: A NECESSARY AND SUFFICIENT CONDITION FOR CONVERGENCE. Bruce Hajek* University of Illinois at Champaign-Urbana

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computational statistics

Markov Networks.

Metropolis-Hastings Algorithm

Bayesian Inference and MCMC

Characterization of cutoff for reversible Markov chains

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

CSC 446 Notes: Lecture 13

LECTURE 15 Markov chain Monte Carlo

Metastability for the Ising model on a random graph

Bayesian model selection in graphs by using BDgraph package

Computational Physics (6810): Session 13

Metropolis/Variational Monte Carlo. Microscopic Theories of Nuclear Structure, Dynamics and Electroweak Currents June 12-30, 2017, ECT*, Trento, Italy

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Markov Random Fields

Some Definition and Example of Markov Chain

FAST INITIAL CONDITIONS FOR GLAUBER DYNAMICS

Model Counting for Logical Theories

Simons Workshop on Approximate Counting, Markov Chains and Phase Transitions: Open Problem Session

What to do with a small quantum computer? Aram Harrow (MIT Physics)

The Ihara zeta function of the infinite grid

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Lecture 4: Simulated Annealing. An Introduction to Meta-Heuristics, Produced by Qiangfu Zhao (Since 2012), All rights reserved

Statistics 251/551 spring 2013 Homework # 4 Due: Wednesday 20 February

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Markov Chain Monte Carlo Lecture 6

Advances and Applications in Perfect Sampling

3.4 Relaxations and bounds

Multimodal Nested Sampling

1 Stat 605. Homework I. Due Feb. 1, 2011

Stochastic processes. MAS275 Probability Modelling. Introduction and Markov chains. Continuous time. Markov property

Markov chain Monte Carlo algorithms

Transcription:

MCMC 0 On Markov Chain Monte Carlo Yevgeniy Kovchegov Oregon State University

MCMC 1 Metropolis-Hastings algorithm. Goal: simulating an Ω-valued random variable distributed according to a given probability distribution π(z), given a complex nature of large discrete space Ω. MCMC: generating a Markov chain {X t } over Ω, with distribution µ t (z) = P (X t = z) converging rapidly to its unique stationary distribution, π(z). Metropolis-Hastings algorithm: Consider a connected neighborhood network with points in Ω. Suppose we know the ratios of π(z ) for any two neighbor π(z) points z and z on the network. Let for z and z connected by an edge of the network, the transition probability be set to p(z, z ) = 1 } {1, M min π(z ) for M large enough. π(z)

MCMC 2 Metropolis-Hastings algorithm. Consider a connected neighborhood network with points in Ω. Suppose we know the ratios of π(z ) for any two neighbor points z and z on the π(z) network. Let for z and z connected by an edge of the network, the transition probability be set to p(z, z ) = 1 } {1, M min π(z ) for M large enough. π(z) Specifically, M can be any number greater than the maximal degree in the neighborhood network. Let p(z, z) absorb the rest of the probabilities, i.e. p(z, z) = 1 p(z, z ) z : z z

MCMC 3 Knapsack problem. The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a mass and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. Knapsack problem is NP complete. Source: Wikipedia.org

MCMC 4 Knapsack problem. Given m items of various weights w j and value v j, and a knapsack with a weight limit R. Assuming the volume and shape do not matter, find the most valuable subset of items that can be carried in the knapsack. Mathematically: we need z = (z 1,..., z m ) in Ω = { m z {0, 1} m : w j z j R } maximizing U(z) = m j=1 v j z j. j=1 Source: Wikipedia.org

MCMC 5 Knapsack problem. Find z = (z 1,..., z m ) in Ω = { z {0, 1} m : m j=1 w j z j R } maximizing U(z) = m j=1 v j z j. MCMC approach: Assign weights π(z) = 1 Z β exp { β U(z) } to each z Ω with β = 1 T, where Z β = z Ω exp { β U(z) } is called partition function. Next, for each z Ω consider a clique C z of neighbor points in Ω. Consider a Markov chain over Ω that jumps from z to a neighbor z C z with probability p(z, z ) = 1 } {1, M min π(z ) for M large enough. π(z)

MCMC 6 Knapsack problem. Assign weights π(z) = 1 Z β exp { β U(z) } to each z Ω with β = 1 T, where Z β = z Ω exp { β U(z) } is called partition function. Next, for each z Ω consider a clique C z of neighbor points in Ω. Consider a Markov chain over Ω that jumps from z to a neighbor z C z with probability p(z, z ) = 1 } {1, M min π(z ) for M large enough. π(z) Observe that π(z ) π(z) = exp { β ( U(z ) U(z) ) = exp { β ( v (z z) )}, where v = (v 1,..., v m ) is the values vector.

MCMC 7 Knapsack and other optimization problems. Issues: (i) Running time? Analyzing mixing time is challenging in MCMC for real-life optimization problems such as knapsack problem. With few exceptions no firm foundation exists, and no performance guaranteed. (ii) Optimal T? T is usually chosen using empirical observations, trial and error, or certain heuristic. Often, simulated annealing approach is used.

MCMC 8 Simulated annealing. Usually, we let π(z) = 1 Z β exp { β U(z) } to each z Ω }. with β = 1 T, and p(z, z ) = 1 M min {1, π(z ) π(z) Idea: What if we let temperature T change with time t, i.e. T = T (t)? When T is large, the Markov chain is more diffusive; as T gets smaller, the value X t stabilizes around the maxima. The method was independently devised by S. Kirkpatrick, C.D. Gelatt and M.P. Vecchi in 1983, and by V. Černý in 1985. Name comes from annealing in metallurgy, a technique involving heating and controlled cooling.

MCMC 9 Gibbs Sampling: Ising Model. Every vertex v of G = (V, E) is assigned a spin σ(v) { 1, +1}. The probability of a configuration σ { 1, +1} V is π(σ) = e βh(σ) Z(β), where β = 1 T

MCMC 10 Gibbs Sampling: Ising Model. Every vertex v of G = (V, E) is assigned a spin σ(v) { 1, +1}. The probability of a configuration σ { 1, +1} V is π(σ) = e βh(σ) Z(β), where β = 1 T +1 1 1 1 +1 +1 1 +1 +1 1 +1 1 1 +1 1

MCMC 11 Gibbs Sampling: Ising Model. σ { 1, +1} V, the Hamiltonian H(σ) = 1 2 u,v: u v σ(u)σ(v) = edges e=[u,v] and probability of a configuration σ { 1, +1} V is π(σ) = e βh(σ) Z(β), where β = 1 T σ(u)σ(v) Z(β) = σ { 1,+1} V e βh(σ) factor. - normalizing

MCMC 12 Ising Model: local Hamiltonian H(σ) = 1 2 u,v: u v σ(u)σ(v) = edges e=[u,v] σ(u)σ(v) The local Hamiltonian H local (σ, v) = u: u v σ(u)σ(v). Observe: conditional probability for σ(v) is given by H local (σ, v): H(σ) = H local (σ, v) e=[u 1,u 2 ]: u 1,u 2 v σ(u 1 )σ(u 2 )

MCMC 13 Gibbs Sampling: Ising Model via Glauber dynamics. +1 1 1 1 +1 +1 1 σ(v) +1 1 +1 1 1 +1 1 Observe: conditional probability for σ(v) is given by H local (σ, v): H(σ) = H local (σ, v) e=[u 1,u 2 ]: u 1,u 2 v σ(u 1 )σ(u 2 )

MCMC 14 Gibbs Sampling: Ising Model via Glauber dynamics. +1 1 1 1 +1 +1 1? +1 1 +1 1 1 +1 1 Randomly pick v G, erase the spin σ(v). Choose σ + or σ : Prob(σ σ + ) = e βh(σ + ) e βh(σ ) +e βh(σ + ) = e βh local (σ +,v) e βh local (σ,v) +e βh local (σ +,v)= e 2β e 2β +e 2β.

MCMC 15 Glauber dynamics: Rapid mixing. Glauber dynamics - a random walk on state space S (here { 1, +1} V ) s.t. needed π is stationary w.r.t. Glauber dynamics. In high temperatures (i.e. β = T 1 small enough) it takes O(n log n) iterations to get ε-close to π. Here V = n. Need: max v V deg(v) tanh(β) < 1 Thus the Glauber dynamics is a fast way to generate π. It is an important example of Gibbs sampling.

MCMC 16 Close enough distribution and mixing time. What is ε-close to π? Start with σ 0 : +1 +1 +1 +1 +1 +1 +1 +1 1 1 1 1 1 1 1 If P t (σ) is the probability distribution after t iterations, the total variation distance P t π T V = 1 P t (σ) π(σ) ε. 2 σ { 1,+1} V

MCMC 17 Close enough distribution and mixing time. Total variation distance: µ ν T V := 1 2 x S µ(x) ν(x) = sup µ(a) ν(a) A S Mixing time: t mix (ε) := inf {t : P t π T V ε, all σ 0 }. In high temperature, t mix (ε) = O(n log n).

MCMC 18 Coupling Method. S - sample space {p(i, j)} i,j S - transition probabilities Construct process ( Xt Y t ) on S S such that X t is a {p(i, j)}-markov chain Y t is a {p(i, j)}-markov chain Once X t =Y t, let X t+1 =Y t+1, X t+2 =Y t+2,...

MCMC 19 Coupling Method.

MCMC 20 Coupling Method. Coupling time: T coupling = min{t : X t = Y t } Successful coupling: Prob(T coupling < )=1

MCMC 21 Mixing times via coupling. Let T i,j be coupling time for X 0 = i and Y 0 = j. Then ( ) Xt Y t given P Xt P Yt T V P [T i,j > t] E[T i,j] t Now, if we let Y 0 π, then for any X 0 S, P Xt π T V = P Xt P Yt T V max i,j S E[T i,j ] t whenever t max i,j S E[T i,j ] ε. ε

MCMC 22 Mixing times via coupling. P Xt π T V ε whenever t max i,j S E[T i,j ] ε. Thus t mix (ε) = inf { t : P Xt π T V ε } max i,j S E[T i,j ]. ε So, O(t mix ) O(T coupling ). Thus constructing a coupled process that minimizes E[T coupling ] gives an effective upper bound on mixing time.

MCMC 23 Coupon collector. n types of coupons: 1, 2,..., n Collecting coupons: coupon / unit of time, each coupon type is equally likely. Goal: To collect a coupon of each type. Question: How much time will it take?

MCMC 24 Coupon collector. c1 c2 c1 c3 c3 c2 c4 c5... τ 1 = 1 τ 2 τ 3 τ 4 τ 5 Here τ 1 = 1, E[τ 2 τ 1 ] = E[τ 3 τ 2 ] = Hence E[τ n ] = n n n 1, n n 2,...,E[τ n τ n 1 ] = n. ( 1 + 1 2 + 1 3 + + 1 ) n = n log n+o(n)