Convex Optimization CMU-10725

Similar documents
Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Understanding MCMC. Marcel Lüthi, University of Basel. Slides based on presentation by Sandro Schönborn

Monte Carlo Methods. Leon Gu CSD, CMU

Stochastic optimization Markov Chain Monte Carlo

Markov Chains, Random Walks on Graphs, and the Laplacian

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Markov Chains and MCMC

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

6 Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Markov Chains, Stochastic Processes, and Matrix Decompositions

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Markov Chain Monte Carlo Methods

MCMC: Markov Chain Monte Carlo

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

ELEC633: Graphical Models

Perron Frobenius Theory

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

The Theory behind PageRank

Convergence Rate of Markov Chains

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

CSC 446 Notes: Lecture 13

Markov chain Monte Carlo Lecture 9

Discrete solid-on-solid models

Quantifying Uncertainty

Markov Chains and Stochastic Sampling

CPSC 540: Machine Learning

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Lecture February Arnaud Doucet

Stochastic Simulation

ECEN 689 Special Topics in Data Science for Communications Networks

Propp-Wilson Algorithm (and sampling the Ising model)

0.1 Naive formulation of PageRank

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

Markov Chain Monte Carlo

No class on Thursday, October 1. No office hours on Tuesday, September 29 and Thursday, October 1.

Chapter 7. Markov chain background. 7.1 Finite state space

Lecture 5: Random Walks and Markov Chain

MARKOV CHAIN MONTE CARLO

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

16 : Markov Chain Monte Carlo (MCMC)

Lecture 2: September 8

Ch5. Markov Chain Monte Carlo

Necessary and sufficient conditions for strong R-positivity

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Probability & Computing

Homework 2 will be posted by tomorrow morning, due Friday, October 16 at 5 PM.

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

2 Discrete-Time Markov Chains

Nonnegative Matrices I

MCMC and Gibbs Sampling. Sargur Srihari

MATH36001 Perron Frobenius Theory 2015

STOCHASTIC PROCESSES Basic notions

Lecture #5. Dependencies along the genome

Interlude: Practice Final

Robert Collins CSE586, PSU. Markov-Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo

Stochastic Processes

Markov Chains Handout for Stat 110

Markov and Gibbs Random Fields

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II


8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

Markov Chains and MCMC

Some Definition and Example of Markov Chain

Outlines. Discrete Time Markov Chain (DTMC) Continuous Time Markov Chain (CTMC)

Markov Chains and Computer Science

CONTENTS. Preface List of Symbols and Notation

Bayesian Methods for Machine Learning

On the mathematical background of Google PageRank algorithm

STA 294: Stochastic Processes & Bayesian Nonparametrics

Markov-Chain Monte Carlo

Markov chains. Randomness and Computation. Markov chains. Markov processes

Powerful tool for sampling from complicated distributions. Many use Markov chains to model events that arise in nature.

Robert Collins CSE586, PSU Intro to Sampling Methods

Lecture 8: The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo (MCMC)

Section 1.7: Properties of the Leslie Matrix

Markov Random Fields

Computational statistics

Graph Models The PageRank Algorithm

17 : Markov Chain Monte Carlo

MCMC Methods: Gibbs and Metropolis

Simulation of point process using MCMC

25.1 Markov Chain Monte Carlo (MCMC)

Lecture 6: Markov Chain Monte Carlo

Irreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

6.842 Randomness and Computation March 3, Lecture 8

Markov Processes. Stochastic process. Markov process

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Transcription:

Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani

Andrey Markov Markov Chains 2

Markov Chains Markov chain: Homogen Markov chain: 3

Markov Chains Assume that the state space is finite: 1-Step state transition matrix: Lemma: The state transition matrix is stochastic: t-step state transition matrix: Lemma: 4

Markov Chains Example Markov chain with three states (s = 3) Transition matrix Transition graph 5

Markov Chains If the probability vector for the initial state is it follows that and, after several iterations (multiplications by T ) no matter what initial distribution ¹(x 1 ) was. stationary distribution The chain has forgotten its past. 6

Markov Chains Our goal is to find conditions under which the Markov chain forgets its past, and independently from its starting distribution, the state distributions converge to a stationary distribution. Definition: [stationary distribution, invariant distribution] This is a necessary condition for having limit behavior of the Markov chain. 7

Markov Chains Theorem: For any starting point, the chain will convergence to the unique invariant distribution p(x), as long as 1. T is a stochastic transition matrix 2. T is irreducible 3. T is aperiodic 8

Limit Theorem of Markov Chains More formally: If the Markov chain is Irreducible and Aperiodic, then: 9

Markov Chains Definition Irreducibility: For each pairs of states (i,j), there is a positive probability, starting in state i, that the process will ever enter state j. = The matrix T cannot be reduced to separate smaller matrices = Transition graph is connected. It is possible to get to any state from any state. 10

Markov Chains Definition Aperiodicity: The chain cannot get trapped in cycles. Definition A state i has period k if any return to state i, must occur in multiples of k time steps. Formally, the period of a state i is defined as (where "gcd" is the greatest common divisor) For example, suppose it is possible to return to the state in {6,8,10,12,...} time steps. Then k=2 11

Markov Chains Definition If k = 1, then the state is said to be aperiodic: returns to state i can occur at irregular times. In other words, a state i is aperiodic if there exists n such that for all n' n, Definition A Markov chain is aperiodic if every state is aperiodic. 12

Markov Chains Theorem: An irreducible Markov chain only needs one aperiodic state to imply all states are aperiodic. Corollary: An irreducible chain is said to be aperiodic, if for some n 0 and some state j Example for periodic Markov chain: In this case Let However, if we start the chain from (1,0), or (0,1), then the chain get traps into a cycle, it doesn t forget its past. Periodic Markov chains don t forget their past. 13

Reversible Markov chains Detailed Balance Property Definition: reversibility /detailed balance condition: Theorem: A sufficient, but not necessary, condition to ensure that a particular ¼ is the desired invariant distribution of the Markov chain is the detailed balance condition. 14

How fast can Markov chains forget the past? MCMC samplers are irreducible and aperiodic Markov chains have the target distribution as the invariant distribution. the detailed balance condition is satisfied. It is also important to design samplers that converge quickly. 15

Spectral properties Theorem: If ¼ is the left eigenvector of the matrix T with eigenvalue 1. The Perron-Frobenius theorem from linear algebra tells us that the remaining eigenvalues have absolute value less than 1. The second largest eigenvalue, therefore, determines the rate of convergence of the chain, and should be as small as possible. 16

The Hastings-Metropolis Algorithm 17

The Hastings-Metropolis Algorithm Our goal: Generate samples from the following discrete distribution: We don t know B! The main idea is to construct a time-reversible Markov chain with (¼ 1,,¼ m ) limit distributions Later we will discuss what to do when the distribution is continuous 18

The Hastings-Metropolis Algorithm Let {1,2,,m} be the state space of a Markov chain that we can simulate. No rejection: we use all X 1, X 2, X n, 19

Example for Large State Space Let {1,2,,m} be the state space of a Markov chain that we can simulate. d-dimensional grid: Max 2d possible movements at each grid point (linear in d) Exponentially large state space in dimension d 20

The Hastings-Metropolis Algorithm Theorem Proof 21

The Hastings-Metropolis Algorithm Observation Proof: Corollary Theorem 22

The Hastings-Metropolis Algorithm Theorem Proof: Note: 23

The Hastings-Metropolis Algorithm It is not rejection sampling, we use all the samples! 24

Continuous Distributions The same algorithm can be used for continuous distributions as well. In this case, the state space is continuous. 25

Experiment with HM An application for continuous distributions Bimodal target distribution: p(x) 0.3 exp( 0.2x 2 ) +0.7 exp( 0.2(x 10) 2 ) q(x x (i ) ) = N(x (i), 100), 5000 iterations 26

Good proposal distrib. is important 27

HM on Combinatorial Sets Generate uniformly distributed samples from the set of permutations Let n=3, and a=12: {1,2,3}: 1+4+9=14 {1,3,2}: 1+6+6=13 {2,3,1}: 2+6+3=11 {2,1,3}: 2+2+9=13 {3,1,2}: 3+2+6=11 {3,2,1}: 3+4+3=10 28

HM on Combinatorial Sets To define a simple Markov chain on neighboring elements (permutations):, we need the concept of Definition: Two permutations are neighbors, if one results from the interchange of two of the positions of the other: (1,2,3,4) and (1,2,4,3) are neighbors. (1,2,3,4) and (1,3,4,2) are not neighbors. 29

HM on Combinatorial Sets That is what we wanted! 30

Gibbs Sampling: The Problem Suppose that we can generate samples from Our goal is to generate samples from 31

Gibbs Sampling: Pseudo Code 32

Gibbs Sampling: Theory Let and let Observation: By construction, this HM sampler would sample from 33

Gibbs Sampling is a Special HM Theorem: The Gibbs sampling is a special case of HM with Proof: By definition: 34

Gibbs Sampling is a Special HM Proof: 35

Gibbs Sampling in Practice 36

Simulated Annealing 37

Simulated Annealing Goal: Find 38

Simulated Annealing Theorem: Proof: 39

Simulated Annealing Main idea Let be big. Generate a Markov chain with limit distribution P (x). In long run, the Markov chain will jump among the maximum points of P (x). Introduce the relationship of neighboring vectors: 40

Simulated Annealing Uniform distribution Use the Hastings- Metropolis sampling: 41

Simulated Annealing: Pseudo Code With prob. accept the new state with prob. (1- ) don't accept and stay 42

Simulated Annealing: Special case In this special case: With prob. =1 accept the new state since we increased V 43

Simulated Annealing: Problems 44

Simulated Annealing Temperature = 1/ 45

Simulated Annealing 46