Monte Carlo Methods for Computation and Optimization (048715)

Similar documents
6 Markov Chain Monte Carlo (MCMC)

Introduction to Machine Learning CMU-10701

Advanced Monte Carlo Methods - Computational Challenges

Introduction to Machine Learning CMU-10701

Bayesian Inference. Chapter 1. Introduction and basic concepts

Copyright 2001 University of Cambridge. Not to be quoted or copied without permission.

Computer intensive statistical methods Lecture 1

Markov Chain Monte Carlo (MCMC)

Computational statistics

Monte Carlo Integration

Approximate Bayesian Computation: a simulation based approach to inference

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Monte Carlo Methods. Part I: Introduction

17 : Markov Chain Monte Carlo

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

Use of Eigen values and eigen vectors to calculate higher transition probabilities

ISE/OR 762 Stochastic Simulation Techniques

MARKOV CHAIN MONTE CARLO

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

MONTE CARLO METHODS. Hedibert Freitas Lopes

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

Paul Karapanagiotidis ECO4060

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference and MCMC

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

Three examples of a Practical Exact Markov Chain Sampling

A = {(x, u) : 0 u f(x)},

Markov Chain Monte Carlo Lecture 4

Statistical Inference

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Markov Chain Monte Carlo methods

Sampling Methods (11/30/04)

Perfect simulation for image analysis

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

Enrico Fermi and the FERMIAC. Mechanical device that plots 2D random walks of slow and fast neutrons in fissile material

Randomized Algorithms

Computer intensive statistical methods

Reminder of some Markov Chain properties:

Part III. A Decision-Theoretic Approach and Bayesian testing

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Data Analysis and Uncertainty Part 2: Estimation

ABC methods for phase-type distributions with applications in insurance risk problems

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Random Walks A&T and F&S 3.1.2

Doing Physics with Random Numbers

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Markov Chain Monte Carlo Lecture 6

Hastings-within-Gibbs Algorithm: Introduction and Application on Hierarchical Model

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Lecture notes on Regression: Markov Chain Monte Carlo (MCMC)

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Lecture 2: Statistical Decision Theory (Part I)

Monte Carlo Integration I [RC] Chapter 3

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Practical unbiased Monte Carlo for Uncertainty Quantification

Bayesian search for other Earths

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

16 : Markov Chain Monte Carlo (MCMC)

Introduction to Bayesian methods in inverse problems

Lecture notes on statistical decision theory Econ 2110, fall 2013

PLEASE SCROLL DOWN FOR ARTICLE. Full terms and conditions of use:

Markov Chains and MCMC

MCMC Methods: Gibbs and Metropolis

Learning the hyper-parameters. Luca Martino

eqr094: Hierarchical MCMC for Bayesian System Reliability

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Markov Chain Monte Carlo Methods

CPSC 540: Machine Learning

STATISTICS-STAT (STAT)

Bayesian inference: what it means and why we care

Stat 451 Lecture Notes Monte Carlo Integration

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Econ 2140, spring 2018, Part IIa Statistical Decision Theory

Zig-Zag Monte Carlo. Delft University of Technology. Joris Bierkens February 7, 2017

Monte Carlo Methods in High Energy Physics I

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Monte Carlo Inference Methods

Foundations of Statistical Inference

Monte Carlo (MC) Simulation Methods. Elisa Fadda

Math 456: Mathematical Modeling. Tuesday, April 9th, 2018

Inference in state-space models with multiple paths from conditional SMC

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

Multivariate Simulations

an introduction to bayesian inference

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

19 : Slice Sampling and HMC

Bayesian inference for intractable distributions

Monte Carlo-based statistical methods (MASM11/FMS091)

2008 Winton. Review of Statistical Terminology

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo integration (naive Monte Carlo)

Queueing Theory and Simulation. Introduction

MCMC and Gibbs Sampling. Kayhan Batmanghelich

A Semi-parametric Bayesian Framework for Performance Analysis of Call Centers

Transcription:

Technion Department of Electrical Engineering Monte Carlo Methods for Computation and Optimization (048715) Lecture Notes Prof. Nahum Shimkin Spring 2015 i

PREFACE These lecture notes are intended for a first, graduate-level, course on Monte-Carlo simulation methods. The required prerequisites are merely an elementary course on probability and statistics, and a course on stochastic processes (emphasizing Markov chains). There exists many excellent textbooks on this subject, with differing emphasis. Listed below are the main ones that were consulted in preparing these notes. (1) R. Rubinstein and D. Kroese, Simulation and the Monte Carlo Method, Wiley, 2008. (2) C. Robert and G. Casella, Monte Carlo Statistical Methods, 2nd ed., Springer, 2005. (3) S. Asmussen and P. Glynn, Stochastic Simulation, Springer, 2007. (4) J. Liu, Monte Carlo Strategies in Scientific Computing, Springer, 2008. Additional specific references will be given as needed. ii

1 Introduction In their most basic form, Monte Carlo schemes compute the expected value of random variables of interest, by averaging random draws from an appropriate probability distribution. This simple idea gives rise to a wide variety of methods and algorithms, which are needed to facilitate this computation and render it efficient. A notable feature of Monte Carlo methods is their wide applicability, making them the method of choice for many complex problems where other (analytical or numeric) methods fail. Essentially whenever we can simulate a random variable or process, we can calculate probabilities and expectations by averaging over these simulations. 1.1 Historical Background A first systematic use of Monte Carlo methods can be traced to the development of atomic weapons in Los Alamos during the years 1944-5, using the early digital computers that were developed at that time. Notable names related to these initial efforts include Stanislaw Ulam, John von Neumann, Nickolas Metropolis, and Enrico Fermi. In the early 1950s, statistical physicists introduced Markov-chain based methods for simulation of simple fluids. This method was gradually extended to simulated more complex physical systems, such as spin glass models and many others. In the 1980s, statisticians and computer scientists developed Monte-carlo methods for a wide variety of problems such as combinatorial optimization, statistical inference, likelihood computation, and Bayesian modeling. Today, Monte Carlo methods find extensive use in a wide range of scientific disciplines, that include physics, chemistry, computer science, economics and finance, operations research, and statistics. We will not attempt to cover in detail the application areas, but rater focus on the basic methods, complemented by some illustrative examples of applications in selected areas. 1.2 Example: Approximating π Buffon s needle: An early experiment known as Buffon s needle (dating back to 1777) serves to illustrate the basic idea and potential of elementary Monte Carlo methods. 1-1

In this experiment, one draws a grid of parallel lines with spacing D on a flat surface, and repeatedly throws a needle of length l into that grid. Under ideal conditions, the chances that the needed will intersect at least one of the drawn lines is easily computed to be 2l/πD = a. Thus, if we let ˆp N denote the proportion of intersects out of N draws, by the law of large numbers we get that lim ˆp N = 2l N πd from which we can approximate π 2l/Dˆp N. To get an idea of the required number of samples for a given accuracy, note that E(ˆp N ) = a, and Var(ˆp N ) = 1 N Var(ˆp a(1 a) 1) = N (since ˆp 1 is a Bernoulli RV with parameter a). Therefore, by Markov s inequality, P { ˆp N a ϵ} Var(ˆp N) ϵ 2 = a(1 a) Nϵ 2. Thus, to get an accuracy of ϵ = 10 3 with 99% confidence, we need a(1 a) 0.01, or N10 6 N 10 8 /a(1 a). Obviously, an improvement in efficiency will be helpful here. Circle in a Square: A related experiment is the following. Draw a circle of diameter 1 inside a square of side 1. Now throw a small ball inside the square, and let p N denote the proportion of throws that land inside the circle. Under ideal conditions, the chances for that are just π/4, and we can approximate π 4ˆp N for N large. The last experiment is easily simulated: Draw at each stage two independent random variables X and Y that are uniformly distributed on [0, 1], and count the fraction of times in which X 2 + Y 2 1. Is the same as p N in our circle-in-square experiment. Question: Repeat the last experiment with a square of side 10. How does this compare to the above? Which option is better? 1.3 The Basic Framework The basic problem in stochastic simulation is to estimate the expected value of a function h of some random variable X: l = E(h(X)) 1-2

The most basic approach (called crude Monte-Carlo) is to use a sequence of i.i.d. samples of X, namely X i f X, and compute the average: ˆl = 1 n n h(x i ) i=1 Note that probabilities of events can be computed as the expected value of their indicator functions: P (x A) = E(1 {x A} ) Basic issues that arise in this context include: 1. Efficient sampling: How to sample efficiently from a required probability distribution f X (which may be complicated, multivariate, and implicitly specified). 2. Variance reduction: How to modify the basic scheme to reduce the variance of the estimate (so that a smaller number of samples will suffice for a given accuracy). The central methods that address the first point include rejection sampling, and, most importantly, Markov Chain Monte Carlo (MCMC). Methods that address the second issue include Importance Sampling, along with many other and more specific schemes. 1.4 Some Application Examples 1.4.1 Statistical Physics The state probabilities of physical systems are often described by the Boltzmann distribution (or Gibbs distribution) of the form π(x) = 1 Z e U(x)/kT, where x X is the system configuration or state, U(x) its potential energy, T is the temperature, and k the Boltzmann constant. The normalization constant Z = Z(T ) is referred to as the partition function. A canonical example is the Ising model, which describes the interaction of magnetic dipoles in a magnetic material. Here x = (x i ) i L, where L is a two or three dimensional 1-3

discrete domain, and x i { 1, 1} is the random state (orientation) of the dipole in position i. The potential energy is U(x) = J x i x j + h i x i. 2 i X,j N i i Here N i are the neighbors of i, J the interaction strength, and h i the external magnetic field. Macroscopic quantities of interest include expected values of various function of the state. For example, U = EU(x) is the (mean) potential energy, m = E( i x i )/ L is the mean magnetization per dipole, etc. These averages can be computed using Monte-Carlo sampling from π(x). 1.4.2 Statistical Inference Suppose we wish to estimate the value of a parameter vector θ based on measurements x. In a Bayesian setting, the posterior distribution of θ is given by π(θ x) = f(x θ)π(θ) f(x θ)π(θ)dθ Therefore ˆθ(x) = E(θ x) = θπ(θ x)dθ cov(θ x) = cov(θ ˆθ(x) x) = (θ ˆθ(x))(θ ˆθ(x)) T π(θ x)dθ These computations, which involve the expected value of certain functions, can be carried out using Monte-Carlo sampling methods. Obviously, we need to be able to sample from the conditional distribution π(θ x). 1.4.3 Queueing Networks Networks of queues are a common model for various communication, computer, and service systems. Here, customers (jobs, packets) arrive to the system, queue up, and are served in one or more service stations. Queuing networks generally belong to the class of discrete-event stochastic systems, and can be simulated using simulation methods and tools available for such systems. Based 1-4

on a set of primitive random variables, such as the arrival and service times of the customers, the system can be simulated forward in time, and the quantities of interest (such as queue sizes, delays) can be computed. Monte Carlo simulations are an important tool for performance analysis and parameter optimization in such systems. Quantities of interest may include: The expected delay of a job from arrival to service completion. The fraction of time that a certain buffer is full. The expected number of rejected calls. The expected time till some buffer becomes full, starting from an empty system. Note that the first three quantities are steady-state ones, which the last is a single-shot quantity. Accordingly, the first may be computed from a single, long simulation run of the system, while the last requires repeated trials from an empty system. We note that events such as buffer full may be rare events, which require excessive running time to encounter and estimate directly. In these cases, specific methods are required to handle rare event simulations. 1.4.4 Sampling Web Content Various statistics about the content of usage of the Web are of interest. Due to its extent, it is obviously not possible to go over its entire content, and sampling methods are required. Consider for example the specific problem of sampling random pages from the corpus of documents indexed by a search engine, using only the search engines public interface. Simple-minded sampling methods suffer from a bias that exists in favor of long documents. More elaborate use of Monte Carlo sampling can overcome this problem. 1.4.5 Combinatorial Optimization Suppose we wish to minimize a function c(x), x X, where X is a large discrete set. This problem can be posed as a stochastic simulation problem by defining the probability 1-5

distribution π(x) = 1 Z exp( c(x)/t ). Evidently, for small T, this distribution is concentrated on points x with small cost. Thus, drawing sample from π will give low-cost x s with high probability. Sampling from π is often carried out using MCMC methods. This is closely related to the Simulated Annealing algorithm for global optimization, which is in fact a special case. 1-6