Markov Chain Monte Carlo Methods for Stochastic

Similar documents
Markov Chain Monte Carlo Methods for Stochastic Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization

Particle Filtering for Data-Driven Simulation and Optimization

Bayesian Inference and MCMC

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

An introduction to Sequential Monte Carlo

Monte Carlo Methods. Geoff Gordon February 9, 2006

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Controlled sequential Monte Carlo

Sensor Fusion: Particle Filter

Slice Sampling Approaches for Stochastic Optimization

Pattern Recognition and Machine Learning

Answers and expectations

Lecture 2: From Linear Regression to Kalman Filter and Beyond

The Kalman Filter ImPr Talk

Dynamic models. Dependent data The AR(p) model The MA(q) model Hidden Markov models. 6 Dynamic models

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Auxiliary Particle Methods

16 : Markov Chain Monte Carlo (MCMC)

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Integrated Non-Factorized Variational Inference

Lecture 4: Dynamic models

Monte Carlo Approximation of Monte Carlo Filters

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Monte Carlo Methods. Leon Gu CSD, CMU

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Statistical Inference and Methods

Probabilistic Machine Learning

Computer Intensive Methods in Mathematical Statistics

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Particle Learning and Smoothing

Introduction to Particle Filters for Data Assimilation

An introduction to particle filters

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Graphical Models and Kernel Methods

Bayesian Methods for Machine Learning

Introduction to Machine Learning CMU-10701

Linear Dynamical Systems

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Machine Learning 4771

Learning Static Parameters in Stochastic Processes

MCMC and Gibbs Sampling. Kayhan Batmanghelich

STA 4273H: Statistical Machine Learning

Markov Chain Monte Carlo

Markov Chains and MCMC

A Review of Pseudo-Marginal Markov Chain Monte Carlo

The Hierarchical Particle Filter

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

CSC 446 Notes: Lecture 13

State Estimation using Moving Horizon Estimation and Particle Filtering

An Brief Overview of Particle Filtering

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

Basic math for biology

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Human Pose Tracking I: Basics. David Fleet University of Toronto

Sequential Monte Carlo Methods (for DSGE Models)

Lecture 7: Optimal Smoothing

6 Markov Chain Monte Carlo (MCMC)

STA 4273H: Statistical Machine Learning

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Part 1: Expectation Propagation

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Computer Intensive Methods in Mathematical Statistics

Introduction to Machine Learning

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

The Unscented Particle Filter

MCMC: Markov Chain Monte Carlo

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Particle Filters. Outline

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Sampling Methods (11/30/04)

Concentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand

Expectation propagation for signal detection in flat-fading channels

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

The Metropolis-Hastings Algorithm. June 8, 2012

Quantile Filtering and Learning

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

Variational Inference via Stochastic Backpropagation

Density Estimation. Seungjin Choi

Quantifying Stochastic Model Errors via Robust Optimization

Online appendix to On the stability of the excess sensitivity of aggregate consumption growth in the US

Monte Carlo methods for sampling-based Stochastic Optimization

Smoothers: Types and Benchmarks

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

STA 4273H: Statistical Machine Learning

Linear Dynamical Systems (Kalman filter)

CPSC 540: Machine Learning

Lecture 6: Bayesian Inference in SDE Models

MONTE CARLO METHODS. Hedibert Freitas Lopes

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Contrastive Divergence

Optimal Control with Learned Forward Models

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Transcription:

Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013 1

Themes Dynamic stochastic optimization models face the curse of dimensionality Randomization can overcome some dimensionality effects With sufficient symmetry, the results can be quite effective Keys are conditioning properly on outcomes, appropriate re-sampling and split sampling, and uses of Gaussian mixtures JRBirge U Florida, Nov 2013 2

Basic problem Outline Problems of estimation Particle filters and sequential methods for simulation Optimization by slice sampling Dynamic Gaussian problems Extensions to Gaussian mixtures JRBirge U Florida, Nov 2013 3

General Problem: Find a set of controls, u t, t =0,,..., T, to T min E[ X 1 f t (x t+1,u t )] u t U t A t ;x t+1 =g t (x t,u t ) t t t; t+1 g t ( t, t) where A t is Σ t -measurable (nonanticipative), U t represents the possibility of additional constraints on the controls, g t represents state dynamics, f t represents the cost of the current action and next state transition. t=00 Challenges: Rapid expansion of states in dimension of x t in each period, Exponential growth of trees over time. JRBirge U Florida, Nov 2013 4

First Problem: What is x t? Example: hidden Markov model Example: We begin with inventory x 0 (which may be uncertain) We make observations y 1,,y T based on reported sales and new inputs What is the distribution of actual inventory x t at any time t? JRBirge U Florida, Nov 2013 5

Bayesian Interpretation t ti Suppose a distribution on x 0 : f(x 0 ) Assume some transition: p(x t x t-1 ) Observations: q(y t x t ) Goal: find P(x t y 1,,y t ( y T )) Idea: Use Bayes Rule to find the conditional probability: P(x t y t )=P(x t,y t )/P(y t ) Π s q(y s x s)p(x s x s-1)f(x 0) JRBirge U Florida, Nov 2013 6

Particle Methods Idea: Keep a limited number (N) of particles in each period Propagate forward to obtain new points Re-sample (re-weight) backwards to get consistent weights Re-balance if weights are low References: Gordon, Salmond, Smith (1993), Doucet (2008) JRBirge U Florida, Nov 2013 7

General Processes If the observation and transition functions, q and p are quite general, then P(x y t t ) may not be available analytically (E.g., no sales if stocked out, maximum limit, bulky demand) d) Alternatives: Markov chain Monte Carlo (MCMC) Particle methods JRBirge U Florida, Nov 2013 8

Sequential Importance Resampling (SIR) Method 1. Propagate. Drawx t+1,j,j =1,...,N from f(x t+1,j x t,j ). 2. Re-weight. Updateweights,ŵ t,j,j =1,...,N,toŵ t+1,j = for j =1,...,N. w t,j g(y t+1 x t+1 ) P N j=1 ŵt,jg(y t+1 x t+1 ) 3. Re-sample. If a re-sampling condition is satisfied, draw N new samples x 0 t+1,j,j =1,...,N from x t+1,j,j =1,...,N with probability distribution given by ŵ t+1,j,, j =1,,..., N; ; let x t+1,j = x 0 t+1,j and ŵ t+1,j =1/N/ for j =1,...,N. JRBirge U Florida, Nov 2013 9

Tractable Case: LQG Linear, quadratic Gaussian (LQG) Model Implications: All of the distributions are normal Can derive analytical formulas (Kalman filter): y t =Hx t + v t ; x t =F x t-1 + u t ; v~n(0,q); u~n(o,r) Iterations: w t = y t -Hx t ; S=HP H T +R; K=P HTS -1 ; x t =x t +Kw t P =(I-KH)P ; x t =Fx t-1 ; P =FP F T +Q JRBirge RMS Phoenix, October 2012 10

Kalman Prediction Inventory over 10 periods: -Actual (*) - Observed (+) Kalman estimate: x t ~ N(x t,p t ) Here: x 10 ~ N(13.8,0.62) JRBirge U Florida, Nov 2013 11

Comparison to MCMC/Particle Filter Idea: construct a sample distribution that fits the data using only information about one-step transitions Goal: maintain a constant number of samples across time instead of explosively growing tree Can it work? Why? JRBirge U Florida, Nov 2013 12

Method Structure Re-sample: Propagate x 1 w 1 w t+1 t1,x1 t t x 2 w 2 2 t w t+1 t2, x 2 t y t+1......... x N w N w t+1 tn, x N t t JRBirge U Florida, Nov 2013 13

Cum. Results N=50 Compare to Kalman Cumulative JRBirge U Florida, Nov 2013 14

Particle Frequency N=50 Particle (+) Kalman (*) JRBirge U Florida, Nov 2013 15

Other Uses of Particles Particle filtering: i finding x t from y t =(y 1,,y t ) Particle smoothing: finding x t from y T =(y 1,,y T ) Particle learning: (Polson et al.) Find parameters θ t at the same time to explain the model. Simulation JRBirge U Florida, Nov 2013 16

Extensions to Optimization Problem: find x to Pincus (1968): enough to sample from: min x f(x)subject to g(x) 0 π κ κ( (x) exp{ κ(f(x))} with constraints (e.g., Geman/Geman (1985)) : π κ,λ( (x) exp { κ (f(x) + λ κ g(x))} Extension: Augment with latent variables (λ, ω) thenwecanwrite π κ,λ (x) exp{ κ(f(x) { + λ κ g(x))} } = e ax E ω {e x2 2 ω }e bx E λ {e x2 2 λ } where the parameters (a, b) depend on (κ, λ κ ) and the functional forms of (f(x),g(x)). JRBirge U Florida, Nov 2013 17

Basic Optimization via Simulation Pincus results: Model: min F(x) Create a distribution Pλ( λ (x) exp(-λ F(x)) ()) Then: lim * λ E Pλ (x) = x JRBirge U Florida, Nov 2013 18

Markov Chain Simulation From x t to x t+1 Sample over exp(-λ F(μ x t +(1-μ) x t+1 ) ) Choose directions i to obtain mixing i in the region JRBirge U Florida, Nov 2013 19

Example Rosenbrock function: φ(x =(x 1,x 2 )) = (1 x 1 ) 2 +100(x 2 x 2 1) 2, which h has a global l minimum i at x = (1, 1). Specification for f: x t = {u if v e κ(φ(u)/φ(xt 1)) ; x t 1 1 otherwise,, } where u U([ 4, 4] 2 ), a uniform distribution over [ 4, 4] [ 4, 4], and v U([0, 1]), a standard uniform draw. Note: corresponds to the Metropolis-Hastings method of Markov Chain Monte Carlo using the distribution proportional to e κφ(x) for the acceptance criterion. JRBirge U Florida, Nov 2013 20

Test for Rosenbrock T=5000 iterations N=500 replications Re-sample when: Effective Sample Size<250 κ=2, 10, 100 JRBirge U Florida, Nov 2013 21

Rosenbrock Graphs κ= 2 10 100 JRBirge U Florida, Nov 2013 22

Other Examples: Rastigrin JRBirge U Florida, Nov 2013 23

Himmelblau JRBirge U Florida, Nov 2013 24

Shubert JRBirge U Florida, Nov 2013 25

Combine Particles and Simulation Suppose N particles are used Consider a 2-stage problem f(x,ξ) = c(x) + Q(x,y,ξ) Distribution ib ti on x, y i, ξ: ξ P(x,y 1, y N,ξ) Π i=1n (c(x)+q(x,y i,ξ i )) Result: E P (x) x * JRBirge U Florida, Nov 2013 26

Implementation Propagation forward on y s: q( ξ j x 1 ) {c(x)+q(ξ ξ j, y j, x)} p( ξ j ) where y j =argmin(c(x) + Q( ξ j, y j, x)) p( x ξξ 1,y 1,, ξ N y N ) j=1n {c(x) + Q(ξ j, y j, x) }p( ξ j ) The particles concentrate the distribution on x* as in the deterministic optimization. JRBirge U Florida, Nov 2013 27

Example From B/Louveaux, Chapter 1 (Farmer): JRBirge U Florida, Nov 2013 28

Dynamic Models General Idea: Create a distribution over decision variables and random variables Use MCMC to sample with sufficient number of particles or annealing parameter marginal mean on optimum Extension for dynamics: Keep constant sample particle numbers Convergence in fixed number of particles per stage? JRBirge U Florida, Nov 2013 29

Example: Linear-Quadratic Gaussian All distributions are available in closed form x t =Ax t-1 +B t u t-1 +v t All normal distributions with risk-sensitive objective: -log E(exp(-(θ/2)(x 2T Qx 2 +u 1T Ru 1 ))) where u 1 is control lin the first period Note: Equivalent to robust formulation Now, all can be found in closed form JRBirge U Florida, Nov 2013 30

Mean of u: m u 1 m = L u 1 1(θ) x 1, Results for LQG where L 1 is the result from the Kalman filter Variance of u: V u (N) where (N) 0 as N V u 1 m (θ) u * u as θ 0 1 1 where u 1* is the two-stage risk-neutral solution. JRBirge U Florida, Nov 2013 31

Extensions to T Stages Can derive S θ t recursively to show that: p(x t-1 t-1 t x,u )=K exp(-(1/2) (1/2) ( s=1 t-1 (x tt Q s x s +u st R s u s )+x tt S t θ x t ) + (x t -Ax t-1 -Bu t-1 ) T V t-1 (x t -Ax t-1 -Bu t-1 )) For particle methods, j=1 N: Each x tj drawn consistently from this distribution JRBirge U Florida, Nov 2013 32

w t1,x t 1 u 1 t w t2, x t 2 u 2 t. w tn, x N t u N t General Method Structure Propogate x t+1 1 x 2 t+1... x t+1 N Re-sample: w t1, x 1 t u 1 t w t2, x 2 t u 2. u t w tn, x t N N N u N t JRBirge U Florida, Nov 2013 33

General Result If the propagation step is unbiased, then E(u * 1 ) u 1 The unbiased result is possible in LQG from the recursive structure. Harder to obtain in more general systems (without additional i future sampling) JRBirge U Florida, Nov 2013 34

Two-Stage: Compare Direct MC Low Risk Aversion High Risk Aversion JRBirge U Florida, Nov 2013 35

Three Stage Example JRBirge U Florida, Nov 2013 36

Generalizing Objectives Direct: risk-sensitive exponential utility: exp(-θ Q(x,y,ξ)) - Q quadratic Objective: e -θ y - d E θ λ [exp((-θ 2 /2λ j )(y j d) 2 )] where λ ~ Exp(2) Also, use: E[ e -θ y - d ] 1 + θ y-d + O(1) JRBirge U Florida, Nov 2013 37

Further Generalizations ations Use objective approximations that preserve normal forms Extend LQG-like results to this general framework Find other frameworks where unbiased future samples are available Obtain sensitivity results on bias from imperfect future sampling Use optimization i to pick ML instead JRBirge U Florida, Nov 2013 38

Conclusions Techniques from MCMC can be used effectively for optimizing to reduce dimension effects Results work well in situations with sufficient symmetry Keys are to effectively use re-sampling, split (and slice) sampling, and latent variables JRBirge U Florida, Nov 2013 39

Other Questions to Consider How can learning and parameter estimation be combined with optimization? What is the range of problems that can be approached in this manner? What are the best uses of optimization technology as part of this process? JRBirge U Florida, Nov 2013 40

Thank you! Questions? JRBirge U Florida, Nov 2013 41