Particle Filtering Approaches for Dynamic Stochastic Optimization

Similar documents
Markov Chain Monte Carlo Methods for Stochastic Optimization

Markov Chain Monte Carlo Methods for Stochastic

Particle Filtering for Data-Driven Simulation and Optimization

Bayesian Inference and MCMC

Sequential Monte Carlo Methods for Bayesian Computation

An introduction to Sequential Monte Carlo

The Kalman Filter ImPr Talk

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Pattern Recognition and Machine Learning

Introduction to Machine Learning CMU-10701

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Auxiliary Particle Methods

16 : Markov Chain Monte Carlo (MCMC)

Monte Carlo Approximation of Monte Carlo Filters

Sequential Monte Carlo and Particle Filtering. Frank Wood Gatsby, November 2007

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Graphical Models and Kernel Methods

Answers and expectations

Lecture 2: From Linear Regression to Kalman Filter and Beyond

An introduction to particle filters

Monte Carlo Methods. Geoff Gordon February 9, 2006

Robotics. Mobile Robotics. Marc Toussaint U Stuttgart

Sensor Fusion: Particle Filter

Towards a Bayesian model for Cyber Security

Lecture 4: Dynamic models

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Learning Static Parameters in Stochastic Processes

Monte Carlo Methods. Leon Gu CSD, CMU

STA 4273H: Statistical Machine Learning

L09. PARTICLE FILTERING. NA568 Mobile Robotics: Methods & Algorithms

Controlled sequential Monte Carlo

Particle Learning and Smoothing

Introduction to Particle Filters for Data Assimilation

STA 4273H: Statistical Machine Learning

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Bayesian Methods for Machine Learning

Smoothers: Types and Benchmarks

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Lecture 6: Multiple Model Filtering, Particle Filtering and Other Approximations

Probabilistic Machine Learning

A Review of Pseudo-Marginal Markov Chain Monte Carlo

Bayesian Monte Carlo Filtering for Stochastic Volatility Models

Slice Sampling Approaches for Stochastic Optimization

Sampling Methods (11/30/04)

CSC 446 Notes: Lecture 13

Linear Dynamical Systems

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Lecture 6: Bayesian Inference in SDE Models

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Human Pose Tracking I: Basics. David Fleet University of Toronto

Markov Chain Monte Carlo

6 Markov Chain Monte Carlo (MCMC)

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

The Hierarchical Particle Filter

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Lecture 7: Optimal Smoothing

STA 4273H: Statistical Machine Learning

Integrated Non-Factorized Variational Inference

Computer Intensive Methods in Mathematical Statistics

Part 1: Expectation Propagation

Statistical Inference and Methods

Markov Chain Monte Carlo Lecture 4

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

MCMC: Markov Chain Monte Carlo

Machine Learning 4771

Particle Filters. Outline

State Estimation using Moving Horizon Estimation and Particle Filtering

Markov Chains and MCMC

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

An Brief Overview of Particle Filtering

Particle Filtering a brief introductory tutorial. Frank Wood Gatsby, August 2007

Sampling Algorithms for Probabilistic Graphical models

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Stochastic Gradient Descent

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Density Estimation. Seungjin Choi

CPSC 540: Machine Learning

Markov Chains and MCMC

Monte Carlo Solution of Integral Equations

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

19 : Slice Sampling and HMC

Computer Intensive Methods in Mathematical Statistics

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

The Unscented Particle Filter

17 : Markov Chain Monte Carlo

Approximate Bayesian Computation and Particle Filters

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Machine Learning

Why do we care? Examples. Bayes Rule. What room am I in? Handling uncertainty over time: predicting, estimating, recognizing, learning

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Expectation propagation for signal detection in flat-fading channels

Particle Filters: Convergence Results and High Dimensions

Sequential Monte Carlo Methods (for DSGE Models)

Variational Scoring of Graphical Model Structures

Transcription:

Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop, Purdue U., July 2015 1

Themes Dynamic stochastic optimization models face the curse of dimensionality Randomization can overcome some dimensionality effects With sufficient symmetry, the results can be quite effective Keys are conditioning properly on outcomes, appropriate re-sampling and split sampling, and uses of Gaussian mixtures JRBirge I-Sim Workshop, Purdue U., July 2015 2

Basic problem Outline Problems of estimation Particle filters and sequential methods for simulation Optimization by slice sampling Dynamic Gaussian problems Extensions to Gaussian mixtures JRBirge I-Sim Workshop, Purdue U., July 2015 3

General Problem: Find a set of controls, u t ; t = 0; : : : ; T, to T min E[ X 1 u t 2U t \A t ;x t+1 =g t (x t ;u t ) t=0 f t (x t+1 ; u t )] where A t is t -measurable (nonanticipative), U t represents the possibility of additional constraints on the controls, g t represents state dynamics, f t represents the cost of the current action and next state transition. Challenges: Rapid expansion of states in dimension of x t in each period, Exponential growth of trees over time. JRBirge I-Sim Workshop, Purdue U., July 2015 4

First Problem: What is x t? Example: hidden Markov model Example: We begin with inventory x 0 (which may be uncertain) We make observations y 1,,y T based on reported sales and new inputs What is the distribution of actual inventory x t at any time t? JRBirge I-Sim Workshop, Purdue U., July 2015 5

Bayesian Interpretation Suppose a distribution on x 0 : f(x 0 ) Assume some transition: p(x t x t-1 ) Observations: q(y t x t ) Goal: find P(x t y 1,,y t ( y T )) Idea: Use Bayes Rule to find the conditional probability: P(x t y t )=P(x t,y t )/P(y t ) / s q(y s x s )p(x s x s-1 )f(x 0 ) JRBirge I-Sim Workshop, Purdue U., July 2015 6

Particle Methods Idea: Keep a limited number (N) of particles in each period Propagate forward to obtain new points Re-sample (re-weight) backwards to get consistent weights Re-balance if weights are low References: Gordon, Salmond, Smith (1993), Doucet (2008) JRBirge I-Sim Workshop, Purdue U., July 2015 7

General Processes If the observation and transition functions, q and p are quite general, then P(x t y t ) may not be available analytically (E.g., no sales if stocked out, maximum limit, bulky demand) Alternatives: Markov chain Monte Carlo (MCMC) Particle methods JRBirge I-Sim Workshop, Purdue U., July 2015 8

Sequential Importance Resampling (SIR) Method 1. Propagate. Draw x t+1;j ; j = 1; : : : ; N from f(x t+1;j jx t;j ). 2. Re-weight. Update weights, ^w t;j ; j = 1; : : : ; N, to ^w t+1;j = for j = 1; : : : ; N. w t;j g(y t+1 jx t+1 ) P N j=1 ^w t;jg(y t+1 jx t+1 ) 3. Re-sample. If a re-sampling condition is satis ed, draw N new samples x 0 t+1;j ; j = 1; : : : ; N from x t+1;j; j = 1; : : : ; N with probability distribution given by ^w t+1;j ; j = 1; : : : ; N; let x t+1;j = x 0 t+1;j and ^w t+1;j = 1=N for j = 1; : : : ; N. JRBirge I-Sim Workshop, Purdue U., July 2015 9

Tractable Case: LQG Linear, quadratic Gaussian (LQG) Model Implications: All of the distributions are normal Can derive analytical formulas (Kalman filter): y t = H x t + v t ; x t =F x t-1 + u t ; v~n(0,q); u~n(o,r) Iterations: w t = y t -Hx t ; S=HP H T +R; K=P HTS -1 ; x t =x t +Kw t P =(I-KH)P ; x t =Fx t-1 ; P =FP F T +Q JRBirge RMS Phoenix, October 2012 10

Inventory over 10 periods: - Actual (*) - Observed (+) Kalman estimate: x t ~ N(x t,p t ) Here: x 10 ~ N(13.8,0.62) Kalman Prediction JRBirge I-Sim Workshop, Purdue U., July 2015 11

Comparison to MCMC/Particle Filter Idea: construct a sample distribution that fits the data using only information about one-step transitions Goal: maintain a constant number of samples across time instead of explosively growing tree Can it work? Why? JRBirge I-Sim Workshop, Purdue U., July 2015 12

Propagate w t1, x 1 t w t2, x 2 t... w tn, x N t Method Structure Re-sample: x 1 w 1 t t+1 x 2 w 2 t t+1 y t+1...... x N w N t t+1 JRBirge I-Sim Workshop, Purdue U., July 2015 13

Cum. Results N=50 Compare to Kalman Cumulative JRBirge I-Sim Workshop, Purdue U., July 2015 14

Particle Frequency N=50 Particle (+) Kalman (*) JRBirge I-Sim Workshop, Purdue U., July 2015 15

Other Uses of Particles Particle filtering: finding x t from y t =(y 1,,y t ) Particle smoothing: finding x t from y T =(y 1,,y T ) Particle learning: (Polson et al.) Find parameters µ t at the same time to explain the model. Simulation JRBirge I-Sim Workshop, Purdue U., July 2015 16

Extensions to Optimization Problem: nd x to Pincus (1968): enough to sample from: min x f(x)subject to g(x) 0 ¼ (x) / expf (f(x))g with constraints (e.g., Geman/Geman (1985)) : ¼ ; (x) / exp f (f(x) + g(x))g Extension: Augment with latent variables ( ;!) then we can write ¼ ; (x) / expf (f(x) + g(x))g = e ax E! fe x2 2! ge bx E fe x2 2 g where the parameters (a; b) depend on ( ; ) and the functional forms of (f(x); g(x)). JRBirge I-Sim Workshop, Purdue U., July 2015 17

Basic Optimization via Simulation Pincus results: Model: min F(x) Create a distribution P (x)/ exp(- F(x)) Then: lim 1 E P (x) = x * JRBirge I-Sim Workshop, Purdue U., July 2015 18

Markov Chain Simulation From x t to x t+1 Sample over exp(- F(¹ x t +(1-¹) x t+1 ) ) Choose directions to obtain mixing in the region JRBirge I-Sim Workshop, Purdue U., July 2015 19

Example Rosenbrock function: Á(x = (x 1 ; x 2 )) = (1 x 1 ) 2 + 100(x 2 x 2 1) 2 ; which has a global minimum at x = (1; 1). Speci cation for f: x t = fu if v e (Á(u)=Á(x t 1)) ; x t 1 otherwise; g where u» U([ 4; 4] 2 ), a uniform distribution over [ 4; 4] [ 4; 4], and v» U([0; 1]), a standard uniform draw. Note: corresponds to the Metropolis-Hastings method of Markov Chain Monte Carlo using the distribution proportional to e Á(x) for the acceptance criterion. JRBirge I-Sim Workshop, Purdue U., July 2015 20

Test for Rosenbrock T=5000 iterations N=500 replications Re-sample when: Effective Sample Size<250 =2, 10, 100 JRBirge I-Sim Workshop, Purdue U., July 2015 21

Rosenbrock Graphs = 2 10 100 JRBirge I-Sim Workshop, Purdue U., July 2015 22

Other Examples: Rastigrin JRBirge I-Sim Workshop, Purdue U., July 2015 23

Himmelblau JRBirge I-Sim Workshop, Purdue U., July 2015 24

Shubert JRBirge I-Sim Workshop, Purdue U., July 2015 25

Combine Particles and Simulation Suppose N particles are used Consider a 2-stage problem f(x,») = c(x) + Q(x,y,») Distribution on x, y i,»: P(x,y 1, y N,»)/ i=1 N (c(x)+q(x,y i,» i )) Result: E P (x) x * JRBirge I-Sim Workshop, Purdue U., July 2015 26

Implementation Propagation forward on y s: q(» j x 1 ) / { c(x) + Q(» j, y j, x)} p(» j ) where y j =argmin(c(x) + Q(» j, y j, x)) p( x» 1,y 1,,» N y N ) / j=1 N {c(x) + Q(» j, y j, x) }p(» j ) The particles concentrate the distribution on x* as in the deterministic optimization. JRBirge I-Sim Workshop, Purdue U., July 2015 27

Example From B/Louveaux, Chapter 1 (Farmer): JRBirge I-Sim Workshop, Purdue U., July 2015 28

Dynamic Models General Idea: Create a distribution over decision variables and random variables Use MCMC to sample with sufficient number of particles or annealing parameter marginal mean on optimum Extension for dynamics: Keep constant sample particle numbers Convergence in fixed number of particles per stage? JRBirge I-Sim Workshop, Purdue U., July 2015 29

Example: Linear-Quadratic Gaussian All distributions are available in closed form x t =Ax t-1 +B t u t-1 +v t All normal distributions with risk-sensitive objective: -log E(exp(-(µ/2)(x 2T Qx 2 +u 1T Ru 1 ))) where u 1 is control in the first period Note: Equivalent to robust formulation Now, all can be found in closed form JRBirge I-Sim Workshop, Purdue U., July 2015 30

Results for LQG Mean of u: m u 1 m u 1 = L 1(µ) x 1, where L 1 is the result from the Kalman filter Variance of u: V u (N) where (N) 0 as N 1 V u 1 m u 1 (µ) u 1 * as µ 0 where u 1 * is the two-stage risk-neutral solution. JRBirge I-Sim Workshop, Purdue U., July 2015 31

Extensions to T Stages Can derive S t µ recursively to show that: p(x t x t-1,u t-1 )=K exp(-(1/2) ( s=1 t-1 (x tt Q s x s +u st R s u s )+x tt S t µ x t ) + (x t -Ax t-1 -Bu t-1 ) T V t-1 (x t -Ax t-1 -Bu t-1 )) For particle methods, j=1 N: Each x t j drawn consistently from this distribution JRBirge I-Sim Workshop, Purdue U., July 2015 32

w t1, x t 1 u t 1 w t2, x t 2. u t 2 w tn, x t N u t N General Method Structure Propogate Re-sample: x 1 t+1 w t1, x 1 t x 2 t+1 u 1 t... x t+1 N w t2, x t 2 u t 2 JRBirge I-Sim Workshop, Purdue U., July 2015 33. w tn, x t N u t N

General Result If the propagation step is unbiased, then E(u 1 ) u 1 * The unbiased result is possible in LQG from the recursive structure. Harder to obtain in more general systems (without additional future sampling) JRBirge I-Sim Workshop, Purdue U., July 2015 34

Two-Stage: Compare Direct MC Low Risk Aversion High Risk Aversion JRBirge I-Sim Workshop, Purdue U., July 2015 35

Three Stage Example JRBirge I-Sim Workshop, Purdue U., July 2015 36

Generalizing Objectives Direct: risk-sensitive exponential utility: exp(-µ Q(x,y,»)) - Q quadratic Objective: e -µ y - d E [exp((-µ 2 /2 j)(y j d) 2 )] where ~ Exp(2) Also, use: E[ e -µ y - d ]¼ 1 + µ y-d + O(1) JRBirge I-Sim Workshop, Purdue U., July 2015 37

Further Generalizations Use objective approximations that preserve normal forms Extend LQG-like results to this general framework Find other frameworks where unbiased future samples are available Obtain sensitivity results on bias from imperfect future sampling Use optimization to pick ML instead JRBirge I-Sim Workshop, Purdue U., July 2015 38

Conclusions Techniques from MCMC can be used effectively for optimizing to reduce dimension effects Results work well in situations with sufficient symmetry Keys are to effectively use re-sampling, split (and slice) sampling, and latent variables JRBirge I-Sim Workshop, Purdue U., July 2015 39

Other Questions to Consider How can learning and parameter estimation be combined with optimization? What is the range of problems that can be approached in this manner? What are the best uses of optimization technology as part of this process? JRBirge I-Sim Workshop, Purdue U., July 2015 40

Thank you! Questions? JRBirge I-Sim Workshop, Purdue U., July 2015 41