FDSA and SPSA in Simulation-Based Optimization Stochastic approximation provides ideal framework for carrying out simulation-based optimization

Similar documents
OPTIMAL DESIGN INPUTS FOR EXPERIMENTAL CHAPTER 17. Organization of chapter in ISSO. Background. Linear models

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Recent Advances in SPSA at the Extremes: Adaptive Methods for Smooth Problems and Discrete Methods for Non-Smooth Problems

75th MORSS CD Cover Slide UNCLASSIFIED DISCLOSURE FORM CD Presentation

X t = a t + r t, (7.1)

Computer Intensive Methods in Mathematical Statistics

Cramér Rao Bounds and Monte Carlo Calculation of the Fisher Information Matrix in Difficult Problems 1

IEOR 165 Lecture 13 Maximum Likelihood Estimation

Mollifying Networks. ICLR,2017 Presenter: Arshdeep Sekhon & Be

Answers to Selected Exercises in Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

Efficient Monte Carlo computation of Fisher information matrix using prior information

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Simulation optimization via bootstrapped Kriging: Survey

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Simulation of stationary processes. Timo Tiihonen

Variational Autoencoder

Data assimilation in high dimensions

Gaussian Processes for Machine Learning

Improved Methods for Monte Carlo Estimation of the Fisher Information Matrix

On Input Design for System Identification

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

IE 581 Introduction to Stochastic Simulation

Gaussian process for nonstationary time series prediction

Simulation. Where real stuff starts

Course on Inverse Problems

Course on Inverse Problems Albert Tarantola

Stochastic Simulation Variance reduction methods Bo Friis Nielsen

Discussion of Maximization by Parts in Likelihood Inference

4. Distributions of Functions of Random Variables

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

STOCHASTIC PERTURBATION METHODS FOR ROBUST OPTIMIZATION OF SIMULATION EXPERIMENTS

7 Variance Reduction Techniques

EM Algorithm II. September 11, 2018

Lecture 16 Deep Neural Generative Models

Stochastic Integration (Simple Version)

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Reliability Monitoring Using Log Gaussian Process Regression

Optimization and Simulation

Econometric Forecasting

Support Vector Machines

AN EMPIRICAL SENSITIVITY ANALYSIS OF THE KIEFER-WOLFOWITZ ALGORITHM AND ITS VARIANTS

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Semester , Example Exam 1

Linear Models for Regression CS534

Expressions for the covariance matrix of covariance data

A new iterated filtering algorithm

Structural Reliability

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

The Expected Opportunity Cost and Selecting the Optimal Subset

F & B Approaches to a simple model

CONSTRAINED OPTIMIZATION OVER DISCRETE SETS VIA SPSA WITH APPLICATION TO NON-SEPARABLE RESOURCE ALLOCATION

Gradient-based Adaptive Stochastic Search

Uniform Random Number Generators

The Extended Kalman Filter is a Natural Gradient Descent in Trajectory Space

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Simulated Annealing for Constrained Global Optimization

EE/PEP 345. Modeling and Simulation. Spring Class 11

Towards an Optimal Noise Versus Resolution Trade-off in Wind Scatterometry

Introduction to gradient descent

State Estimation of Linear and Nonlinear Dynamic Systems

CER Prediction Uncertainty

Inferring biological dynamics Iterated filtering (IF)

Statistics & Data Sciences: First Year Prelim Exam May 2018

Simulation. Where real stuff starts

DEVELOPMENTS IN STOCHASTIC OPTIMIZATION ALGORITHMS WITH GRADIENT APPROXIMATIONS BASED ON FUNCTION MEASUREMENTS. James C. Span

Perturbation. A One-measurement Form of Simultaneous Stochastic Approximation* Technical Communique. h,+t = 8, - a*k?,(o,), (1) Pergamon

Variance Reduction s Greatest Hits

Ch 4. Linear Models for Classification

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

POMP inference via iterated filtering

MMSE-Based Filtering for Linear and Nonlinear Systems in the Presence of Non-Gaussian System and Measurement Noise

Data assimilation in high dimensions

Estimators based on non-convex programs: Statistical and computational guarantees

COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract

ECE521 lecture 4: 19 January Optimization, MLE, regularization

Censoring mechanisms

Bayesian Methods for Machine Learning

Motion Models (cont) 1 2/15/2012

Monte Carlo Simulation. CWR 6536 Stochastic Subsurface Hydrology

Economics 583: Econometric Theory I A Primer on Asymptotics

Noisy Optimization: A Theoretical Strategy Comparison of ES, EGS, SPSA & IF on the Noisy Sphere

Probing the covariance matrix

OPTIMIZATION WITH DISCRETE SIMULTANEOUS PERTURBATION STOCHASTIC APPROXIMATION USING NOISY LOSS FUNCTION MEASUREMENTS. by Qi Wang

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

Stochastic Collocation Methods for Polynomial Chaos: Analysis and Applications

Lecture 8: Policy Gradient I 2

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Accurate Maximum Likelihood Estimation for Parametric Population Analysis. Bob Leary UCSD/SDSC and LAPK, USC School of Medicine

AGEC 661 Note Eleven Ximing Wu. Exponential regression model: m (x, θ) = exp (xθ) for y 0

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Long-Run Covariability

Final Exam. Economics 835: Econometrics. Fall 2010

A general framework for statistical inference on discrete event systems

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Average Reward Parameters

Reliability Theory of Dynamically Loaded Structures (cont.)

Stochastic Proximal Gradient Algorithm

Transcription:

FDSA and SPSA in Simulation-Based Optimization Stochastic approximation provides ideal framework for carrying out simulation-based optimization Rigorous means for handling noisy loss information inherent in Monte Carlo simulation: y(θ) = Q(θ,V) = L(θ) + noise Most other optimization methods (GAs, nonlinear programming, etc) apply only on ad hoc basis FDSA, or some variant of it, remains the method of choice for the majority of practitioners (Fu and Hu, 1997) No need to know inner workings of simulation, as in gradient-based methods such as IPA, LR/SF, etc. FDSA and SPSA-type methods much easier to use than gradient-based method as they only require simulation inputs/outputs 14-1

Common Random Numbers Common random numbers (CRNs) provide a way for improving simulation-based optimization by reusing the Monte-Carlo-generated random variables CRNs based on the famous formula for two random variables X, Y: var(x Y) = var(x) + var(y) 2cov(X, Y) Maximizing the covariance minimizes the variance of the difference The aim of CRNs is to reduce variability of the gradient estimate Improves convergence in algorithm 14-2

CRNs (cont d) For SPSA, the gradient variability is largely driven by the numerator y( + c ) y( c ) Two effects contribute to variability: c (i) difference due to perturbations (desirable) (ii) difference due to noise effects in measurements (undesirable) CRNs useful for reducing undesirable variability in (ii) Using CRNs maximizes covariance between two y( ) values in numerator Minimizes variance of difference k k k k k k ± k k 14-3

CRNs (cont d) In simulation (vs. most real systems) some form of CRNs is often feasible The essence of CRN is to use same random numbers in both y( k + ck k) and y( k ck k) Achieved by using same random number seed for both simulations and synchronizing the random numbers Optimal rate of convergence of iterate to θ (à la k β/2 ) is k 1/2 (Kleinman et al., 1999); this rate is same as stochastic gradient-based method Rate is improvement on optimal non-crn rate of k 1/3 Unfortunately, pure CRN may not be feasible in largescale simulations due to violating synchronization requirement e.g., if θ represents service rates in a queuing system, difference between k + c k k and k c k k may allow additional (stochastic) arrivals to be serviced in one case 14-4

Numerical Illustration (Example 14.8 in ISSO) Simulation using exponentially distributed random variables and loss function with p = dim(θ) = 10 Goal is to compare CRN and non-crn θ is minimizing value for L(θ) Table below shows improved accuracy of solution under CRNs; plot on next slide compares rate of convergence Total Iterations n n 0 θ θ CRNs Non-CRNs 1000 0.02195 0.04103 10,000 0.00658 0.01845 100,000 0.00207 0.00819 14-5

Rates of Convergence for CRN and Non-CRN (Example 14.9 in ISSO) 7.0 6.0 Mean Values of β/2 ˆ ( ) n θ θ n 5.0 4.0 3.0 2.0 Non-CRN, β =1 CRN, β =1 1.0 0 Non-CRN, β =2/3 100 1000 10,000 100,000 n (log scale) 14-6

Partial CRNs By using the same random number seed for y( k + ck k) and y( c ) it is possible to achieve a partial CRN k k k Some of the events in the simulations will be synchronized due to common seed Synchronization is likely to break down during course of simulation, especially for small k when c k is relatively large Asymptotic analysis produces convergence rate identical to pure CRN since synchronization occurs as c k 0 Also require new seed for simulations at each iteration (common for both y( ) values) to ensure convergence to min θ L(θ) = min θ E[Q(θ,V)]) In partial CRN,practical finite sample rate of convergence for SPSA tends to be lower than in pure CRN setting 14-7

Numerical Example: Partial CRNs (Kleinman et al., 1999; see p. 398 of ISSO) A simulation using exponentially distributed random variables was conducted in Kleinman, et al. (1999) for p = 10 Simulation designed so that it is possible to implement pure CRN (not available in most practical simulations) Purpose is to evaluate relative performance of non-crn, partial CRN, and pure CRN 14-8

Numerical Example (cont d) Numerical Results for 100 replications of SPSA and FDSA (no. of y( ) measurements in SPSA and FDSA are equal with total iterations of 10000 and 1000 respectively): Non-CRN Partial CRN Pure CRN SPSA 10000 θ θ 0 0.0190 0.0071 0.0065 FDSA 1000 θ θ 0 0.0410 0.0110 0.0064 Partial CRN offers significant improvement over non-crn and SPSA outperforms FDSA (except in idealized pure CRN case) 14-9