Quantifying Stochastic Model Errors via Robust Optimization

Similar documents
Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

ITERATIVE METHODS FOR ROBUST ESTIMATION UNDER BIVARIATE DISTRIBUTIONAL UNCERTAINTY

Proceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.

Robust Dual-Response Optimization

Distributionally Robust Stochastic Optimization with Wasserstein Distance

Optimal Transport in Risk Analysis

Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K. V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds.

Distributionally Robust Convex Optimization

Robust Sensitivity Analysis for Stochastic Systems

Optimal Transport Methods in Operations Research and Statistics

Covariance function estimation in Gaussian process regression

Proceedings of the 2017 Winter Simulation Conference W. K. V. Chan, A. D Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds.

Sensitivity to Serial Dependency of Input Processes: A Robust Approach

Proceedings of the 2015 Winter Simulation Conference L. Yilmaz, W. K. V. Chan, I. Moon, T. M. K. Roeder, C. Macal, and M. D. Rossetti, eds.

Random Convex Approximations of Ambiguous Chance Constrained Programs

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Proceedings of the 2018 Winter Simulation Conference M. Rabe, A. A. Juan, N. Mustafee, A. Skoogh, S. Jain, and B. Johansson, eds.

Distributionally Robust Optimization with ROME (part 1)

Robust Optimization for Risk Control in Enterprise-wide Optimization

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Distributionally Robust Discrete Optimization with Entropic Value-at-Risk

Likelihood robust optimization for data-driven problems

Introduction to Rare Event Simulation

Robust Backtesting Tests for Value-at-Risk Models

Online Companion: Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

ORIGINS OF STOCHASTIC PROGRAMMING

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

Distributionally Robust Optimization with Infinitely Constrained Ambiguity Sets

arxiv: v1 [stat.me] 21 Jun 2016

Distirbutional robustness, regularizing variance, and adversaries

Assessing financial model risk

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Optimization Tools in an Uncertain Environment

arxiv: v3 [math.oc] 25 Apr 2018

Distributionally Robust Convex Optimization

Markov Chain Monte Carlo Methods for Stochastic

STA 4273H: Statistical Machine Learning

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Particle Filtering Approaches for Dynamic Stochastic Optimization

Bayesian Point Process Modeling for Extreme Value Analysis, with an Application to Systemic Risk Assessment in Correlated Financial Markets

Robust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach

Modelling Under Risk and Uncertainty

arxiv: v1 [stat.me] 2 Nov 2017

Uncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department

Curve Fitting Re-visited, Bishop1.2.5

Foundations of Nonparametric Bayesian Methods

DSGE Methods. Estimation of DSGE models: GMM and Indirect Inference. Willi Mutschler, M.Sc.

PRIVATE RISK AND VALUATION: A DISTRIBUTIONALLY ROBUST OPTIMIZATION VIEW

Testing Statistical Hypotheses

Handout 8: Dealing with Data Uncertainty

Proceedings of the 2018 Winter Simulation Conference M. Rabe, A. A. Juan, N. Mustafee, A. Skoogh, S. Jain, and B. Johansson, eds.

Dynamic System Identification using HDMR-Bayesian Technique

On Bayesian Computation

ECS171: Machine Learning

Econometrics I, Estimation

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

TWO-STAGE LIKELIHOOD ROBUST LINEAR PROGRAM WITH APPLICATION TO WATER ALLOCATION UNDER UNCERTAINTY

Data-Driven Optimization under Distributional Uncertainty

Long-Run Covariability

MEASURES OF RISK IN STOCHASTIC OPTIMIZATION

Statistics: Learning models from data

Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models

Simulation optimization via bootstrapped Kriging: Survey

ABC methods for phase-type distributions with applications in insurance risk problems

Stochastic Networks and Parameter Uncertainty

Estimation of Dynamic Regression Models

SOME HISTORY OF STOCHASTIC PROGRAMMING

Quantitative Biology II Lecture 4: Variational Methods

Markov Chain Monte Carlo Methods for Stochastic Optimization

On the Power of Tests for Regime Switching

Spring 2017 Econ 574 Roger Koenker. Lecture 14 GEE-GMM

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Distributed Chance-Constrained Task Allocation for Autonomous Multi-Agent Teams

Extreme Value Analysis and Spatial Extremes

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Optimization Problems with Probabilistic Constraints

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Introduction to Machine Learning

DSGE-Models. Limited Information Estimation General Method of Moments and Indirect Inference

Additive Isotonic Regression

Algorithms for Uncertainty Quantification

ECON 3150/4150, Spring term Lecture 6

Proceedings of the 2016 Winter Simulation Conference T. M. K. Roeder, P. I. Frazier, R. Szechtman, E. Zhou, T. Huschka, and S. E. Chick, eds.

Robust Actuarial Risk Analysis

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Robust Stochastic Lot-Sizing by Means of Histograms

LIGHT ROBUSTNESS. Matteo Fischetti, Michele Monaci. DEI, University of Padova. 1st ARRIVAL Annual Workshop and Review Meeting, Utrecht, April 19, 2007

Time inconsistency of optimal policies of distributionally robust inventory models

Scenario-Free Stochastic Programming

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Information Geometry

Ambiguous Joint Chance Constraints under Mean and Dispersion Information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

Surrogate loss functions, divergences and decentralized detection

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Computational statistics

Approximate Bayesian Computation and Particle Filters

Does Better Inference mean Better Learning?

Transcription:

Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations Engineering University of Michigan Acknowledgement: National Science Foundation under grants CMMI- 1436247 and CMMI-1400391

Overview Focus: Discrete-event stochastic systems in operations research Motivation: Model risk / model error/ input uncertainty arises when performance analysis is based on misspecified / uncertain input model assumptions Goal: Performance analysis that is robust against model error Scope: Input model: stochastic Uncertainty: nonparametric

Basic Question Single-server queue: i.i.d. Exp(0.8) interarrival times i.i.d. Exp(1) service times Target performance measure: probability of long waiting time incurred by the 50-th arriving customer, i.e. P W 50 > 2

Basic Question Single-server queue: i.i.d. Exp(0.8) interarrival times i.i.d. Exp(1) service times Target performance measure: probability of long waiting time incurred by the 50-th arriving customer, i.e. P W 50 > 2 With insufficient or unreliable historical data, What if service time distribution is not Exp(1)?

Basic Question Single-server queue: i.i.d. Exp(0.8) interarrival times i.i.d. Exp(1) service times Target performance measure: probability of long waiting time incurred by the 50-th arriving customer, i.e. P W 50 > 2 With insufficient or unreliable historical data, What if service time distribution is not Exp(1)? What is a reliable estimate of the performance measure, without committing to a (non-)parametric model for the service time distribution?

Question 2: Uncertain Dependency Single-server queue: i.i.d. Exp(0.8) interarrival times i.i.d. Exp(1) service times Target performance measure: probability of long waiting time incurred by the 50-th arriving customer, i.e. P W 50 > 2

Question 2: Uncertain Dependency Single-server queue: i.i.d. Exp(0.8) interarrival times i.i.d. Exp(1) service times Target performance measure: probability of long waiting time incurred by the 50-th arriving customer, i.e. P W 50 > 2 What if the interarrival time sequence is not i.i.d.?

Question 2: Uncertain Dependency Single-server queue: i.i.d. Exp(0.8) interarrival times i.i.d. Exp(1) service times Target performance measure: probability of long waiting time incurred by the 50-th arriving customer, i.e. P W 50 > 2 What if the interarrival time sequence is not i.i.d.? An estimate without restricting to any parametric class of time series models for the interarrival times?

Question 3: Input Reconstruction Single-server queue: i.i.d. Exp(0.8) interarrival times i.i.d. unknown service times What information can we extract on the service time distribution if we only have data on the waiting times?

Robust Stochastic Modeling Model uncertainty arises due to modeling limitations, presence of risk, lack of direct data etc. Robust perspective:

Robust Stochastic Modeling Model uncertainty arises due to modeling limitations, presence of risk, lack of direct data etc. Robust perspective: Element 1: Represent partial, nonparametric information on the input model via constraints uncertainty set (e.g. Ben-Tal et al. 09, Bertsimas et al. 11)

Robust Stochastic Modeling Model uncertainty arises due to modeling limitations, presence of risk, lack of direct data etc. Robust perspective: Element 1: Represent partial, nonparametric information on the input model via constraints uncertainty set (e.g. Ben-Tal et al. 09, Bertsimas et al. 11) Element 2: Solve optimization problem that computes the worstcase performance measure subject to the uncertainty

Basic Example Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of service times, P 0 = Exp(1), h X = I(waiting time of 20th customer > 2) compute E 0 h X 1,, X T under the condition i.i.d. X t P 0

Basic Example Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of service times, P 0 = Exp(1), h X = I(waiting time of 20th customer > 2) What if the distribution for X t is potentially misspecified? compute E 0 h X 1,, X T under the condition i.i.d. X t P 0

Basic Example Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of service times, P 0 = Exp(1), h X = I(waiting time of 20th customer > 2) What if the distribution for X t is potentially misspecified? maximize E f h X 1,, X T subject to i.i.d. X t P f D P f P 0 η

Basic Example Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of service times, P 0 = Exp(1), h X = I(waiting time of 20th customer > 2) What if the distribution for X t is potentially misspecified? The true service time distribution P f is in a neighborhood of the baseline model P 0, as measured by a nonparametric distance D of η units maximize E f h X 1,, X T subject to i.i.d. X t P f D P f P 0 η

Basic Example Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of service times, P 0 = Exp(1), h X = I(waiting time of 20th customer > 2) What if the distribution for X t is potentially misspecified? The true service time distribution P f is in a neighborhood of the baseline model P 0, as measured by a nonparametric distance D of η units maximize E f h X 1,, X T subject to i.i.d. X t P f D P f P 0 η Decision variable: P f

Basic Example Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of service times, P 0 = Exp(1), h X = I(waiting time of 20th customer > 2) What if the distribution for X t is potentially misspecified? The true service time distribution P f is in a neighborhood of the baseline model P 0, as measured by a nonparametric distance D of η units maximize E f h X 1,, X T subject to i.i.d. X t P f D P f P 0 η Decision variable: P f D P f P 0 can be: Kullback-Leibler (KL) divergence = log dp f dp 0 dp f φ-divergence = φ dp f dp dp 0 0 replaced/enriched by moment constraints, e.g. μ E f X t μ

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? compute E 0 h X 1,, X T under the condition i.i.d. X t P 0

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f P f is within η-neighborhood of P 0 = P 0 P 0 P f has the same marginal as P 0 P f a class of stationary serially dependent processes

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f P f is within η-neighborhood of P 0 = P 0 P 0 P f has the same marginal as P 0 P f a class of stationary serially dependent processes

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f P f is within η-neighborhood of P 0 = P 0 P 0 P f has the same marginal as P 0 P f a class of stationary serially dependent processes

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f P f is within η-neighborhood of P 0 = P 0 P 0 P f has the same marginal as P 0 P f a class of stationary serially dependent processes

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f P f is within η-neighborhood of P 0 = P 0 P 0 P f has the same marginal as P 0 P f a class of stationary serially dependent processes Decision variable: P f

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f P f is within η-neighborhood of P 0 = P 0 P 0 P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes Decision variable: P f

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes Decision variable: P f D P f P 0 can be: mutual information for the bivariate distribution of any two consecutive states under P f

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes Decision variable: P f D P f P 0 can be: mutual information for the bivariate distribution of any two consecutive states under P f Pearson s φ 2 -coefficient for the bivariate distribution of any two consecutive states under P f

Example 2: Uncertain Dependency Baseline input model: X t P 0 X = (X 1, X 2,, X T ) Performance measure: E[h X ] Output the worst-case performance measure among all 1-lag processes within η units of nonparametric distance from i.i.d. process, keeping marginals fixed E.g. Single-server queue: X = sequence of interarrival times, P 0 = Exp(0.8), h X = I(waiting time of 20th customer > 2) What if the i.i.d. assumption for (X 1,, X T ) is potentially misspecified? maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes Decision variable: P f D P f P 0 can be: mutual information for the bivariate distribution of any two consecutive states under P f Pearson s φ 2 -coefficient for the bivariate distribution of any two consecutive states under P f

Past Literature Area Economics Hansen & Sargent 01, 08 Examples of Works Finance Glasserman & Xu 13, Lim et al. 12 Control theory/mdp/markov perturbation analysis Petersen et. al. 00, Nilim & El Ghaoui 05, Iyengar 05, Jain et. al. 10, Heidergott et. al. 10 (Distributionally) robust optimization Delage & Ye 10, Goh & Sim 10, Ben-Tal et. al. 13, Wiesemann et. al. 14, Xin et. al. 15 Robust Monte Carlo Glasserman & Xu 14, Glasserman & Yang 15 Decision analysis/moment method Smith 93, 95, Birge & Wets 87, Bertsimas & Popescu 05, Popescu 05

Past Literature Area Economics Hansen & Sargent 01, 08 Examples of Works Finance Glasserman & Xu 13, Lim et al. 12 Control theory/mdp/markov perturbation analysis Petersen et. al. 00, Nilim & El Ghaoui 05, Iyengar 05, Jain et. al. 10, Heidergott et. al. 10 (Distributionally) robust optimization Delage & Ye 10, Goh & Sim 10, Ben-Tal et. al. 13, Wiesemann et. al. 14, Xin et. al. 15 Robust Monte Carlo Glasserman & Xu 14, Glasserman & Yang 15 Decision analysis/moment method Typical focus in the literature: Tractable convex optimization formulation Smith 93, 95, Birge & Wets 87, Bertsimas & Popescu 05, Popescu 05

Past Literature Area Economics Hansen & Sargent 01, 08 Examples of Works Finance Glasserman & Xu 13, Lim et al. 12 Control theory/mdp/markov perturbation analysis Petersen et. al. 00, Nilim & El Ghaoui 05, Iyengar 05, Jain et. al. 10, Heidergott et. al. 10 (Distributionally) robust optimization Delage & Ye 10, Goh & Sim 10, Ben-Tal et. al. 13, Wiesemann et. al. 14, Xin et. al. 15 Robust Monte Carlo Glasserman & Xu 14, Glasserman & Yang 15 Decision analysis/moment method Typical focus in the literature: Tractable convex optimization formulation Challenges in the stochastic modeling contexts: Non-standard constraints/objectives High or infinite dimensionality Stochastic/simulation-based Smith 93, 95, Birge & Wets 87, Bertsimas & Popescu 05, Popescu 05

A Sensitivity Approach Worst-case formulation for the Basic Example: where D( ) = KL divergence maximize E f [h X ] subject to D P f P 0 X t i.i.d. P f η

A Sensitivity Approach Relaxed worst-case formulation for the Basic Example: maximize E f [h X ] subject to D P f P 0 Tη where D( ) = KL divergence

A Sensitivity Approach Relaxed worst-case formulation for the Basic Example: maximize E f [h X ] subject to D P f P 0 Too conservative Tη where D( ) = KL divergence

A Sensitivity Approach Worst-case formulation: maximize E f [h X ] subject to D P f P 0 X i.i.d. t P f η

A Sensitivity Approach Worst-case formulation: maximize E f [h X ] subject to D P f P 0 X i.i.d. t P f η Let η 0. Under mild regularity conditions, max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η + where ξ 1 (P 0, h), are computationally tractable objects that depend only on the baseline model P 0

A Sensitivity Approach Worst-case formulation: maximize E f [h X ] subject to D P f P 0 X i.i.d. t P f η Let η 0. Under mild regularity conditions, max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η + where ξ 1 (P 0, h), are computationally tractable objects that depend only on the baseline model P 0

A Sensitivity Approach Worst-case formulation: maximize E f [h X ] subject to D P f P 0 X i.i.d. t P f η Let η 0. Under mild regularity conditions, max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η + where ξ 1 (P 0, h), are computationally tractable objects that depend only on the baseline model P 0

Numerical Illustration Performance measure: E h X = P W 20 > 2 Baseline model P 0 for service time: Exp(1) Worst-case formulation: maximize E f [h X ] subject to D P f P 0 η and X i.i.d. t P f minimize E f [h X ] subject to D P f P 0 X t i.i.d. P f η max E f h X E 0 h X + ξ 1 P 0, h η Asymptotic approximation min E f h X E 0 h X ξ 1 P 0, h η

Numerical Illustration Worst-case lower and upper bounds Baseline performance measure

Numerical Illustration Exp (increasing rate) Gamma (increasing shape parameter) Mixture of Exp Exp (decreasing rate) Gamma (constant mean) Mixture of Exp (constant mean)

Numerical Illustration Relaxed bounds from conventional approach

Form of Expansion Coefficients Influence function (Hampel 74): the first order effect on a statistical functional due to perturbation of the input probability distribution For a functional Z( ) acting on P, the influence function G(x) is defined such that: Z P f = Z P 0 + E f G X +

Form of Expansion Coefficients Influence function (Hampel 74): the first order effect on a statistical functional due to perturbation of the input probability distribution For a functional Z( ) acting on P, the influence function G(x) is defined such that: Z P f = Z P 0 + E f G X + i.i.d. For example, if Z P f = E f [h X 1,, X T ] where X t P f, G x = T t=1 E 0 [h(x 1,, X T ) X t = x] TE 0 [h(x 1,, X T )]

Form of Expansion Coefficients maximize E f [h X ] subject to D P f P 0 X i.i.d. t P f η T Theorem (L. 13): For KL divergence D( ), assuming h X i=1 Λ i (X i ) where each Λ i X i satisfies E 0 e θλ i X i < for θ in a neighborhood of zero, and Var 0 G X > 0, then ξ 1 P 0, h = 2Var 0 G X

Form of Expansion Coefficients maximize E f [h X ] subject to D P f P 0 X i.i.d. t P f η T Theorem (L. 13): For KL divergence D( ), assuming h X i=1 Λ i (X i ) where each Λ i X i satisfies E 0 e θλ i X i < for θ in a neighborhood of zero, and Var 0 G X > 0, then ξ 1 P 0, h = ξ 2 P 0, h = 2Var 0 G X 1 Var 0 G X 1 3 κ 3 G X + ν where κ 3 is the third-order cumulant and ν = E 0 G 2 X, Y E 0 G 2 X, Y G X E 0 G X G Y E 0 G Y and G 2 X, Y is a second order influence function

Form of Expansion Coefficients maximize E f [h X ] subject to D P f P 0 X t i.i.d. P f η T Theorem (L. 13): For KL divergence D( ), assuming h X i=1 Λ i (X i ) where each Λ i X i satisfies E 0 e θλ i X i < for θ in a neighborhood of zero, and Var 0 G X > 0, then ξ 1 P 0, h = ξ 2 P 0, h = 2Var 0 G X 1 Var 0 G X 1 3 κ 3 G X + ν where κ 3 is the third-order cumulant and ν = E 0 G 2 X, Y E 0 G 2 X, Y G X E 0 G X G Y E 0 G Y and G 2 X, Y is a second order influence function Simulation strategy: nested simulation, or nonparametric bootstrap

Uncertain Dependency maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes

Uncertain Dependency maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η +

Uncertain Dependency maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η + Theorem (L. 13): Define a bivariate influence function G x, y = T t=2 E 0 [h(x 1,, X T ) X t 1 = x, X t = y]

Uncertain Dependency maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η + Theorem (L. 13): Define a bivariate influence function G x, y = T t=2 E 0 [h(x 1,, X T ) X t 1 = x, X t = y] and its interaction effect R x, y = G x, y E 0 G X, Y X = x E 0 G X, Y Y = y + E 0 [ G X, Y ]

Uncertain Dependency maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η + Theorem (L. 13): Define a bivariate influence function G x, y = T t=2 E 0 [h(x 1,, X T ) X t 1 = x, X t = y] and its interaction effect R x, y = G x, y E 0 G X, Y X = x E 0 G X, Y Y = y + E 0 [ G X, Y ] For mutual information D( ), assuming h X is bounded, and Var 0 R X, Y > 0, then i.i.d. where X, Y P 0 ξ 1 P 0, h = 2Var 0 R X, Y

Uncertain Dependency maximize E f h X 1,, X T subject to X 1,, X T P f D P f P 0 η P f has the same marginal as P 0 P f a class of stationary 1-lag dependent processes max E f h X = E 0 h X + ξ 1 P 0, h η + ξ 2 P 0, h η + Theorem (L. 13): Define a bivariate influence function G x, y = T t=2 E 0 [h(x 1,, X T ) X t 1 = x, X t = y] and its interaction effect R x, y = G x, y E 0 G X, Y X = x E 0 G X, Y Y = y + E 0 [ G X, Y ] For mutual information D( ), assuming h X is bounded, and Var 0 R X, Y > 0, then i.i.d. where X, Y P 0 ξ 1 P 0, h = 2Var 0 R X, Y Simulation strategy: nested simulation + two-way analysis of variance (ANOVA)

Numerical Illustration

General Formulations and Simulation Optimization Beyond asymptotic approximation? Other types of constraints/objectives?

General Formulations and Simulation Optimization Beyond asymptotic approximation? Other types of constraints/objectives? Example 3: Input Reconstruction Can we infer about the service time distribution from waiting time data?

General Formulations and Simulation Optimization Beyond asymptotic approximation? Other types of constraints/objectives? Example 3: Input Reconstruction Can we infer about the service time distribution from waiting time data? Find P f such that E f φ j h X 1,, X T = X i.i.d. t P f μ j for j = 1,, m φ j, j = 1,, m: moment functions, e.g. I( q j ), j h(x 1,, X T ): waiting time μ j : empirical moments

General Formulations and Simulation Find P f such that Optimization Beyond asymptotic approximation? Other types of constraints/objectives? Example 3: Input Reconstruction Can we infer about the service time distribution from waiting time data? E f φ j h X 1,, X T = X i.i.d. t P f μ j for j = 1,, m Moment matching between the simulation output and the real output data φ j, j = 1,, m: moment functions, e.g. I( q j ), j h(x 1,, X T ): waiting time μ j : empirical moments

General Formulations and Simulation maximize R(P f ) subject to E f φ j h X 1,, X T = Optimization Beyond asymptotic approximation? Other types of constraints/objectives? Example 3: Input Reconstruction Can we infer about the service time distribution from waiting time data? e.g., R(P f ) = entropy of P f X i.i.d. t P f μ j for j = 1,, m φ j, j = 1,, m: moment functions, e.g. I( q j ), j h(x 1,, X T ): waiting time μ j : empirical moments

Sequential Least-Square Minimization An optimization problem with nonconvex stochastic constraints maximize R(P f ) subject to E f φ j h X = μ j for j = 1,, m i.i.d. X t P f

Sequential Least-Square Minimization An optimization problem with nonconvex stochastic constraints maximize R(P f ) subject to E f φ j h X = μ j for j = 1,, m i.i.d. X t P f A sequence of optimizations with stochastic non-convex objectives but deterministic convex constraints Find the maximum η such that the optimal value of minimize Z P f subject to = m 2 j=1 E f φ j h X μ j R P f η is 0. The P that is attained by η gives the optimal solution

Sequential Least-Square Minimization An optimization problem with nonconvex stochastic constraints maximize R(P f ) subject to E f φ j h X = μ j for j = 1,, m i.i.d. X t P f A sequence of optimizations with stochastic non-convex objectives but deterministic convex constraints Find the maximum η such that the optimal value of minimize Z P f subject to = m 2 j=1 E f φ j h X μ j R P f η is 0. The P that is attained by η gives the optimal solution

Sequential Least-Square Minimization An optimization problem with nonconvex stochastic constraints maximize R(P f ) subject to E f φ j h X = μ j for j = 1,, m i.i.d. X t P f A sequence of optimizations with stochastic non-convex objectives but deterministic convex constraints Find the maximum η such that the optimal value of minimize Z P f subject to = m 2 j=1 E f φ j h X μ j R P f η is 0. The P that is attained by η gives the optimal solution

Sequential Least-Square Minimization An optimization problem with nonconvex stochastic constraints maximize R(P f ) subject to E f φ j h X = μ j for j = 1,, m i.i.d. X t P f A sequence of optimizations with stochastic non-convex objectives but deterministic convex constraints Find the maximum η such that the optimal value of minimize Z P f subject to = m 2 j=1 E f φ j h X μ j R P f η is 0. The P that is attained by η gives the optimal solution Similar in spirit to quadratic penalty method (Bertsekas 95)

Sequential Least-Square Minimization An optimization problem with nonconvex stochastic constraints maximize R(P f ) subject to E f φ j h X = μ j for j = 1,, m i.i.d. X t P f A sequence of optimizations with stochastic non-convex objectives but deterministic convex constraints Find the maximum η such that the optimal value of minimize Z P f subject to = m 2 j=1 E f φ j h X μ j R P f η is 0. The P that is attained by η gives the optimal solution Similar in spirit to quadratic penalty method (Bertsekas 95)

Simulation Optimization over Probability Distributions Operate on a discretized space P f as (p 1,, p k ) on fixed support points (z 1,, z k ) Constrained stochastic approximation (SA) for solving local optimum

Simulation Optimization over Probability Distributions Operate on a discretized space P f as (p 1,, p k ) on fixed support points (z 1,, z k ) Constrained stochastic approximation (SA) for solving local optimum Gradient estimation: Perturb P f to a mixture with point mass 1 ε P f + εe i to obtain a finitedimensional influence function A nonparametric version of the likelihood ratio / score function method (Glynn 90, Reiman & Weiss 89, L Ecuyer 90)

Iteration on Mirror Descent Stochastic Approximation Each iteration solves a linearization of the objective value, penalized by the KL divergence (the entropic descent algorithm; e.g., Nemirovski et. al. 09, Beck & Teboulle 03, Nemirovski & Yudin 83) Given a current solution P n, solve P n+1 = argmin Pf Z P n P f P n + 1 γ n D(P f P n ) subject to R P f η βn 1 γn Z Pn 1+βn P 0 P n 1+βn e 1+βn where Z P n is the estimated gradient γ n is the step size β n solves a one-dimensional root-finding problem Convergence rate O 1/n in terms of the expected KL divergence when γ n = θ/n for large enough θ and standard conditions hold (Goeva, L. & Zhang 14, L. & Qian 16)

Numerical Illustration System: M/G/1 queue with known arrival rate but unknown service time distribution Output observations: queue length over time Specifications: Match 9 quantiles of the output variable time-average queue length in a busy period Discretization grid of 50 support points # data = 100,000

Numerical Illustration Plot of the optimal Z P f against η True vs reconstructed distribution # matching quantiles increased from 9 to 14 # data reduced from 10 5 to 200

Numerical Illustration Monotone distribution Bimodal distribution

Discussion Motivation: Model misspecification/uncertainty Main Approach: Sensitivity/asymptotic analysis of robust optimization programs over input stochastic models Simulation optimization Future Work: Other forms of stochastic/system uncertainty Relation to Bayesian methodology