Sample Average Approximation (SAA) for Stochastic Programs

Similar documents
Complexity of two and multi-stage stochastic programming problems

A Sequential Sampling Procedure for Stochastic Programming

A. Shapiro Introduction

Monte Carlo Methods for Stochastic Programming

Scenario Generation and Sampling Methods

Solution Methods for Stochastic Programs

Optimization Tools in an Uncertain Environment

Stochastic Decomposition

Importance sampling in scenario generation

Stochastic Integer Programming An Algorithmic Perspective

Upper bound for optimal value of risk averse multistage problems

Stochastic Optimization One-stage problem

Reformulation of chance constrained problems using penalty functions

Stability of optimization problems with stochastic dominance constraints

Scenario Generation and Sampling Methods

Monte Carlo Sampling-Based Methods for Stochastic Optimization

Asymptotics of minimax stochastic programs

On a class of minimax stochastic programs

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Jitka Dupačová and scenario reduction

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Stochastic Integer Programming

arxiv: v3 [math.oc] 25 Apr 2018

Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem

Soumyadip Ghosh. Raghu Pasupathy. IBM T.J. Watson Research Center Yorktown Heights, NY 10598, USA. Virginia Tech Blacksburg, VA 24061, USA

A Tighter Variant of Jensen s Lower Bound for Stochastic Programs and Separable Approximations to Recourse Functions

The L-Shaped Method. Operations Research. Anthony Papavasiliou 1 / 44

Distributionally robust simple integer recourse

Quantifying Stochastic Model Errors via Robust Optimization

1. Introduction. In this paper we consider stochastic optimization problems of the form

Chance Constrained Programming

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

Risk neutral and risk averse approaches to multistage stochastic programming.

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

IEOR E4703: Monte-Carlo Simulation

A CENTRAL LIMIT THEOREM FOR NESTED OR SLICED LATIN HYPERCUBE DESIGNS

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse

Probabilistic Bisection Search for Stochastic Root Finding

Financial Optimization ISE 347/447. Lecture 21. Dr. Ted Ralphs

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

Solving Chance-Constrained Stochastic Programs via Sampling and Integer Programming

MIDAS: A Mixed Integer Dynamic Approximation Scheme

FINANCIAL OPTIMIZATION

STA205 Probability: Week 8 R. Wolpert

CONVERGENCE ANALYSIS OF SAMPLING-BASED DECOMPOSITION METHODS FOR RISK-AVERSE MULTISTAGE STOCHASTIC CONVEX PROGRAMS

Bias evaluation and reduction for sample-path optimization

On almost sure rates of convergence for sample average approximations

Sample Average Approximation Method. for Chance Constrained Programming: Theory and Applications

In the original knapsack problem, the value of the contents of the knapsack is maximized subject to a single capacity constraint, for example weight.

Convex relaxations of chance constrained optimization problems

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

Stochastic Programming Approach to Optimization under Uncertainty

Robust Optimization for Risk Control in Enterprise-wide Optimization

Stochastic Subgradient Methods

Computational and Statistical Learning Theory

September Math Course: First Order Derivative

Branch-and-cut Approaches for Chance-constrained Formulations of Reliable Network Design Problems

Monte Carlo Integration I [RC] Chapter 3

Probability Models of Information Exchange on Networks Lecture 1

Stat 451 Lecture Notes Numerical Integration

Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs

Stability of Stochastic Programming Problems

A Few Notes on Fisher Information (WIP)

Lecture 7 Introduction to Statistical Decision Theory

Automatic Differentiation and Neural Networks

Data Mining Stat 588

Stochastic Programming: From statistical data to optimal decisions

Computational Complexity of Stochastic Programming: Monte Carlo Sampling Approach

ORIGINS OF STOCHASTIC PROGRAMMING

Long-Run Covariability

THE MATHEMATICS OF CONTINUOUS-VARIABLE SIMULATION OPTIMIZATION

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

Sampling-Based Progressive Hedging Algorithms in Two-Stage Stochastic Programming

Stochastic Dual Dynamic Integer Programming

Probability and Measure

Statistical inference

The L-Shaped Method. Operations Research. Anthony Papavasiliou 1 / 38

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

An Optimal Path Model for the Risk-Averse Traveler

Stochastic Models (Lecture #4)

Stochastic Optimization with Risk Measures

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

A Stochastic-Oriented NLP Relaxation for Integer Programming

CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares

Optimization Problems with Probabilistic Constraints

P (A G) dp G P (A G)

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

How Much Evidence Should One Collect?

Some multivariate risk indicators; minimization by using stochastic algorithms

Robust and Stochastic Optimization Notes. Kevin Kircher, Cornell MAE

Minimum Description Length (MDL)

4. Convex optimization problems

Scenario-Free Stochastic Programming

c 2004 Society for Industrial and Applied Mathematics

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Transcription:

Sample Average Approximation (SAA) for Stochastic Programs with an eye towards computational SAA Dave Morton Industrial Engineering & Management Sciences Northwestern University

Outline SAA Results for Monte Carlo estimators: no optimization What results should we want for SAA? Results for SAA 1. Bias 2. Consistency 3. Central limit theorem (CLT) SAA Algorithm A basic algorithm A sequential algorithm Multi-Stage Problems What We Didn t Discuss

Stochastic Programming Models z = min x X E f(x, ξ) Such problems arise in statistics, simulation and mathematical programming Our focus: mathematical programming with X deterministic We ll assume: (A1) X and compact (A2) Ef(, ξ) is lower semicontinuous (A3) E sup f 2 (x, ξ) < x X ξ is a random vector and P ξ P ξ (x) We can evaluate f(x, ξ(ω)) for a fixed x and realization ξ(ω) Choice of f determines problem class

Sample Average Approximation True or population problem: Denote optimal solution x z = min Ef(x, ξ) (SP ) x X SAA problem: z n = min x X n f(x, ξ j ) 1 n j=1 }{{} f n (x) (SP n ) Here, ξ 1, ξ 2,..., ξ n iid as ξ or sampled another way. Denote optimal solution x n View z n as an estimator of z and x n as an estimator of x Want names? external sampling method, sample-path optimization, stochastic counterpart, retrospective optimization, non-recursive method, and sample average approximation.

Let s start in a simpler setting, momentarily putting aside optimization...

Monte Carlo Sampling Suppressing the (fixed) decision x Let z = Ef(ξ), σ 2 = varf(ξ) < and ξ 1, ξ 2,..., ξ n be iid as ξ Let z n = 1 n n f(ξ i ) be the sample mean estimator of z i=1 FACT 1. Ez n = z z n is an unbiased estimator of z FACT 2. z n z, wp1 (strong LLN) z n is a strongly consistent estimator of z FACT 3. n(zn z) N(0, σ 2 ) (CLT) Rate of convergence is 1/ n and scaled difference is normally distributed FACTS 4,5,... law of iterated logarithm, concentration inequalities,...

Do such results carry over to SAA?

SAA Population problem: Denote optimal solution x z = min Ef(x, ξ) (SP ) x X SAA problem: z n = min x X n f(x, ξ j ) 1 n j=1 }{{} f n (x) (SP n ) Denote optimal solution x n View zn as an estimator of z and x n as an estimator of x What can we say about zn and x n as n? What should we want to say about zn and x n as n?

1. x n x, wp1 and n(x n x ) N(0, Σ) SAA: Possible Goals

1. x n x, wp1 and n(x n x ) N(0, Σ) 2. z n z, wp1 and n(z n z ) N(0, σ 2 ) SAA: Possible Goals

1. x n x, wp1 and n(x n x ) N(0, Σ) 2. zn z, wp1 and n(zn z ) N(0, σ 2 ) 3. Ef(x n, ξ) z, wp1 SAA: Possible Goals

1. x n x, wp1 and n(x n x ) N(0, Σ) 2. z n z, wp1 and n(z n z ) N(0, σ 2 ) 3. Ef(x n, ξ) z, wp1 SAA: Possible Goals 4. lim n P (Ef(x n, ξ) z ε n ) 1 α where ε n 0

1. x n x, wp1 and n(x n x ) N(0, Σ) 2. z n z, wp1 and n(z n z ) N(0, σ 2 ) 3. Ef(x n, ξ) z, wp1 SAA: Possible Goals 4. lim n P (Ef(x n, ξ) z ε n ) 1 α where ε n 0 Modeling Issues: If (SP n ) is for maximum-likelihood estimation then goal 1 could be appropriate If (SP ) is to price a financial option then goal 2 could be appropriate When (SP ) is a decision-making model, 1 may be more than we need and 2 is of secondary interest. Goals 3 and 4 arguably suffice

1. x n x, wp1 and n(x n x ) N(0, Σ) 2. z n z, wp1 and n(z n z ) N(0, σ 2 ) 3. Ef(x n, ξ) z, wp1 SAA: Possible Goals 4. lim n P (Ef(x n, ξ) z ε n ) 1 α where ε n 0 Modeling Issues: If (SP n ) is for maximum-likelihood estimation then goal 1 could be appropriate If (SP ) is to price a financial option then goal 2 could be appropriate When (SP ) is a decision-making model, 1 may be more than we need and 2 is of secondary interest. Goals 3 and 4 arguably suffice Technical Issues: In general, we shouldn t expect {x n} n=1 to converge when (SP ) has multiple optimal solutions. In this case, we want: limit points of {x n} n=1 solve (SP ) If we achieve limit points result, X is compact & Ef(, ξ) is continuous, then we obtain goal 3 The limiting distributions may not be normal

1. x n x, wp1 and n(x n x ) N(0, Σ) 2. z n z, wp1 and n(z n z ) N(0, σ 2 ) 3. Ef(x n, ξ) z, wp1 SAA: Possible Goals 1 4. lim n P (Ef(x n, ξ) z ε n ) 1 α where ε n 0 Modeling Issues: If (SP n ) is for maximum-likelihood estimation then goal 1 could be appropriate If (SP ) is to price a financial option then goal 2 could be appropriate When (SP ) is a decision-making model, 1 may be more than we need and 2 is of secondary interest. Goals 3 and 4 arguably suffice Technical Issues: In general, we shouldn t expect {x n} n=1 to converge when (SP ) has multiple optimal solutions. In this case, we want: limit points of {x n} n=1 solve (SP ) If we achieve limit points result, X is compact & Ef(, ξ) is continuous, then we obtain goal 3 The limiting distributions may not be normal 1 Again, these goals aren t true in general; i.e., they may be impossible goals.

1. Bias 2. Consistency 3. CLT

SAA: Example z = min 1 x 1 [E f(x, ξ) = Eξx], where ξ N(0, 1) Every feasible solution, x [ 1, 1] is optimal and z = 0

SAA: Example z = min 1 x 1 [E f(x, ξ) = Eξx], where ξ N(0, 1) Every feasible solution, x [ 1, 1] is optimal and z = 0 z n = x n = ±1, z n = N(0, 1/n) min 1 x 1 ( 1 n ) n ξ j x j=1

SAA: Example z = min 1 x 1 [E f(x, ξ) = Eξx], where ξ N(0, 1) Every feasible solution, x [ 1, 1] is optimal and z = 0 z n = x n = ±1, z n = N(0, 1/n) Observations min 1 x 1 1. Ez n z n (negative bias) ( 1 n ) n ξ j x j=1 2. Ez n Ez n+1 n (monotonically shrinking bias) 3. z n z, wp1 (strongly consistent) 4. n(z n z ) = N(0, 1) (non-normal errors) 5. b(z n) Ez n z = a/ n (O(n 1/2 ) bias)

SAA: Example z = min 1 x 1 [E f(x, ξ) = Eξx], where ξ N(0, 1) Every feasible solution, x [ 1, 1] is optimal and z = 0 z n = x n = ±1, z n = N(0, 1/n) Observations min 1 x 1 1. Ez n z n (negative bias) ( 1 n ) n ξ j x j=1 2. Ez n Ez n+1 n (monotonically shrinking bias) 3. z n z, wp1 (strongly consistent) 4. n(z n z ) = N(0, 1) (non-normal errors) 5. b(z n) Ez n z = a/ n (O(n 1/2 ) bias) So, optimization changes the nature of sample-mean estimators.

SAA: Example z = min 1 x 1 [E f(x, ξ) = Eξx], where ξ N(0, 1) Every feasible solution, x [ 1, 1] is optimal and z = 0 z n = x n = ±1, z n = N(0, 1/n) Observations min 1 x 1 1. Ez n z n (negative bias) ( 1 n ) n ξ j x j=1 2. Ez n Ez n+1 n (monotonically shrinking bias) 3. z n z, wp1 (strongly consistent) 4. n(z n z ) = N(0, 1) (non-normal errors) 5. b(z n) Ez n z = a/ n (O(n 1/2 ) bias) So, optimization changes the nature of sample-mean estimators. Note: What if x [ 1, 1] is replaced by x R? SAA fails, spectacularly.

1. Bias 2. Consistency 3. CLT

1. Bias All you need to know: min [f(x) + g(x)] min f(x) + min g(x) x X x X x X

SAA: Bias Theorem. Assume (A1), (A2), and E f n (x) = Ef(x, ξ), x X. Then, Ezn z. If, in addition, ξ 1, ξ 2,..., ξ n are iid then Ezn Ezn+1.

SAA: Bias Theorem. Assume (A1), (A2), and E f n (x) = Ef(x, ξ), x X. Then, Ezn z. If, in addition, ξ 1, ξ 2,..., ξ n are iid then Ezn Ezn+1. Notes: First result does not require iid realizations, just an unbiased estimator Hypothesis can be relaxed to: E f n (x) Ef(x, ξ), x X Hypothesis can be relaxed to: ξ 1, ξ 2,..., ξ n are exchangeable random variables

Proof of Bias Result E 1 n n f(x, ξ j ) = E f(x, ξ) j=1

Proof of Bias Result min E x X 1 n n f(x, ξ j ) = min E f(x, ξ) j=1 x X

Proof of Bias Result min E x X 1 n n f(x, ξ j ) = min E f(x, ξ) = z j=1 x X

Proof of Bias Result and so we obtain min E x X E min x X 1 n 1 n n f(x, ξ j ) = min E f(x, ξ) = z j=1 x X n f(x, ξ j ) min E f(x, ξ) = z j=1 x X

and so we obtain min E x X Ez n = E 1 n min x X Proof of Bias Result n f(x, ξ j ) = min E f(x, ξ) = z j=1 1 n x X n f(x, ξ j ) min E f(x, ξ) = z j=1 x X

and so we obtain min E x X Ez n = E 1 n min x X Proof of Bias Result n f(x, ξ j ) = min E f(x, ξ) = z j=1 1 n x X n f(x, ξ j ) min E f(x, ξ) = z j=1 x X

and so we obtain min E x X Ez n = E 1 n min x X Proof of Bias Result n f(x, ξ j ) = min E f(x, ξ) = z j=1 1 n x X n f(x, ξ j ) min E f(x, ξ) = z j=1 x X Aside: Simple example when n = 1 E min f(x, ξ) min Ef(x, ξ) x X x X

and so we obtain min E x X Ez n = E 1 n min x X Proof of Bias Result n f(x, ξ j ) = min E f(x, ξ) = z j=1 1 n x X n f(x, ξ j ) min E f(x, ξ) = z j=1 x X Aside: Simple example when n = 1 E min f(x, ξ) min Ef(x, ξ) x X x X Interpretation: We ll do better if we wait and see ξ s realization before choosing x Next, we show bias decreases monotonically: Intuition... Ez n Ez n+1

Proof of Bias Monotonicity Result Ez n+1 = E min x X = E min x X [ 1 n + 1 1 n + 1 n+1 i=1 n+1 i=1 f(x, ξ i ) 1 n n+1 ] j=1,j i f(x, ξ j )

Proof of Bias Monotonicity Result Ez n+1 = E min x X = E min x X E [ 1 n + 1 1 n + 1 1 n + 1 n+1 i=1 n+1 i=1 n+1 i=1 min x X f(x, ξ i ) 1 n 1 n n+1 ] j=1,j i n+1 j=1, j i f(x, ξ j ) f(x, ξ j )

Proof of Bias Monotonicity Result Ez n+1 = E min x X = E min x X E [ 1 n + 1 1 n + 1 1 n + 1 n+1 i=1 n+1 i=1 n+1 i=1 min x X f(x, ξ i ) 1 n 1 n n+1 ] j=1,j i n+1 j=1, j i f(x, ξ j ) f(x, ξ j ) = 1 n + 1 n+1 i=1 E min x X 1 n n+1 j=1, j i f(x, ξ j )

Proof of Bias Monotonicity Result Ez n+1 = E min x X = E min x X E [ 1 n + 1 1 n + 1 1 n + 1 n+1 i=1 n+1 i=1 n+1 i=1 min x X f(x, ξ i ) 1 n 1 n n+1 ] j=1,j i n+1 j=1, j i f(x, ξ j ) f(x, ξ j ) = 1 n + 1 n+1 i=1 E min x X 1 n n+1 j=1, j i f(x, ξ j ) = Ez n

Bias 2. Consistency: z n and x n 3. CLT

2. Consistency of z n All you need to know: Ef(x, ξ) Ef(x n, ξ) and fn (x n) f n (x )

SAA: Consistency of z n Theorem. Assume (A1), (A2), and the USLLN: lim sup fn (x) Ef(x, ξ) = 0, wp1. n x X Then, z n z, wp1.

SAA: Consistency of z n Theorem. Assume (A1), (A2), and the USLLN: lim sup fn (x) Ef(x, ξ) = 0, wp1. n x X Then, z n z, wp1. Notes: Does not assume ξ 1, ξ 2,... ξ n are iid Instead, assumes uniform strong law of large numbers (USLLN)

SAA: Consistency of z n Theorem. Assume (A1), (A2), and the USLLN: lim sup fn (x) Ef(x, ξ) = 0, wp1. n x X Then, z n z, wp1. Notes: Does not assume ξ 1, ξ 2,... ξ n are iid Instead, assumes uniform strong law of large numbers (USLLN) Important to realize: lim sup fn (x) Ef(x, ξ) = 0, wp1. n x X lim fn (x) Ef(x, ξ) = 0, wp1, x X n But, the converse is false. Think of our example: fn (x) = ξ n x and X = R

Proof of consistency of z n z n z = fn (x n) Ef(x, ξ)

Proof of consistency of z n z n z = fn (x n) Ef(x, ξ) = max { fn (x n) Ef(x, ξ), Ef(x, ξ) f n (x n) }

Proof of consistency of z n z n z = fn (x n) Ef(x, ξ) = max { fn (x n) Ef(x, ξ), Ef(x, ξ) f n (x n) } max { fn (x ) Ef(x, ξ), Ef(x n, ξ) f n (x n) }

Proof of consistency of z n zn z = fn (x n) Ef(x, ξ) = max { fn (x n) Ef(x, ξ), Ef(x, ξ) f n (x n) } max { fn (x ) Ef(x, ξ), Ef(x n, ξ) f n (x n) } max { fn (x ) Ef(x, ξ), fn (x n) Ef(x n, ξ) }

Proof of consistency of z n z n z = fn (x n) Ef(x, ξ) = max { fn (x n) Ef(x, ξ), Ef(x, ξ) f n (x n) } max { fn (x ) Ef(x, ξ), Ef(x n, ξ) f n (x n) } max { fn (x ) Ef(x, ξ), fn (x n) Ef(x n, ξ) } sup fn (x) Ef(x, ξ) x X

Proof of consistency of z n z n z = fn (x n) Ef(x, ξ) = max { fn (x n) Ef(x, ξ), Ef(x, ξ) f n (x n) } max { fn (x ) Ef(x, ξ), Ef(x n, ξ) f n (x n) } max { fn (x ) Ef(x, ξ), fn (x n) Ef(x n, ξ) } sup fn (x) Ef(x, ξ) x X Taking n completes the proof

2. Consistency of x n All you need to know: If g is continuous and lim k x k = ˆx then lim k g(x k ) = g(ˆx)

SAA: Consistency of x n Theorem. Assume (A1), (A2), Ef(, ξ) is continuous, and the USLLN: lim sup fn (x) Ef(x, ξ) = 0, wp1. n x X Then, every limit point of {x n} solves (SP ), wp1.

SAA: Consistency of x n Theorem. Assume (A1), (A2), Ef(, ξ) is continuous, and the USLLN: lim sup fn (x) Ef(x, ξ) = 0, wp1. n x X Then, every limit point of {x n} solves (SP ), wp1. Notes: Assumes USLLN rather than assuming ξ 1, ξ 2,... ξ n are iid And, assumes continuity of Ef(, ξ) The result doesn t say: lim n x n = x, wp1. Why not?

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. (Note such as limit point exists and ˆx X because X is compact.)

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. By the USLLN lim n n N f n (x n) }{{} z n = z, wp1

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. By the USLLN and lim n n N f n (x n) }{{} z n = z, wp1 fn (x n) Ef(ˆx, ξ) = fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ)

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. By the USLLN and lim n n N f n (x n) }{{} z n = z, wp1 fn (x n) Ef(ˆx, ξ) = fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ) fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ)

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. By the USLLN and lim n n N f n (x n) }{{} z n = z, wp1 fn (x n) Ef(ˆx, ξ) = fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ) Taking n for n N... fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ)

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. By the USLLN and lim n n N f n (x n) }{{} z n = z, wp1 fn (x n) Ef(ˆx, ξ) = fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ) Taking n for n N... First term goes to zero by USLLN fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ)

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. By the USLLN and lim n n N f n (x n) }{{} z n = z, wp1 fn (x n) Ef(ˆx, ξ) = fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ) fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ) Taking n for n N... First term goes to zero by USLLN And second goes to zero by continuity of Ef(, ξ)

Proof of consistency of x n Let ˆx be a limit point of {x n } n=1 and let n N index a convergent subsequence. By the USLLN and lim n n N f n (x n) }{{} z n = z, wp1 fn (x n) Ef(ˆx, ξ) = fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ) fn (x n) Ef(x n, ξ) + Ef(x n, ξ) Ef(ˆx, ξ) Taking n for n N... First term goes to zero by USLLN And second goes to zero by continuity of Ef(, ξ) Thus, Ef(ˆx, ξ) = z

Bias Consistency: z n and x n 3. CLT

Bias Consistency: z n and x n When does USSLN hold? Suppose we have a stochastic MIP, in which continuity doesn t make sense?

Sufficient Conditions for the USLLN Fact. 2 Assume X is compact and assume: f(, ξ) is continuous, wp1, on X g(ξ) satisfying sup f(x, ξ) g(ξ), wp1 and Eg(ξ) < x X ξ 1, ξ 2,..., ξ n are iid as ξ. Then, the USLLN holds. 2 Facts are theorems that we won t prove.

Sufficient Conditions for the USLLN Fact. Let X be compact and convex and assume: f(, ξ) is convex and continuous, wp1, on X the LLN holds pointwise: lim n Then, the USLLN holds. fn (x) Ef(x, ξ) = 0, wp1, x X

SAA: Consistency of z n and x n under Finite X Fact. Assume X is finite, and assume lim n fn (x) Ef(x, ξ) = 0, wp1, x X. Then, USLLN holds, z n z, and every limit point of {x n} solves (SP ), wp1. Notes: Ef(, ξ) need not be continuous (would be unnatural since domain X is finite) Assumes pointwise LLN rather than USLLN iid Here plus X finite implies lim fn (x) Ef(x, ξ) = 0, wp1, x X n lim sup fn (x) Ef(x, ξ) = 0, wp1 n x X

SAA: Consistency of z n and x n under LSC f(, ξ) Fact. Assume ξ 1, ξ 2,..., ξ n are iid as ξ f(, ξ) is lower semicontinuous on X ξ g(ξ) satisfying inf f(x, ξ) g(ξ), wp1, where E g(ξ) <. x X Then, z n z, wp1, and every limit point of {x n} solves (SP ), wp1.

SAA: Consistency of z n and x n under LSC f(, ξ) Fact. Assume ξ 1, ξ 2,..., ξ n are iid as ξ f(, ξ) is lower semicontinuous on X ξ g(ξ) satisfying inf f(x, ξ) g(ξ), wp1, where E g(ξ) <. x X Then, z n z, wp1, and every limit point of {x n} solves (SP ), wp1. Proof relies on epi-convergence of f n (x) to Ef(x, ξ) Epi-convergence provides theory for approximation in optimization beyond SAA f n (x) convex, continuous on compact, convex X: epi-convergence USLLN But, epi-convergence provides a more general framework in non-convex setting Epi-convergence can be viewed as precisely the relaxation of uniform convergence that yields desired convergence results

MATHEMATICS OF OPERATIONS RESEARCH Vol. 11, No. 1, February 1986 Printed in U.S.A. APPROXIMATION TO OPTIMIZATION PROBLEMS: AN ELEMENTARY REVIEW* PETER KALL Universitat Zurich During the last two decades the concept of epi-convergence was introduced and then was used in various investigations in optimization and related areas. The aim of this review is to show in an elementary way how closely the arguments in the epi-convergence approach are related to those of the classical theory of convergence of functions. 1. Introduction. In mathematical programming problems of the type inf (q(x) x E } (I) have to be solved, where r C IRR and,: F -> R are given. In designing solution methods for e it is quite common to replace the original problem by a sequence of "approximating" problems inf (,(x) I xe r}) (IF) which are supposed to be easier to solve then e. To give some examples, we just mention cutting plane methods, penalty methods and solution methods for stochastic programming problems. To simplify the presentation we restate the above problems in the usual way by defining f(x)= f (x) if xe, + oo else, and f fv(x) = ^(x) if x E r, + oo else. Then obviously e and 6l are equivalent to

Bias Consistency: z n and x n 3. CLT

3. One-sided CLT for z n All you need to know: CLT for iidrvs and f n (x n) f n (x) x X

SAA: Towards a CLT for z n We have conditions under which z n z shrinks to zero Is n correct scaling factor so that n(z n z ) converges to something nontrivial?

SAA: Towards a CLT for z n We have conditions under which z n z shrinks to zero Is n correct scaling factor so that n(z n z ) converges to something nontrivial? Notation: f n (x) = 1 n n f(x, ξ j ) j=1 σ 2 (x) = var[f(x, ξ)] s 2 n(x) = 1 n 1 n [ f(x, ξ j ) f n (x) ] 2 j=1 X is set of optimal solutions to (SP ) z α satisfies P(N(0, 1) z α ) = 1 α

SAA: Towards a CLT for z n z n = f n (x n) f n (x), wp1, x X

SAA: Towards a CLT for z n and so z n = f n (x n) f n (x), wp1, x X z n z σ(x)/ n f n (x) z σ(x)/ n, wp1

SAA: Towards a CLT for z n and so Let x X X. z n = f n (x n) f n (x), wp1, x X z n z σ(x)/ n f n (x) z σ(x)/ n, wp1

SAA: Towards a CLT for z n and so Let x X X. Then, z n = f n (x n) f n (x), wp1, x X z n z σ(x)/ n f n (x) z σ(x)/ n, wp1 ( z P n z ) σ(x )/ n z α ( fn (x ) z ) P σ(x )/ n z α

SAA: Towards a CLT for z n and so Let x X X. Then, By CLT for iidrvs Thus... z n = f n (x n) f n (x), wp1, x X z n z σ(x)/ n f n (x) z σ(x)/ n, wp1 ( z P n z ) σ(x )/ n z α ( fn lim P (x ) z ) n σ(x )/ n z α ( fn (x ) z ) P σ(x )/ n z α = 1 α

SAA: One-sided CLT for z n Theorem. Assume a pointwise CLT: ( ) fn lim P (x) Ef(x, ξ) n σ(x)/ u = P(N(0, 1) u), x X. n Let x X. Then, Notes: lim inf n ( z P n z ) σ(x )/ n z α 1 α. (A3) and ξ 1, ξ 2,..., ξ n iid as ξ suffice for pointwise CLT. Other possibilities, too For sufficiently large n, we infer that P { z n z α σ(x )/ n z } 1 α Of course, we don t know σ(x ), and so this is practically useless. But...

SAA: Towards (a better) CLT for z n z n = f n (x n) f n (x), wp1, x X

SAA: Towards (a better) CLT for z n and so z n = f n (x n) f n (x), wp1, x X z n z s n (x n)/ n f n (x) z s n (x n)/ n, wp1

SAA: Towards (a better) CLT for z n and so z n = f n (x n) f n (x), wp1, x X z n z s n (x n)/ n f n (x) z s n (x n)/ n, wp1 Let x = x min arg min x X σ2 (x).

SAA: Towards (a better) CLT for z n z n = f n (x n) f n (x), wp1, x X and so z n z s n (x n)/ n f n (x) z s n (x n)/ n, wp1 Let x = x min arg min x X σ2 (x). Then, ( z P n z ) s n (x n)/ n z α ( fn (x min P ) z s n (x n)/ n z α )

SAA: Towards (a better) CLT for z n and so z n = f n (x n) f n (x), wp1, x X z n z s n (x n)/ n f n (x) z s n (x n)/ n, wp1 Let x = x min arg min x X σ2 (x). Then, ( z P n z ) s n (x n)/ n z α ( fn (x min P ) ) z s n (x n)/ n z α ( fn (x min = P ) [ z σ(x min )/ n z sn (x ]) n) α σ(x min )

SAA: Towards (a better) CLT for z n and so z n = f n (x n) f n (x), wp1, x X z n z s n (x n)/ n f n (x) z s n (x n)/ n, wp1 Let x = x min arg min x X σ2 (x). Then, ( z P n z ) s n (x n)/ n z α If z α > 0 and lim inf n ( fn (x min P ) ) z s n (x n)/ n z α ( fn (x min = P ) [ z σ(x min )/ n z sn (x ]) n) α σ(x min ) s n(x n) inf σ(x) then... x X ( z lim inf P n z n s n (x n)/ n z α ) 1 α

SAA: One-sided CLT for z n Theorem. Assume (A1)-(A3) ξ 1, ξ 2,..., ξ n are iid as ξ inf x X σ2 (x) lim inf n s2 n(x n) lim sup s 2 n(x n) sup σ 2 (x), wp1 n x X Then, given 0 < α < 1 lim inf n ( z P n z ) s n (x n)/ n z α 1 α.

SAA: One-sided CLT for z n Theorem. Assume (A1)-(A3) ξ 1, ξ 2,..., ξ n are iid as ξ inf x X σ2 (x) lim inf n s2 n(x n) lim sup s 2 n(x n) sup σ 2 (x), wp1. n x X Then, given 0 < α < 1 Notes: lim inf n Could have assumed pointwise CLT For sufficiently large n, we infer that ( z P n z ) s n (x n)/ n z α 1 α. P { z n z α s n (x n)/ n z } 1 α How does this relate to the bias result: Ez n z?

Bias Consistency: z n and x n CLT for z n Two-sided CLT for z?

Two-sided CLT for z n Fact. Assume (A1)-(A3) ξ 1, ξ 2,..., ξ n are iid as ξ f(x 1, ξ) f(x 2, ξ) g(ξ) x 1 x 2, x 1, x 2 X, where E g 2 (ξ) < If (SP ) has a unique optimal solution then: n (z n z ) N(0, σ 2 (x )).

Two-sided CLT for z n Fact. Assume (A1)-(A3) ξ 1, ξ 2,..., ξ n are iid as ξ f(x 1, ξ) f(x 2, ξ) g(ξ) x 1 x 2, x 1, x 2 X, where E g 2 (ξ) < If (SP ) has a unique optimal solution then: n (z n z ) N(0, σ 2 (x )). Notes: But, there are frequently multiple optimal solutions...

Two-sided CLT for z n Fact. Assume (A1)-(A3) ξ 1, ξ 2,..., ξ n are iid as ξ f(x 1, ξ) f(x 2, ξ) g(ξ) x 1 x 2, x 1, x 2 X, where E g 2 (ξ) < Then, n (z n z ) inf x X N(0, σ2 (x)).

Two-sided CLT for z n Fact. Assume (A1)-(A3) ξ 1, ξ 2,..., ξ n are iid as ξ f(x 1, ξ) f(x 2, ξ) g(ξ) x 1 x 2, x 1, x 2 X, where E g 2 (ξ) <. Then, Notes: n (z n z ) inf x X N(0, σ2 (x)). What is inf x X N(0, σ2 (x))? n ( fn (x) Ef(x, ξ) ) N(0, σ 2 (x)) N(0, σ 2 (x)) is family of correlated normal random variables

Two-sided CLT for z n Fact. Assume (A1)-(A3) ξ 1, ξ 2,..., ξ n are iid as ξ f(x 1, ξ) f(x 2, ξ) g(ξ) x 1 x 2, x 1, x 2 X, where E g 2 (ξ) <. Then, Notes: n (z n z ) inf x X N(0, σ2 (x)). What is inf x X N(0, σ2 (x))? n ( fn (x) Ef(x, ξ) ) N(0, σ 2 (x)) N(0, σ 2 (x)) is family of correlated normal random variables Recall example: inf N(0, x X σ2 (x)) = N(0, 1) How does inf x X N(0, σ2 (x)) relate to the bias result: Ez n z?

Bias Consistency: z n and x n CLT for z n 3. CLT for x n

Fact. Assume (A1)-(A3) SAA: CLT for x n f(, ξ) is convex and twice continuously differentiable X = {x : Ax b} (SP ) has a unique optimal solution x (x 1 x 2 ) H (x 1 x 2 ) > 0, x 1, x 2 X, x 1 x 2, where H = E 2 xf(x, ξ). Assume x f(x, ξ) satisfies: x f(x 1, ξ) x f(x 2, ξ) g(ξ) x 1 x 2 x 1, x 2 X, where Eg 2 (ξ) < for some real-valued function g Then, n(x n x ) u where u solves the random QP: 1 min u 2 u H u + c u s.t. A i u 0 : i {i : A i x = b i } u E x f(x, ξ) = 0 and c is multivariate normal with mean 0 and covariance matrix Σ, where ( Σ ij = cov f(x,ξ) x i, f(x,ξ) x j ).

Bias: z n Consistency: z n and x n CLT: z n and x n

SAA: Revisiting Possible Goals 1. x n x, wp1 and n(x n x ) u, where u solves a random QP 2. z n z, wp1 and n(z n z ) inf x X N(0, σ 2 (x)) 3. Ef(x n, ξ) z, wp1 4. lim n P (Ef(x n, ξ) z ε n ) 1 α where ε n 0 We now have conditions under which variants of 1-3 hold Let s next start by aiming for a more modest version of 4: Given ˆx X and α find a (random) CI width ε with: P(E f(ˆx, ξ) z ε) 1 α

An SAA Algorithm

Assessing Solution Quality: Towards an SAA Algorithm z = min x X E f(x, ξ) Goal: Given ˆx X and α find a (random) CI width ε with: Using the bias result, E 1 n n j=1 P(E f(ˆx, ξ) z ε) 1 α f(ˆx, ξ j 1 ) min x X n n f(x, ξ j ) Ef(ˆx, ξ) z j=1 } {{ } G n (ˆx)

Assessing Solution Quality: Towards an SAA Algorithm z = min x X E f(x, ξ) Goal: Given ˆx X and α find a (random) CI width ε with: P(E f(ˆx, ξ) z ε) 1 α Using the bias result, E 1 n n j=1 f(ˆx, ξ j 1 ) min x X n n f(x, ξ j ) j=1 } {{ } G n (ˆx) Ef(ˆx, ξ) z Remarks Anticipate var G n (ˆx) var [ 1 n n j=1 f(ˆx, ξj )] + var zn G n (ˆx) 0, but not asymptotically normal (what to do?) Not much of an algorithm if the solution, ˆx, comes as input!

An SAA Algorithm Input: CI level 1 α, sample sizes n x and n, replication size n g

An SAA Algorithm Input: CI level 1 α, sample sizes n x and n, replication size n g Output: Solution x n x and approximate (1 α)-level CI on E f(x n x, ξ) z

An SAA Algorithm Input: CI level 1 α, sample sizes n x and n, replication size n g Output: Solution x n x and approximate (1 α)-level CI on E f(x n x, ξ) z 0. Sample iid observations ξ 1, ξ 2,..., ξ n x, and solve (SP nx ) to obtain x n x

An SAA Algorithm Input: CI level 1 α, sample sizes n x and n, replication size n g Output: Solution x n x and approximate (1 α)-level CI on E f(x n x, ξ) z 0. Sample iid observations ξ 1, ξ 2,..., ξ n x, and solve (SP nx ) to obtain x n x 1. For k = 1, 2,..., n g 1.1. Sample iid observations ξ k1, ξ k2,..., ξ kn from the distribution of ξ 1.2. Solve (SP n ) using ξ k1, ξ k2,..., ξ kn to obtain x k n 1.3. Calculate G k n(x n x ) = 1 n n j=1 f(x n x, ξ kj ) 1 n n j=1 f(xk n, ξ kj )

An SAA Algorithm Input: CI level 1 α, sample sizes n x and n, replication size n g Output: Solution x n x and approximate (1 α)-level CI on E f(x n x, ξ) z 0. Sample iid observations ξ 1, ξ 2,..., ξ n x, and solve (SP nx ) to obtain x n x 1. For k = 1, 2,..., n g 1.1. Sample iid observations ξ k1, ξ k2,..., ξ kn from the distribution of ξ 1.2. Solve (SP n ) using ξ k1, ξ k2,..., ξ kn to obtain x k n 1.3. Calculate G k n(x n x ) = 1 n n j=1 f(x n x, ξ kj ) 1 n n j=1 f(xk n, ξ kj ) 2. Calculate gap estimate and sample variance: Ḡ n (n g ) = 1 n g n g k=1 G k n(x n x ) and s 2 G(n g ) = 1 n g 1 n g k=1 ( G k n (x n x ) Ḡn(n g ) ) 2

An SAA Algorithm Input: CI level 1 α, sample sizes n x and n, replication size n g Output: Solution x n x and approximate (1 α)-level CI on E f(x n x, ξ) z 0. Sample iid observations ξ 1, ξ 2,..., ξ n x, and solve (SP nx ) to obtain x n x 1. For k = 1, 2,..., n g 1.1. Sample iid observations ξ k1, ξ k2,..., ξ kn from the distribution of ξ 1.2. Solve (SP n ) using ξ k1, ξ k2,..., ξ kn to obtain x k n 1.3. Calculate G k n(x n x ) = 1 n n j=1 f(x n x, ξ kj ) 1 n n j=1 f(xk n, ξ kj ) 2. Calculate gap estimate and sample variance: Ḡ n (n g ) = 1 n g n g k=1 G k n(x n x ) and s 2 G(n g ) = 1 n g 1 n g 3. Let ε g = t ng 1,αs G (n g )/ n g, and output x n x and one-sided CI: [ 0, Ḡ n (n g ) + ε g ] k=1 ( G k n (x n x ) Ḡn(n g ) ) 2

An SAA Algorithm Input: CI level 1 α, sample sizes n x and n, replication size n g Fix α = 0.05 and n g = 15 (say) Choose n x and n based on what is computationally reasonable Choose n x > n, perhaps n x n Then, For fixed n and n x can justify algorithm with n g For fixed n g can justify the algorithm with n Can even use n g = 1, albeit with different variance estimator

An SAA Algorithm Output: Solution x n x and approximate (1 α)-level CI on E f(x n x, ξ) z x n x is the decision we will make The confidence interval is on x n x s optimality gap, E f(x n x, ξ) z Here, E f(x n x, ξ) = E ξ [f(x n x, ξ) x n x ] So, this is a posterior assessment, given the decision we will make

An SAA Algorithm 0. Sample iid observations ξ 1, ξ 2,..., ξ n x, and solve (SP nx ) to obtain x n x ξ 1, ξ 2,..., ξ n x need not be iid Agnostic to algorithm used to solve (SP nx )

An SAA Algorithm 1. For k = 1, 2,..., n g 1.1. Sample iid observations ξ k1, ξ k2,..., ξ kn from the distribution of ξ 1.2. Solve (SP n ) using ξ k1, ξ k2,..., ξ kn to obtain x k n 1.3. Calculate G k n(x n x ) = 1 n n j=1 f(x n x, ξ kj ) 1 n n j=1 f(xk n, ξ kj ) ξ k1, ξ k2,..., ξ kn need not be iid, but should satisfy E f n (x) = Ef(x, ξ) (could use Latin hypercube sampling or randomized quasi Monte Carlo sampling) (ξ k1, ξ k2,..., ξ kn ), k = 1, 2,..., n g, should be iid Agnostic to algorithm used to solve (SP n ) Can solve relaxation of (SP n ) if lower bound is used in second term of 1.3 (recall E f n (x) Ef(x, ξ) relaxation in bias result) Can also use independent samples and different sample sizes, n u, and n l, for the upper- and lower-bound estimators in step 1.3

An SAA Algorithm 2. Calculate gap estimate and sample variance: Ḡ n (n g ) = 1 n g n g k=1 G k n(x n x ) and s 2 G(n g ) = 1 n g 1 n g 3. Let ε g = t ng 1,αs G (n g )/ n g, and output x n x and one-sided CI: [ 0, Ḡ n (n g ) + ε g ] k=1 ( G k n (x n x ) Ḡn(n g ) ) 2 Standard calculation of sample mean and sample variance Standard calculation of one-sided confidence interval for a nonnegative parameter Again, here the parameter is E f(x n x, ξ) z SAA Algorithm tends to be conservative, i.e., exhibit over-coverage Why?

SAA Algorithm Applied to a few Two-Stage SLPs Problem DB WRPM 20TERM SSN n x in (SP nx ) for x n x 50 50 50 2000 Optimality Gap n 25 25 25 1000 n g 30 30 30 30 95% CI Width 0.2% 0.08% 0.5% 8% Var. Red. 4300 480 1300 17 Variance reduction with respect to algorithm, which estimates upper and lower bounds defining G with independent, rather than common, random number streams

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 6 A 2 4 7 E D C 5 4 gap (% of z*) 3 2 1 samp err gap 0 10 100 1000 10000 n=n x Note: n = n x

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 0.9 A 2 4 7 E D C 0.8 0.7 0.6 gap (% of z*) 0.5 0.4 0.3 samp err gap 0.2 0.1 0 100 1000 10000 n=n x

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 0.18 A 2 4 7 E D C 0.16 0.14 0.12 gap (% of z*) 0.1 0.08 0.06 samp err gap 0.04 0.02 0 1000 10000 n=n x

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 8.35 A 2 4 7 E D C 8.3 Upper and Lower Bounds 8.25 8.2 8.15 8.1 8.05 8 7.95 1 10 100 1000 10000 n

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 8.32 A 2 4 7 E D C 8.31 Upper and Lower Bounds 8.3 8.29 8.28 8.27 8.26 8.25 1 10 100 1000 10000 n

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 8.302 A 2 4 7 E D C 8.3 Upper and Lower Bounds 8.298 8.296 8.294 8.292 8.29 8.288 1 10 100 1000 10000 n

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 0.001 A 2 4 7 E D C 0.01 gap 0.1 1 1 10 100 1000 10000 n If EG n (x n) = a n p then log [EG n(x n)] = log[a] p log[n] From these four points p 0.74. R 2 = 0.9998.

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) B 1 3 5 6 A 2 4 7 E D C Enforce symmetry constraints: x 1 = x 6, x 2 = x 7, x 3 = x 5

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) 6 6 5 5 4 4 gap (% of z*) 3 2 1 samp err gap gap (% of z*) 3 2 1 samp err gap 0 10 100 1000 10000 0 10 100 1000 10000 n=n x n=n x no extra constraints with symmetry constraints

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 gap (% of z*) 0.5 0.4 0.3 0.2 samp err gap gap (% of z*) 0.5 0.4 0.3 0.2 samp err gap 0.1 0.1 0 100 1000 10000 0 100 1000 10000 n=n x n=n x no extra constraints with symmetry constraints

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) gap (% of z*) 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 1000 10000 samp err gap gap (% of z*) 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 1000 10000 samp err gap n=n x n=n x no extra constraints with symmetry constraints

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) 8.35 8.35 8.3 8.3 Upper and Lower Bounds 8.25 8.2 8.15 8.1 8.05 Upper and Lower Bounds 8.25 8.2 8.15 8.1 8.05 8 8 7.95 1 10 100 1000 10000 7.95 1 10 100 1000 10000 n n no extra constraints with symmetry constraints

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) 8.32 8.32 8.31 8.31 Upper and Lower Bounds 8.3 8.29 8.28 8.27 8.26 Upper and Lower Bounds 8.3 8.29 8.28 8.27 8.26 8.25 1 10 100 1000 10000 8.25 1 10 100 1000 10000 n n no extra constraints with symmetry constraints

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) 8.304 8.304 8.302 8.302 Upper and Lower Bounds 8.3 8.298 8.296 8.294 8.292 Upper and Lower Bounds 8.3 8.298 8.296 8.294 8.292 8.29 8.29 8.288 1 10 100 1000 10000 8.288 1 10 100 1000 10000 n n no extra constraints with symmetry constraints

SAA Algorithm Network Capacity Expansion Model (z 8.3) (Higle & Sen) 0.001 0.001 0.01 0.01 gap gap 0.1 0.1 1 1 10 100 1000 10000 n 1 1 10 100 1000 10000 n EG n (x n) = a n p p 0.74. R 2 = 0.999 p 0.61. R 2 = 0.986 rate worse, constant a better

If you are happy with your results from SAA Algorithm then stop now!

Why Are You Unhappy? 1. Computational effort to solve n g = 15 instances of (SP n ) is prohibitive; 2. Bias of zn is large; 3. Sampling error, ε g, is large; or, 4. Solution x n x is far from optimal to (SP )

Why Are You Unhappy? 1. Computational effort to solve n g = 15 instances of (SP n ) is prohibitive; 2. Bias of zn is large; 3. Sampling error, ε g, is large; or, 4. Solution x n x is far from optimal to (SP ) Remedy 1: Single replication procedure: n g = 1 Remedy 2: LHS, randomized QMC, adaptive jackknife estimator Remedy 3: CRNs reduce variance. Other ideas help: LHS and randomized QMC Remedy 4: A sequential SAA algorithm

A Sequential SAA Algorithm

A Sequential SAA Algorithm Step 1: Generate a candidate solution Step 2: Check stopping criterion. If satisfied, stop Else, go to step 1 Instead of candidate solution ˆx = x n x X, we have {ˆx k } with each ˆx k X Stopping criterion rooted in above procedure (with n g = 1): and s 2 k s 2 n k (x n k ) = G k G nk (ˆx k ) = 1 n k 1 n k 1 n k j=1 n k j=1 ( f(ˆxk, ξ j ) f(x n k, ξ j ) ) [ (f(ˆxk, ξ j ) f(x n k, ξ j )) ( f nk (ˆx k ) f nk (x n k )) ] 2

A Sequential SAA Algorithm Stopping criterion: Sample size criterion: T = inf k 1 {k : G k h s k } (1) n k ( ) 2 1 ( cq,α + 2q ln 2 k ) (2) h h Fact. Consider the sequential sampling procedure in which the sample size is increased according to (2), and the procedure stops at iteration T according to (1). Then, under some regularity assumptions (including uniform integrability of a moment generating function) lim inf h h P (E f(ˆx T, ξ) z hs T ) 1 α

A Word (well, pictures) About Multi-Stage Stochastic Programming

What Does Solution Mean? In multistage setting, assessing solution quality means assessing policy quality

One Family of Algorithms & SAA Assume interstage independence, or, dependence with special structure. Stochastic dual dynamic programming (SDDP): (a) Forward Pass (b) Backward Pass

Small Sampling of Things We Didn t Talk About Non-iid sampling (well, we did a bit) Bias and variance reduction techniques (some brief allusion) Multi-stage SAA (in any detail) Large-deviation results, concentration-inequality results, finite-sample guarantees More generally, results with coefficients that are difficult to estimate SAA for expected-value constraints, including chance constraints SAA for other models, such as those with equilibrium constraints Results that exploit more specific special structure of f, ξ, and/or X Results that study interaction between an optimization algorithm and SAA Stochastic approximation, stochastic gradient descent, stochastic mirror descent, stochastic cutting-plane methods, stochastic dual dynamic programming... Statistical testing of optimality conditions Results for risk measures not expressed as expected (dis)utility. Decision-dependent probability distributions Distributionally robust data-driven variants of SAA

Summary: SAA SAA Results for Monte Carlo estimators: no optimization What results should we want for SAA? Results for SAA 1. Bias 2. Consistency 3. CLT SAA Algorithm A basic algorithm A sequential algorithm Multi-Stage Problems What We Didn t Discuss

Small Sampling of References Lagrange, Bernoulli, Euler, Laplace, Gauss, Edgeworth, Hotelling, Fisher... (leading to maximum likelihood) H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical Statistics 22, 400-407, 1951. G. Dantzig and A. Madansky, On the solution of two-stage linear programs under uncertainty, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1961. Overviews and Tutorials A. Shapiro, A. Ruszczyński, D. Dentcheva, Lectures on Stochastic Programming: Modeling and Theory (Chapter 5, Statistical Inference), 2014. A. Shapiro, Monte Carlo sampling methods. In A. Ruszczyński and A. Shapiro (editors), Stochastic Programming: Handbooks in Operations Research and Management Science, 2003. S. Kim, R. Pasupathy and S. Henderson. A guide to sample-average approximation. In Handbook of Simulation Optimization, edited by M. Fu, 2015. T. Homem-de-Mello and G. Bayraksan, Monte Carlo sampling-based methods for stochastic optimization, Surveys in Operations Research and Management Science 19, 56-85, 2014. G. Bayraksan and D.P. Morton, Assessing solution quality in stochastic programs via sampling, Tutorials in Operations Research, M.R. Oskoorouchi (ed.), 102-122, INFORMS, 2009. Further References G. Bayraksan and D.P. Morton, Assessing solution quality in stochastic programs, Mathematical Programming, 108, 495-514 (2006). G. Bayraksan and D.P. Morton, A sequential sampling procedure for stochastic programming, Operations Research 59, 898-913 (2011). J. Dupačová and R. Wets, Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems, The Annals of Statistics 16, 1517-1549, 1988. M. Freimer, J. Linderoth and D. Thomas, The impact of sampling methods on bias and variance in stochastic linear programs, Computational Optimization and Applications 51, 51-75, 2012. P. Glynn and G. Infanger, Simulation-based confidence bounds for two-stage stochastic programs, Mathematical Programming 138, 15-42, 2013. J. Higle and S. Sen, Stochastic decomposition: an algorithm for two-stage linear programs with recourse, Mathematics of Operations Research 16, 650-669, 1991.

Small Sampling of References J. Higle and S. Sen, Duality and statistical tests of optimality for two stage stochastic programs, Mathematical Programming 75, 257-275, 1996. T. Homem-de-Mello, On rates of convergence for stochastic optimization problems under non-iid sampling, SIAM Journal on Optimization 19, 524-551, 2008. G. Infanger, Monte Carlo (importance) sampling within a Benders decomposition algorithm for stochastic linear programs, Annals of Operations Research 39, 4167, 1991 A. King and R. Rockafellar, Asymptotic theory for solutions in statistical estimation and stochastic programming, Mathematics of Operations Research 18, 148-162, 1993. A. King and R. Wets, Epiconsistency of convex stochastic programs, Stochastics 34, 83-92, 1991. A. Kleywegt, A. Shapiro, and T. Homem-de-Mello, The sample average approximation method for stochastic discrete optimization, SIAM Journal on Optimization 12, 479-502, 2001. V. Kozmik and D.P. Morton, Evaluating policies in risk-averse multi-stage stochastic programming, Mathematical Programming 152, 275-300 (2015). J. Luedtke and S. Ahmed, A sample approximation approach for optimization with probabilistic constraints, SIAM Journal on Optimization 19, 674-699, 2008. J. Linderoth, A. Shapiro and S. Wright, The empirical behavior of sampling methods for stochastic programming, Annals of Operations Research 142, 215-241, 2001. W. Mak, D. Morton and R. Wood, Monte Carlo bounding techniques for determining solution quality in stochastic programs, Operations Research Letters 24, 47-56 (1999). B. Pagnoncelli, S. Ahmed and A. Shapiro, Sample average approximation method for chance constrained programming: theory and applications, Journal of Optimization Theory and Applications 142, 399-416, 2009. R. Pasupathy, On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simulation optimization, Operations Research 58, 889-901, 2010. J. Royset and R. Szechtman, Optimal Budget Allocation for Sample Average Approximation, Operations Research 61, 762-776, 2013.

Sorry for All the Acronyms (SAA) CI: Confidence Interval CLT: Central Limit Theorem CRN: Common Random Numbers DB: Donohue Birge test instance iid: independent and identically distributed iidrvs: iid random variables LHS: Latin Hybercube Sampling LLN: Law of Large Numbers LSC: Lower Semi-Continuous MIP: Mixed Integer Program QMC: Quasi Monte Carlo QP: Quadratic Program SAA: Sample Average Approximation SDDP: Stochastic Dual Dynamic Programming SLP: Stochastic Linear Program SSN: SONET Switched Network test instance. Or, Suvrajeet Sen s Network SONET: Synchronous Optical Networking USLLN: Uniform Strong LLN wp1: with probability one WRPM: West-coast Regional Planning Model 20TERM: 20 TERMinal test instance