Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Similar documents
Optimization Tools in an Uncertain Environment

Stochastic Optimization One-stage problem

Handout 1: Introduction to Dynamic Programming. 1 Dynamic Programming: Introduction and Examples

Introduction to Game Theory: Simple Decisions Models

Lecture 11: Random Variables

Math 273a: Optimization Basic concepts

Mathematical Optimization Models and Applications

Lecture 3. Optimization Problems and Iterative Algorithms

Mathematical Preliminaries

On deterministic reformulations of distributionally robust joint chance constrained optimization problems

Lecture 2: Random Variables and Expectation

Chance Constrained Programming

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Applications of Linear Programming

Optimality, Duality, Complementarity for Constrained Optimization

Lecture 7 Monotonicity. September 21, 2008

1: PROBABILITY REVIEW

Probability and Measure

Discrete Probability. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 2

Mathematical Economics. Lecture Notes (in extracts)

Lecture 2: Convex Sets and Functions

1 Strict local optimality in unconstrained optimization

Stochastic Approximation Approaches to the Stochastic Variational Inequality Problem

Recap of Basic Probability Theory

ORIGINS OF STOCHASTIC PROGRAMMING

Lecture 2: Convergence of Random Variables

Information Theory and Communication

6.262: Discrete Stochastic Processes 2/2/11. Lecture 1: Introduction and Probability review

Modeling Uncertainty in Linear Programs: Stochastic and Robust Linear Programming

Nonlinear Programming Models

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 3 9/10/2008 CONDITIONING AND INDEPENDENCE

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

Recap of Basic Probability Theory

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

A Stochastic-Oriented NLP Relaxation for Integer Programming

Lecture 3. Discrete Random Variables

arxiv: v3 [math.oc] 25 Apr 2018

Set-based Min-max and Min-min Robustness for Multi-objective Robust Optimization

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Lebesgue measure on R is just one of many important measures in mathematics. In these notes we introduce the general framework for measures.

Lecture 6: The Pigeonhole Principle and Probability Spaces

CHAPTER 2: QUADRATIC PROGRAMMING

Sample Spaces, Random Variables

Stochastic Integer Programming

Appendix A : Introduction to Probability and stochastic processes

Near-Potential Games: Geometry and Dynamics

Stochastic Programming Math Review and MultiPeriod Models

Stability-based generation of scenario trees for multistage stochastic programs

Lecture Note 1: Introduction to optimization. Xiaoqun Zhang Shanghai Jiao Tong University

Reinforcement Learning

Optimization for Machine Learning

Lecture 9 Monotone VIs/CPs Properties of cones and some existence results. October 6, 2008

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Homework 3. Convex Optimization /36-725

JUSTIN HARTMANN. F n Σ.

Solution Methods for Stochastic Programs

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

CONVERGENCE ANALYSIS OF SAMPLING-BASED DECOMPOSITION METHODS FOR RISK-AVERSE MULTISTAGE STOCHASTIC CONVEX PROGRAMS

SOME HISTORY OF STOCHASTIC PROGRAMMING

Statistical Inference

Probability Theory and Simulation Methods

Introduction to Nonlinear Stochastic Programming

Complexity of two and multi-stage stochastic programming problems

Convex Optimization and Modeling

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Week 12-13: Discrete Probability

LMI Methods in Optimal and Robust Control

F (x) = P [X x[. DF1 F is nondecreasing. DF2 F is right-continuous

Lecture 1: Introduction

Lecture 2: Review of Basic Probability Theory

Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Probability. Lecture Notes. Adolfo J. Rumbos

Deep Learning for Computer Vision

the time it takes until a radioactive substance undergoes a decay

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Introduction and Preliminaries

Lecture 4: Convex Functions, Part I February 1

6.207/14.15: Networks Lecture 10: Introduction to Game Theory 2

On a class of minimax stochastic programs

Constrained Optimization and Lagrangian Duality

18.175: Lecture 2 Extension theorems, random variables, distributions

Introduction and Math Preliminaries

Stochastic Programming: Optimization When Uncertainty Matters

Financial Optimization ISE 347/447. Lecture 21. Dr. Ted Ralphs

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

6.207/14.15: Networks Lecture 11: Introduction to Game Theory 3

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

RS Chapter 1 Random Variables 6/5/2017. Chapter 1. Probability Theory: Introduction

Convex Optimization Notes

Probability Theory Review

Lecture 4: Probability and Discrete Random Variables

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Problem Set 0 Solutions

EE C128 / ME C134 Feedback Control Systems

Transcription:

Lecture 1 Stochastic Optimization: Introduction January 8, 2018

Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing at all takes place in the universe in which some rule of the maximum or minimum does not apply. Important tool in the analysis/design/control/simulation of physical, economic, chemical and biological systems Model apply algorithm check solution Stochastic Optimization 1

Unconstrained optimization Unconstrained X is defined as X R n minimize x R n f(x) Examples: f(x) = x 3 3x 2. Important application: Data fitting and regression Stochastic Optimization 2

Unconstrained optimization: An example Given a data set {y i, x i1,..., x ip } n p=1 (n records, with the dependent variable y i and independent variable x i1,..., x ip ). The linear regression model assumes that the relationship between the dependent variable y and the independent variables x i is linear. This relation is captured as follows: y i = x i0 + p j=1 β p x ip + ɛ i, i = 1,..., n where ɛ i denotes a random variable. More compactly, we may state this as follows: y = Xβ + ɛ, where y y 1. y n, X x T 1. x T n. Stochastic Optimization 3

Then the least-squares estimator β is defined as follows: β = argmin β Xβ y 2. Stochastic Optimization 4

Convex optimization Convex minimize x R n f(x) subject to x X, where X is a convex set and f is a convex function. Definition 1 (Convexity of sets and functions) A set X R n is a convex set if x 1, x 2 X then (λx 1 + (1 λ)x 2 ) X for all λ [0, 1]. A function f is said to be convex if f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ), λ [0, 1]. Stochastic Optimization 5

A function f is said to be strictly convex if f(λx 1 + (1 λ)x 2 ) < λf(x 1 ) + (1 λ)f(x 2 ), λ [0, 1]. A function f is said to be strongly convex with parameter µ if f(λx 1 +(1 λ)x 2 ) λf(x 1 )+(1 λ)f(x 2 ) 1 2 µλ(1 λ) x 1 x 2 2, λ [0, 1] Note that in the above definition f does not need to be differentiable. Definition 2 (Convexity of differentiable functions) Consider a differentiable function f : R n R. A function f is said to be convex if f(x 2 ) f(x 1 ) + x f(x 1 ) T (x 2 x 1 ), x 1, x 2 R n. Stochastic Optimization 6

A function f is said to be strongly convex with parameter µ if ( x f(x 1 ) x f(x 2 )) T (x 1 x 2 ) µ x 1 x 2 2, x 1, x 2 R n. Any local solution of (Convex) is a global solution Examples of convex sets: 1. Linear constraints: X {x : Ax = b, x 0} 2. Convex quadratic constraints: X {x : N i=1 (x i a i ) 2 b}. Examples of convex functions: 1. f(x) = e x. 2. f(x) = 1 2 xt Qx + c T x, where Q 0. Application: Controller design, constrained least-squares, etc. Stochastic Optimization 7

Nonlinear program NLP minimize x X f(x) f : Objective function is a possibly nonconvex function x R n : Decision variables X R n is a possibly nonconvex set f : X R Applications: Nonlinear regression, process control in chemical engineering, etc.: Stochastic Optimization 8

Discrete optimization Discrete minimize x R n f(x) subject to x Z. Z is a finite set implying that x can take on discrete values e.g. x {0, 1}. Sometimes x 1 R, x 2 {0, 1}; the resulting problem is called a mixed-integer problem Applications: facility location problems, unit commitment problems Stochastic Optimization 9

Convex optimization relevance in this course Stochastic optimization captures a broad class of problems, including convex, nonconvex (time permitting), and discrete optimization problems (not considered here). In this course, we focus on the following: Convex stochastic optimization problems (including stochastic programs with recourse) Monotone stochastic variational inequality problems (subsumes stochastic convex optimization and captures stochastic Nash games, stochastic contact problems, stochastic traffic equilibrium problems) Robust optimization problems Applications: Statistical learning problems Convexity is crucial and will be leveraged extensively during the course!! Stochastic Optimization 10

Problems complicated by uncertainty In the aforementioned (deterministic) problems, parameters are known with certainty. Specifically, given a function f(x; ξ), we consider two possibilities: ξ is a random variable. Our focus is then on solving the following: min x X E[f(x, ξ)] (Stoch-Opt) ξ is unavailable and instead we have that ξ U (where U is an uncertainty set). A problem of interest is then: min x X max ξ U f(x, ξ) (Robust-Opt) Stochastic Optimization 11

We motivate this line of questioning by considering the classical newsvendor problem Stochastic Optimization 12

A short detour Probability Spaces Throughout this course, we will be utilizing the notion of a probability space (Ω, F, P). This mathematical construct captures processes (either real or synthetic) that are characterized by randomness. This space is constructed for a particular such process and on every occasion this process is examined, both the set of outcomes and the associated probabilities are the same. The sample-space Ω is a nonempty set that denotes the set of outcomes. This represents a single execution of the experiment. The σ-algebra F denotes the set of events where each event is a set containing zero or more outcomes. Stochastic Optimization 13

The assigmnent of probabilities to the events is captured by P. Once the space (Ω, F, P) is established, then nature selects an outcome ω from Ω. As a consequence, all events that contain ω as one of its outcomes are said to have occurred. If nature selects outcomes infinitely often, then the relative frequencies of occurrence of a particular event corresponds with the value specified by the probability measure P. Stochastic Optimization 14

Properties of F: Ω F F closed under complementation: A F = (Ω\A) F. F is closed under countable unions A i F for i = 1, 2..., implies that ( i=1 A i ) F. Properties of P. The probability measure P : F [0, 1] such that P is P is countably additive: If {A i } i=1 F denotes a countable collection of pairwise disjoint sets (A i A j = for i j), then P( i=1a i ) = i=1 P(A i ). The measure of the sample-space is one or P(Ω) = 1. Stochastic Optimization 15

A short detour Probability Spaces: II Example 1. Single coin toss Ω {H, T }. The σ algebra F contains 2 2 = 4 events F {{}, {H}, {T }, {H, T }}. Furthermore, P({}) = 0,P({H}) = 0.5, P({T }) = 0.5, and P({H, T }) = 1. Example 2. Double coin toss Ω {HH, HT, T H, T T }. Stochastic Optimization 16

The σ algebra F contains 2 4 = 16 events F {{}, {HH}, {T T }, {HT }, {T H}, {HH, T T }, {HH, HT }, {HH, T H} {HT, T T }, {HT, T H}, {T H, T T }, {HH, HT, T H}, {HH, HT, T T }, {HH, T H, T T }, {HT, T H, T T } {HH, T H, HT, HH}}. Furthermore, P({}) = 0,P(A 1 ) = 0.25, P(A 2 ) = 0.5, P(A 3 ) = Stochastic Optimization 17

0.75, and P({HH, T H, HT, HH}) = 1, where A 1 {HH, HT, T H, T T } A 2 {{HH, T T }, {HH, HT }, {HH, T H}, {HT, T T }, {HT, T H}, {T H, T T }, A 3 {{HH, HT, T H}, {HH, HT, T T }, {HH, T H, T T }, {HT, T H, T T } Stochastic Optimization 18

Random variables Given a probability space (Ω, F, P), a random variable represents a function on a sample-space with measurable values. Specifically, X is a random variable defined as X : Ω E, where E is a measurable space. Consequently, P(X S) = P(ω Ω X(ω) S). Example: Coin-tossing. Define X(ω) as follows: X(ω) = 100, ω = H 100, ω = T. Stochastic Optimization 19

Example: The Newsvendor Problem Suppose a company has to decide its order quantity x, given a demand d The cost is given by f(x, d) cx + b[d x] + }{{} back-order cost + h[x d] +, } {{ } holding cost where b is back-order penalty and h is holding cost In such an instance, the firm will solve the problem: min x 0 f(x). Stochastic Optimization 20

The Newsvendor Problem More specifically, suppose, demand is a random variable, defined as d ω d(ω) where d : Ω R + is a random variable, Ω is the sample space Furthermore, suppose (Ω, F, P) denotes the associated probability space where P denotes the probability distribution function Then the (random) cost associated with demand d ω is given by f(x; ω) cx + b[d ω x] + }{{} back-order cost + h[x d ω ] +, } {{ } holding cost We assume for the present that P is known; then, the firm may Stochastic Optimization 21

minimize its expected costs (averaged) given by min x 0 E[f(x; ω)], where E[ ] is the expectation with respect to P Stochastic Optimization 22

The Newsvendor Problem This is an instance of a two-stage problem with recourse First-stage decision: Order quantity x Second-stage ω specific recourse decisions: y ω = [d ω x] + and z ω = [x d ω ] +. Recourse decisions can be taken upon revelation of uncertainty; firststage decisions have to be taken prior to this revelation Stochastic Optimization 23

A Scenario-based Approach In practice, analytical solutions of this problem are complicated by the presence of an expectation (integral) One avenue: a scenario-based approach requires obtaining K samples from Ω, denoted by d(ω 1 ),..., d(ω K ) or d 1,..., d K. The recourse-based problem is then given by minimize K k=1 subject to x 0. p k f(x; ω k ) Stochastic Optimization 24

Note that f(x; ω) = cx + b[d ω x] + + h[x d ω ] + = max ((c b)x + bd ω, (c + h)x hd ω ). minimize x,v 1,...,v K subject to K k=1 p k v k v k (c b)x + bd k, v k (c + h)x hd k, x 0 k = 1,..., K k = 1,..., K This is a linear program with one possible challenge; as K grows, it becomes increasingly difficult to solve directly Stochastic Optimization 25

A two-stage linear program Consider the newsvendor problem again. It can be written as follows: minimize subject to x 0. cx + E[Q(x; ω)] where Q(x; ω) is the optimal value of the following recourse problem: Q(x; ω) minimize [by ω + hz ω ] subject to y ω d ω x, z ω x d ω, y ω, z ω 0. Stochastic Optimization 26

The problem Q(x; ω) represents the cost of responding to the uncertainty captured by realization ω and given the first-stage decision x This motivates a canonical form for the two-stage stochastic linear program: minimize subject to c T x + E[Q(x; ξ)] Ax = b x 0. where Q(x; ξ) is the optimal value of the following second-stage Stochastic Optimization 27

recourse problem: Q(x; ξ) minimize q T y ξ subject to T x + W y ξ = h y ξ 0, and ξ := (q, T, W, h) represents the data of the second-stage problem We define Q(x), the cost of recourse, as follows: Q(x) E[Q(x; ω)]. Stochastic Optimization 28

A general model for stochastic optimization A general model for stochastic optimization problems is given by the following. Given a random variable ξ : Ω R d and a function f : X R d R, then the stochastic optimization problem requires an x such that Stoch-opt minimize x E[f(x, ξ)] subject to x X. This model includes the case where f(x, ξ) = c T x + Q(x, ξ) as a special case. Stochastic Optimization 29

Analysis of two-stage stochastic programming 1. Properties of Q(x; ξ) (polyhedral, convex, etc.) 2. Expected recourse costs Q(x) Discrete distributions General distributions (convexity, continuity, Lipschitz continuity etc.) 3. Optimality conditions 4. Extensions to convex regimes 5. Nonanticipativity 6. Value of perfect information Stochastic Optimization 30

Decomposition methods for two-stage stochastic programming 1. Cutting-plane methods 2. Extensions to convex nonlinear regimes 3. Dual decomposition methods Stochastic Optimization 31

Monte-Carlo Sampling methods for convex stochastic optimization 1. Stochastic decomposition schemes for two-stage stochastic linear programs with general distributions 2. Sample-average approximation methods Consistency of estimators Convergence rates 3. Stochastic approximation methods Almost-sure convergence of iterates Non-asymptotic rates of convergence Stochastic Optimization 32

Robust optimization problems Stochastic optimization relies on the availability of a distribution function. In many settings, this is not available; instead, we have access to a set for the uncertain parameter In such instance, one avenue lies in solving a robust optimization problem Consider a linear optimization problem: min x { c T x : Ax b, x 0 }. The uncertain linear optimization problem is given by { min x { c T x : Ax b, x 0 }} (c,b,a) U Stochastic Optimization 33

where U denotes the uncertainty set associated with the data. The robust counterpart of this problem is given by min x { ĉ(x) = sup (c,b,a) U { c T x : Ax b, x 0, (c, b, A) U }}. This is effectively a problem in which the robust value of the objective is minimized over all robust feasible solutions; a robust feasible solution is defined as an x such that Ax b, x 0, (A, b) U. It can be seen that feasibility requirements lead to a semi-infinite optimization problem; in other words, there is an infinite number of constraints of the form Ax b, one for every (A, b) U. In Stochastic Optimization 34

addition, the objective is of a min-max form, leading to a challenging optimization problem Under some conditions on the uncertainty set, the robust optimization problem can be recast as a convex optimization problem and is deemed to be tractable. The first part of our study in robust optimization will analyze the development of tractable robust counterparts for a diverse set of uncertainty sets. In the second part of this topic, we will examine how chance constraints and their amobiguous variants can be captured via a tractable problem. Stochastic Optimization 35

Stochastic variational inequality problems Consider the convex optimization problem given by min x X f(x), (Opt) where f : X R is a continuously differentiable function and X is a closed and convex set. Then x is a solution to (Opt) if and only if x is a solution to a variational inequality problem, denoted by VI(X, x f). It may be recalled that VI(X, F ) requires an x X such that (y x) T F (x) 0, y X. Stochastic Optimization 36

Consider the stochastic generalization of (Opt) given by min x X E[f(x, ξ)], (SOpt) where f : X R d R is a convex function and E[.] denotes the ecpectation with respect to a probability distribution P. The necessary and sufficienty conditions of optimality of this problem are given by VI(X, F ) where F (x) E[ x f(x, ξ)]. Variational inequality problems can capture the equilibrium conditions of optimization problems and convex Nash games. Additionally, they emerge in modeling a variety of problems including traffic equilibrium problems, contact problems (in structural design), pricing of American options, etc. Unfortunately, approaches for stochastic convex optimization cannot be directly expected to function on variational inequality problems. Stochastic Optimization 37

Instead, we extend stochastic approximation schemes to accommodate monotone stochastic variational inequality problems. Recall that a map F is monotone over X if for all x, y X, we have that (y x) T (F (y) F (x)) 0. Stochastic Optimization 38