Lecture 1 Stochastic Optimization: Introduction January 8, 2018
Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing at all takes place in the universe in which some rule of the maximum or minimum does not apply. Important tool in the analysis/design/control/simulation of physical, economic, chemical and biological systems Model apply algorithm check solution Stochastic Optimization 1
Unconstrained optimization Unconstrained X is defined as X R n minimize x R n f(x) Examples: f(x) = x 3 3x 2. Important application: Data fitting and regression Stochastic Optimization 2
Unconstrained optimization: An example Given a data set {y i, x i1,..., x ip } n p=1 (n records, with the dependent variable y i and independent variable x i1,..., x ip ). The linear regression model assumes that the relationship between the dependent variable y and the independent variables x i is linear. This relation is captured as follows: y i = x i0 + p j=1 β p x ip + ɛ i, i = 1,..., n where ɛ i denotes a random variable. More compactly, we may state this as follows: y = Xβ + ɛ, where y y 1. y n, X x T 1. x T n. Stochastic Optimization 3
Then the least-squares estimator β is defined as follows: β = argmin β Xβ y 2. Stochastic Optimization 4
Convex optimization Convex minimize x R n f(x) subject to x X, where X is a convex set and f is a convex function. Definition 1 (Convexity of sets and functions) A set X R n is a convex set if x 1, x 2 X then (λx 1 + (1 λ)x 2 ) X for all λ [0, 1]. A function f is said to be convex if f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ), λ [0, 1]. Stochastic Optimization 5
A function f is said to be strictly convex if f(λx 1 + (1 λ)x 2 ) < λf(x 1 ) + (1 λ)f(x 2 ), λ [0, 1]. A function f is said to be strongly convex with parameter µ if f(λx 1 +(1 λ)x 2 ) λf(x 1 )+(1 λ)f(x 2 ) 1 2 µλ(1 λ) x 1 x 2 2, λ [0, 1] Note that in the above definition f does not need to be differentiable. Definition 2 (Convexity of differentiable functions) Consider a differentiable function f : R n R. A function f is said to be convex if f(x 2 ) f(x 1 ) + x f(x 1 ) T (x 2 x 1 ), x 1, x 2 R n. Stochastic Optimization 6
A function f is said to be strongly convex with parameter µ if ( x f(x 1 ) x f(x 2 )) T (x 1 x 2 ) µ x 1 x 2 2, x 1, x 2 R n. Any local solution of (Convex) is a global solution Examples of convex sets: 1. Linear constraints: X {x : Ax = b, x 0} 2. Convex quadratic constraints: X {x : N i=1 (x i a i ) 2 b}. Examples of convex functions: 1. f(x) = e x. 2. f(x) = 1 2 xt Qx + c T x, where Q 0. Application: Controller design, constrained least-squares, etc. Stochastic Optimization 7
Nonlinear program NLP minimize x X f(x) f : Objective function is a possibly nonconvex function x R n : Decision variables X R n is a possibly nonconvex set f : X R Applications: Nonlinear regression, process control in chemical engineering, etc.: Stochastic Optimization 8
Discrete optimization Discrete minimize x R n f(x) subject to x Z. Z is a finite set implying that x can take on discrete values e.g. x {0, 1}. Sometimes x 1 R, x 2 {0, 1}; the resulting problem is called a mixed-integer problem Applications: facility location problems, unit commitment problems Stochastic Optimization 9
Convex optimization relevance in this course Stochastic optimization captures a broad class of problems, including convex, nonconvex (time permitting), and discrete optimization problems (not considered here). In this course, we focus on the following: Convex stochastic optimization problems (including stochastic programs with recourse) Monotone stochastic variational inequality problems (subsumes stochastic convex optimization and captures stochastic Nash games, stochastic contact problems, stochastic traffic equilibrium problems) Robust optimization problems Applications: Statistical learning problems Convexity is crucial and will be leveraged extensively during the course!! Stochastic Optimization 10
Problems complicated by uncertainty In the aforementioned (deterministic) problems, parameters are known with certainty. Specifically, given a function f(x; ξ), we consider two possibilities: ξ is a random variable. Our focus is then on solving the following: min x X E[f(x, ξ)] (Stoch-Opt) ξ is unavailable and instead we have that ξ U (where U is an uncertainty set). A problem of interest is then: min x X max ξ U f(x, ξ) (Robust-Opt) Stochastic Optimization 11
We motivate this line of questioning by considering the classical newsvendor problem Stochastic Optimization 12
A short detour Probability Spaces Throughout this course, we will be utilizing the notion of a probability space (Ω, F, P). This mathematical construct captures processes (either real or synthetic) that are characterized by randomness. This space is constructed for a particular such process and on every occasion this process is examined, both the set of outcomes and the associated probabilities are the same. The sample-space Ω is a nonempty set that denotes the set of outcomes. This represents a single execution of the experiment. The σ-algebra F denotes the set of events where each event is a set containing zero or more outcomes. Stochastic Optimization 13
The assigmnent of probabilities to the events is captured by P. Once the space (Ω, F, P) is established, then nature selects an outcome ω from Ω. As a consequence, all events that contain ω as one of its outcomes are said to have occurred. If nature selects outcomes infinitely often, then the relative frequencies of occurrence of a particular event corresponds with the value specified by the probability measure P. Stochastic Optimization 14
Properties of F: Ω F F closed under complementation: A F = (Ω\A) F. F is closed under countable unions A i F for i = 1, 2..., implies that ( i=1 A i ) F. Properties of P. The probability measure P : F [0, 1] such that P is P is countably additive: If {A i } i=1 F denotes a countable collection of pairwise disjoint sets (A i A j = for i j), then P( i=1a i ) = i=1 P(A i ). The measure of the sample-space is one or P(Ω) = 1. Stochastic Optimization 15
A short detour Probability Spaces: II Example 1. Single coin toss Ω {H, T }. The σ algebra F contains 2 2 = 4 events F {{}, {H}, {T }, {H, T }}. Furthermore, P({}) = 0,P({H}) = 0.5, P({T }) = 0.5, and P({H, T }) = 1. Example 2. Double coin toss Ω {HH, HT, T H, T T }. Stochastic Optimization 16
The σ algebra F contains 2 4 = 16 events F {{}, {HH}, {T T }, {HT }, {T H}, {HH, T T }, {HH, HT }, {HH, T H} {HT, T T }, {HT, T H}, {T H, T T }, {HH, HT, T H}, {HH, HT, T T }, {HH, T H, T T }, {HT, T H, T T } {HH, T H, HT, HH}}. Furthermore, P({}) = 0,P(A 1 ) = 0.25, P(A 2 ) = 0.5, P(A 3 ) = Stochastic Optimization 17
0.75, and P({HH, T H, HT, HH}) = 1, where A 1 {HH, HT, T H, T T } A 2 {{HH, T T }, {HH, HT }, {HH, T H}, {HT, T T }, {HT, T H}, {T H, T T }, A 3 {{HH, HT, T H}, {HH, HT, T T }, {HH, T H, T T }, {HT, T H, T T } Stochastic Optimization 18
Random variables Given a probability space (Ω, F, P), a random variable represents a function on a sample-space with measurable values. Specifically, X is a random variable defined as X : Ω E, where E is a measurable space. Consequently, P(X S) = P(ω Ω X(ω) S). Example: Coin-tossing. Define X(ω) as follows: X(ω) = 100, ω = H 100, ω = T. Stochastic Optimization 19
Example: The Newsvendor Problem Suppose a company has to decide its order quantity x, given a demand d The cost is given by f(x, d) cx + b[d x] + }{{} back-order cost + h[x d] +, } {{ } holding cost where b is back-order penalty and h is holding cost In such an instance, the firm will solve the problem: min x 0 f(x). Stochastic Optimization 20
The Newsvendor Problem More specifically, suppose, demand is a random variable, defined as d ω d(ω) where d : Ω R + is a random variable, Ω is the sample space Furthermore, suppose (Ω, F, P) denotes the associated probability space where P denotes the probability distribution function Then the (random) cost associated with demand d ω is given by f(x; ω) cx + b[d ω x] + }{{} back-order cost + h[x d ω ] +, } {{ } holding cost We assume for the present that P is known; then, the firm may Stochastic Optimization 21
minimize its expected costs (averaged) given by min x 0 E[f(x; ω)], where E[ ] is the expectation with respect to P Stochastic Optimization 22
The Newsvendor Problem This is an instance of a two-stage problem with recourse First-stage decision: Order quantity x Second-stage ω specific recourse decisions: y ω = [d ω x] + and z ω = [x d ω ] +. Recourse decisions can be taken upon revelation of uncertainty; firststage decisions have to be taken prior to this revelation Stochastic Optimization 23
A Scenario-based Approach In practice, analytical solutions of this problem are complicated by the presence of an expectation (integral) One avenue: a scenario-based approach requires obtaining K samples from Ω, denoted by d(ω 1 ),..., d(ω K ) or d 1,..., d K. The recourse-based problem is then given by minimize K k=1 subject to x 0. p k f(x; ω k ) Stochastic Optimization 24
Note that f(x; ω) = cx + b[d ω x] + + h[x d ω ] + = max ((c b)x + bd ω, (c + h)x hd ω ). minimize x,v 1,...,v K subject to K k=1 p k v k v k (c b)x + bd k, v k (c + h)x hd k, x 0 k = 1,..., K k = 1,..., K This is a linear program with one possible challenge; as K grows, it becomes increasingly difficult to solve directly Stochastic Optimization 25
A two-stage linear program Consider the newsvendor problem again. It can be written as follows: minimize subject to x 0. cx + E[Q(x; ω)] where Q(x; ω) is the optimal value of the following recourse problem: Q(x; ω) minimize [by ω + hz ω ] subject to y ω d ω x, z ω x d ω, y ω, z ω 0. Stochastic Optimization 26
The problem Q(x; ω) represents the cost of responding to the uncertainty captured by realization ω and given the first-stage decision x This motivates a canonical form for the two-stage stochastic linear program: minimize subject to c T x + E[Q(x; ξ)] Ax = b x 0. where Q(x; ξ) is the optimal value of the following second-stage Stochastic Optimization 27
recourse problem: Q(x; ξ) minimize q T y ξ subject to T x + W y ξ = h y ξ 0, and ξ := (q, T, W, h) represents the data of the second-stage problem We define Q(x), the cost of recourse, as follows: Q(x) E[Q(x; ω)]. Stochastic Optimization 28
A general model for stochastic optimization A general model for stochastic optimization problems is given by the following. Given a random variable ξ : Ω R d and a function f : X R d R, then the stochastic optimization problem requires an x such that Stoch-opt minimize x E[f(x, ξ)] subject to x X. This model includes the case where f(x, ξ) = c T x + Q(x, ξ) as a special case. Stochastic Optimization 29
Analysis of two-stage stochastic programming 1. Properties of Q(x; ξ) (polyhedral, convex, etc.) 2. Expected recourse costs Q(x) Discrete distributions General distributions (convexity, continuity, Lipschitz continuity etc.) 3. Optimality conditions 4. Extensions to convex regimes 5. Nonanticipativity 6. Value of perfect information Stochastic Optimization 30
Decomposition methods for two-stage stochastic programming 1. Cutting-plane methods 2. Extensions to convex nonlinear regimes 3. Dual decomposition methods Stochastic Optimization 31
Monte-Carlo Sampling methods for convex stochastic optimization 1. Stochastic decomposition schemes for two-stage stochastic linear programs with general distributions 2. Sample-average approximation methods Consistency of estimators Convergence rates 3. Stochastic approximation methods Almost-sure convergence of iterates Non-asymptotic rates of convergence Stochastic Optimization 32
Robust optimization problems Stochastic optimization relies on the availability of a distribution function. In many settings, this is not available; instead, we have access to a set for the uncertain parameter In such instance, one avenue lies in solving a robust optimization problem Consider a linear optimization problem: min x { c T x : Ax b, x 0 }. The uncertain linear optimization problem is given by { min x { c T x : Ax b, x 0 }} (c,b,a) U Stochastic Optimization 33
where U denotes the uncertainty set associated with the data. The robust counterpart of this problem is given by min x { ĉ(x) = sup (c,b,a) U { c T x : Ax b, x 0, (c, b, A) U }}. This is effectively a problem in which the robust value of the objective is minimized over all robust feasible solutions; a robust feasible solution is defined as an x such that Ax b, x 0, (A, b) U. It can be seen that feasibility requirements lead to a semi-infinite optimization problem; in other words, there is an infinite number of constraints of the form Ax b, one for every (A, b) U. In Stochastic Optimization 34
addition, the objective is of a min-max form, leading to a challenging optimization problem Under some conditions on the uncertainty set, the robust optimization problem can be recast as a convex optimization problem and is deemed to be tractable. The first part of our study in robust optimization will analyze the development of tractable robust counterparts for a diverse set of uncertainty sets. In the second part of this topic, we will examine how chance constraints and their amobiguous variants can be captured via a tractable problem. Stochastic Optimization 35
Stochastic variational inequality problems Consider the convex optimization problem given by min x X f(x), (Opt) where f : X R is a continuously differentiable function and X is a closed and convex set. Then x is a solution to (Opt) if and only if x is a solution to a variational inequality problem, denoted by VI(X, x f). It may be recalled that VI(X, F ) requires an x X such that (y x) T F (x) 0, y X. Stochastic Optimization 36
Consider the stochastic generalization of (Opt) given by min x X E[f(x, ξ)], (SOpt) where f : X R d R is a convex function and E[.] denotes the ecpectation with respect to a probability distribution P. The necessary and sufficienty conditions of optimality of this problem are given by VI(X, F ) where F (x) E[ x f(x, ξ)]. Variational inequality problems can capture the equilibrium conditions of optimization problems and convex Nash games. Additionally, they emerge in modeling a variety of problems including traffic equilibrium problems, contact problems (in structural design), pricing of American options, etc. Unfortunately, approaches for stochastic convex optimization cannot be directly expected to function on variational inequality problems. Stochastic Optimization 37
Instead, we extend stochastic approximation schemes to accommodate monotone stochastic variational inequality problems. Recall that a map F is monotone over X if for all x, y X, we have that (y x) T (F (y) F (x)) 0. Stochastic Optimization 38