Robust and Stochastic Optimization Notes. Kevin Kircher, Cornell MAE

Robust and Stochastic Optimization Notes Kevin Kircher, Cornell MAE

These are partial notes from ECE 6990, Robust and Stochastic Optimization, as taught by Prof. Eilyan Bitar at Cornell University in the fall of 2015. They cover three approaches to convex optimization with uncertain input data: robust convex programming, where a solution must be feasible for all possible realizations of the uncertain parameters, chance-constrained programming, where a solution must be feasible with high probability under the uncertain parameter distribution, and sampled convex programming, where a solution must be feasible for a number of independent samples of the uncertain parameters. These notes assume knowledge of convex optimization at the level of Boyd and Vandenberghe [1], with which we attempt to maintain consistent notation. Some course material, such as affine policies for multistage stochastic programming, has been omitted. If you find any errors which are the fault of the scribe, not the lecturer feel free to let Kevin know. 1

Contents 1 Three paradigms for optimization under uncertainty 3 1.1 Robust convex programming........................... 4 1.2 Chance-constrained programming........................ 5 1.3 Sampled convex programming.......................... 6 2 Sample bounds for sampled convex programming 7 2.1 A loose, general bound.............................. 7 2.2 A tight bound for structured problems..................... 9 2.3 Comparison.................................... 10 3 Convex, conservative approximations to chance-constrained programs 11 3.1 Generating functions............................... 11 3.2 Enlarging the inner approximation....................... 13 3.3 Joint chance constraints............................. 15 3.4 Bernstein approximations for affinely perturbed constraints.......... 16 4 Tractable robust convex programs 18 4.1 Robust linear programs.............................. 18 5 Affine policies for stochastic linear programs 22 5.1 Primal affine approximation........................... 23 5.2 Dual affine approximation............................ 25 5.3 Example: a two-stage stochastic linear program................ 27 2

Chapter 1 Three paradigms for optimization under uncertainty We consider the uncertain convex program subject to f(x, δ) 0 x X, δ, (UCP) where x R n is the optimization variable, δ R d is an uncertain parameter, f : R n R d R is convex, X R n is closed and convex, and R d is the set of all uncertain parameters. This problem is ill-posed in its current form, since we have no information about δ other than that it belongs to. We also have not specified for which δ the constraint f(x, δ) 0 must hold, e.g., for all δ, for randomly selected δ with some probability, or for particular realizations of δ. Each of these approaches gives rise to a well-posed problem closely related to UCP. Although it may seem restrictive to require a linear objective function and a single inequality constraint coupling x and δ, in fact UCP is general. To see this, consider the problem minimize f 0 (y, δ) subject to f i (y, δ) 0, i = 1,..., m y Y, δ, (1.1) where y R n 1, Y is closed and convex, and f 0,..., f m are convex. The epigraph form of this problem is minimize t subject to f 0 (y, δ) t f i (y, δ) 0, i = 1,..., m y Y, δ. This problem fits into the UCP framework with x = (y, t), X = Y R, and f(x, δ) = max {f 0 (y, δ) t, f 1 (y, δ),..., f m (y, δ)}. 3

Example (optimal control). A special case of UCP is the open-loop optimal control problem minimize f 0 (z 0,..., z T, u 0,..., u T 1, w 0,..., w T 1 ) subject to z t+1 = A t z t + B t u t + C t w t u t U t z t+1 Z t+1 (z 0, w 0,..., w T 1 ), where the constraints hold for t = 0,..., T 1 and T is the time horizon. At each time t, z t R nz is the system state, u t R nu is the control action, and w t R nw is the disturbance. The initial state z 0 and disturbances w 0,..., w T 1 are uncertain. The optimization variables are the controls u 0,..., u T 1 and states z 1,..., z T. To see that problem (1.2) is a special case of UCP, we can write each equality constraint z t+1 = A t z t + B t u t + C t w t as the two inequality constraints z t+1 (A t z t + B t u t + G t w t ) 0 and z t+1 +A t z t +B t u t +G t w t 0. With this transformation, problem (1.2) is a special case of problem (1.1) with y = (u 0,..., u K 1, z 1,..., z K ), Y = U 0 U T 1 Z 1 Z T, δ = (z 0, w 0,..., w K 1 ), and appropriately defined constraint functions f i, i = 1,..., m. Since problem (1.1) reduces to UCP, so does the optimal control problem (1.2). The resulting UCP instance has n = T (n u + n z ) + 1 optimization variables and d = n z + T n w uncertain parameters. (1.2) 1.1 Robust convex programming A robust convex program requires that the uncertain constraint f(x, δ) 0 in UCP hold for all possible δ: subject to f(x, δ) 0 for all δ x X. (RCP) When one exists, we denote a solution of RCP by x k. RCP is a semi-infinite problem, since the constraint for all δ is infinite-dimensional, but x is finite-dimensional. The infinite-dimensional constraint makes RCP intractable in general. RCP is also conservative, since low-probability realizations of δ may significantly increase the optimal value. A goal of these notes is to derive convex, finite-dimensional approximations of RCP. Example (minimax optimization). The minimax problem minimize sup {f(x, δ) δ } subject to x X, where f is convex and X is closed and convex, can be written as an RCP. To see this, consider the epigraph form minimize t subject to sup {f(x, δ) δ } t x X. 4

The first constraint is equivalent to f(x, δ) t for all δ, so we have an RCP. Example (semidefinite programming). The standard form of a semidefinite program is subject to F (x) = F 0 + x 1 F 1 + + x n F n 0 Ax = b, (SDP) where x R n, F 0,..., F n S n +, and A R m n. The constraint F (x) 0 holds if and only if δ T F (x)δ 0 for all δ with δ 2 = 1. An equivalent problem, therefore, is the RCP minimize subject to c T x δ T (F 0 + x 1 F 1 + + x n F n )δ for all δ Ax = b, where = {δ δ 2 = 1}. We will see that this problem can be solved approximately using linear programming. This gives a method for handling very large semidefinite programs, for which no efficient solver currently exists. 1.2 Chance-constrained programming A chance-constrained program assumes a distribution on δ and bounds the probability of constraint violation by ε [0, 1]: minimize subject to c T x P {f(x, δ) 0} 1 ε x X. (CCP ε ) When one exists, we denote a solution of CCP ε by x ε. CCP ε is equivalent to RCP for ε = 0, but is less conservative than RCP even for small ε > 0. For general constraint functions f and distributions on δ, CCP ε is nonconvex. A goal of these notes is to derive convex, finite-dimensional inner approximations of CCP ε. Example (affine constraint, Gaussian parameter). In the special case of affine f, Gaussian δ, and ε 1/2, CCP ε is convex. To see this, let f(x, δ) = δ T x + b and δ N (µ, Σ). Then δ T x + b N (µ T x + b, x T Σx), and P {f(x, δ) 0} 1 ε 0 µ T x + b + Φ 1 (1 ε) Cx 2, where C T C = Σ and Φ is the cumulative distribution function of a standard normal random variable. The constant Φ 1 (1 ε) is nonnegative for ε 1/2, so this is a (convex) second-order cone constraint. This example demonstrates that chance constraints can promote a deterministic problem to a more general and likely harder to solve problem class. For instance, if X is a polyhedron, then the chance constraint promotes a linear program to a second-order cone program. 5

1.3 Sampled convex programming A sampled (or scenario or random) convex program requires that the constraints be met for each of the independent, identically distributed samples δ 1,..., δ N from the distribution of δ: subject to f(x, δ i ) 0, i = 1,..., N (SCP N ) x X. When one exists, we denote a solution of SCP N by x N. SCP N is a convex program of the same problem class as the original version. Unlike CCP ε, SCP N requires no assumptions on the distribution of δ, only the ability to independently sample from it. SCP N is also less conservative than RCP, and can accommodate unbounded uncertainty sets. A downside of SCP N is that in general, its solutions are feasible for some δ, but not all. Only as N are the SCP N solutions feasible for RCP. 6

Chapter 2 Sample bounds for sampled convex programming How many samples must we take to guarantee that the SCP N solutions are feasible for CCP ε with high probability? More precisely, for what value of N can we certify that P {P {f(x N, δ) 0} 1 ε} 1 β for ε, β [0, 1]? This chapter provides two lower bounds on the number of samples. The first bound is general but loose. The second bound is tight but requires some assumptions on the underlying UCP. It is worth noting that the results in this chapter only concern feasibility. The optimality gap between the SCP N and CCP ε solutions is an active area of current research. 2.1 A loose, general bound We begin by defining the violation probability of x X by V (x) = P {f(x, δ) > 0}. A point x is feasible for CCP ε if V (x) < ε, so we seek a lower bound on N that guarantees P {V (x N ) > ε} < β. One bound follows immediately from Markov s inequality, In 2005, Califiori and Campi [2] showed that P {V (x N) > ε} 1 ε E V (x N). E V (x N) n N + 1. (2.1) 7

Plugging this into the Markov bound gives our main result: N n εβ 1 = P {V (x N) > ε} β. (2.2) We now show that inequality (2.1) holds. This requires a result from convex analysis involving the problem (2.3) subject to x X i, i = 1,..., m, where x R n and the X i are closed and convex. Let x denote the solution to problem (2.3), and x k denote the solution to the same problem with the kth constraint deleted. We call the constraint x X k a support constraint if c T x k < ct x. The result, which we state without proof, is that problem (2.3) has at most n support constraints. The expectation in (2.1) is taken over the samples δ N = {δ 1,..., δ N }. By definition of V, we have E δ N V (x N) = E δ N { P f(x N, δ) > 0 } δ N δ δ N [ = E E I++ (f(x N, δ)) ] δ N, δ N δ δ N where I ++ is the indicator function of R ++, { 0 z / R ++ I ++ (z) = 1 z R ++. Letting δ k = {δ 1,..., δ k 1, δ k+1,..., δ N }, denoting by x k N the solution of the SCP N with the constraint f(x, δ k ) 0 deleted, and applying iterated expectation, we have [ E V (x N) = E E I++ (f(x N, δ)) δ k] δ N δ k δ k δ k [ = E E vk δ k] δ k δ k δ k = E δ N+1 v k, where δ N+1 = δ N δ N+1, and v k = I ++ (f(x N, δ)) indicates whether the kth constraint is a support constraint. From our support constraint result, N+1 k=1 v k n with probability one. Taking the sample mean of both sides of the equation E δ N V (x N ) = E δ N+1 v k, we have N+1 1 E V (x N + 1 N) = 1 N+1 E v k δ N N + 1 δ N+1 k=1 k=1 8

Table 2.1: samples required for 1 ε violation with n = 101, β = 10 4. ε 0.1 0.05 0.025 0.01 0.005 0.0025 0.001 2.1 bound 10 7 2 10 7 4 10 7 10 8 2 10 8 4 10 8 10 9 2.2 bound 2204 4408 8817 22042 44084 88168 220421 But E δ N V (x N ) is a constant and expectation is a linear operator, so we have E V (x N) = 1 N+1 δ N N + 1 E δ N+1 n N + 1. 2.2 A tight bound for structured problems In this section, we present a tight bound on the required number of samples for a special class of sampled convex programs. This bound is highly desirable, since the required number of samples n/εβ 1 from 2.1 is huge even for small problems and moderate risk parameters, as the following example shows. k=1 v k Example (optimal control). We consider an instance of the optimal control problem (1.2) with a single-input, single-output system (n u = n z = 1) and time horizon of T = 50 time steps. As discussed in 1, this problem can be written as an uncertain convex program with n = T (n u + n z ) + 1 = 101 optimization variables. If the corresponding CCP ε instance is intractable, we can generate an approximate solution x N by solving the corresponding SCP N instance. If we require that x N be feasible for CCP ε with 99.99% probability (β = 10 4 ), the bound from 2.1 suggests taking n/ε 10 4 1 samples. The first row of Table 2.1 shows this number for several values of ε. The second row shows the number of required samples with the tightened bound developed in this section. In 2008, Campi and Garatti [3] showed that if for each N, the SCP N derived from UCP has (i) a unique solution, and (ii) a feasible set with nonempty interior, then n 1 ( ) N P {V (x N) > ε} ε i (1 ε) N i. (2.4) i Moreover, if UCP is fully supported meaning that for all N n, SCP N has exactly n support constraints then the bound holds with equality. A proof of this result can be found in [3]. 9

The right-hand side of inequality (2.4) is equal to P {X n 1}, where X Bin(N, ε). The smallest N for which the right-hand side is no larger than β is upper bounded by { n 1 ( ) } N inf ε i (1 ε) N i β 2 ( ( ) ) 1 ln + n. N N i ε β This gives our second result: if for all N, the SCP N derived from UCP has a unique solution and a feasible set with nonempty interior, then N 2 ε 2.3 Comparison ( ( ) ) 1 ln + n β = P {V (x N) > ε} β. (2.5) The bounds (2.2) and (2.5) both grow linearly with n and 1/ε, but depend on β as O(1/β) and O(ln(1/β)), respectively. This difference is substantial: for β = 10 4, for example, 1/β = 10 4, but ln(1/β) = 9.2. The logarithmic dependence of (2.5) on 1/β means that under mild assumptions and for reasonably small N, we can guarantee that the SCP N solution is feasible for CCP ε with high probability. By imposing further structure on f and the distribution of δ, the ε-dependence can also be tightened to O(ln(1/ε)). 10

Chapter 3 Convex, conservative approximations to chance-constrained programs Recall that the general form of the chance-constrained program is minimize subject to c T x P {f(x, δ) 0} 1 ε x X. (CCP ε ) The chance constraint makes CCP ε nonconvex in general. In this chapter, following 2 of Nemirovski and Shapiro s 2006 paper [4], we develop methods to construct convex subsets of the CCP ε feasible region that lead to tractable optimization problems. This technique is conservative: solutions to the approximate problems are feasible but generally suboptimal for CCP ε. 3.1 Generating functions We begin by considering the set of variables that are feasible for the chance constraint, X ε = {x P {f(x, δ) 0} 1 ε}. The set X ε is convex only in special cases, e.g., when f(x, δ) = a T x δ and the distribution of δ is logconcave, or when f(x, δ) = δ T x b and δ is normal. In the general, nonconvex case, we seek a function g : R n R such that {x g(x) ε} is a convex subset of X ε. This holds if 1. g is convex, and 2. for all x, P {f(x, δ) 0} g(x). An equivalent expression of condition 2 is E I + (f(x, δ)) g(x), 11

e αz 1 α 2 (z + α) 2 + 1 α (z + α) + I + (z) α z Figure 3.1: the three generating functions e αz, (1/α 2 )(z + α) 2 +, and (1/α)(z + α) + are all convex, nondecreasing, and upper bound the indicator function I + (z) on R. where I + is the indicator function of R +, I + (z) = { 0 z / R + 1 z R +. We can guarantee that E I + (f(x, δ)) g(x) by ensuring that for all x and for all δ, I + (f(x, δ)) g(x). (3.1) To see this, consider the case where δ is a discrete random variable that takes value δ i R d with probability p i for i = 1,..., r. In this case, if inequality (3.1) holds, then E I + (f(x, δ)) = r p i I + (f(x, δ i )) r p i g(x) = g(x), since p 0 and 1 T p = 1. A similar argument can be made when δ is a continuous or mixed random variable. To summarize, we seek a convex function g(x) that upper bounds I + (f(x, δ)) for all x and δ. This motivates defining a family of generating functions ψ : R R, which (i) are convex, nonnegative, and nondecreasing, and (ii) satisfy ψ(z) > ψ(0) = 1 for all z > 0. Because a convex nondecreasing function of a convex function is convex, the function x ψ(f(x, δ)) is convex in x for any generating function ψ. Because ψ is nonnegative and 12

satisfies condition (ii), we also have ψ(z) I + (z) for all z R, with equality at z = 0 (see figure 3.1). For any generating function ψ, therefore, {x E ψ(f(x, δ)) ε} is a convex subset of X ε, and X {x E ψ(f(x, δ)) ε} is a convex subset of the CCP ε feasible region. We can generate an approximate solution to CCP ε by solving the problem minimize subject to c T x E ψ(f(x, δ)) ε x X. 3.2 Enlarging the inner approximation In 3.1 we developed a conservative approximation of CCP ε. So far, we have said nothing about how good or bad this approximation may be. In this section, we improve the approximation by (loosely speaking) expanding our convex subset of the CCP ε feasible region. To do this, we introduce a parameter t > 0 and replace f(x, δ) by f(x, δ)/t. The argument in 3.1 carries through exactly under this replacement because an equivalent description of the set of variables that are feasible for the chance constraint is { { } } X ε = x 1 P t f(x, δ) 0 1 ε. For any t > 0, therefore, {x E ψ(f(x, δ)/t) ε} is a convex subset of X ε. In the discussion following result (2.4) of [4], Nemirovski and Shapiro strengthen this statement to { { ( )} } x 1 inf tε + t E ψ f(x, δ) 0 X ε. t>0 t Using this fact, a conservative approximation to CCP ε is subject to inf t>0 { tε + t E ψ (f(x, δ)/t)} 0 x X. (3.2) For problem (3.2) to be convex, the function x inf t>0 { tε + t E ψ ( )} 1 f(x, δ) t must be convex. Because partial minimization preserves convexity and the sum of a linear and a convex function is convex, this is true as long as (x, t) t E ψ (f(x, δ)/t) is a convex function. But (f(x, δ), t) t E ψ (f(x, δ)/t) is the perspective of the convex function f(x, δ) E ψ (f(x, δ)), and the perspective of a convex function is convex (see 3.2.6 of [1]). Therefore, problem (3.2) is convex. While convex, problem (3.2) may still be intractable: the expectation integral may be costly or impossible to compute for some distributions on δ, and the infimum may have 13

a closed-form solution only for certain choices of ψ, as demonstrated in the following two examples. Example (square-positive generating function). For the generating function ψ(z) = (1/α 2 )(z + α) 2 + (the square of the positive part), we can derive a sufficient condition for the constraint { ( )} 1 inf tε + t E ψ f(x, δ) 0 (3.3) t>0 t in problem (3.2) to hold. We have ( ) 1 t E ψ f(x, δ) = t t α 2 E [ (1 f(x, δ) + α t ) 2 + = 1 [(f(x, α 2 t E δ) + αt) 2 + 1 [ α 2 t E (f(x, δ) + αt) 2], where the second line follows from the fact that t > 0, and the third from the fact that z+ 2 z 2 for all z. Expanding the quadratic term and simplifying, constraint (3.3) holds whenever { inf (1 ε)t + 1 t>0 α 2 t E [ f(x, δ) 2] + 2 } E f(x, δ) 0. (3.4) α Applying the first-order optimality conditions to the term inside the infimum (a convex function of t), we have t E [f(x, δ) = 2 ] α 2 (1 ε). Substituting t and simplifying, constraint (3.4) is equivalent to ( 1 + α 3/2) (1 ε) E [f(x, δ) 2 ] + 2 E f(x, δ) 0. (3.5) Importantly, this result depends only on the first two moments of f(x, δ). For example, suppose f(x, δ) = δ T x, E δ = µ, and E δδ T = Σ S n ++. Then E f(x, δ) = µ T x and E[f(x, δ) 2 ] = x T Σx, so inequality (3.5) reduces to the second-order cone constraint ( 1 + α 3/2) Σ (1 ε) 1/2 x + 2µ T x 0. 2 This gives the following conservative approximation of CCP ε : subject to ( 1 + α 3/2) (1 ε) Σ 1/2 x 2 + 2µ T x 0 x X. Assuming X is described by a system of linear equalities and linear, convex quadratic, or secondorder cone inequalities, this is a (tractable) second-order cone program. The only required distributional information is the first two moments of δ. ] ] 14

Example (hinge generating function). With the generating function ψ(z) = (1/α)(z + α) +, it is less clear how to ensure satisfaction of the constraint (3.3) in problem (3.2). We have ( ) 1 t E ψ f(x, δ) = tα [( ) ] 1t t E f(x, δ) + α + = 1 α E [ ] (f(x, δ) + αt) + 1 E f(x, δ) + αt. α Thus, constraint (3.3) holds whenever { tε + 1α } E f(x, δ) + αt 0. inf t>0 While convex, the term inside the infimum is not differentiable, so we cannot apply the first-order optimality condition. 3.3 Joint chance constraints We now extend the results in 3.2 to the case of multiple chance constraints. While this case can technically be reduced to the case of a single chance constraint by taking the pointwise maximum of the constraint functions, this may destroy useful structure. We consider the problem minimize subject to c T x P {f(x, δ) 0} 1 ε x X, where f : R n R d R m is convex. Writing f(x, δ) = (f 1 (x, δ),..., f m (x, δ)), a point x is feasible for the chance constraint if P m {f i (x, δ) 0} 1 ε or, equivalently, if P m {f i (x, δ) > 0} ε. But m m P {f i (x, δ) > 0} P {f i (x, δ) > 0}, so we can construct a conservative approximation of the chance constraint by bounding the right-hand probability. One way to accomplish this is to require P {f i (x, δ) > 0} ε i for i = 1,..., m, for some ε 1,..., ε m 0 satisfying m ε i ε. This gives the following conservative approximation of our jointly chance-constrained problem: subject to P {f i (x, δ) 0} 1 ε i, i = 1,..., m x X. 15

Applying the results in 3.1-3.2 to each chance constraint gives the following (again conservative) approximation: subject to inf t>0 { tε i + t E ψ i (f i (x, δ)/t)} 0, i = 1,..., m x X. Remark. Specifying the generating functions ψ 1,..., ψ m and bounds ε 1,..., ε m are important modeling steps that can strongly influence the tractability and quality of this approximation scheme. 3.4 Bernstein approximations for affinely perturbed constraints We now consider a special case of CCP ε with f(x, δ) affine in δ: f(x, δ) = f 0 (x) + d δ i f i (x). Here the known, deterministic functions f 0,..., f d : R n R are convex on X. For i = 1,..., d, we assume A1. the component δ i of δ is independent of δ j for all j i, A2. the support i of δ i is a compact set, A3. the moment generating function M i (t) = E e tδ i of δ i is finite-valued for all t R, and A4. if i contains a negative number, then f i is affine on X. Under these assumptions, the problem { subject to inf t>0 t ln ε + f 0 (x) + } d t ln M i(f i (x)/t) 0 x X is a convex, conservative approximation to CCP ε. One way to show this is to set ψ(z) = e z and apply the results in 3.2. It can also be shown more directly as follows. For any t > 0, the chance constraint P {f(x, δ) 0} 1 ε can be written as { ( ) } 1 d P f 0 (x) + δ i f i (x) > 0 ε t 16

or, equivalently, as P { exp ( ( 1 f 0 (x) + t )) } d δ i f i (x) > 1 ε. By Markov s inequality, { ( ( 1 P exp f 0 (x) + t )) } ( ( d 1 δ i f i (x) > 1 E exp f 0 (x) + t = e f 0(x)/t E = e f 0(x)/t d e δ if i (x)/t )) d δ i f i (x) d M i (δ i f i (x)/t), where the last line follows from the mutual independence of δ 1,..., δ d and the definition of M i. Therefore, the chance constraint holds if there exists a t > 0 such that or, equivalently, e f 0(x)/t f 0 (x) + t d d M i (δ i f i (x)/t) ε, ( ) 1 ln M i t δ if i (x) t ln ε. By a similar argument to the one mentioned in 3.2, we can sharpen this result to say that the chance constraint holds if { d ( ) } 1 f 0 (x) + t ln M i t δ if i (x) t ln ε 0. inf t>0 The convexity of this constraint can be established using an argument similar to that in 3.2. It involves the perspective function and the convexity and monotonicity of logarithmic moment generating functions. Summarizing, a convex, conservative approximation to CCP ε with f(x, δ) affine in δ is { subject to inf t>0 f 0 (x) + t } d ln M i (δ i f i (x)/t) t ln ε 0 x X. (3.6) The results in this section extend naturally to joint chance constraints. 17

Chapter 4 Tractable robust convex programs In this chapter, we consider the following special case of the general robust convex program: subject to Ax K b for all (A, b). (4.1) Here the generalized inequality Ax K b means that b Ax K, where K R m is a proper cone (i.e., K is a closed, convex cone with nonempty interior that contains no line.) See 2.4 of [1] for more on cones and generalized inequalities. When the proper cone K is specified, problem (4.1) can be reduced to the canonical RCP form of 1.1 by writing the generalized inequality as a system of scalar convex inequalities and constraining the pointwise maximum of their left-hand sides. In order to derive tractable robust convex programs, however, it is useful to retain the conic structure of problem (4.1). In these notes, we consider only the case of K = R m +, so that problem (4.1) is a robust linear program. We enumerate the uncertainty sets that lead to tractable problems. The approach generalizes, with some modifications, to robust second-order cone programs and robust semidefinite programs (see chapters 6 and 8 of [5]). 4.1 Robust linear programs In this section, we consider the case of K = R m +, which gives the robust linear program subject to δi T x b i for all δ i i, i = 1,..., m, (RLP) where b 1,..., b m R are known, deterministic constants. (The results in this section can be extended to the case of uncertain b i.) We are interested in forms of the uncertainty sets i R n that generate tractable optimization problems. Loosely speaking, this requires that the maximization problem generated by each robust constraint, maximize δi T x subject to δ i i, 18

have a tractable dual. This holds in particular if each i is defined by a set of linear generalized inequalities: i = {δ i C i δ i Ki d i }, for some proper cone K i and known, deterministic C i R l i n and d i R l i. 4.1.1 Polyhedral uncertainty sets A simple, solvable case of RLP involves polyhedral uncertainty sets: i = {δ i C i δ i d i }. (When applied to vectors, the symbol without subscript denotes generalized inequality with respect to the nonnegative orthant in other words, componentwise inequality.) The corresponding robust constraint is sup { δ T i x Ci δ i d i } bi. An x R n is feasible for this constraint if the optimal value of the following linear program in δ i is no larger than b i : maximize x T δ i subject to C i δ i d i. If i is nonempty, then this problem is feasible, so strong duality implies that its optimal value equals the optimal value of the dual linear program minimize subject to d T i λ i Ci T λ i = x λ i 0, with variable λ i R l i. Therefore, x is feasible for the i th robust constraint if there exists a λ i 0 satisfying C T i λ i = x and d T i λ i b i. We can write RLP with polyhedral uncertainty sets as subject to d T i λ i b i, i = 1,..., m Ci T λ i = d i, i = 1,..., m λ i 0, i = 1,..., m. (4.2) This problem remains a linear program, but the variable (x, λ 1,..., λ m ) has dimension n + l, where l = m l i. The robust constraint also introduces m inequality constraints, l equality constraints, and l nonnegativity constraints. 19

λ2 λ1 δ 0 i v 2 v 1 Figure 4.1: the ellipsoid {δ 0 i + D i u u 2 ρ i }. The matrix (ρ i D i ) 2 has eigenvalues λ 1 and λ 2 and eigenvectors v 1 and v 2. 4.1.2 Ellipsoidal uncertainty sets Another solvable case of RLP arises when the uncertainty sets are ellipsoids: i = { δ 0 i + D i u u 2 ρ i }. Here the ellipsoid center is δ 0 i. Its size and shape are determined by the known, deterministic scalar ρ i > 0 and matrix D i. Without loss of generality, we can assume D i S n +. The ellipsoid i in R 2 is illustrated in figure 4.1. The corresponding robust constraint is sup{δ T i x δ i i } b i or, equivalently, x T δ 0 i + sup { x T D i u u 2 ρ } b i. An x R n is feasible for this constraint if the optimal value of the following second-order cone program in u is no larger than b i x T δ 0 i : maximize x T D i u subject to u 2 ρ. This problem maximizes a linear function over the Euclidean ball of radius ρ. The solution is the longest feasible vector in the direction of D T i x, which is u = (ρ/ D T i x 2 )D T i x. The optimal value is ρ D T i x 2, so x is feasible for the i th robust constraint if ρ D T i x 2 b i x T δ 0 i. We can therefore write RLP with ellipsoidal uncertainty sets as subject to (δi 0 ) T x + ρ D T i x 2 b i, i = 1,..., m. (4.3) Unlike the polyhedral case, this problem has the same dimension as the original RLP. However, the ellipsoidal uncertainty sets promote the robust linear program to a second-order cone program, which is generally more costly to solve. 20

4.1.3 Generalization The argument in 4.1.1-4.1.2 can be generalized to accommodate uncertainty sets of the form i = {δ i C i δ i Ki d i }, where K i is a proper cone and C i R li n and d i R l i are known. To see this, we note that x is feasible for the constraint δi T x b i for all δ i i if and only if the optimal value of the problem minimize subject to x T δ i C i δ i Ki d i is no larger than b i. Assuming strong duality holds, an equivalent condition is that the optimal value of the dual maximize d T i λ i subject to Ci T λ i + x = 0 λ i Ki is no larger than b i. (Here Ki = { y x T y 0 for all x K } is the dual cone of K.) This holds if and only if there exists a λ i Ki satisfying d T i λ i b i and Ci T λ i + x = 0. Therefore, the robust LP with uncertainty sets defined by general conic inequalities is equivalent to subject to d T i λ i b i, i = 1,..., m Ci T (4.4) λ i + x = 0, i = 1,..., m λ i Ki, i = 1,..., m. Here the optimization variable (x, λ 1,..., λ m ) has dimension n + l, where l = m l i. The m robust constraints have been reduced to m inequality constraints, l equality constraints, and l cone constraints. It is easy to see that problem (4.4) reduces to problem (4.2) when K i = R l i +, i = 1,..., m. Problem (4.4) also reduces to problem (4.3) when K i =, i = 1,..., m. To see this, recall that D T i x 2 (1/ρ)( (δ 0 i ) T x + b i ) where SO n+1 is the second-order cone in R n+1. [ ] Di T x (δi 0 ) T SO n+1, x + b 21

Chapter 5 Affine policies for stochastic linear programs In this chapter, we consider the following situation. Given some matrices A, B, and C, we must decide a function x (also called a policy or decision rule). At some point in the future, an uncertain parameter δ will be revealed. At that time, we will incur a cost δ T C T x(δ). Our goal is to minimize the expected value of this cost, subject to the constraint that our policy x must satisfy Ax(δ) Bδ almost surely. Formally, we consider the single-stage stochastic linear program minimize subject to E δ T C T x(δ) Ax(δ) + s(δ) = Bδ s(δ) 0 } P-a.s. (SP) The optimization variables are x L 2 k,n and s L2 k,m, where L2 r,s denotes the space of Borel-measurable, square-integrable functions from R r to R s. The notation P-a.s. means almost surely under the probability measure P. The matrices A R m n, B R m k, and C R n k are known exactly. The uncertain parameter δ R k has support where = {δ W δ h}, e T 1 1 W = e T 1 R l k, h = 1 R l, Ŵ 0 for some Ŵ R(l 2) k. In other words, any δ satisfies δ 1 = 1 and Ŵ δ 0. Problem SP is intractable in general. In fact, even if δ is uniformly distributed on the unit hypercube, problem SP is known to be #P-hard (meaning, loosely speaking, that it is at least as hard as any problem in NP.) Therefore, in 5.1, we will derive a tractable approximation of problem SP by restricting attention to policies that are affine in δ. In 5.2, we will employ similar methods to derive a bound on the suboptimality of our approximate solutions. 22

To facilitate these approximations, we assume A1. is nonempty and bounded, and A2. spans R k. Assumption A1 will allow us to apply linear programming strong duality in 5.1. Assumption A2 guarantees that the moment matrix M = E δδ T is positive definite (hence invertible). To see this, first note that M is positive semidefinite: For any v, v T Mv = E v T δδ T v = E(v T δ) 2 0. To see that M is also positive definite, suppose that v 0. Then v T Mv = 0 only if v T δ = 0 for all δ. But spans R k by assumption A2, so this can hold only if v = 0, a contradiction. Thus, v T Mv > 0 for all v 0. 5.1 Primal affine approximation In this section, we develop a method for solving problem SP approximately. We restrict attention to linear policies x(δ) = Xδ and s(δ) = Sδ, for some X R n k and S R m k. Since δ 1 = 1 with probability one for any δ, a linear policy in δ is actually affine in the truly uncertain parameters δ 2,..., δ k. Hence the title of this section. With the restriction to linear policies, we can simplify the objective function of problem SP, and derive tractable reformulations of its constraints. The objective function becomes E δ T C T x(δ) = E δ T C T Xδ = E tr(δ T C T Xδ) = E tr(δδ T C T X) = tr(mc T X). Since linear functions are continuous, the equality constraint Ax(δ) + s(δ) = Bδ P-a.s. (5.1) is equivalent to AXδ + Sδ = Bδ for all δ (AX + S B)δ = 0 for all δ. But spans R k by assumption A2, so this holds if and only if AX + S = B. (5.2) 23

This is a finite-dimensional reformulation of the original equality constraint (5.1), with x( ) and s( ) restricted to be linear functions. Similarly, linearity (hence continuity) of s implies that the inequality constraint is equivalent to s(δ) 0 P-a.s. (5.3) s T i δ 0 for all δ, i = 1,..., m, where S = [ s 1 s m ] T. But by definition of, the constraint s T i δ 0 for all δ holds if and only if the optimal value of the linear program minimize subject to is nonnegative. The dual of this linear program is maximize subject to s T i δ W δ h h T λ i W T λ i = s i λ i 0. Since is nonempty and bounded by assumption A1, strong duality implies that s T i δ 0 for all δ if and only if there exists a λ i R l + such that W T λ i = s i and h T λ i 0. Applying this result for each i = 1,..., m, the constraint (5.3) holds if and only if there exist λ 1,..., λ m R l such that λ T i W = s T i, λ T i h 0, i = 1,..., m i = 1,..., m λ i 0, i = 1,..., m. This system can be written more compactly by introducing a matrix Λ = [ λ 1 λ m ] T. With this notation, the constraint (5.3) holds if and only if there exists a Λ R m l such that ΛW = S Λh 0 Λ ij 0, i = 1,..., m, j = 1,..., l. Collecting our objective function and constraint reformulations, we have the following primal linear approximation of problem SP: minimize tr(mc T X) subject to AX + ΛW = B Λh 0 Λ ij 0, i = 1,..., m, j = 1,..., l. (SP u ) This is a finite-dimensional, deterministic linear program with optimization variables X R n k and Λ R m l. Because we have reduced the search space from square-integrable functions to linear functions, the optimal value of problem SP u is an upper bound on the optimal value of problem SP. 24

5.2 Dual affine approximation In 5.1, we derived a primal approximation to the single-stage stochastic program SP. The derivation involved constructing an inner approximation of the SP feasible region by restricting the search space from square integrable functions to linear ones. The resulting approximation provides an upper bound on the optimal value of SP. Because the primal linear approximation is tractable, it also provides a means of generating solutions to SP that, while suboptimal, might be quite good. In this section, we develop a means of quantifying how good the primal linear approximate solutions are. This involves constructing a lower bound on the SP optimal value by dualizing its equality constraints and imposing another linearity restriction. Once we have derived a lower bound, we can compare it to the upper bound from the primal linear approximation. The difference between the upper and lower bounds is itself an upper bound on the suboptimality of the primal linear approximate solutions. We begin by dualizing the equality constraints in problem SP, giving the equivalent reformulation minimize E [ δ T C T x(δ) ] { [ + sup y L 2 k,m E y(δ) T (Ax(δ) + s(δ) Bδ) ]} subject to s(δ) 0 P-a.s.. (5.4) Here the equivalence follows from the fact that with probability one, the supremum over y yields zero whenever Ax(δ) + s(δ) = Bδ, and + otherwise. We seek a tractable problem whose optimal value lower bounds the optimal value of problem (5.4). To that end, we note that sup y L 2 k,m { E [ y(δ) T (Ax(δ) + s(δ) Bδ) ]} sup Y R m k { E [ δ T Y T (Ax(δ) + s(δ) Bδ) ]}. (5.5) In other words, the optimal value of problem (5.4) cannot increase when the dual search space is restricted from square-integrable functions y to linear functions y(δ) = Y δ. The right-hand side of inequality (5.5) can be simplified: E [ δ T Y T (Ax(δ) + s(δ) Bδ) ] = E [ tr ( δ T Y T (Ax(δ) + s(δ) Bδ) )] = tr ( Y T E [ (Ax(δ) + s(δ) Bδ) δ T ]) Therefore, the optimal value of problem SP cannot be less than the optimal value of the problem minimize E [ δ T C T x(δ) ] { ( + sup Y R m k tr Y T E [ (Ax(δ) + s(δ) Bδ) δ ])} T (5.6) subject to s(δ) 0 P-a.s. The supremum in problem (5.6) yields zero whenever E [ (Ax(δ) + s(δ) Bδ) δ T ] = 0 and yields + otherwise. Therefore, we can replace the supremum with the constraint E [ (Ax(δ) + s(δ) Bδ) δ T ] = 0 A E x(δ)δ T + E s(δ)δ T = BM, 25

where the equivalence follows from the linearity of expectation and the definition of the moment matrix M. Recall that x : R k R n and s : R k R m are general square-integrable functions. We therefore have no clean expressions for the terms E x(δ)δ T and E s(δ)δ T. However, let us change variables to matrices X R n k and S R m k, defined such that XM = E x(δ)δ T SM = E s(δ)δ T. We have shown that M is positive definite (hence invertible), so the matrices X and S can be uniquely determined from the functions x and s. With this change of variables, the constraint A E x(δ)δ T + E s(δ)δ T = BM is equivalent to (AX + S B)M = 0. Since M is invertible, this holds if and only if Given XM = E x(δ)δ T, we also have that AX + S = B. E δ T C T x(δ) = tr ( C T E x(δ)δ T ) = tr(c T XM). Therefore, an equivalent reformulation of problem (5.6) is minimize tr(mc T X) subject to AX + S = B x L 2 k,n s.t. XM = E x(δ)δ T s L 2 k,m s.t. SM = E s(δ)δ T and s(δ) 0 } P-a.s. (5.7) Problem (5.7) is still intractable due to the second and third constraints. However, the second constraint is redundant and can be deleted. To see this, note that given any X R n k, we can produce an x L 2 k,n with XM = E x(δ)δt by setting x(δ) = Xδ. The third constraint can be replaced by (W he T 1 )MS T 0. A proof of this fact can be found in 2.4 of [6]. Thus, the optimal value of the problem minimize tr(mc T X) subject to AX + S = B ((W he T 1 )MS T ) ij 0, i = 1,..., l, j = 1,..., m. (5.8) lower bounds the optimal value of problem SP. Like the primal linear approximation problem SP u, the dual linear approximation problem SP l is a finite-dimensional linear program with optimization variables X R n k and S R m k. The primal and dual linear approximations give the following method for solving problem SP approximately and bounding the suboptimality gap. 26

1. Compute a solution (X u, S u ) to the primal linear approximation SP u. 2. Compute a solution (X l, S l ) to the dual linear approximation SP l. 3. Observe that tr ( MC T X u) E δ T C T x (δ) tr ( MC ( T X u X l)), where x L 2 k,n is the optimal value of problem SP. 5.3 Example: a two-stage stochastic linear program We consider the problem minimize subject to c T x + E δ T D T y(δ) A x x + A y y(δ) + s(δ) = Bδ s(δ) 0 } P-a.s. (5.9) Here we must decide both the vector x R nx and the policy y L 2 k.n y. At some point in the future, we will realize δ and incur cost c T x + δ T Dy(δ). Our goal is to choose x and y to minimize this cost, subject to the constraint that A x x + A y y(δ) Bδ almost surely. The problem data are A x R m nx, A y R m ny, B R m k, c R nx, and D R ny k, as well as Ŵ R(l 2) k, which defines the support of δ, and the moment matrix M S k ++. The primal affine approximation of this problem is + tr(md T Y ) subject to A x xe T 1 + A y Y + ΛW = B Λh 0 Λ ij 0, i = 1,..., m, j = 1,..., l, (5.10) with variables x R nx, Y R ny k, and Λ R m l. The dual affine approximation is + tr(md T Y ) subject to A x xe T 1 + A y Y + S = B ((W he T 1 )MS T ) ij 0, i = 1,..., l, j = 1,..., m. (5.11) with variables x R nx, Y R ny k, and S R m k. We will compare the affine policy to a simple approximation method called sample average approximation. In this method, we generate the independent samples δ 1,..., δ N from the distribution of δ, then solve the problem + (1/N) N δt i D T y i subject to A x x + A y y i Bδ i, i = 1,..., N. (5.12) Here the variables are x R nx and y 1,..., y N R ny. With probability one, the true realization of δ will not equal any of the samples δ i, so implementing the sample average 27

y(δ) y(δ) 2 1 + δ 2 2 1 + δ 2 x x Figure 5.1: feasible region (black) and cost vector (c, Dδ) = (1, (1 + δ 2 )) (red) for δ 2 = 0 (left) and δ 2 = 1/2 (right). Different values of δ 2 give different optimal vertices. approximation policy requires interpolation. For large N, the optimal value of the sample average approximation approaches the optimal value of the true problem (5.9). The sample average approximation policy is much more costly to compute than the affine policies, however. Problem (5.12) has n x +Nn y variables and mn constraints, compared to n x +kn y +ml variables and l + m + lm constraints in problem (5.10) and n x + kn y + km variables and lm + m constraints in problem (5.11). 5.3.1 Problem instance We consider a simple problem instance with k = 2 and δ uniformly distributed on = {δ δ 1 = 1, 0 δ 2 1}, so that l = 4 and [ ] [ ] 0 1 1 1/2 Ŵ =, M =. 1 1 1/2 1/3 The dimensions are n x = n y = 1, with constraint matrices 1 0 0 0 A x = 0, A y = 1, B = 2 0, 2 1 1 1 giving the feasible region { (x, y) R L 2 2,1 x 0, y(δ) max { 2, 1 T δ + 2x } P-a.s. }. We consider costs c = 1 and D = 1 T. Figure 5.1 shows the problem geometry for two fixed values of the uncertain parameter δ 2. Figure 5.2 shows the optimal values of the approximation problems. Figure 5.3 shows a histogram of the costs incurred by each policy, over 100,000 Monte Carlo simulations. 28

2 2.1 Sample average 2.2 Optimal value 2.3 2.4 2.5 2.6 2.7 Primal affine upper bound Dual affine lower bound 2.8 2.9 3 200 400 600 800 1000 1200 1400 1600 1800 2000 N Figure 5.2: for large N, the optimal value of the sample average approximation problem (5.12) (red) approaches the optimal value of the true problem (5.9), which lies between the primal and dual affine approximations. 2000 Primal affine 1000 2000 0 4.5 4 3.5 3 2.5 2 1.5 1 0.5 Dual affine 1000 2000 1000 0 4.5 4 3.5 3 2.5 2 1.5 1 0.5 Sample average (N = 2000) 0 4.5 4 3.5 3 2.5 2 1.5 1 0.5 Cost Figure 5.3: cost histogram of the three policies over 100,000 Monte Carlo runs. In the sample average policy, we compute y(δ) by linearly interpolating the solutions y 1,..., y N corresponding to the samples δ 1,..., δ N. 29

Bibliography [1] Boyd, Stephen, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004. [2] Calafiore, Giuseppe, and Marco C. Campi. Uncertain convex programs: randomized solutions and confidence levels. Mathematical Programming 102.1 (2005): 25-46. [3] Campi, Marco C., and Simone Garatti. The exact feasibility of randomized solutions of uncertain convex programs. SIAM Journal on Optimization 19.3 (2008): 1211-1230. [4] Nemirovski, Arkadi, and Alexander Shapiro. Convex approximations of chance constrained programs. SIAM Journal on Optimization 17.4 (2006): 969-996. [5] Ben-Tal, Aharon, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization. Princeton University Press, 2009. [6] Kuhn, Daniel, Wolfram Wiesemann, and Angelos Georghiou. Primal and dual linear decision rules in stochastic and robust optimization. Mathematical Programming 130.1 (2011): 177-209. 30