A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition

Similar documents
Models and Algorithms for Distributionally Robust Least Squares Problems

Distributionally Robust Convex Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Lecture: Convex Optimization Problems

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Convex optimization problems. Optimization problem in standard form

4. Convex optimization problems

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Additional Homework Problems

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

On deterministic reformulations of distributionally robust joint chance constrained optimization problems

Lecture Note 5: Semidefinite Programming for Stability Analysis

4. Convex optimization problems

Sanjay Mehrotra Dávid Papp. October 30, 2018

Convex Optimization M2

Supplementary Material

Decomposition Algorithms for Two-Stage Distributionally Robust Mixed Binary Programs

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

What can be expressed via Conic Quadratic and Semidefinite Programming?

Assignment 1: From the Definition of Convexity to Helley Theorem

Stability of optimization problems with stochastic dominance constraints

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Inequality Constraints

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

Chapter 2: Preliminaries and elements of convex analysis

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

5. Duality. Lagrangian

c 2000 Society for Industrial and Applied Mathematics

Constrained Optimization and Lagrangian Duality

Portfolio Selection under Model Uncertainty:

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

4TE3/6TE3. Algorithms for. Continuous Optimization

Lecture: Duality of LP, SOCP and SDP

Distributionally Robust Optimization with ROME (part 1)

Lagrangian Duality Theory

Math 273a: Optimization Subgradients of convex functions

Convex analysis and profit/cost/support functions

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

15. Conic optimization

Random Convex Approximations of Ambiguous Chance Constrained Programs

Convex Optimization and Modeling

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization in Classification Problems

Lecture 6: Conic Optimization September 8

12. Interior-point methods

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Implications of the Constant Rank Constraint Qualification

Min-max-min robustness: a new approach to combinatorial optimization under uncertainty based on multiple solutions 1

1 Review of last lecture and introduction

Primal/Dual Decomposition Methods

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty

Lecture 8. Strong Duality Results. September 22, 2008

Convex Optimization & Lagrange Duality

On John type ellipsoids

The Karush-Kuhn-Tucker (KKT) conditions

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Lecture 3: Semidefinite Programming

On the Existence and Convergence of the Central Path for Convex Programming and Some Duality Results

Quadratic Two-Stage Stochastic Optimization with Coherent Measures of Risk

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

COURSE ON LMI PART I.2 GEOMETRY OF LMI SETS. Didier HENRION henrion

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Convergence Analysis for Distributionally Robust Optimization and Equilibrium Problems*

Trust Region Problems with Linear Inequality Constraints: Exact SDP Relaxation, Global Optimality and Robust Optimization

Lecture: Examples of LP, SOCP and SDP

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Lecture: Duality.

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

A sensitivity result for quadratic semidefinite programs with an application to a sequential quadratic semidefinite programming algorithm

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Lecture 3. Optimization Problems and Iterative Algorithms

Distributionally Robust Optimization with Infinitely Constrained Ambiguity Sets

Introduction to Semidefinite Programming I: Basic properties a

Machine Learning. Support Vector Machines. Manfred Huber

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Optimized Bonferroni Approximations of Distributionally Robust Joint Chance Constraints

Convex Optimization M2

Lecture 7 Monotonicity. September 21, 2008

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Support Vector Machines

Convex Optimization and l 1 -minimization

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

Agenda. 1 Duality for LP. 2 Theorem of alternatives. 3 Conic Duality. 4 Dual cones. 5 Geometric view of cone programs. 6 Conic duality theorem

Optimized Bonferroni Approximations of Distributionally Robust Joint Chance Constraints

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

10 Numerical methods for constrained problems

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Iteration-complexity of first-order penalty methods for convex programming

Notation and Prerequisites

Elements of Convex Optimization Theory

arxiv: v2 [math.oc] 10 May 2017

Technische Universität Dresden Herausgeber: Der Rektor

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Transcription:

A Two-Stage Moment Robust Optimization Model and its Solution Using Decomposition Sanjay Mehrotra and He Zhang July 23, 2013 Abstract Moment robust optimization models formulate a stochastic problem with an uncertain probability distribution of parameters described by its moments. In this paper we study a two-stage stochastic convex programg model using moments to define the probability ambiguity set for the objective function coefficients in both stages. A decomposition based algorithm is given. We show that this two-stage model can be solved to any precision in polynomial time. We consider a special case where the probability ambiguity sets are described by the exact information of the first two moments and the convex functions are piece-wise linear utility functions. A two-stage stochastic semidefinite programg formulation is given of this problem and we provide computational results on the performance of this problem using a portfolio optimization application. Results show that the two stage modeling is effective when forecasting models have predictive power. 1 Introduction Moment robust optimization models specify information on the distribution of the uncertain parameters using moments of the probability distribution of these parameters. The probability distribution is not known. Scarf [16] proposed such a model for the newsvendor problem. In his problem the given information is the mean and variance of the distribution. Recently, different forms of the distributional ambiguity sets have been considered. Bertsimas et al. [2] studied a piece-wise linear utility model with exact knowledge of the first two moments. Two cases are considered in [2]: (i) the uncertain coefficients are in the objective function, and (ii) the uncertain coefficients are in the right-hand side of the constraints. For the first case, Bertsimas et al. [2] give an equivalent semidefinite programg (SDP) formulation. When the uncertain coefficients are in the right-hand side of the constraints, they show that their robust model is NP-complete. The uncertain parameters only appear in the second stage problem, hence their model can be considered as a single stage moment robust optimization Research partially supported by NSF grant CMMI-1100868 and ONR grant N00014-09-10518 and N00014210051. Dept. of Industrial Engineering & Management Sciences, Northwestern University, Evanston, IL 60201 Dept. of Industrial Engineering & Management Sciences, Northwestern University, Evanston, IL 60201 1

problem. Delage and Ye [5] also considered such a single stage moment robust convex optimization program with the ambiguity set defined by a confidence region for both first and second moments. They show that this problem can be formulated as a semi-infinite convex optimization problem. It is then solved by the ellipsoid method in polynomial time. Alternatives to using moments to specify the distribution uncertainty have been proposed. Pflug and Wozabal [14] analyzed the portfolio selection problem with the ambiguity set defined by a confidence region over a reference probability measure using the Kantorovich distance. Shapiro and Ahmed [17] analyzed a class of convex optimization problem with ambiguity set defined by general moment constraints and bounds over the probability measures. Mehrotra and Zhang [13] give conic reformulations of ambiguity models in [5, 14, 17] for the distributionally robust least-squares problem. In this paper we consider a two-stage moment robust stochastic convex optimization problem given as: x X G(x) = G k (x) := f(x) + G(x), (1.1) K π k G k (x), (1.2) k=1 The objective functions f(x) and g k (w k ) are defined as: g k(w k ). (1.3) w k W k (x) f(x) := max E P [ρ 1 (x, p)], P P 1 (1.4) g k (w k ) := max E P [ρ 2 (w k, q)], P P 2,k (1.5) where ρ 1 ( ) and ρ 2 ( ) are two general functions and the expectations in (1.4) and (1.5) are taken for the random vectors p and q. S 1 and S 2 are first and second stage sample spaces. Note that P 2,k may depend on k. Let M 1 and M 2 be the measures defined on S 1 and S 2 with the Borel σ-algebra. The probability ambiguity sets P 1 and P 2,k are defined as: P 1 := {P : P M 1, E P [1] = 1, (E P [ p] µ 1 ) T Σ 1 1 (E P [ p] µ 1 ) α 1, (1.6) E P [( p µ 1 )( p µ 1 ) T ] β 1 Σ 1 }, P 2,k := {P : P M 2, E P [1] = 1, (E P [ q] µ 2,k ) T Σ 1 2,k (E P[ q] µ 2,k ) α 2,k, (1.7) E P [( q µ 2,k )( q µ 2,k ) T ] β 2,k Σ 2,k }, where µ 1, Σ 1, µ 2,k, Σ 2,k are moment parameters used to define ambiguity sets P 1 and P 2,k. This definition is same as the one used in Delage and Ye [5]. The feasible set X is a convex compact set and the second stage feasible set W k (x) is defined as: W k (x) := {W k w k = h k T k x, w k W}, (1.8) where W is a nonempty convex compact set in R n 2, x R n 1 and w k R n 2. The situation leading to modeling framework (1.1)-(1.8) is one where the current parameters of a decision problem are uncertain, and this uncertainty propogates through a known stochastic process, 2

leading to future uncertain parameters. For example, the estimated current returns of stocks are ambiguous, and the portfolio needs to be balanced at a future step where the returns are also ambiguous. In (1.1)-(1.3), we have moment robust problems in both stages and the parameter estimations of the moments for the second stage problem (1.3) are uncertain when the decision makers optimize the first-stage problem (1.1). Note that although the parameter W k, T k and h k are random in this model, in view of the NP-completeness results for the single stage case [2], in the current paper we do not define an ambiguity set for these parameters. Only the parameters of the convex objective function for the first and the second stage problem are defined over an ambiguity set. In this paper we develop a decomposition based algorithm to solve (1.1)-(1.8). We show that (1.1)-(1.8) can be solved in polynomial time under suitable, yet general assumptions. For the single stage moment robust problem in [5], a key requirement is that a semi-infinite constraint is verified, or an oracle gives a cut in polynomial time. Since the constraint can only be verified to ɛ-precision in our case, we need further development to prove the polynomial solvability of (1.1)-(1.8). The ɛ-precision only allows an ɛ-precision verification of the constraint feasibility. Also, we can only generate an approximate cut to separate infeasible points. Both facts suggest that we need to use approximate separation oracles to prove the polynomial solvability. We also study a two-stage moment robust portfolio optimization model and empirical results suggests that the two-stage modeling framework is effective when we have forecasting power. This paper is organized as follows. In Section 2 we give additional notations, the necessary definitions, and assumptions for (1.1)-(1.8). In Section 3 we develop an equivalent formulation of the two-stage framework (1.1)-(1.3). In Section 4, we present some results for convex optimization problem needed to calculate ɛ-sub and supergradients. Section 5 gives analysis of an ellipsoidal decomposition algorithm for (1.1)-(1.8). In Section 6 a two-stage moment robust portfolio optimization model is considered. We use data to study the effectiveness of the two-stage model. We compare our two-stage model with two other models and experimental results suggest that our two-stage model has better performance when we have forecasting power. 2 Definitions and Assumptions In this section we summarize several definitions and assumptions used in the rest of this paper. Throughout we use the phrase polynomial time to refer to computations performed using number of elementary operations that are polynomial in problem dimension, size of the input data, and log( 1 ), using exact arithmetric. Here ɛ is the desired precision for the ɛ optimization problem (1.1)-(1.8). Definition 1 Let us consider variables x with dim(x) = n. For any set C R n and a positive real number ɛ, the set B x (C, ɛ) is defined as: B x (C, ɛ) := {x R n : x y ɛ for some y C}. B x (C, ɛ) is the ɛ-ball covering of C. In particular, when C is a singleton, i.e. C = { x}, we set B x (C, ɛ) = B x ( x, ɛ), which is the ɛ-ball around x. 3

Definition 2 Let us consider an optimization problem (P): x C h(x), where C R n is a full dimensional closed convex set and h(x) is a convex function of x. We say that (P) is solved to ɛ-precision if we can find a feasible x C, such that h( x) h(x) + ɛ for all x C. Definition 3 For a convex function f(x) defined on R n, d R n is an ɛ-subgradient at x if for all z R n, f(z) f(x) + d T (z x) ɛ. For a concave function f(x) defined on R n, d R n is an ɛ-supergradient at x if for all z R n, f(z) f(x) + d T (z x) + ɛ. Definition 4 Let us consider the convex optimization problem f(x) (2.1) s.t. g(x) 0 h(x) = 0 x X, where f : R n R, g : R n R m are convex component-wise, h : R n R l is affine, and X is a nonempty convex compact set. For any µ R m +, λ R l, we define the Lagrangian dual as: θ(µ, λ) := x X {f(x) + µt g(x) + λ T h(x)}. (2.2) Let γ be the optimal objective value of (2.1). We call µ R m + and λ R l to be an ɛ optimal Lagrange multipliers of the constraints g(x) 0 and h(x) = 0 if 0 γ θ( µ, λ) < ɛ. Note that because of weak duality γ θ(µ, λ) 0 for any µ R m + and λ R l. For the two-stage moment robust problem (1.1)-(1.3), we make the following assumptions: Assumption 1 For α 1 0, β 1 1, and Σ 1 0, ρ 1 (x, p) is P-integrable for all P P 1. Assumption 2 The sample space S 1 R m 1 is convex and compact (closed and bounded), and it is equipped with an oracle that for any p R m 1 can either confirm that p S 1 or provide a hyperplane that separates p from S 1 in polynomial time. Assumption 3 The set X R n 1 is convex, compact, and full dimensional (closed and bounded with nonempty interior). There exists an x 0 X and r0, 1 R0 1 R +, such that B x (x 0, r0) 1 X B x (0, R0). 1 In the context of the ellipsoid method we assume that x 0, r0 1 and R0 1 are known. X is equipped with an oracle that for any x R n 1 can either confirm that x X or provide a vector d R n 1 with d 1 such that d T x < d T x for x X in polynomial time. Assumption 4 The function ρ 1 (x, p) is concave in p. In addition, given a pair (x, p), it is assumed that in polynomial time, one can: 1. evaluate the value of ρ 1 (x, p); 2. find a supergradient of ρ 1 (x, p) in p. Assumption 5 The function ρ 1 (x, p) is convex in x. In addition, it is assumed that one can find in polynomial time a subgradient of ρ 1 (x, p) in x. 4

Assumption 6 For any k {1,..., K}, α 2,k 0, β 2,k 1, and Σ 2,k 0, ρ 2 (w, q) is P-integrable for all P P 2,k. Assumption 7 The sample space S 2 R m 2 is convex and compact (closed and bounded), and it is equipped with an oracle that for any q R m 2 can either confirm that q S 2 or provide a hyperplane that separates q from S 2 in polynomial time. Assumption 8 The set W R n 2 is convex, compact, and full dimensional. There exists known w 0 W and r0, 2 R0 2 R +, such that B w (w 0, r0) 2 W B w (0, R0). 2 It is equipped with an oracle that for any w R n 2 can either confirm that w W or provide a hyperplane, i.e. a vector d R n 2 with d 1 that separates w from W in polynomial time. Assumption 9 For any k {1,..., K}, x X, W k (x) := {w k : W k w k = h k T k x, w k W}. Assumption 9 implies that W k (x) is nonempty and compact for any k {1,..., K}. This is a standard assumption in stochastic programg. It may be ensured by using an artificial variable for each scenario. Assumption 10 The function ρ 2 (w, q) is concave in q. In addition, given a pair (w, q), we assume that we can in polynomial time: 1. evaluate the value of ρ 2 (w, q); 2. find a supergradient of ρ 2 (w, q) in q. Assumption 11 The function ρ 2 (w, q) is convex in w. In addition, we assume that we can find in polynomial time a subgradient of ρ 2 (w, q) in w. Assumptions 1-2 are to guarantee that the first stage ambiguity set (1.6) is well defined. Assumption 3 requires that the first stage feasible region is compact, and a separating hyperplane with enough norm can be generated for any infeasible point. Assumptions 4-5 make sure that the objective function is convex/conave and its sub and supergradients can be computed efficiently. Assumptions 1-5 are similar as the assumptions in [5]. Assumptions 6-11 are similar to Assumptions 1-5. These assumptions ensure that the second stage problem is a well-defined convex program for each scenario. 3 Equivalent Formulation of Two-Stage Moment Robust Problem In this section we give an equivalent formulation of the two stage moment robust problem (1.1)-(1.3). The next theorem gives such a reformulation for a single stage problem. Theorem 1 (Delage and Ye 2010 [5]). Let v := (Y, y, y 0, t) and define: c(v) : = y 0 + t (3.1) V(h, x) : = {v : y 0 h(x, q) q T Yq q T y, q S, (3.2) t (βσ 0 + µ 0 µ T 0 ) Y + µ T 0 y + α Σ 1 2 0 (y + 2Yµ 0 ), Y 0}. 5

Consider the moment robust problem with probability ambiguity set P defined as (max E P[h(x, q)]) (3.3) x X P P P := {P : P M, E P [ q] = 1, (E P [ q] µ 0 ) T Σ 1 0 (E P [ q] µ 0 ) α, (3.4) Consider the following assumptions: E P [( q µ 0 )( q µ 0 ) T ] βσ 0 }. (i) α 0, β 1, Σ 0, and h(x, q) is P-integrable for all P P. (ii) The sample space S R m is convex and compact, and it is equipped with an oracle that for any q R m can either confirm that q S or provide a hyperplane that separates q from S in polynomial time. (iii) The feasible region X is convex and compact, and it is equippend with an oracle that for any x R n can either confirm that x X or provide a hyperplane that separates x from X in polynomial time. (iv) The function h(x, q) is concave in q. In addition, given a pair (x, q), it is assumed that one in polynomial time can: 1. evaluate the value of h(x, q); 2. find a supergradient of h(x, q) in q. (v) The function h(x, q) is convex in x. In addition, it is assumed that one can find in polynomial time a subgradient of h(x, q) in x. If assumption (i) is satisfied, then for any given x X, the optimal value of the inner problem max P P E P [h(x, q)] in (3.3) is equal to the optimal value c(v ) of the problem: c(v). (3.5) v V(h,x) If assumptions (i)-(v) are satisfied, then (3.3) is equivalent to the following problem v V(h,x),x X c(v). (3.6) Problem (3.6) is well defined, and can be solved by the ellipsoid method to any precision ɛ in polynomial time. Applying Theorem 1 to (1.4) and (1.5), we have an equivalent two-stage semi-infinite programg formulation of (1.1)-(1.8) as stated in the following theorem. Theorem 2 Suppose that Assumptions 1-11 are satisfied. Then the two-stage moment robust problem (1.1)-(1.8) is equivalent to x X,v 1 V(ρ 1,x) G(x) = G k (x) := c(v 1 ) + G(x) (3.7) K π k G k (x), (3.8) k=1 c(v 2,k), (3.9) w k W k (x),v 2,k V(ρ 2,w k ) 6

where: v 1 := (Y 1, y 1, y 0 1, t 1 ), (3.10) c(v 1 ) := y 0 1 + t 1, (3.11) V 1 (ρ 1, x) := {v 1 : y1 0 ρ 1 (x, p) p T Y 1 p p T y 1, p S 1 (3.12) t 1 (β 1 Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + Σ α 1 2 1 1 (y 1 + 2Y 1 µ 1 ) Y 1 0}, and for any k {1,..., K}, v 2,k := (Y 2,k, y 2,k, y 0 2,k, t 2,k ), (3.13) c(v 2,k ) := y 0 2,k + t 2,k, (3.14) V 2,k (ρ 2, w k ) := {v 2,k : y 0 2,k ρ 2 (w k, q) q T Y 2,k q q T y 2,k, q S 2, (3.15) t 2,k (β 2 Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k + Σ α 1 2 2 2,k (y 2,k + 2Y 2,k µ 2,k ), Y 2,k 0}. Problem (3.7)-(3.9) is a two-stage stochastic program, where both stages are semi-infinite programg problems. We will develop a decomposition algorithm to prove the polynomial solvability of (3.7)-(3.9) in Section 5. 4 Subgradient Calculation and Polynomial Solvability In this section, we summarize some known results on convex optimization. We also present a basic result for computing an ɛ-subgradient for general convex optimization problem. The following theorem from Grotschel, Lovasz and Schrijver [7] shows that a convex optimization problem and the separation problem of a convex set are polynomially equivalent. Theorem 3 (Grotschel et al. [7, Theorem 3.1]). Consider a convex optimization problem of the form z Z ct z with a linear objective function and a convex closed feasible set Z R n. Assume that there are known constants a 0, r and R such that B z (a 0, r) Z B z (0, R). Assume that we have an oracle such that given a vector z and a number δ > 0, we can conclude with one of the following: 1. z passes the test of the oracle, i.e., ensure that z B z (Z, δ) in polyhnomial time. 2. z does not pass the test of the oracle and it can generate a vector d R n with d 1 such that d T z d T z + δ for every z Z in polynomial time. Then, given an ɛ > 0, we can find a vector y satisfying the oracle with some δ ɛ such that c T y ɛ c T z for z Z in polynomial time by using the ellipsoid method. The following proposition tells us that the average of N ɛ-subgradients is still an ɛ-subgradient of the average of the N convex functions. 7

Proposition 1 (Hiriart-Urruty and Lemarechal [8, Theorem 3.1.1]). If h 1 (x),..., h N (x) are N convex functions w.r.t. x and h ɛ 1( x),..., h ɛ N ( x) are ɛ-subgraduents of these N convex function at x, then for any π 1,..., π N R + with N i=1 π i = 1, N i=1 π i h ɛ i( x) is a ɛ-subgradient of N i=1 π ih i (x) at x. The following lemma shows that the ɛ-optimal solutions of the Lagrangian may be used to obtain an ɛ-supergradient of the Lagrangian w.r.t. the Lagrange multipliers. Lemma 1 Consider the convex programg problem defined in (2.1), where f : R n R, g : R n R m is convex, h : R n R l is affine, and X is a nonempty convex compact set. Assume that {x : g(x) < 0} {x : h(x) = 0} X =. For any given µ R m +, λ R l and ɛ > 0, if x is an ɛ-optimal solution of θ( µ, λ), i.e, ɛ < θ( µ, λ) (f( x) + µ T g( x) + λ T h( x) < 0, then (g( x); h( x)) is an ɛ-supergradient of θ(, ) at ( µ, λ), where θ(, ) is the Lagrangian dual defined in (2.2). Proof It is well known that θ(µ, λ) is a concave function w.r.t. (µ, λ) [3, Sec. 5.1.2]. Since f, g are convex, h is affine and X is nonempty and compact, we know that θ(µ, λ) is finite for (µ; λ) R m+l. Therefore, θ(µ, λ) = x X {f(x) + µt g(x) + λ T h(x)} f( x) + µ T g( x) + λ T h( x) = f( x) + [(µ; λ) ( µ; λ)] T (g( x); h( x)) + ( µ; λ) T (g( x); h( x)) θ( µ, λ) + [(µ; λ) ( µ; λ)] T (g( x); h( x)) + ɛ The following strong duality theorem is from Bazaraa, Sherali and Shetty [1, Theorem 6.2.4]. Theorem 4 Consider the convex optimization problem (2.1). Let X be a nonempty convex set in R n. Let f : R n R, g : R n R m be convex, and let h : R n R l be affine; that is, h is of the form h(x) = Ax b. Suppose that the Slater constraint qualification holds, i.e, there exists an ˆx X such that g(ˆx) < 0, h(ˆx) = 0, and 0 int(h(x )), where h(x ) = {h(x) : x X } and int( ) is the interior of a set. Then, inf{f(x) : g(x) 0, h(x) = 0, x X } = sup{θ(µ, λ) : µ 0}. Furthermore, if the inf is finite, then sup{θ(µ, λ) : µ 0} is achieved at ( µ; λ) with µ 0. If the inf is achieved at x, then µ T g( x) = 0. The following theorem states that the ɛ-optimal Lagrangian multipliers can be used as an ɛ-subgradient of the perturbed convex optimization problem (4.1) at the origin. Theorem 5 Consider the convex optimization problem (2.1). Let X be a nonempty convex set in R n, let f : R n R, g : R n R m be convex, and h : R n R l be affine. Assume that the Slater constaint qualification as in Theorem 4 holds. Assume that µ R m +, λ R l are 8

ɛ-optimal Lagrangian multipliers for the constraints g(x) 0 and h(x) = 0. Now consider the perturbed problem: π(u, v) := f(x) (4.1) s.t. g(x) u h(x) = v x X, with u R m and v R l. Then, π(u, v) is convex and ( µ; λ) is an ɛ-subgradient of π(u, v) at (0, 0) w.r.t. (u; v). Proof The convexity of π(u, v) is known from [3, Sec. 5.6.1]. For any µ R m +, λ R l, the Lagrangian function of the original problem π(0, 0) is written as L(x, µ, λ) = f(x) + µ T g(x) + λ T h(x). Slater constraint qualification conditions ensure that the strong duality holds. For given µ R m +, λ R l, consider the dual problem: θ(µ, λ) := inf x X L(x, µ, λ). For any u R m, v R l, let Y(u, v) := {x : x X, g(x) u, h(x) = v}. If Y(u, v), since θ(µ, λ) = inf x X {f(x) + µ T g(x) + λ T h(x)}, for any x Y(u, v), we can have: θ( µ, λ) = inf x X {f(x) + µ T g(x) + λ T h(x)} f( x) + µ T g( x) + λ T h( x) f( x) + µ T u + λ T v. The first inequality follows from the definition of inf and the second inequality follows from µ R m +. Since x is an arbitrary point in Y(u, v), we can take the infimum of the right-hand side over the set Y(u, v) to get: θ( µ, λ) π(u, v) + µ T u + λ T v. Since 0 < π(0, 0) θ( µ, λ) < ɛ, we know that: π(0, 0) ɛ θ( µ, λ) π(u, v) + µ T u + λ T v. (4.2) Inequality (4.2) holds when Y(u, v) = because π(u, v) = in this case. Therefore, we can conclude that ( µ; λ) is an ɛ-subgradient of π(u, v) at (0, 0) with respect to (u; v). 5 A Decomposition Algorithm for a General Two-Stage Moment Robust Optimization Model In Section 3 we presented an equivalent formulation (3.7)-(3.9) of the two-stage moment robust program (1.1)-(1.3) as a semi-infinite program. In this section we propose a decomposition algorithm to show that this equivalent formulation can be solved to any precision 9

in polynomial time. A key ingredient of the decomposition algorithm is the construction of an ɛ-subgradient of the function G k (x) defined in (3.9). Theorem 5 will be useful for this purpose. First, observe that we can apply Theorem 1 to the second stage problem (3.9) to solve them in polynomial time. This is stated in the following corollary. Corollary 1 For k {1,..., K}, let Assumptions 6-10 be satisfied. The second stage problem G k (x) defined as (3.9) can be solved to any precision ɛ in polynomial time. Proof Given k {1,..., K}, consider the moment robust problem: max E[ρ 2 (w k, q)]. w k W k (x) P P 2,k Assumption 8 guarantees that for k {1,..., K} and x X, the set W k (x) is nonempty and bounded. This verifies condition (iii) in Theorem 1. Assumption 6, 7, 9, 10 verify conditions (i), (ii), (iv), (v). For any given k {1,..., K}, define the Lagrangian function L k (λ, x) of (3.9) as: L k (λ, x) := w k,y 2,k,y 2,k,y 0 2,k,t 2,k y 0 2,k + t 2,k + λ T (W k w k h k + T k x) s.t. y2,k 0 ρ 2 (w k, q) q T Y 2,k q q T y 2,k, q S 2 t 2,k (β 2,k Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k + Σ α 1 2 2,k 2,k (y 2,k + 2Y 2,k µ 2,k ) Y 2,k 0, w k W. (5.1a) (5.1b) (5.1c) (5.1d) We now give a summary of the remaining analysis in this section. In Proposition 2, we show that for any given λ, the Lagrangian problem L k (λ, x) can be evaluated to any precision polynomially. In Proposition 3, the ɛ approximate solution of L k (λ, x) can be used to generate an ɛ-subgradient to prove the polynomial time solvability of the second stage problem for each given first stage solution x, as stated in Lemma 2 and 3. Based on this result, we further prove the polynomial solvability of the two-stage semi-infinite problem (3.7)-(3.9) in Theorem 6. The following proposition states that (5.1) can be solved to any precision in polynomial time. Proposition 2 Let Assumptions 6-10 be satisfied. Then, for λ R l 2, L k (λ, x) can be evaluated and a solution (w k, v 2,k ) can be found to any precision ɛ in polynomial time. Proof Given k {1,..., K}, x X, and λ R l 2, consider the optimization problem: max E P [φ k (x, λ, w k, q)], (5.2) w k W P P 2,k where φ k (x, λ, w k, q) := ρ 2 (w k, q) + λ T (W k w k h k + T k x). Since λ T (W k w k h k + T k x) is linear w.r.t. w k, Assumptions 10 and 11 are also satisfied for φ k (x, λ, w ξ, q). Therefore, 10

by using Theorem 1, (5.2) is equivalent to w k,y 2,k,y 2,k,y 0 2,k,t 2,k y 0 2,k + t 2,k (5.3) s.t. y 0 2,k φ k (x, λ, w k, q) q T Y 2,k q q T y 2,k, q S 2 t 2,k (β 2,k Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k + Σ α 1 2 2,k 2,k (y 2,k + 2Y 2,k µ 2,k ) Y 2,k 0, w k W, and (5.3) is polynomially solvable. By substituting for y 2,k, t 2,k in the objective, it is easy to see that (5.3) is equivalent to (5.1), which implies that (5.1) can be solved to any precision ɛ in polynomial time. The following proposition shows that the ɛ-optimal solution of the Lagrangian problem (5.1) gives an ɛ-supergradient of L k (λ, x) w.r.t. λ. Proposition 3 For any k {1,..., K}, x X, λ R l 2, let ( w k, v 2,k ) := ( w k, Ȳ2,k, ȳ 2,k, ȳ 0 2,k, t 2,k ) be an ɛ-optimal solution of the Lagrangian problem defined in (5.1) for λ = λ. Then, W k w k h k T k x is an ɛ-supergradient of L k (λ, x) w.r.t. λ at λ. Proof Since (5.1) is the Lagrangian dual problem of (3.9), we apply Lemma 1 and the result follows. From Assumption 8, for each x X and k {1,..., K}, G k (x) := sup {L k (λ, x)}. (5.4) λ R l 2 We make the following additional assumption on the knowledge of the bounds on the optimal Lagrange multipliers and the optimum value of the Lagrangian. Note that the existence of these bounds is implied by Assumption 8. Assumption 12 There exists known constants R λ > 1, s and s such that, for x X and k {1,..., K}, the optimal objective value of (5.4) is in [s, s] and the interestion of the optimal solution of (5.4) and B λ (0, R λ ) is nonempty. The following lemma guarantees the polynomial time solvability of (5.4). Lemma 2 Suppose that Assumption 12 is satisfied. Given x X and k {1,..., K}, we can find λ R l 2, so that L k ( λ, x) G k (x) ɛ in polynomial time. Proof Given δ > 0, denote ˆδ = δ 2 and consider the problem Θ k (x) := s s.t. s L k (λ, x) s s s λ B λ (0, R λ ). (5.5a) (5.5b) (5.5c) (5.5d) Since L k (λ, x) is a concave function w.r.t. λ, the feasible region of (5.5) is convex. From Assumption 12, (5.5) is equivalent to (5.4). Let C k (x) := {(s; λ) : s L k (λ, x), s s s, λ B λ (0, R λ )} be the feasible region of (5.5). For (ŝ; ˆλ) / B s,λ (C k (x), δ), either: 11

(i) ŝ δ > L k (ˆλ, x) or (ii) (ŝ, ˆλ) violates at least one of (5.5c) and (5.5d). If (ŝ, ˆλ) satisfies none of (i) and (ii), i.e., ŝ δ L k (ˆλ, x) and (5.5c) and (5.5d) are satisfied, then obviously, (ŝ, ˆλ) B s,λ (C k (x), δ). Let Lˆδ k (ˆλ, x) be the ˆδ-precision value of L k (ˆλ, x) evaluated from Proposition 2 in polynomial time. L δ k (ˆλ, x) < ŝ because L k (ˆλ, x) ˆδ Lˆδ k (ˆλ, x) L k (ˆλ, x) + ˆδ. For condition (ii), it is trivial to check the exact feasibility of (ŝ, ˆλ) in polynomail time. Therefore, we get an oracle as: (1) Lˆδ k (ˆλ, x) < ŝ; (2) ŝ < s or ŝ > s; (3) ˆλ / B λ (0, R λ ). Any (ŝ, ˆλ) / B s,λ (C k (x), δ) will satisfy at least one of (1)-(3). Consequently, if (ŝ, ˆλ) passes the oracle (1)-(3), (ŝ, ˆλ) B s,λ (C k (x), δ). Now, we prove that for a given (ŝ, ˆλ) not passing the oracle (1)-(3), we can generate a δ-cut, i.e, we can find a vector (d s ; d λ ) such that (d s ; d λ ) T (ŝ, ˆλ) < (d s ; d λ ) T (s, λ) + δ for (s, λ) C k (x). For a given (ŝ, ˆλ), if (1) is satisfied, then according to Proposition 3, we can generate a ˆδ-supergradient of L k (λ, x) at ˆλ. Let g denote this ˆδ-supergradient. Since L k (λ, x) L k (ˆλ, x) + g T (λ ˆλ) + ˆδ for λ R l 2, inequality s L k (ˆλ, x) + g T (λ ˆλ) + ˆδ is valid for λ R l 2. We combine this inequality with Lˆδ k (ˆλ, x) < ŝ and L k (ˆλ, x) ˆδ Lˆδ k (ˆλ, x), and get a valid inequality: It implies that: s ŝ + g T (λ ˆλ) + 2ˆδ for λ R l 2. ( 1; g) T (ŝ; ˆλ) ( 1; g) T (s; λ) + δ for (s; λ) C k (x). Obviously, ( 1; g) 1. If (2) is satisfied, either ŝ < s or ŝ > s. The valid separating inequality for the first case is: (1; 0) T (ŝ; ˆλ) < (1; 0) T (s; λ) and for the second case is: ( 1; 0) T (ŝ; ˆλ) < ( 1; 0] T (s; λ) for (s; λ) C k (x). If (3) is satisfied, a valid separating inequality is: (0; γˆλ) T (ŝ; ˆλ) < (0; γˆλ) T (ŝ; λ) for (s; λ) C k (x), where γ > 0 is a constant that ensures (0, γˆλ) 1. Let ˆɛ = 2ɛ and 3 s be an optimal solution of max (s,λ) Ck (x) s. From Theorem 3 we have a (ŝ; ˆλ) B s,λ (C k (x), ɛ) satisfying the oracle (1)-(3) with some δ ˆɛ in polynomial time in log( 1 ɛ ), such that ŝ s ˆɛ. According to the definition of C k (x), we know that s is the optimal objective value of (5.4). On the other hand, since (ŝ; ˆλ) satisfies the oracle (1)-(3) with some δ < ˆɛ, we have that s ˆɛ ŝ Lˆδ k (ˆλ, x) L k (ˆλ, x) + ˆδ. Therefore, s < L k (ˆλ, x) + 3 2ˆɛ = L k(ˆλ, x) + ɛ. We conclude that (5.4) can be solved to any precision ɛ in polynomial time. 12

According to Proposition 2, for x X and k {1,..., K}, the second stage problem G k (x) can be solved to any precision in polynomial time. The next lemma claims that the recourse function G(x) can be evaluated to any precision in polynomial time. Lemma 3 Suppose that Assumption 12 is satisfied. Let x X, and the recourse function G(x) be defined in (1.2). Then, (i) G(x) can be evaluated to any precision ɛ; (ii) an ɛ- subgradient of G(x) can be obtained in time polynomial in log( 1 ) and K. ɛ Proof For a given ɛ > 0, from Lemma 2, G k (x) can be obtained to ˆɛ = ɛ -precision in 2 polynomial time. Let s kˆɛ, λˆɛ k be an ˆɛ-optimal solution of (5.4). Since ˆɛ < G k(x) sˆɛ k < ˆɛ, sˆɛ k + ˆɛ Lk(λˆɛ k, x) sˆɛ k ˆɛ, we see that ɛ < G k(x) L (λˆɛ k k, x) < ɛ. We can apply Theorem 5 to conclude that T T k λˆɛ k is an ɛ-subgradient of G k(x) at x. Therefore, with G k (x) := K k=1 π kg k (x), we have: ɛ < G(x) K π sˆɛ k k < ɛ, which implies that G(x) can be evaluated to ɛ-precision in polynomial time. From Theorem 1, K k=1 π it T k λˆɛ k is an ɛ-subgradient of G( ) at x. The following Lemma shows that the feasibility of the semi-infinite constraint (5.6b) can be verified to any precision in polynomial time. Lemma 4 (Delage and Ye 2010). Assume the support set S R m is convex and compact, and it is equipped with an oracle that for any ξ R m can either confirm that ξ S or provide a hyperplane that separates ξ from S in time polynomial in m. Let function h(x, ξ) be concave in ξ in time polynomial in m. Then, for any x, Y 0, and y, one can find a solution ξ that is ɛ-optimal with respect to the problem k=1 max t,ξ t s.t. t h(x, ξ) ξ T Yξ ξ T y ξ S (5.6a) (5.6b) (5.6c) in time polynomial in log( 1 ) and the dimension of ξ. ɛ The next theorem shows the solvability of the two-stage problem (3.7)-(3.9) with recourse function G(x). Theorem 6 Let Assumptions 1-12 be satisfied. Problem (3.7)-(3.9) can be solved to any precision ɛ in time polynomial in log( 1 ) and the size of the problem (3.7)-(3.9). ɛ Proof We want to apply Theorem 3 to show the polynomial solvability. The proof is divided into 5 steps. The first step is to verify that the feasible region is convex. Secondly, we prove the existence of an optimal solution of problem (3.7)-(3.9). In step 3 and 4, we establish the weak feasibility and weak separation oracles. We then verify the polynomial solvability of (3.7)-(3.9) by applying Theorem 3 in step 5. 13

Step 1. Verification of the Convexity of the Feasible Region Let z := (x, Y 1, y 1, y 0 1, t 1, τ) and rewrite (3.7) as: y1 0 + t 1 + τ z (5.7a) s.t. τ G(x) (5.7b) y1 0 ρ 1 (x, p) p T Y 1 p p T y 1, p S 1 t 1 (β 1 Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + Σ α 1 2 1 1 (y 1 + 2Y 1 µ 1 ) Y 1 0 x X. (5.7c) (5.7d) (5.7e) (5.7f) From Proposition 2, for k {1,..., K}, G k (x) can be evaluated to any precision in polynomial time. Since G k (x) is a convex function w.r.t. x, G(x) is a convex function w.r.t. x. It implies that (5.7b) is a convex contraint. According to Assumption 5, (5.7e) and (5.7c) are convex, (5.7d) is a second order cone constraint, which is convex. Constraints (5.7e) and (5.7f) are obviously convex. Therefore, the feasible region of (5.7) is convex. Step 2. Existence of an Optimal Solution of (3.7)-(3.9). Let x X and z := ( x, Ȳ1, ȳ 1, ȳ 0 1, t 1, τ) be defined as: τ = G( x), Ȳ = I, ȳ = 0, ȳ 0 1 = sup p S1 {ρ 1 ( x, p) p T Ȳ 1 p}, t 1 = trace(β 1 Σ 1 + µ 1 µ T 1 ) + 2 α 1 Σ 1 2 1 µ 1. Then z is a feasible solution of (5.7). Note that ȳ 0 1 exists because S 1 is compact. Therefore, the feasible region of (5.7) is nonempty. On the other hand, from Theorem 1 and Assumptions 1-5, the set of optimal solutions of the problem: y1 0 + t 1 (5.8) x,y 1,y 1,y1 0,t 1 s.t. y1 0 ρ 1 (x, p) p T Y 1 p p T y 1, p S 1, t 1 (β 1 Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + Σ α 1 2 1 1 (y 1 + 2Y 1 µ 1 ), Y 1 0, x X, is nonempty. Let us assume that the optimal objective value of (5.8) equals γ 1. From Lemma 2, the recourse function G(x) is finite for x X. Since X is compact, γ 2 := x X G(x) is finite. Since the optimal objective value of (5.7) is no less than γ 1 + γ 2, we conclude that (5.7) has a finite objective value and the set of optimal solutions is nonempty. Step 3. Establishment of the Weak Feasibility Oracle According to Assumption 3, we know that B x (x 0, r 1 0) X B x (0, R 1 0). Let x = x 0. Y 1 = I and y 1 = 0. Let 0 < r 0 < r 1 0 be a constant such that Y 1 + S 0 for S F r 0, where 14

F is the Frobenius norm of matrices. Let y 0 = sup {ρ 1 1 (x, p) p T Y 1 p p T y 1 } + r 0 Y 1 I F r 0, y 1 r 0, x x 0 r 0,p S 1 t 1 = sup {(β 1 Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + α 1 Σ 1 2 1 (y 1 + 2Y 1 µ 1 ) } + r 0 Y 1 I F r 0, y 1 <r 0 τ = sup G(x) + r 0. x B x(x 0,r 0 ) z := (x, Y 1, y 1, τ, y 0 1, t 1). Then, the feasible region of problem (5.7) contains the ball B z (z, r 0 ). On the other hand, let R 0 > r 0 + z be a constant such that the intersection of the set of optimal solutions of problem (5.7) and the ball B z (0, R 0 ) is nonempty. Consequently, solving (5.7) is equivalent to (5.7) with the additional constraint z B z (0, R 0 ). Let the feasible region of problem (5.7) with this additional constraint be C. From the above discussion, we have B z (z, r 0 ) C B z (0, R 0 ). Given δ > 0, let ˆδ = δ 2. Let ẑ := (ˆx, Ŷ1, ŷ 1, ˆτ, ŷ 0 1, ˆt 1 ) / B z (C, δ). The point ẑ satisfies at least one of the following three conditions. (i) ˆτ + ˆδ < G(ˆx); (ii) ŷ 0 1 + ˆδ < sup p S1 {ρ 1 (ˆx, p) p T Ŷ 1 p p T ŷ 1 }; (iii) ẑ at least violates one of (5.7d)-(5.7f). If none of the conditions (i)-(iii) is satisfied, then ẑ B z (C, δ). If (i) is satisfied, since G(x) can be evaluated to precision ɛ in polynomial time for x X, we have that ˆτ < Gˆδ(ˆx), 2 where Gˆδ(ˆx) is the ˆδ-optimal objective value of G(ˆx) according to Proposition 2. Assume that (ii) is satisfied. According to Lemma 4, sup p S1 {ρ 1 (ˆx, p) p T Ŷ 1 p p T ŷ 1 } can be found to ˆδ precision polynomially. Let ˆp be the corresponding ˆδ optimal solution. Then condition (ii) implies that ŷ1 0 < ρ 1 (ˆx, ˆp) ˆp T Ŷ 1ˆp ˆp T ŷ 1. We have an oracle system as: ˆτ < Gˆδ(ˆx); (5.9) ŷ1 0 < ρ 1 (ˆx, ˆp) ˆp T Ŷ 1ˆp ˆp T ŷ 1 ; (5.10) ˆt 1 < (β 1 Σ 1 + µ 1 µ T 1 ) Ŷ1 + µ T 1 ŷ1 + Σ α 1 2 1 1 (ŷ 1 + 2Ŷ1µ 1 ) ; (5.11) Ŷ 1 0; (5.12) ˆx / X. (5.13) We need to show that the system (5.9)-(5.13) can be verified in polynomial time. Condition (5.9) can be verified in polynomial time by using Lemma 3. Condition (5.10) can be verified in polynomial time by using Lemma 4. Condition (5.11) can be verified by verifing the feasibility of (5.7d). Condition (5.12) can be verified using matrix factorization in O(m 3 1) arithmetic operations. Condition (5.13) can be verified in polynomial time according to Assumption 3. In addition from the above discussion, if ẑ does not satisfy any of (5.9)-(5.13), then ẑ B z (C, δ). 15

Step 4. Establishment of the Weak Separation Oracle In Step 3, we showed that the oracle system (5.9)-(5.13) can find any ẑ / B z (C, δ) and can be verified in polynomial time. If any of (5.9)-(5.13) are satisfied, we will prove that we can generate an inequality which separates ẑ from the feasibility region C with δ-tolerence, i.e. satisfy the condition 2 described in Theorem 3. Assume that a point ẑ is given. If (5.9) is satisfied, from Lemma 3 we can obtain a ˆδ-subgradient of G(x) at ˆx. Let us denote this ˆδ-subgradient as g. Since G(x) G(ˆx) + g T (x ˆx) ˆδ for x X, we generate a valid inequality: τ G(ˆx) + g T (x ˆx) ˆδ. Combining with Gˆδ(ˆx) ˆδ G(ˆx) Gˆδ(ˆx) + ˆδ and Gˆδ(ˆx) > ˆτ, we get the separating hyperplane (5.14): ( g; 0; 0; 1; 0; 0) T vec(z) + δ ( g; 0; 0; 1; 0; 0] T vec(ẑ) (5.14) for z C, where vec(z) := (x; vec(y 1 ); y 1 ; y 0 1; t 1 ; τ) and vec( ) vectorizes a matrix. It is clear that: (g; 0; 0; 1; 0; 0) 1. If (5.10) is satisfied, assume ˆp be the ˆδ-optimal solution of problem (5.6) w.r.t. ẑ. Then, we can generate the separating hyperplane: ( x ρ 1 (ˆx, ˆp); vec(ˆpˆp T ); ˆp; 0; 1; 0) T vec(z) ( x ρ 1 (ˆx, ˆp); vec(ˆpˆp T ); ˆp; 0; 1; 0) T vec(ẑ), (5.15) for z C, where x ρ 1 (x, p) is a subgradient of ρ 1 (x, p) w.r.t. x (Assumption 4). Note that: ( x ρ 1 (ˆx, ˆp); vec(ˆpˆpˆδ T ); ˆp; 0; 1; 0) 1. If (5.11) is satisfied, a valid inequality can be generated as: (β 1 Σ 1 + µ 1 µ T 1 + Ĝ) Y 1 + (µ 1 + ĝ) T y 1 t 1 ĝ T ŷ 1 + Ĝ Ŷ1 g(ŷ1, ŷ 1 ), where ĝ = y1 g(ŷ1, ŷ 1 ), Ĝ = mat( Y 1 g(ŷ1, ŷ 1 )), g(y 1, y 1 ) := Σ 1 2 1 (y 1 + 2Y 1 µ 1 ), and Y1 g(y 1, y 1 ); y1 g(y 1, y 1 ) are the gradients of g(y 1, y 1 ) in Y 1 and y 1 respectively. Note that mat( ) puts a vector to a symmetric square matrix. It implies that: (0; vec(β 1 Σ 1 + µ 1 µ T 1 + Ĝ); (µ 1 + ĝ); 0; 1) T vec(ẑ) < (0; vec(β 1 Σ 1 + µ 1 µ T 1 + Ĝ); (µ 1 + ĝ); 0; 1) T vec(z) for z C. Obviously, (0; vec(β 1 Σ 1 + µ 1 µ T 1 + Ĝ); (µ 1 + ĝ); 0; 1) 1. If (5.12) is satisfied, a separating hyperplane can be generated based on the eigenvector corresponding to the lowest eigenvalue. The vector to represent this nonzero separating hyperplane can be scaled to satisfy the requirement 1. If (5.13) is satisfied, a separating hyperplane with 1 can be generated in polynoial time according to Assumption 3. Step 5. Verification of the Polynomial Solvability According to the analysis in steps 1-4, we can apply Lemma 3 to conclude that for a given ɛ > 0 and ˆɛ = ɛ, we can find a ẑ := (ˆx, Ŷ1, ŷ 3 1, ŷ1, 0 ˆt 1, ˆτ) in time polynomial in log( 1 ), such ɛ 16

that ẑ satisfies the oracle (5.9)-(5.13) with some δ < ˆɛ and ŷ 0 1 + ˆt 1 + ˆτ < η + ˆɛ, where η is the true optimal value of the problem (3.7). Since ẑ satisfies the oracle (5.9)-(5.13) with δ < ˆɛ, the solution ẑ = (ˆx, Ŷ1, ŷ 1, ŷ 0 1 + ˆɛ, ˆt 1, ˆτ + ˆɛ) C and the objective value associated with ẑ is no greater than η + 3ˆɛ = η + ɛ. In summary, we have shown that problem (3.7)-(3.9) can be solved to any precision in polynomial time. Remark: Delage and Ye [5] use Lemma 4 to verify the feasibility of the constraint (5.6b). For infeasibility point, a separating hyperplane can be generated by using the optimal solution of (5.6). The assumption they make is that an exact solution of (5.6) can be found. Lemma 4 can only claim that the problem can be solved to ɛ-precision for the two-stage moment robust model. In the proof of Theorem 6, we show that it is enough to use the ˆδ optimal solution to verify the feasibility of (5.6) and generate a separating hyperplane. 6 A Two-Stage Moment Robust Portfolio Optimization Model In this section we analyze a two-stage moment robust model with piecewise linear objectives. In Section 6.1 we show that when (1) the probability ambiguity sets P 1 an P 2,k are described by (1.6)-(1.7) or exact moment information; (2) the support S 1 and S 2 are the full space R n 1, R n 2 or described by ellipsoids, the two stage moment robust problem can be reformulated as a semidefinite program. Note that the single stage moment robust model with piecewise linear objective and probability ambiguity set described by exact moments are discussed by Bertsimas et al. [2]. In Section 6.2 we study a two-stage moment robust portfolio optimization application with practical data. Our numerical results suggest that the twostage modeling is effective when we have forecasting power. 6.1 A Two-Stage Moment Robust Optimization Model with Piecewise Linear Objectives Consider the two-stage moment robust optimization model: x X max E P [U( p T x)] + G(x), P P 1 (6.1) K G(x) := π k G k (x), (6.2) G k (x) := k=1 max [U( q T w k )], (6.3) w k W k (x) P P 2,k where X and W k (x) are described by linear, second-order cone and semidefinite constraints, and the ambiguity sets P 1 and P 2,k are defined in (1.6)-(1.7). The utility function U( ) is piecewise linear convex and defined as: U(z) = max i=1,...,m {c i z + d i }, where c k, d k, k = 1,..., M are given. Let Assumption 1 and Assumption 6 be satisfied. By applying Theorem 17

1, problem (6.1)-(6.3) is equivalent to: where x,y 1,y 1,y 0 1,t 1 y 0 1 + t 1 + G(x), (6.4) s.t. y1 0 c i p T x + d i p T Y 1 p p T y 1, p S 1, i = 1,..., M, t 1 (β 1 Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + Σ α 1 2 1 1 (y 1 + 2Y 1 µ 1 ), Y 1 0, x X, G(x) := G k (x) := K π k G k (x), (6.5) k=1 w k,y 2,k,y 2,k,y 0 2,k,t 2,k y 0 2,k + t 2,k, (6.6) s.t. y 0 2,k c i q T w k + d i q T Y 2,k q q T y 2,k, q S 2, i = 1,..., M, t 2,k (β 2 Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k, + Σ α 1 2 2 2,k (y 2,k + 2Y 2,k µ 2,k ), Y 2,k 0, w k W k (x). The piecewise utility function U can be understood as a convex utility on the linear objective p T x and q T k w k. Now we discuss three subcases: (i) Assume that S 1 and S 2 are convex and compact; (ii)s 1 = R n 1 and S 2 = R n 2, where n 1 = dim(x) and n 2 = dim(w k ); (iii) S 1 and S 2 are described by some ellipsoids, i.e. S 1 = {p : (p p 0 ) T Q 1 (p p 0 ) 1}, S 2 = {q : (q q 0 ) T Q 2 (q q 0 ) 1}, where p 0, Q 1, q 0 and Q 2 are given and matrices Q 1 and Q 2 have at least one strictly positive eigenvalue. For case (i), problem (6.1)-(6.3) will be a special case of the general model analyzed in Section 5. For cases (ii) and (iii), we give two-stage semidefinite reformulations of (6.4)-(6.6). The next lemma from Delage and Ye [5] provide an equivalent reformulation of the semi-infinite constraints in (6.4)-(6.6). Lemma 5 (Delage and Ye 2012 [5]). The semi-infinite constraints ξ T Yξ + ξ T y + y 0 c i ξ T x + d i, ξ S, i = 1,..., M, (6.7) can be reformulated as the following semidefinite constraints. (1) If S = R n, we can reformulate (6.7) as: ( 1 Y (y c ) 2 ix) 1 (y c 2 ix) T 0, i = 1,..., M. (6.8) y 0 d i (2) If S = {ξ : (ξ ξ 0 ) T Θ(ξ ξ 0 ) 1}, we can reformulate (6.7) as: ( 1 Y (y c ) ( ) 2 ix) Θ Θθ0 1 (y c 2 ix) T τ y 0 d i i θ T 0 Θ θ T, τ 0 Θθ 0 1 i 0, i = 1,..., M, (6.9) where θ 0 is given and Θ 0 has at least one strictly positive eigenvalue. 18

By the following theorem applying Lemma 5 to (6.4)-(6.6), we can reformulate (6.1)-(6.3) as a two-stage semidefinite program. Theorem 7 Let Assumption 1 and Assumption 6 be satisfied. If (i) S 1 = R n 1 and S 2 = R n 2 ; or (ii) S 1 = {p : (p p 0 ) T Q 1 (p p 0 ) 1}, S 2 = {q : (q q 0 ) T Q 2 (q q 0 ) 1}, where p 0, Q 1, q 0 and Q 2 are given and matrices Q 1 and Q 2 have at least one strictly positive eigenvalue, then the two-stage moment robust problem (6.1)-(6.3) is equivalent to the two-stage semidefinite programg problem: where where x,y 1,y 1,y 0 1,t 1 y 0 1 + t 1 + G(x), (6.10) ( y 1 c i x ) Y1 2 s.t. (y 1 c i x) T τ y 0 i B 1, for i = 1,..., M, 2 1 d i t 1 (β 1 Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + Σ α 1 2 1 1 (y 1 + 2Y 1 µ 1 ), G(x) := Y 1 0, x X, G k (x) := s.t. K π k G k (x), (6.11) k=1 w k,y 2,k,y 2,k,y 0 2,k,t 2,k y 0 2,k + t 2,k, (6.12) ( ) y 2,k c i w k Y2,k 2 (y 2,k w k ) T y 0 2 2,k d τ i B 2 for i = 1,..., M, i t 2,k (β 2 Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k + Σ α 1 2 2 2,k (y 2,k + 2Y 2,k µ 2,k ), Y 2,k 0, w k X, (1) B 1 = B 2 = 0 if S 1 = R n 1, S 2 = R n 2 ; ( ) ( ) Q1 Q (2) B 1 = 1 p 0 Q2 Q p T 0 Q 1 p T, B 0 Q 1 p 0 1 2 = 2 q 0 q T 0 Q 2 q T, if S 0 Q 2 q 0 1 1 = {p : (p p 0 ) T Q 1 (p p 0 ) 1}, S 2 = {q : (q q 0 ) T Q 2 (q q 0 ) 1}. Equations (6.10)-(6.12) are standard two-stage linear semidefinite programg problem. Two-stage linear semidefinite programs can be solved using interior decomposition methods in [11, 12, 10]. The performance will be demonstrated in our context in Section 6.2. Another interesting case of model (6.1)-(6.6) is to assume that the ambiguity sets P 1 and P 2,k are described by the exact mean vector and covariance matrix, i.e. P 1 and P 2,k are given as: P 1 :={P : P M 1, E P [1] = 1, E P [ p] = µ 1, E P [ p p T ] = Σ 1 + µ 1 µ T 1, p S 1 }, (6.13) P 2,k :={P : P M 2, E P [1] = 1, E P [ q] = µ 2,k, E P [ q q T ] = Σ 2,k + µ 2,k µ T 2,k, q S 2 }. (6.14) 19

We continue to assume that Σ 1, Σ 2,k 0. We now analyze the two-stage moment robust model with piecewise linear objective and ambiguity sets P 1 and P 2,k defined in (6.13)- (6.14), and focus on the cases (i) S 1 = R n 1 and S 2 = R n 2, where n 1 = dim(x) and n 2 = dim(w k ); (ii) S 1 and S 2 are described by some ellipsoids, i.e. S 1 = {p : (p p 0 ) T Q 1 (p p 0 ) 1}, S 2 = {q : (q q 0 ) T Q 2 (q q 0 ) 1}, where p 0, Q 1, q 0 and Q 2 are given and matrices Q 1 and Q 2 have at least one strictly positive eigenvalue. We provide an equivalent two-stage semidefinite reformulation in the following theorem. Theorem 8 Assume Σ 1, Σ 2,k 0 for k. Consider the cases (i) S 1 = R n 1 and S 2 = R n 2 ; or (ii) S 1 = {p : (p p 0 ) T Q 1 (p p 0 ) 1}, S 2 = {q : (q q 0 ) T Q 2 (q q 0 ) 1}, where p 0, Q 1, q 0 and Q 2 are given and matrices Q 1 and Q 2 have at least one strictly positive eigenvalue. Then, the two-stage moment robust problem (6.1)-(6.3) is equivalent to the two-stage semidefinite programg problem: where x,y 1,y 1,y 0 1 s.t. (Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + y 0 1 + G(x), (6.15) ( y 1 c i x ) Y1 2 (y 1 c i x) T y 0 2 1 d i x X, G(x) := G k (x) := s.t. τ i B 1, for i = 1,..., M, K π k G k (x), (6.16) k=1 w k,y 2,k,y 2,k,y 0 2,k ( ) y 2,k c i w k Y2,k 2 (y 2,k w k ) T y 0 2 2,k d i w k W k (x), where B 1 and B 2 are defined in Theorem 7. (Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k + y 0 2,k, (6.17) τ i B 2 for i = 1,..., M, Proof Given x X, the dual of the inner problem (6.18) can be written as: max E P [U( p T x)] (6.18) P P 1 Y 1,y 1,y1 0 (Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + y1, 0 (6.19) s.t. p T Y 1 p + p T y 1 + y1 0 U(q T x) for q S 1. (6.20) Since U(z) = max i=1,...,m {c i z + d i }, constraint (6.20) is equivalent to: p T Y 1 p + p T y 1 + y 0 1 c i q T x + d i for q S 1, i = 1,..., M. (6.21) 20

Applying Theorem 5 to (6.21), we know that the inner problem (6.18) is equivalent to the semidefinite programg problem Y 1,y 1,y 0 1 s.t. (Σ 1 + µ 1 µ T 1 ) Y 1 + µ T 1 y 1 + y 0 1, (6.22) ( y 1 c i x ) Y1 2 (y 1 c i x) T y 0 2 1 d i τ i B 1, for i = 1,..., M. (6.23) Similarly, we can prove that for each given x X, k = 1,..., M, w k W k (x), the inner problem is equivalent to the semidefinite programg problem Y 2,k,y 2,k,y 0 2,k s.t. max E P [U( q T w k )] (6.24) P P 2,k (Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k + y 0 2,k, (6.25) ( ) y 2,k c i w k Y2,k 2 (y 2,k c i w k ) T y 0 2 2,k d i τ i B 2, for i = 1,..., M. (6.26) We can get the desired equivalent two-stage semidefinite programg reformulation (6.10)- (6.12) by combining the equivalent reformulations (6.22)-(6.23) and (6.25)-(6.26) with the outer problem. 6.2 A Two-Stage Portfolio Optimization Problem In this section we study a two-stage portfolio optimization problem based on model (6.1)- (6.3). The problem is described as follows. An investor needs to plan a portfolio of n assets for two periods. In the first period, the first two moments, µ 1 and Σ 1 + µ 1 µ T 1 are the known information of the probability distribution of the return vector p = (p 1,..., p n ) T. The probability distribution of the return vector p can be any probability measure in the ambiguity set P 1 defined in (6.13). The magnitude of the variation between the investment strategy x and initial strategy x 0 should not excede δ, i.e. x x 0 δ, to maintain investment stability and reduce trading costs. The investor reinvests in the n assets at the beginning of the second period. Similarly, the deviation between the investment strategy of this period and the strategy x of the last period should not exceed δ. The probability distribution of the return vector of the second period depends on some scenario ξ drawn from distribution D. Assume that the sample space of distribution D consists of K scenarios ξ 1,..., ξ K. Let the probability distribution of the return vector q = (q 1,..., q n ) T be in the ambiguity set P 2,k defined in (6.13). We assume that the investor is risk-averse and use a piecewise linear concave utility function as: u(y) = i {1,...,M} a T i y + b i. The total utility of the investor is the sum of utilities from both stages. We consider two options for the choices of S 1 and S 2 : either S 1 = S 2 = R n or S 1 = {p : (p p 0 ) T Q 1 (p p 0 ) 1}, S 2 = {q : (q q 0 ) T Q 2 (q q 0 ) 1}, where matrices Q 1 and Q 2 have at least one strictly 21

positive eigenvalue. Therefore, we model this investor s problem as: max E P [max a x P P 1 i i p T x b i ] + G(x), (6.27) s.t. x x 0 δ, e T x = 1, x l x x u, K G(x) := π k G k (x), (6.28) k=1 G k (x) := w k max E P [max a P P 2,k i i q T w ξ b i ], (6.29) s.t. x x 0 δ, e T w k = 1, w k,l w k w k,u, where e is the n-dimensional vector with 1 in each entry, and x l, x u, w k,l, w k,u are bounds on the investments. A direct application of Theorem 8 results in the following theorem. Theorem 9 Assume Σ 1, Σ 2,k 0 for k = 1,..., M. The two stage portfolio optimization problem (6.27)-(6.29) is equivalent to the two-stage stochastic semidefinite programg problem: where (Σ 1 + µ 1 µ T x,y 1,y 1,y1 0 1 ) Y 1 + µ T 1 y 1 + y1 0 + G(x), (6.30) ( 1 Y s.t. 1 (y ) 2 1 + a i x) 1 (y 2 1 + a i x) T y1 0 τ + b i B 1, i = 1,..., M, i x x 0 δ, e T x = 1, x l x x u, τ i 0, i = 1,..., M, K G(x) = π k G k (x), (6.31) k=1 G k (x) := (Σ 2,k + µ 2,k µ T 2,k) Y 2,k + µ T 2,ky 2,k + y2,k, 0 (6.32) w k,y 2,k,y 2,k,y2,k 0 ( 1 Y s.t. 2,k (y ) 2 2,k + a i w k ) 1 (y 2 2,k + a i w k ) T y2,i 0 η + b i,k B 2, i = 1,..., M, i w k x δ, e T w k = 1, w k,l w k w k,u, η i,k 0, i = 1,..., M, (1) B 1 = B 2 = 0 if S = R n ; ( ) ( ) Q1 Q (2) B 1 = 1 p 0 Q2 Q p T 0 Q 1 p T, B 0 Q 1 p 0 1 2 = 2 q 0 q T 0 Q 2 q T, if S 0 Q 2 q 0 1 1 = {p : (p p 0 ) T Q 1 (p p 0 ) 1}, S 2 = {q : (q q 0 ) T Q 2 (q q 0 ) 1}. 22

6.3 Computational Study In this section the two stage model (6.27)-(6.29) is numerically studied with practical data. We first choose a multivariate AR-GARCH model to forecast the second stage mean vector µ 2,k and covariance matrix Σ 2,k from the first stage moments µ 1 and Σ 1. Then we compare the performance of our two stage moment robust model with the other two models, i.e. (1) stochastic programg model and (2) single stage moment robust model. Our empirical results suggest that our two-stage moment robust modeling framework performs better when we have predictive power. 6.3.1 Establishing Parameters of the Two-Stage Portfolio Optimization Model In the numerical example, the vectors p and q in (6.27)-(6.29) are return vectors in period t and t+1. The return process r t is described by a multivariate AR-GARCH model as follows: r t = φ 0 + φ 1 r t 1 + ɛ t, ɛ t N(0, Q t ), (6.33) Q t = CC T + A T ɛ t 1 ɛ T t 1A T + B T Q t 1 B. (6.34) The expected return r t is predicted by using a multivariate AR model and the covariance is predicted by a multivariate BEKK GARCH model. The data set is a historical data set of 3 assets over a 7-year horizon (2006-2013), obtained from Yahoo! Finance website. The 3 assets are: AAR Corp., Boeing Corp. and Lockheed Martin. The basic calibration strategy is to use least squares to solve the multivariate AR model to get the residuals ɛ and then use ɛ as an input for the BEKK GARCH model [9]. The return model is described as follows. At the beginning of a certain day t, we estimate the expected return r t and covariance matrix Q t of the current day by using the data of the last 30 days. We then use the AR-GARCH model (6.33)-(6.34) to forecast Q t+1 and then r t+1 follow the distribution N(φ 0 +φ 1 r t, Q t+1 ). We start by generating (r t + 1) using an n-dimensional Sobol sequence [4, 6]. We start to generate a set of K n-dimensional Sobol points (S 1,..., S K ). Then we set ɛ i = Qt+1Φ 1/2 1 (S i ), i = 1,..., K, where Φ is the cumulative normal distribution function. Finally, we generate K samples of r t+1, i.e. r t+1,1,..., r t+1,k by using (6.35). r t+1,i = φ 0 + φ 1 r t + ɛ i, i = 1,..., K. (6.35) At the beginning of day t, we optimize the investment strategy x by considering the forecasts of day t+1. We will use the data from t 750 to t 1 (around 3 year) to calibrate the model (6.33)-(6.34). We re-calibrate the model at the beginning of each day. The two-stage linear conic programg model (6.30)-(6.32) is solved by using SeDuMi [18]. In particular, the parameters are chosen as follows, S 1 = S 2 = R n, δ = 1, x l = w k,l = e and x u = w k,u = e for k. Note that the lower bounds x l and w k,u are negative since the investor is allowed to take a short position. We start from 2009 and solve the problem for each day in 2009-2012. 23