OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY

Size: px

Start display at page:

Download "OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY"

Sharon Wiggins
5 years ago
Views:

1 OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY By BAHA M. ALZALG A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY Department of Mathematics DECEMBER 011

2 To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of BAHA M. ALZALG find it satisfactory and recommend that it be accepted. K. A. Ariyawansa, Professor, Chair Robert Mifflin, Professor David S. Watkins, Professor ii

3 For all the people iii

4 ACKNOWLEDGEMENTS My greatest appreciation and my most sincere Thank You! go to my advisor Professor Ari Ariyawansa for his guidance, advice, and help during the preparation of this dissertation. I am also grateful for his offer for me to work as his research assistant and for giving me the opportunity to write papers and visit several conferences and workshops in North America and overseas. I wish also to express my appreciation and gratitude to Professor Robert Mifflin and Professor David S. Watkins for taking the time to serve as committee members and for various ways in which they helped me during all stages of doctoral studies. I want to thank all faculty, staff and graduate students in the Department of Mathematics at Washington State University. I would especially like to thank from the faculty Associate Professor Bala Krishnamoorthy, from the staff Kris Johnson, and from the students Pietro Paparella for their kind help. Finally, no words can express my gratitude to my parents and my grandmother for love and prayers. I also owe a special gratitude to my brothers and sisters, and some relatives in Jordan for their support and encouragement. iv

5 OPTIMIZATION OVER SYMMETRIC CONES UNDER UNCERTAINTY Abstract by Baha M. Alzalg, Ph.D. Washington State University December 011 Chair: Professor K. A. Ariyawansa We introduce and study two-stage stochastic symmetric programs (SSPs) with recourse to handle uncertainty in data defining (deterministic) symmetric programs in which a linear function is minimized over the intersection of an affine set and a symmetric cone. We present a logarithmic barrier decomposition-based interior point algorithm for solving these problems and prove its polynomial complexity. Our convergence analysis proceeds by showing that the log barrier associated with the recourse function of SSPs behaves as a strongly self-concordant barrier and forms a self-concordant family on the first stage solutions. Since our analysis applies to all symmetric cones, this algorithm extends Zhao s results [48] for two-stage stochastic linear programs, and Mehrotra and Özevin s results [5] for two-stage stochastic semidefinite programs (SSDPs). We also present another class of polynomial-time decomposition algorithms for SSPs based on the volumetric barrier. While this extends the work of Ariyawansa and Zhu [10] for SSDPs, our analysis is based on utilizing the advantage of the special algebraic structure associated with the symmetric cone not utilized in [10]. As a consequence, we are able to significantly simplify the proofs of central results. We then describe four applications leading to the SSP problem where, in particular, the underlying symmetric cones are second-order cones and rotated quadratic cones. v

6 Contents Acknowledgement iv Abstract v Chapter 1 1 Introduction and Background Introduction What is a symmetric cone? Symmetric cones and Euclidean Jordan algebras Stochastic Symmetric Optimization Problems 8.1 The stochastic symmetric optimization problem Definition of an SSP in primal standard form Definition of an SSP in dual standard form Problems that can be cast as SSPs A Class of Polynomial Logarithmic Barrier Decomposition Algorithms for Stochastic Symmetric Programming The log barrier problem for SSPs Formulation and assumptions Computation of x η(µ, x) and xx η(µ, x) vi

7 3. Self-concordance properties of the log-barrier recourse Self-concordance of the recourse function Parameters of the self-concordant family A class of logarithmic barrier algorithms for solving SSPs Complexity analysis Complexity for short-step algorithm Complexity for long-step algorithm A Class of Polynomial Volumetric Barrier Decomposition Algorithms for Stochastic Symmetric Programming The volumetric barrier problem for SSPs Formulation and assumptions The volumetric barrier problem for SSPs Computation of x η(µ, x) and xxη(µ, x) Self-concordance properties of the volumetric barrier recourse Self-Concordance of η(µ, ) Parameters of the self-concordant family A class of volumetric barrier algorithms for solving SSPs Complexity analysis Complexity for short-step algorithm Complexity for long-step algorithm Some Applications Two applications of SSOCPs Stochastic Euclidean facility location problem Portfolio optimization with loss risk constraints Two applications of SRQCPs Optimal covering random ellipsoid problem vii

8 5.. Structural optimization Related Open problems: Multi-Order Cone Programming Problems Multi-order cone programming problems Duality Multi-oder cone programming problems over integers Multi-oder cone programming problems under uncertainty An application CERFLPs An MOCP model DERFLPs A 0-1MOCP model ERFLPs with integrality constraints An MIMOCP model Stochastic CERFLPs An SMOCP model Conclusion 150 viii

9 List of Abbreviations CERFLP CFLP DERFLP DFLP DLP DRQCP DSDP DSOCP DSP EFLP ERFLP ESFLP FLP KKT MFLP MIMOCP MOCP continuous Euclidean-rectilinear facility location problem continuous facility location problem discrete Euclidean-rectilinear facility location problem discrete facility location problem deterministic linear programming deterministic rotated quadratic cone programming deterministic semidefinite programming deterministic second-order cone programming deterministic symmetric programming Euclidean facility location problem Euclidean-rectilinear facility location problem Euclidean single facility location problem facility location problem Karush-Kuhn-Tucker conditions multiple facility location problem mixed integer multi-order cone programming (deterministic) multi-order cone programming 0 1MOCP 0-1 multi-order cone programming POCP RFLP SFLP SLP SMOCP SRQCP SSDP p th order cone programming rectilinear facility location problem stochastic facility location problem stochastic linear programming stochastic multi-order cone programming stochastic rotated quadratic cone programming stochastic semidefinite programming ix

10 SSOCP SSP Arw(x) Aut(K) diag( ) GL(n, R) int(k) stochastic second-order cone programming stochastic symmetric programming the arrow-shaped matrix associated with the vector x the automorphism group of a cone K the operator that maps its argument to a block diagonal matrix the general linear group of degree n over R the interior of a cone K E n the n dimensional real vector space whose elements are indexed from 0 E+ n Ê+ n H n H+ n K J Q n p QH n QH+ n S n S+ n R n R n + x x x p x x the second-order cone of dimension n the rotated quadratic cone of dimension n the space of complex Hermitian matrices of order n the space of complex Hermitian semidefinite matrices of order n the cone of squares of a Euclidean Jordan algebra J the p th -order cone of dimension n the space of quaternion Hermitian matrices of order n the space of quaternion Hermitian semidefinite matrices of order n the space of real symmetric matrices of order n the space of real symmetric positive semidefinite matrices of order n the space of real vectors of dimension n the cone of nonnegative orthants of R n the Frobenius norm of an element x the Euclidean norm of a vector x the p-norm of a vector x the linear representation of an element x the quadratic representation of an element x x

11 X 0 X 0 x 0 x 0 the matrix X is positive semidefinite the matrix X is positive definite the vector x lies in a second-order cone of an appropriate dimension the vector x lies in the interior of a second-order cone of an appropriate dimension x N 0 the vector x lies in the Cartesian product of N second-order cones with appropriate dimensions x ˆ 0 x ˆ N 0 the vector x lies in a rotated quadratic cone of an appropriate dimension the vector x lies in the Cartesian product of N rotated quadratic cones with appropriate dimensions x KJ 0 x is an element of a symmetric cone K J x KJ 0 x is an element of the interior of a symmetric cone K J x p 0 x N p 0 the vector x lies in a p th -order cone of an appropriate dimension the vector x lies in the Cartesian product of N p th -order cones with appropriate dimensions x p1,p,...,p N 0 the vector x lies in the Cartesian product of N cones of orders p 1, p,..., p N and with appropriate dimensions x y x y the Jordan multiplication of elements x and y of a Jordan algebra the inner product trace(x y) of elements x and y of a Euclidean Jordan algebra xi

12 Chapter 1 Introduction and Background 1.1 Introduction The purpose of this dissertation is to introduce the two-stage stochastic symmetric programs (SSPs) 1 with recourse and to study this problem in the dual standard form: max s.t. c T x + E [Q(x, ω)] Ax + ξ = b ξ K 1, (1.1.1) where x and ξ are the first-stage decision variables, and Q(x, ω) is the minimum of the problem max s.t. d(ω) T y W (ω)y + ζ = h(ω) T (ω)x ζ K, (1.1.) 1 Based on tradition in optimization literature, we use the term stochastic symmetric program to mean the generic form of a problem, and the term stochastic symmetric programming to mean the field of activities based on that problem. While both will be denoted by the acronym SSP, the plural of the first usage will be denoted by the acronym SSPs. Acronyms DLP, DRQCP, DSDP, DSOCP, DSP, MIMOCP, MOCP, 0-1MOCP, SLP, SMOCP, SRQCP SSDP, and SSOCP are defined and used accordance with this custom. 1

13 where y and ζ are the second-stage variables, E[Q(x, ω)] := Q(x, ω)p (dω), the matrix Ω A and the vectors b and c are deterministic data, and the matrices W (ω) and T (ω) and the vectors h(ω) and d(ω) are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P. The cones K 1 and K are symmetric cones (i.e., closed, convex, pointed, self-dual cones with their automorphism groups acting transitively on their interiors) in R n 1 and R n. (Here, n 1 and n are positive integers.) The birth of symmetric programming (also known as symmetric cone programming [36]) as a subfield of convex optimization can be dated back to 003. The main motivation of this generalization was its ability to handle many important applications that cannot be covered by linear programming. In symmetric programs, we minimize a linear function over the intersection of an affine set and a so called symmetric cone. In particular, if the symmetric cone is the nonnegative orthant cone, the result is linear programming, if it is the second-order cone, the result is second-order cone programming, and if it is the semidefinite cone (the cone of all real symmetric positive semidefinite matrices), the result is semidefinite programming. It is seen that symmetric programming is a generalization of linear programming that includes second-order cone programming and semidefinite programming as special cases. We shall refer to such problems as deterministic linear programs (DLPs), deterministic second-order cone programs (DSOCPs), deterministic semidefinite programs (DSDPs) and deterministic symmetric programs (DSPs) because the data defining applications leading to such problems is assumed to be known with certainty. It has been found that DSPs are related to many application areas including finance, geometry, robust linear programming, matrix optimization, norm minimization problems, and relaxations in combinatorial optimization. We refer the reader to the survey papers by Todd [38] and Vandenberghe and Boyd [41] which discuss, in particular, DSDP and its applications, and the survey papers of Alizadeh and Goldfarb [1], and Lobo, et al. [3] which discuss, in particular, DSOCPs with a number of applications in many areas

14 including a variety of engineering applications. Deterministic optimization problems are formulated to find optimal decisions in problems with certainty in data. In fact, in some applications we cannot specify the model entirely because it depends on information which is not available at the time of formulation but it will be determined at some point in the future. Stochastic programs have been studied since 1950s to handle those problems that involve uncertainty in data. See [17, 3, 33] and references contained therein. In particular, two-stage stochastic linear programs (SLPs) have been established to formulate many applications (see [15] for example) of linear programming with uncertain data. There are efficient algorithms (both interior and noninterior point) for solving SLPs. The class of SSP problems may be viewed as an extension of DSPs (by allowing uncertainty in data) on the one hand, and as an extension of SLPs (where K 1 and K are both cones of nonnegative orthants) or, more generally, stochastic semidefinite programs (SSDPs) with recourse [9, 5] (where K 1 and K are both semidefinite cones) on the other hand. Interior point methods [30] are considered to be one of the most successful classes of algorithms for solving deterministic (linear and nonlinear) convex optimization problems. This provides motivation to investigate whether decomposition-based interior point algorithms can be developed for stochastic programming. Zhao [48] derived a decomposition algorithm for SLPs based on a logarithmic barrier and proved its polynomial complexity. Mehrotra and Özevin [5] have proved important results that extend the work of Zhao [48] to the case of SSDPs including a derivation of a polynomial logarithmic barrier decomposition algorithm for this class of problems that extends Zhao s algorithm for SLPs. An alternative to the logarithmic barrier is the volumetric barrier of Vaidya [40] (see also [5, 6, 7]). It has been observed [8] that certain cutting plane algorithms [1] for SLPs based on the volumetric barrier perform better in practice than those based on the logarithmic barrier. Recently, Ariyawansa and Zhu [10] have derived a class of decomposition algorithms for SSDPs based on a volumetric barrier analogous to work of Mehrotra and 3

15 Özevin [5] by utilizing the work of Anstreicher [7] for DSDP. Concerning algorithms for DSPs, Schmieta and Alizadeh [35, 36] and Rangarajan [34] have extended interior point algorithms for DSDP to DSP. Concerning algorithms for SSPs, we know of no interior point algorithms for solving them that exploits the special structure of the symmetric cone as it is done in [35, 36, 34] for DSP. The question that naturally arises now is whether interior point methods could be derived for solving SSPs, and if the answer is affirmative, it is important to ask whether or not we can prove the polynomial complexity of the resulting algorithms. Our particular concern in this dissertation is to extend decomposition-based interior point methods for SSDPs to stochastic optimization problems over all symmetric cones based on both logarithmic and volumetric barriers and also to prove the polynomial complexity of the resulting algorithms. In fact, there is a unifying theory [18] based on Euclidean Jordan algebras that connects all symmetric cones. This theory is very important for a sound understanding of the Jordan algebraic characterization of symmetric cones and, hence, a correct understanding of the very close equivalence between Problem (1.1.1, 1.1.) and our constructive definition of an SSP which will be presented in Chapter. Now let us indicate briefly how to solve Problem (1.1.1, 1.1.). We assume that the event space Ω is discrete and finite. In practice, the case of interest is when the random data would have a finite number of realizations, because the inputs of the modeling process require that SSPs be solved for a finite number of scenarios. The general scheme of the decomposition-based interior point algorithms is as follows. As we have an SSP with a finite scenarios, we can explicitly formulate the problem as a large scale DSP, we then use primal-dual interior point methods to solve the resulting large scale formulation directly, which can be successfully accomplished by utilizing the special structure of the underlying symmetric cones. This dissertation is organized as follows. In the remaining part of this chapter, we outline a minimal foundation of the theory of Euclidean Jordan algebras. Based on Eu- 4

16 clidean Jordan algebras. In Chapter we explicitly introduce the constructive definition of SSP problem in both the primal and dual standard forms and then introduce six general classes of optimization problems that can be cast as SSPs. The focus of Chapter 3 is on extending the work of Mehrotra and Özevin [5] to the case of SSPs by deriving a class of logarithmic barrier decomposition algorithms for SSPs and establishing the polynomial complexity of the resulting algorithm. In Chapter 4, we extend the work of Ariyawansa and Zhu [10] to the case of SSPs by deriving a class of volumetric barrier decomposition algorithms for SSPs and proving polynomial complexity of certain members of the class of algorithms. Chapter 5 is devoted to describe four applications leading to, indeed, two important special cases of SSPs. More specifically, we describe the stochastic Euclidean facility location problem and the portfolio optimization problem with loss risk constraints as two applications of SSPs when the symmetric cones K 1 and K are both second-order cones, and then describe the optimal covering random ellipsoid problem and an application in structural optimization as two applications of SSPs when the symmetric cones K 1 and K are both rotated quadratic cones (see. for definitions). The material in Chapters 3-5 are independent, but they essentially depend on the material of Chapters 1 and. So, after reading Chapter, one can proceed directly to Chapter 3 and/or Chapter 4 concerning theoretical results (algorithms and complexity analysis), and/or to Chapter 5 concerning applicational models (see Figure 1.1). In Chapter 6, we propose the so called multi-order cone programming problem as a new conic optimization problem. This problem is beyond the scope of this dissertation because it is over non-symmetric cones, so we will leave it as an unsolved open problem. We indicate weak and strong duality relations of this optimization problem and describe an application of it. We conclude this dissertation in Chapter 7 by summarizing its contributions and indicating some possible future research directions. 5

17 Figure 1.1: The material has been organized into seven chapters. After reading Chapter, the reader can proceed directly to Chapter 3, Chapter 4, and/or Chapter 5. We gave a concise definition for the symmetric cones immediately after Problem (1.1.1, 1.1.). In 1., we write down this definition more explicitly, and in 1.3 we review the theory of Euclidean Jordan algebras that connects all symmetric cones. The text of Faraut and Korányi [18] covers the foundations of this theory. In Chapter, the reader will see how general and abstract the problem is. In order to make our presentation concrete, we will describe two interesting examples of symmetric cones with some details throughout this chapter: the second-order cone and the cone of real symmetric positive semidefinite matrices. Our presentation in 1. and 1.3 mostly follows that of [18] and [36], and, in particular, most examples in 1.3 are taken from [36]. We now introduce some notations that will be used in the sequel. We use R to denote the field of real numbers. If A R k and B R l, then the Cartesian product of A B := {(x; y) : x A and y B}. We use R m n to denote the vector spaces of real m n matrices. The matrices 0 n, I n R n n denote, respectively, the zero and the identity matrices of order n (we will write 0 n and I n as 0 and I when n is known from the context). All vectors we use are column vectors with superscript T indicating transposition. We use, for adjoining vectors and matrices in a row, and use ; for adjoining them in a column. So, for example, if x, y, and z are vectors, we have: 6

18 x y z = (xt, y T, z T ) T = (x; y; z). For each vector x R n whose first entry is indexed with 0, we write x for the subvector consisting of entries 1 through n 1; therefore x = (x 0 ; x) R R n 1. We let E n denote the n dimensional real vector space R R n 1 whose elements x are indexed with 0, and denote the space of real symmetric matrices of order n by S n. 1. What is a symmetric cone? This section and the next section are elementary. We start with some basic definitions finally leading us to the definition of the symmetric cone. Let V be a finite-dimensional Euclidean vector space over R with inner product,. A subset S of V is said to be convex if it is closed with respect to convex combinations of finite subsets of S, i.e., for any λ (0, 1), x, y S implies that λx + (1 λ)y S. A subset K V is said to be a cone if it is closed under scalar multiplication by positive real numbers, i.e., if for any λ > 0, x K implies that λx K. A convex cone is a cone that is also a convex set. A cone is said to be closed if it is closed with respect to taking limits, solid if it has a nonempty interior, pointed if it contains no lines or, alternatively, it does not contain two opposite nonzero vectors (so the origin is an extreme point), and regular if it is a closed, convex, pointed, solid cone. By GL(n, R) the general linear group of degree n over R (i.e., the set of n n invertible matrices with entries from R, together with the operation of ordinary matrix multiplication). For a regular cone K V, we denote by int(k) the interior of K, and by Aut(K) the automorphism group of K, i.e., Aut(K) := {ϕ GL(n, R) : ϕ(k) = K}. Definition Let V be a finite-dimensional real Euclidean space. A regular K V 7

19 is said to be homogeneous if for each u, v int(k), there exists an invertible linear map ϕ : V V such that 1. ϕ(k) = K, i.e., ϕ is an automorphism of K, and. ϕ(u) = v. In other words, Aut(K) acts transitively on the interior of K. A regular K V said to be self-dual if it coincides with its dual cone K, i.e., K = K := {s V : x, s 0, x K}, and symmetric if it is both homogeneous and self-dual. Almost all conic optimization problems in real world applications are associated with symmetric cones such as the nonnegative orthant cone, the second-order cone (see below for definitions), and the cone of positive semi-definite matrices over the real or complex numbers. Except for Chapter 7 which proposes an unsolved open optimization problem over a non-symmetric cone, our focus in this dissertation is on those cones that are symmetric. The material in the rest of this section is not needed later because the theory of Jordan algebras will lead us independently to the same results. However, we include this material for the sake of clarity of the examples and completeness. Example 1. The second-order cone E+. n We show that the second-order cone (also known as the quadratic, Lorentz, or the ice cream cone) of dimension n E n + := {ξ E n : ξ 0 ξ }, with the standard inner product, ξ, ζ := ξ T ζ, is a symmetric cone. 8

20 In Lemma 6..1, we prove that the dual cone of the p th -order cone is the q th -order cone, i.e., {ξ E n : ξ 0 ξ p } = {ξ E n : ξ 0 ξ q }, where 1 p and q is conjugate to p in the sense that 1/p + 1/q = 1. Picking p = (hence q = ) implies that (E+) n = E+. n This demonstrates the self-duality of E+. n So, to prove that the cone E+ n is symmetric, we only need to show that it is homogeneous. The proof follows that of Example 1 in of [18, Chapter I]. First, notice that E+ n can be redefined as E n + := {ξ E n : ξ T Jξ 0, ξ 0 0}, where J := 1 0T 0 I n 1. Notice also that each element of the group G := {A R n n : A T JA = J} maps E+ n onto itself (because, for every A G, we have that (Aξ) T J(Aξ) = ξ T Jξ), and so does the direct product H := R + G. It now remains to show that the group H acts transitively on the interior of E+. n To do so, it is enough show that, for any x int(e+), n there exists an element in H that maps e to x, where e := (1; 0) E n. Note that we may write x as x = λy with λ = x T Jx and y E n. Moreover, there exists a reflector matrix Q such that ȳ = Q (0;... ; 0; r), with r = ȳ. We then have y 0 r = y 0 ȳ = y T Jy = 1 λ xt Jx = 1. Therefore, there exists t 0 such that y 0 = cosh t and r = sinh t. 9

21 We define ˆQ := 1 0T 0 Q and H t := cosh t 0 T sinh t 0 I n 0 sinh t 0 T cosh t. Observing that ˆQ, Ht G yields that ˆQHt G, and therefore λ ˆQH t H. The result follows by noting that x = λ ˆQH t e. Example. The cone of real symmetric positive semidefinite matrices S+. n The residents of the interior of the cone of real symmetric positive semidefinite matrices of order n, S+, n are the positive definite matrices, the residents of the boundary of this cone are the singular positive semidefinite matrices (having at least one 0 eigenvalue), and the only matrix that resides at the origin is the positive semidefinite matrix having all 0 eigenvalues. We now show that S+, n with the Frobenius inner product U, V := trace(uv ), is a symmetric cone. We first verify that S+ n is self-dual, i.e., S n + = (S n +) = {U S n : trace(uv ) 0 for all V S n +}. We can easily prove that S n + (S n +) by assuming that X S n + and observing that, for any Y S n +, we have trace(xy ) = trace(x 1 Y X 1 ) 0. Here we used the positive definiteness of X 1 Y X 1 to obtain the inequality. In fact, any U S n can be written as U = QΛQ T where Q R n n is an orthogonal matrix and Λ R n n is a diagonal matrix whose diagonal entries are the eigenvalues of U (see for example Watkins [43, Theorem 5.4.0]). Therefore, if U S n + we have that trace(u) = trace(qλq T ) = trace(λqq T ) = trace(λ) = n λ i 0. i=1 10

22 To prove that (S+) n S+, n assume that X / S+, n then there exists a nonzero vector y R n such that trace(x yy T ) = y T Xy < 0, which shows that X / (S+) n. So, to prove that the cone S+ n is symmetric, it remains to show that it is homogeneous. The proof follows exactly that of Example in of [18, Chapter I]. For P GL(n, R), we define a linear map ϕ P : S n S n by ϕ P (X) = P XP T. Note that ϕ P maps S n + into itself. By the Cholesky decomposition theorem, if X int(s n +) then X can be decomposed into a product X = LL T = ϕ L (I n ), where L GL(n, R), which establishes the result. 1.3 Symmetric cones and Euclidean Jordan algebras In this section we give a review some of the basic definitions and theory of Euclidean Jordan algebras that are necessary for our subsequent development. We will also see how we can use Euclidean Jordan algebras to obtain symmetric cones. Let J be a finite-dimensional vector space over R. A map : J J J is called bilinear if for all x, y J and for all α, β R, we have that (αx + βy) z = α(x z) + β(y z) and x (αy + βz) = α(x y) + β(x z). Definition A finite-dimensional vector space J over R is called an algebra over R if a bilinear map : J J J exists. Let x be an element in an algebra J, then for n we define x n recursively by x n := x x n 1. Definition Let J be a finite-dimensional R algebra with a bilinear product : J J J. Then (J, ) is called a Jordan algebra if for all x, y J 11

23 1. x y = y x (commutativity);. x (x y) = x (x y) (Jordan s axiom). The product x y between two elements x and y of a Jordan algebra (J, ) is called the Jordan multiplication between x and y. A Jordan algebra (J, ) has an identity element if there exists a (necessarily unique) element e J such that x e = e x = x for all x J. A Jordan algebra (J, ) is not necessarily associative, that is, x (y z) = (x y) z may not hold in general. However, it is power associative, i.e., x p x q = x p+q for all integers p, q 1. Example 3. It is easy to see that the space R n n of n n real matrices with the Jordan multiplication X Y := (XY + Y X)/ forms a Jordan algebra with identity I n. Example 4. It can be verified that the space E n with the Jordan multiplication x y := (x T y; x 0 ȳ + y 0 x) forms a Jordan algebra with the identity (1; 0) E n. Definition A Jordan algebra J is called Euclidean if there exists an inner product, on (J, ) such that for all x, y, z J 1. x, x > 0 x 0 (positive definiteness);. x, y = y, x (symmetry); 3. x, y z = x y, z (associativity). That is, J admits a positive definite, symmetric, quadratic form which is also associative. In the sequel, we consider only Euclidean Jordan algebras with identity. We simply denote the Euclidean Jordan algebra (J, ) by J. Example 5. The space R n n is not a Euclidean Jordan algebra. However, under the operation defined in Example 3, the subspace S n is a Jordan subalgebra of R n n, and, 1

24 indeed, is a Euclidean Jordan algebra with the inner product X, Y = trace(x Y ) = trace(xy ). While both the symmetry and the associativity can be easily proved by using the fact that trace(xy ) = trace(y X), the positive definiteness can be immediately obtained by observing that trace(x ) > 0 for X 0. Example 6. It is easy to verify that the space E n (with defined as in Example ( 4)) is a Euclidean Jordan algebra with the inner product x, y = x T y. Many of the results below also hold for the general setting of Jordan algebras, but here we focus entirely on Euclidean Jordan algebras with identity as that generality is not needed for our subsequent development. Since a Euclidean Jordan algebra J is power associative, we can define the concepts of rank, minimal and characteristic polynomials, eigenvalues, trace, and determinant for it. Definition Let J be a Euclidean Jordan algebra. Then 1. for x J, deg(x) := min {r > 0 : {e, x, x,..., x r } is linearly dependent} is called the degree of x;. rank(j ) := max x J deg(x) is called the rank of J. Let x be an element of degree d in a Euclidean Jordan algebra J. We define R[x] as the set of all polynomials in x over R: { } R[x] := a i x i : a i R, a i = 0 for all but a finite number of i. i=0 Since {e, x, x,..., x d } is linearly dependent, there exists a nonzero monic polynomial q R[x] of degree d, such that q(x) = 0. In other words, there are real numbers a 1 (x), a (x),..., a d (x), not all zero, such that q(x) := x d a 1 (x)x d 1 + a (x)x d + + ( 1) d a d (x)e = 0. 13

25 Clearly q is of minimal degree of those polynomials in R[x] which have the above properties, so we call it (or, alternatively, the polynomial p(λ) := λ d a 1 (x)λ d 1 + a (x)λ d + + ( 1) d a d (x)) the minimal polynomial of x. Note that the minimal polynomial of an element x J is unique, because otherwise, as its monic, we will have a contradiction to the minimality of its degree. An element x J is called regular if deg(x) = rank(j ). If x is a regular element of a Euclidean Jordan algebra, then we define its characteristic polynomial to be equal to its minimal polynomial. We have the following proposition. Proposition ([18, Proposition II..1]). Let J be an algebra with rank r. The set of regular elements is open and dense in J. There exist polynomials a 1, a,..., a r such that the minimal polynomial of every regular element x is given by p(λ) = λ r a 1 (x)λ r 1 + a (x)λ r + + ( 1) r a r (x). The polynomials a 1, a,..., a r are unique and a i is homogeneous with degree i. The polynomial p(λ) is called the characteristic polynomial of the regular element x. Since the set of regular elements are dense in J, by continuity we can extend characteristic polynomials to all x in J. So, the minimal polynomial coincides with the characteristic polynomial for regular elements and divides the characteristic polynomial of non-regular elements. Definition Let x be an element in a rank-r algebra J, then its eigenvalues are the roots λ 1, λ,..., λ r of its characteristic polynomial p(λ) = λ r a 1 (x)λ r 1 + a (x)λ r + + ( 1) r a r (x). Whereas the minimal polynomial has only simple roots, it is possible, in the case of non-regular elements, that the characteristic polynomial have multiple roots. Indeed, the characteristic and minimal polynomials have the same set of roots, except for their multiplicities. In fact, we can define the degree of x to be the number of distinct eigenvalues of x. 14

26 Definition Let x be an element in a rank-r algebra J, and λ 1, λ,..., λ r be the roots of its characteristic polynomial p(λ) = λ r a 1 (x)λ r 1 +a (x)λ r + +( 1) r a r (x). Then 1. trace(x) := λ 1 + λ + + λ r = a 1 (x) is the trace of x in J ;. det(x) := λ 1 λ λ r = a r (x) is the determinant of x in J. Example 7. All these concepts (characteristic polynomials, eigenvalues, trace, determinant, etc.) are the corresponding concepts used in S n. Observe that rank(s n ) = n because the deg(x) is the number of distinct eigenvalues of X which is, indeed, at most n. Example 8. We can easily prove the following quadratic identity for x E n : x x 0 x + (x 0 x )e = 0. Thus, we can define the polynomial p(λ) := λ x 0 λ + (x 0 x ) as the characteristic polynomial of x in E n and its two roots, λ 1, = x 0 ± x, are the eigenvalues of x. We also have that trace(x) = x 0 and det(x) = x 0 x. Observe also that λ 1 = λ if and only if x = 0, and therefore x is a multiple of the identity. Thus, every x E n {αe : α R} has degree. This implies that rank(e n ) =, which is, unlike the rank of S n, independent of the dimension of the underlying vector space. For an element x J, let x : J J be the linear map defined by x y := x y, for all y J. Note that x e = x and x x = x. Note also that the operators x and x commute, because, by Jordan s Axiom, x x y = x x y. Example 9. An equivalent way of dealing with symmetric matrices is dealing with vectors obtained from the vectorization of symmetric matrices. The operator vec : S n R n creates a column vector from a matrix X by stacking its column vectors below one another. 15

27 Note that ( ) XY + Y X vec(x Y ) = vec = 1 (vec(xy I)+vec(IY X)) = 1 (I X + X I) vec(y ), } {{} := X where we used the fact that vec(abc) = (C T A)vec(B) to obtain the last equality (here, the operator : R m n R k l R mk nl is the Kronecker product which maps the pair of matrices (A, B) into the matrix A B whose (i, j) block is a ij B for i = 1,,..., m and j = 1,,..., n). This gives the explicit formula of the operator for S n. Example 10. The explicit formula of the operator for E n can be immediately given by considering the following multiplication: x y = x T y x 0 ȳ + y 0 x = x 0 x T x x 0 I }{{} :=Arw(x):= x y. Here Arw(x) S n is the arrow-shaped matrix associated with the vector x E n. For x, y J, let x, y : J J J be the quadratic operator defined by x, y := x y + y x x y. Therefore, in addition to x, we can define another linear map x : J J associated with x that is called the quadratic representation and defined by x := x x = x, x. 16

28 Example 11. Continuing Example 9 we have X, Zvec(Y ) = X Z vec(y ) + Z X vec(y ) X Z vec(y ) = vec(x (Z Y )) + vec(z (X Y )) vec((x Z) Y ) = vec ( ) ( XZY +XY Z+ZY X+Y ZX 4 + vec ZXY +ZY X+XY Z+Y XZ ) 4 vec ( ) XZY +ZXY +Y XZ+Y ZX 4 = 1 (vec(xy Z) + vec(zy X)) = 1 (X Z + Z X)vec(Y ). This shows that X, Z := 1 (X Z + Z X), and, in particular, X := X X. Example 1. Continuing Example 10, we can easily verify that x := Arw (x) Arw(x ) = x x 0 x x 0 x T det(x)i + x x T. Notice that e = e = I, trace(e) = r, det(e) = 1 (since all the eigenvalues of e are equal to one). Notice also that the linear operator x is symmetric with respect to,, because, by the associativity of the inner product,, we have that x y, z = x y, z = x, y z = y, x z. This implies that x is also symmetric with respect to,. It is also easy to see that x, y + z = x, y + x, z, and consequently x, y + z = x, y + x, z. As the operator X in Example 11 plays an important role in the development of the interior point methods for DSDP (SSDP) and the operator in Example 1 plays an important role in the development of the interior point methods for DSOCP (SSOCP), we would expect that the operator x will play a similar role in the development of the interior point methods for DSP (SSP). A spectral decomposition is a decomposition of x into idempotents together with the eigenvalues. Recall that two elements c 1, c J are said to be orthogonal if c 1 c = 0. 17

29 A set of elements of J is orthogonal if all its elements are mutually orthogonal to each other. An element c J is said to be an idempotent if c = c. An idempotent is primitive if it is non-zero and cannot be written as a sum of two (necessarily orthogonal) non-zero idempotents. Definition Let J be a Euclidean Jordan algebra. Then a subset {c 1, c,..., c r } of J is called: 1. a complete system of orthogonal idempotents if it is an orthogonal set of idempotents where c 1 + c + + c r = e;. a Jordan frame if it is a complete system of orthogonal primitive idempotents. Example 13. Let {q 1, q,..., q n } be an orthonormal subset (all its vectors are mutually orthogonal and all of unit norm (length)) of R n. Then the set {q 1 q1 T, q q T,..., q n qn T } is a Jordan frame in S n. In fact, by the orthonormality, we have that 0 n, if i j (q i qi T )(q j qj T ) = q i qi T, if i = j. and that n i=1 q iq T i = I n. Example 14. Let x be a vector in E n. It is easy to see that the set is a Jordan frame in E n. { ( ) ( )} 1 x 1; x, 1 1; x x Theorem (Spectral decomposition (I), [18]). Let J be a Euclidean Jordan algebra with rank r. Then for x J there exist real numbers λ 1, λ,..., λ r and a Jordan frame c 1, c,..., c r such that x = λ 1 c 1 + λ c + + λ r c r, and λ 1, λ,..., λ r are the eigenvalues of x. It is immediately seen that the eigenvalues of elements of Euclidean Jordan algebras are always real, which is not the case for non-euclidean Jordan algebras. 18

30 Example 15. It is known that for any X S n there exists an orthogonal matrix Q R n n and a diagonal matrix Λ R n n such that X = QΛQ T. In fact, λ 1, λ,..., λ n ; the diagonal entries of Λ, and q 1, q,..., q n ; the columns of Q, can be used to rewrite X equivalently as X = λ 1 q 1 q1 T +λ }{{} q q T + + λ }{{} n q n qn T. }{{} C 1 C n which, in view of Example 13, gives the spectral decomposition (I) of X in S n. C Example 16. Using Example 14, the spectral decomposition (I) of x in E n can be obtained by considering the following identity: ( ) ( ) ( ) ( 1 x 1 x = (x 0 + x ) 1; + (x }{{} 0 x ) 1; x ). x }{{} x λ 1 λ } {{ } c 1 } {{ } c Theorem 1.3. (Spectral decomposition (II), [18]). Let J be a Euclidean Jordan algebra. Then for x J there exist unique real numbers λ 1, λ,..., λ k, all distinct, and a unique complete system of orthogonal idempotents c 1, c,..., c k such that x = λ 1 c 1 +λ c + + λ k c k. Continuing Examples 15 and 16, we have the following two examples [36]. Example 17. To write the spectral decomposition (II) of X in S n, let λ 1 > λ > > λ k be distinct eigenvalues of X such that, for each i = 1,,..., k, the eigenvalue λ i has multiplicity m i and orthogonal eigenvectors q i1, q i,..., q imi. Then X can be written as m 1 m m k X = λ 1 q 1j q1j T +λ q j qj T + + λ k q kj qkj T. j=1 } {{ } :=C 1 j=1 } {{ } :=C j=1 } {{ } :=C k where the set {C 1, C,..., C k } is an orthogonal system of idempotents and k i=1 C i = I n. Notice that for each eigenvalue λ i, the matrix C i is uniquely written, even though the 19

31 corresponding eigenvectors q i1, q i,..., q imi may not be unique. Example 18. By Example 16, the eigenvalues of x E n are λ 1, = x 0 ± x, and therefore, as mentioned earlier, only multiples of identity have multiples eigenvalues. In fact, if x = αe, for some α R, then its spectral decomposition (II) is simply αe (here {e} is the singleton system of orthonormal idempotents). Definition Let J be a Euclidean Jordan algebra. We say that x J is positive semidefinite (positive definite) if all its eigenvalues are nonnegative (positive). Proposition 1.3. ([18, Proposition III..]). If the elements x and y are positive definite, then the element x y is so. Definition We say that two elements x and y of a Euclidean Jordan algebra are simultaneously decomposed if there is a Jordan frame {c 1, c,..., c r } such that x = r i=1 λ ic i and y = r i=1 µ ic i. For x J, it is now possible to rewrite the definition of x as x := λ 1c 1 + λ c + + λ kc k = x x. We also have the following definition. Definition Let x be an element of a Euclidean Jordan algebra J with a spectral decomposition x := λ 1 c 1 + λ c + + λ k c k. Then 1. the square root of x is x 1/ := λ 1/ 1 c 1 + λ 1/ c + + λ 1/ k c k, whenever all λ i 0, and undefined otherwise;. the inverse of x is x 1 := λ 1 1 c 1 + λ 1 c + + λ 1 c k, whenever all λ i 0, and undefined otherwise. k 0

32 More generally, if f is any real valued continuous function, then it is also possible to extend the above definition to define f(x) as f(x) := f(λ 1 )c 1 + f(λ )c + + f(λ k )c k. Observe that x 1 x = e. We call x invertible if x 1 is defined, and non-invertible or singular otherwise. Note that every positive definite element is invertible and its inverse is also positive definite. Remark 1. The equality x y = e may not imply that y = x 1, as it can be seen in the following equality [18, Chapter II]: = } {{ } } {{ } } {{ } =X=X 1 =Y X 1 =I We define the differential operator D x : J J by D x f(x) := ( ) ( ) ( ) d d d f(λ 1 ) c 1 + f(λ ) c + + f(λ k ) c k. dλ 1 dλ dλ k The Jacobian matrix x f(x) is defined so that ( x f(x)) T y = (D x f(x)) y for all y J. For n, we define Dxf(x) n recursively by Dxf(x) n := D x (Dx n 1 f(x)). For instance, D xx = D x (D x x) = D x e = 0, D x x 1 = x, provided that x is invertible. More generally, if y is a function of x and they are both simultaneously decomposed, then D x f(y) := D y f(y) D x y. For example, D x y 1 = y D x y, provided that y is invertible. Note that, if y and z are functions of x and they are all simultaneously decomposed, then D x (y ± z) = D x y ± D x z, and D x (y z) = (D x y) z + y (D x z). This differential operator is interesting and will play an important role when computing partial derivatives. 1

33 There is a one-to-one correspondence between Euclidean Jordan algebras and symmetric cones. Definition If J is a Euclidean Jordan algebra then its cone of squares is the set K J := {x : x J }. Such a one to one correspondence between (cones of squares of) Euclidean Jordan algebras and symmetric cones is given by the following fundamental result, which says that a cone is symmetric if and only if it is the cone of squares of some Euclidean Jordan algebra. Theorem (Jordan algebraic characterization of symmetric cones, [18]). A regular cone K is symmetric iff K = K J for some Euclidean Jordan algebra J. The above result implies that an element is positive semidefinite if and only if it belongs to the cone of squares, and it is positive definite if and only if it belongs to the interior of the cone of squares. In other words, an element x in a Euclidean Jordan algebra J is positive semidefinite if and only if x K J, and is positive definite if and only if x int(k J ), where int(k J ) denotes the interior of the cone K J. The following notations will be used throughout the dissertation. For a Euclidean Jordan algebra J, we write x KJ 0 and x KJ 0 to mean that x K J and x int(k J ), respectively. We also write x KJ y (or y KJ x) and x KJ y (or y KJ x) to mean that x y KJ 0 and x y KJ 0, respectively. Example 19. The cone K S n is, indeed, S n +, the cone of real symmetric positive semidefinite matrices of order n. This can be seen in view of the fact that a symmetric matrix is square of another symmetric matrix if and only if it is a positive semidefinite matrix. To prove this fact, suppose that X is positive semidefinite, then it has nonnegative eigenvalues

34 λ 1, λ,..., λ n and, by the spectral theorem for real symmetric matrices, there exists an orthogonal matrix Q R n n and a diagonal matrix Λ := diag (λ 1 ; λ ;... ; λ n ) R n n such that X = QΛQ T. By letting Y := QΛ 1/ Q T S n where Λ 1/ := diag( λ 1 ; λ ;... ; λ n ), it follows that X = QΛQ T = QΛ 1/ Λ 1/ Q T = (QΛ 1/ Q T )(QΛ 1/ Q T ) = Y. To prove the other direction, let us assume that X = Y for some Y S n. It is clear that X S n. To show that X is positive semidefinite, let (λ, v) be an eigenpair of Y, then λ R (every real symmetric matrix has real eigenvalues). Furthermore, we have that Xv = Y v = Y (λv) = λ v. Thus, (λ, v) is an eigenpair of X, which means that X has always nonnegative eigenvalues, and therefore this completes the proof. Example 0. The cone of squares of E n is the second-order cone of dimension n: E n + := {ξ E n : ξ 0 ξ }. To see this, recall that the cone of squares (with with respect to defined in Example 4) is K E n = {ζ : ζ E n } = {( ζ ; ζ 0 ζ) : ζ E n }. Thus, any x K E n can be written as x = ( y ; y 0 ȳ), for some y E n. It follows that x 0 = y = y 0 + ȳ y 0 ȳ = x, where the inequality follows by observing that (y 0 ȳ ) 0. This means that x E n + and hence K E n E n +. The proof of the other direction can be found in 4 of [1]. We indicate by the operator diag( ) that one maps its argument to a block diagonal matrix; for example, if x R n, then diag(x) is the n n diagonal matrix with the entries of x on the diagonal. 3

35 Theorem ([18, Theorem III.1.5]). Let J be a Jordan algebra over R with the identity element e. The following properties are equivalent: 1. J is a Euclidean Jordan algebra.. The symmetric bilinear form trace(x y) is positive definite. A direct consequence of the above theorem is that if J is a Euclidean Jordan algebra, then x y := trace(x y) is an inner product. In the sequel, we define the inner product as x, y := x y = trace(x y) and we call it the Frobenius inner product. It is easy to see that, for x, y, z J, x e = trace(x), x y = y x, (x + y) z = x z + y z, x (y + z) = x z + x z, and (x y) z = x (y z). For x J, the Frobenius norm (denoted by F, or simply ) is defined as x := x x. We can also define various norms on J as functions of eigenvalues. For example, the definition of the Frobenius norm can be rewritten as x := λ 1 + λ + + λ k = trace(x ) = x x. The Cauchy-Schwartz inequality holds for the Frobenius inner product, i.e., for x, y J, x y x y (see for example [34]). Note that e = e e = r. Let x J and t R. For a function g := g(x, t) from J R into R, we will use g for the partial derivative of g with respect to t, and x g, xxg, 3 xxxg to denote the gradient, Hessian, and the third order partial derivative of g with respect to x. For a function y := y(x, t) from J R into J, we will also use y for the partial derivative of y with respect to t, D x y to denote the first partial derivative of y with respect to x, and x y to denote the Jacobian matrix of y (with respect to x). Let h 1, h,..., h k J. For a function f from J into R, we write k x...xf(x)[h 1, h,..., h k ] := k f(x + t 1 h t k h k ) t 1 t k t1 = =t k =0 to denote the value of the k th differential of f taken at x along the directions h 1, h,..., h k. 4

36 We now present some handy tools that will help with our computations. Lemma Let J be a Euclidean Jordan algebra with identity e, and x, y, z J. Then 1. {ln det (e + tx)} t=0 = trace(x) and, more generally, {ln det (y + tx)} t=0 = y 1 x, provided that det (e + tx) and det (y + tx) are positive.. {trace(e + tx) 1 } t=0 = trace(x) and, more generally, {trace(e+tx y) 1 } t=0 = x y, provided that e + tx is invertible. 3. x ln det x = x 1, provided that det x is positive (so x is invertible). More generally, if y is a function of x, then x ln det y = ( x y) T y 1, provided that det y is positive. 4. x x 1 = x 1 provided that x is invertible, and hence xx ln det x = x x 1 = x 1. More generally, if y is a function of x, then x y 1 = y 1 x y provided that y is invertible. 5. x x[y] = x, y. 6. If x and y are functions of t where t R, then (x y) = x y + x y. In other words, ( x y) = x y + x y. Therefore, x = x, x, y = x, y + x, y, and, in particular, x = x, x. 7. x trace(x) = e = D x x and, more generally, if y is a function of x and they are both simultaneously decomposed, then x trace(y) = ( x y) T e = D x y. Hence, if y and z are functions of x and they are all simultaneously decomposed, then x (y z) = D x (y z) = (D x y) z + (D x z) y = ( x y) T z + ( x z) T y. 5

37 The proofs of most of these statements are straightforward. We only indicate that item 1 follows from the facts that {det (e + tx)} t=0 = trace(x) and that {det (y + tx)} t=0 = det(y)(y 1 x) [18, Proposition III.4.], the first statement in item follows from spectral decomposition (I), the first statement in item 3 is taken from [18, Proposition III.4.], the first statement in item 4 is taken from [18, Proposition II.3.3], item 5 is taken from 3 of [18, Chapter II], the first statement in item 6 is taken from 4 of [18, Chapter II], and the first statement in item 7 is obtained by using item 3 and the observation that det(exp(x)) = exp(trace(x)). Definition We say two elements x, y of a Euclidean Jordan algebra J operator commute if x y = y x. In other words, x and y operator commute if for all z J, we have that x (y z) = y (x z). We remind the reader about the fact that two matrices X, Y S n commute if and only if XY is symmetric, if and only if X and Y can be simultaneously diagonalized, i.e. they share a common system of orthonormal eigenvectors (the same Q). This fact can be generalized in the following theorem which can be also applied to multiple-block elements. Theorem ([36, Theorem 7]). Two elements of a Euclidean Jordan algebra operator commute if and only if they are simultaneously decomposed. Note that if two operator commutative elements x and y are invertible, then so is x y. Moreover, (x y) 1 = x 1 y 1, and det(x y) = det(x) det(y) (see also [18, Proposition II..]). In Remark 1 we mentioned that the equality x y = e may not imply that y = x 1. However, the equality x y = e does imply that y = x 1 when the elements x and y operator commute. Lemma 1.3. (Properties of ). Let x and y be elements of a Euclidean Jordan algebra with rank r and dimension n, x invertible, and k be an integer. Then 1. det( xy) = det (x) det(y). 6

38 . xx 1 = x, xe = x. 3. yx = y x y. 4. x k = x k. The first three items of the following lemma are taken from [18, Chapters II and III] and the last one is taken from [36, ]. We use, for adjoining elements of a Euclidean Jordan algebra J in a row, and use ; for adjoining them in a column. Thus, if J is a Euclidean Jordan algebra, and x i J for i = 1,,..., m, we have (x 1 ; x ;... ; x m ) := x 1 x. x m } J J {{ J }, m times and we write (x 1 ; x ;... ; x m ) T := (x 1, x,..., x m ) := [x 1 x x m ]. As we mentioned earlier, we also use the superscript T to indicate transposition of column vectors in R n. We end this section with the following lemma which is essentially a part of Lemma 1 in [36]. Lemma Let x J with a spectral decomposition x = λ 1 c 1 + λ c + + λ r c r. Then the following statements hold. 1. The matrices x and x commute and thus share a common system of eigenvectors.. Every eigenvalue of x have the form (1/)(λ i +λ j ) for some i, j r. In particular, x KJ 0 (x KJ 0) if and only if x 0 ( x 0). The eigenvalues of x; λ i s, are amongst the eigenvalues of x. 7

39 Chapter Stochastic Symmetric Optimization Problems In this chapter, we use the Jordan algebraic characterization of symmetric cones to define the SSP problem in both the primal and dual standard forms. We then see how this problem can include some general optimization problems as special cases..1 The stochastic symmetric optimization problem We define a problem based on the DSP problem analogous to the way SLP problem is defined based on the DLP problem. To do so, we first introduce the definition of a DSP. Let J be a Euclidean Jordan algebra with dimension n and rank r. The DSP problem and its dual [36] are min c x max b T y m (P 1 ) s.t. a i x = b i, i = 1,,..., m (D 1 ) s.t. y i a i KJ x KJ 0; y R m, i=1 c 8

40 where c, a i J for i = 1,,..., m, b R m, x is the primal variable, y is the dual variable, and, as mentioned in.1, K J is the cone of squares of J. The pair (P 1, D 1 ) can be rewritten in the following compact form min c x max b T y (P ) s.t. Ax = b (D ) s.t. A T y KJ c x KJ 0; y R m, where A := (a 1 ; a ;... ; a m ) is a linear operator that maps J into R m and A T is a linear operator that maps R m into J such that x A T y = (Ax) T y. In fact, we can prove weak and strong duality properties for the pair (P, D ) as justification for referring to them as a primal dual pair; see, for example, [30]. In the rest of this section, we assume that m 1, m, n 1, n, r 1, and r are positive integers, and J 1 and J are Euclidean Jordan algebras with identities e 1 and e, dimensions n 1 and n, and ranks r 1 and r, respectively..1.1 Definition of an SSP in primal standard form We define the primal form of the SSP based on the primal form of the DSP. Given a i, t j J 1 and w j J for i = 1,,..., m 1 and j = 1,,..., m. Let A := (a 1 ; a ;... ; a m1 ) be a linear operator that maps x to the m 1 -dimensional vector whose i th component is a i x, b R m 1, c J 1, T := (t 1 ; t ;... ; t m ) be a linear operator that maps x to the m -dimensional vector whose i th component is t i x, W := (w 1 ; w ;... ; w m ) be a linear operator that maps y to the m -dimensional vector whose i th component is w i y, h R m, and d J. We also assume that A, b and c are deterministic data, and T, W, h and d are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P. Given this data, an SSP with recourse in 9

41 primal standard form is min s.t. c x + E [Q(x, ω)] Ax = b x KJ1 0, (.1.1) where Q(x, ω) is the minimum value of the problem min s.t. d(ω) y W (ω)y = h(ω) T (ω)x y KJ 0, (.1.) where x is the first-stage decision variable, y is the second-stage variable, and E[Q(x, ω)] := Q(x, ω)p (dω). Ω.1. Definition of an SSP in dual standard form In many applications we find that a defined SSP problem based on the dual standard form (D ) is more useful. In this part we define the dual form of the SSP based on the dual form of the DSP. Given a i J 1 and t i, w j J for i = 1,,..., m 1 and j = 1,,..., m. Let A := (a 1, a,..., a m1 ), b J 1, c R m 1, T := (t 1, t,..., t m1 ), W := (w 1, w,..., w m ), h J, and d R m. We also assume that A, b and c are deterministic data, and T, W, h and d are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P. Given this data, an SSP with recourse in dual standard form is max c T x + E [Q(x, ω)] s.t. Ax KJ1 b, (.1.3) where Q(x, ω) is the maximum value of the problem 30

42 max d(ω) T y s.t. W (ω)y KJ h(ω) T (ω)x, (.1.4) where x R m 1 is the first-stage decision variable, y R m is the second-stage variable, and E[Q(x, ω)] := Q(x, ω)p (dω). Ω In fact, it is also possible to define SSPs in mixed forms where the first-stage is based on primal problem (P ) while the second-stage is based on the dual problem (D ), and vice versa.. Problems that can be cast as SSPs It is interesting to mention that almost all conic optimization problems in real world applications are associated with symmetric cones, and that, as an illustration of the modeling power of conic programming, all deterministic convex programming problems can be formulated as deterministic conic programs (see [9]). As a consequence, by considering the stochastic counterpart of this result, it is straightforward to show that all stochastic convex programming problems can be formulated as stochastic conic programs. In this section we will see how SSPs can include some general optimization problems as special cases. We start with two-stage stochastic linear programs with recourse. Problem 1. Stochastic linear programs: It is clear that the space of real numbers R with Jordan multiplication x y := xy and inner product x y := xy forms a Euclidean Jordan algebra. Since a real number is a square of another real number if and only if it is nonnegative, the cone of squares of R is indeed R + ; the set of all nonnegative real numbers. This verifies that R + is a symmeric cone. The cone R p + of nonnegative orthants of R p is also symmetric because it is just 31

43 . ; +. ; +. ; Euclidean Jordan algebra R p = {(x 1 ; x ;... ; x p ) : x i R, i = 1,,..., p} Symmetric cone R p + = {x R p : x i 0, i = 1,,..., p} Conic inequality x R p + R p 0 x > 0 + Jordan multiplication x y = (x 1 y 1 ; x y ; ; x p y p ) = diag(x) y }{{} x Inner product x y = x 1 y 1 + x y + + x p y p = x T y Identity element e = (1; 1;... ; 1) Spectral decomposition x = x }{{} 1 (1; } 0;.. {{ 0) } x }{{} (0; } 1;.. {{ 0) } λ 1 c 1 λ c + x }{{} p (0; } 0;.. {{ 1) } λ p c p Cone rank rank(r p +) = p Expression for trace trace(x) = x 1 + x + + x p Expression for determinant det(x) = x 1 x x p Expression for Frobenius norm x = x 1 + x + + x p = x Expression for inverse x 1 = ( 1 x 1 ; 1 x ;... ; 1 x p ) (if x i 0 for all i) Expression for square root x 1/ = ( x 1 ; x ;... ; x p ) (if x i 0 for all i) Log barrier function ln det(x) = p i=1 ln x i if x i > 0 for all i Table.1: The Euclidean Jordan algebraic structure of the nonnegative orthant cone. the Cartesian product of the symmetric cones p time {}}{ R +, R +,..., R +. Table.1 summarizes the Euclidean Jordan algebraic structure of the nonnegative orthant cone R p +. When J 1 = R n 1 and J = R n ; the spaces of vectors of dimensions n 1 and n, respectively, with Jordan multiplication x y := diag(x)y and the standard inner product x y := x T y, then K J1 = R n 1 + and K J = R n + ; the cones of nonnegative orthants of R n 1 and R n, respectively, and hence the SSP problem (.1.1,.1.) becomes the SLP problem: min s.t. c T x + E [Q(x, ω)] Ax = b x 0, 3

44 where Q(x, ω) is the minimum value of the problem min s.t. d(ω) T y W (ω)y = h(ω) T (ω)x y 0. Two-stage SLP with recourse has many practical applications, see for example [15]. Problem. Stochastic second-order cone programs: When J 1 = E n 1 and J = E n with Jordan multiplication x y := (x T y, x 0 ȳ + y 0 x), and inner product x y := x T y, then K J1 = E n 1 + and K J = E n + ; the second-order cones of dimensions n 1 and n, respectively (see Example 0), and hence we obtain SSOCP with recourse. Table. 1 summarizes the Euclidean Jordan algebraic structure of the second-order cone in both single- and multiple-block forms. To introduce the SSOCP problem, we first introduce some notations that will be used throughout this part and in Chapter 5. For simplicity sake, we write the single-block second-order cone inequality x E p + 0 as x 0 (to mean that x E p +) when p is known from the context, and the multipleblock second-order cone inequality x E p 1 + Ep + Ep N + 0 as x N 0 (to mean that x E p 1 + E p + E p N + ) when p 1, p,..., p N are known from the context. It is immediately seen that, for every vector x R p where p = N i=1 p i, x N 0 if and only if x is partitioned conformally as x = (x 1 ; x ;... ; x N ) and x i 0 for i = 1,,..., N. We also write x y (x N y) or y x (y N x) to mean that x y 0 (x y N 0). We now are ready to introduce the definition of an SSOCP with recourse. Let N 1, N 1 be integers. For i = 1,,..., N 1 and j = 1,,..., N, let m 1, m, n 1, n, n 1i, n j be positive integers such that n 1 = N 1 i=1 n 1i and n = N i=1 n j. An SSOCP with recourse in primal standard form is defined based on deterministic data A R m 1 n 1, b [ ] A 0 1 The direct sum of two square matrices A and B is the block diagonal matrix A B :=. 0 B 33

45 Euclidean Jordan alg. J E p = {x : x = (x 0 ; x) R R p 1 } E p1 E p N Symmetric cone K J E p + = {x E p : x 0 x } E p1 + E p N + x KJ 0 (x KJ 0) x 0 (x 0) x N 0 (x N 0) Jordan product x y (x T y; x 0 ȳ + y 0 x) = Arw(x)y (x 1 y 1 ;... ; x N y N ) Inner product x y x T y x T 1 y x T N y N The identity e (1; 0) (e 1 ;... ; e N ) The matrix x Arw(x) ( ) ( ) Arw(x 1 ) Arw(x N ) 1 x Spectral decomposition x = (x 0 + x ) 1; Follows from decomp. of }{{} x λ 1 }{{} ( ) ( c (x 0 x ) 1; x ) each block x }{{} i, 1 i N x λ } {{ } c rank(k J ) N N Expression for trace(x) λ 1 + λ = x 0 i=1 trace(x i) Expression for det(x) λ 1 λ = x 0 x Π N i=1 det(x i) Frobenius norm x Inverse x 1 λ 1 + λ = x N i=1 x i λ 1 1 c 1 + λ 1 c = Jx det(x) (if det(x) 0; o/w x is singular) 1 ;... ; x 1 N ) (if x 1 i exists for all i; o/w x is singular) (x 1 Log barrier function ln (x 0 x ) N i=1 ln det x i Table.: The Euclidean Jordan algebraic structure of the second-order cone in singleand multiple-block forms. R m 1 and c R n 1 and random data T R m n 1, W R m n, h R m and d R n whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P. Given this data, the two-stage SSOCP in the primal standard form is the problem min s.t. c T x + E [Q(x, ω)] Ax = b x N1 0, (..1) where x R n 1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem min s.t. d(ω) T y W (ω)y = h(ω) T (w)x y N 0, (..) 34

46 where y R n is the second-stage variable and E[Q(x, ω)] := Q(x, ω)p (dω). Ω Note that if N 1 = n 1 and N = n, then SSOCP (..1,..) reduces to SLP. In fact, if N 1 = n 1, then n i = 1 for each i = 1,,..., N 1, and so x i E 1 + := {t R : t 0} for each i = 1,,..., N 1. Thus, the constraint x N1 0 means the same as x 0, i.e., x lies in the nonnegative orthant of R n 1. The same situation occurs for y N 0. Thus SLPs is a special case of SSOCP (..1,..). Stochastic quadratic programs (SQPs) are also a special case of SSOCPs. To demonstrate this, recall that a two-stage SMIQP with recourse is defined based on deterministic data C S n 1 +, c R n 1, A R m 1 n 1 and b R m 1 ; and random data H S n +, d R n, T R m n 1, W R m n, and h R m whose realizations depend on an underlying outcome in an event space Ω with a known probability function P. Given this data, an SQP with recourse is min s.t. q 1 (x, ω) = x T Cx + c T x + E[Q(x, ω)] Ax = b x 0, (..3) where x R n 1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem min s.t. q (y, ω) = y T H(ω)y + d(ω) T y W (ω)y = h(ω) T (ω)x y 0, (..4) where y R n is the second-stage decision variable and E[Q(x, ω)] := Q(x, ω)p (dω). Ω 35

47 Observe that the objective function of (..3) can be written as (see [1]) q 1 (x 1, ω) = ū + E[Q(x, ω)] 1 4 ct C 1 c where ū = C 1 / x + 1 C 1 / c. Similarly, the objective function of (..4) can be written as q (y, ω) = v 1 4 d(ω)t H(ω) 1 d (ω) where v = H(ω) 1 / y + 1 H(ω) 1 / d(ω). Thus, problem (..3,..4) can be transformed into the SSOCP: min u 0 s.t. ū C 1 / x = 1 C 1 / c Ax = b u 0, x 0, (..5) where Q(x, ω) is the minimum value of the problem min v 0 s.t. v H(ω) 1 / y = 1 H(ω) 1 / d(ω) W (ω)y = h(ω) T (ω)x v 0, y 0, (..6) where E[Q(x, ω)] := Q(x, ω)p (dω). Ω Note that the SQP problem (..3,..4) and the SSOCP problem (..5,..6) will have the same minimizing solutions, but their optimal objective values are equal up to constants. More precisely, the difference between the optimal objective values of (..4,..6) would be 1 d(ω)t H(ω) 1 d(ω). Similarly, the optimal objective values of (..3, 36

48 ..4) and (..5,..6) will differ by 1 ct C 1 c 1 Ω ( d(ω) T H(ω) 1 d(ω) ) P (dω). In 5.1 of Chapter 5, we describe two applications that illustrate the applicability of two-stage SSOCPs with recourse. Problem 3. Stochastic rotated quadratic cone programs: For each vector x R n indexed from 0, we write ˆx for the sub-vector consisting of entries through n 1; therefore x = (x 0 ; x 1 ; ˆx) R R R n. We let Ê n denote the n dimensional real vector space R R R n whose elements x are indexed with 0. The rotated quadratic cone [1] of dimension n is defined by Ê n + := {x = (x 0 ; x 1 ; ˆx) Ê n : x 0 x 1 ˆx, x 0 0, x 1 0}. The constraint on x that satisfies the relation x 0 x 1 ˆx is called a hyperbolic constraint. It is clear that the cones E n + and Ê n + have the same Euclidean Jordan algebraic structure. In fact, the later is obtained by rotating the former through an angle of thirty degrees in the x 0 x 1 -plane [1]. More specifically, by writing x ˆ 0 (x ˆ N 0) to mean that x E ˆp + (x E ˆp 1 + E ˆp + Eˆ p N + ), then one can easily see that the hyperbolic constraint (x 0 ; x 1 ; ˆx) ˆ 0 is equivalent to the second-order cone constraint (x 0 + x 1 ; x 0 x 1 ; ˆx) 0. In fact, x 0 + x 1 (x 0 x 1 ; ˆx) (x 0 + x 1 ) (x 0 x 1 ; ˆx) 4x 0 x 1 4x 0 x ˆx x 0 x 1 ˆx. So, if we are given the same setting as in Problem (..1,..), the two-stage stochastic rotated quadratic cone program (SRQCP) in the primal standard form is the problem 37

49 min s.t. c T x + E [Q(x, ω)] Ax = b x ˆ N1 0, where x R n 1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem min s.t. d(ω) T y W (ω)y = h(ω) T (w)x y ˆ N 0. In 5. of Chapter 5, we describe two applications that illustrate the applicability of two-stage SRQCPs with recourse. Problem 4. Stochastic semidefinite programs: When J 1 = S n 1 and J = S n ; the spaces of real symmetric matrices of orders n 1 and n, respectively, with Jordan multiplication X Y := 1 (XY + Y X) and inner product X Y := trace(xy ), then K J1 = S n 1 + and K J = S n 1 + ; the cones of real symmetric positive semidefinite matrices of orders n 1 and n, respectively (see Example 19). We simply write the linear matrix inequality X S p + 0 p as X 0 (to mean that X S p +). Hence the SSP problem (.1.1,.1.) becomes the SSDP problem (see also [9, Subsection.1]): min s.t. C X + E [Q(X, ω)] AX = b X 0, (..7) where Q(X, ω) is the minimum value of the problem min s.t. D(ω) Y W(ω)Y = h(ω) T (ω)x Y 0, (..8) 38

50 +λ 1 + vec(x) = λ 1 q1 +λ 1 q Euclidean Jordan alg. J S p = {X R p p : X T = X} vec(s p ) = {vec(x) R p : X T = X} Symmetric cone K J S p + = {X Sp : X 0} vec(s p + ) = {vec(x) Rp : X 0} Jordan product x y X Y = 1 (XY + Y X) vec(x) vec(y ) = 1 vec(xy + Y X) Inner product x y X Y = trace(xy ) vec(x) vec(y ) = vec(x) T vec(y ) The identity e I p vec(i p ) 1 The matrix X (I X + X I) 1 (I X + X I) The matrix X X X X X Spectral decomposition X = λ 1 q 1 q1 T }{{} q q T }{{} vec(q } {{ 1 T ) } vec(q } {{ T ) } C 1 C c 1 c +λ p q p qp T T + + λ p vec(q p qp ) }{{}}{{} C p c p rank(k J ) p p Table.3: The Euclidean Jordan algebraic structure of the positive semidefinite cone in matrix and vectorized matrix forms. where X S n 1 is the first-stage decision variable, Y S n is the second-stage variable, and E[Q(X, ω)] := Q(X, ω)p (dω). Ω Here, clearly, C S n 1, D S n, and the linear operator A : S n 1 R m 1 is defined by AX := (A 1 X; A X;... ; A m1 X) where A i S n 1 for i = 1,,..., m 1. The linear operators W : S n R m and T : S n R m 1 are defined in a similar manner. Problem (..7,..8) can also be written in a vectorized matrix form (see [5, Section 1]). See Table.3 for a summary of the Euclidean Jordan algebraic structure of the positive semidefinite cone in both matrix and vectorized matrix forms. Some applications leading to two-stage SSDPs with recourse can be found in [49]. Problem 5. Stochastic programming over complex Hermitian positive semidefinite matrices: Recall that a square matrix with complex entries is called Hermitian if it is equal to its 39

51 own conjugate transpose, and that the eigenvalues of a (positive semidefinite) Hermitian matrix are always (nonnegative) real valued. Let H p denote the space of complex Hermitian matrices of order p. Equipped with the Jordan multiplication X Y := (XY + Y X)/, the space H p forms a Euclidean Jordan algebra. By following a similar argument to that in Example 19 (but using the spectral theorem for complex Hermitian matrices instead of using the spectral theorem for real symmetric matrices, see [43, Theorem ]), we can show that the cone of squares of the space of complex Hermitian matrices of order p is the cone of complex Hermitian positive semidefinite matrices of orders p, denoted by H p +. The Euclidean Jordan algebraic structure of the cone of complex Hermitian positive semidefinite matrices is analogous to that of the cone of real symmetric positive semidefinite matrices. Another subclass of stochastic symmetric programming problems is obtained when J 1 = H n 1 and J = H n, and so we have that K J1 = H n 1 + and K J = H n 1 + ; the cones of complex Hermitian positive semidefinite matrices of orders n 1 and n, respectively. In fact, it is impossible to deal with the field of complex numbers directly as it is an unordered field. However, this can be overcome by defining a transformation that maps complex Hermitian matrices to symmetric matrices; see, for example, [35, 18]. Problem 6. Stochastic programming over quaternion Hermitian positive semidefinite matrices: We first recall some notions of quaternions and quaternion matrices. The entries of the field of real quaternions has the form x = x 0 + x 1 i + x j + x 3 k where x 0, x 1, x, x 3 R and i, j, and k are abstract symbols satisfying i = j = k = ijk = 1. The conjugate transpose of x is x = x 0 x 1 i x j x 3 k. If A is a p p quaternion matrix and A is its conjugate transpose, then A is called Hermitian if A = A. Note that, if A is Hermitian, then w Aw is a real number for any p dimensional column vector w with quaternion components. The matrix A is called positive semidefinite if it is Hermitian 40

52 and w Aw 0 for any p dimensional column vector w with quaternion components. A (positive semidefinite) quaternion Hermitian matrix has (nonnegative) real eigenvalues. Let QH p denote the space of quaternion Hermitian matrices of order p. Equipped with the Jordan multiplication X Y := (XY + Y X)/, the space QH p forms a Euclidean Jordan algebra. It has been shown [46] that the spectral theorem holds also for quaternion matrices (i.e., any quaternion Hermitian matrix is conjugate to a real diagonal matrix). So, by following argument similar to that in Example 19, we can show that the cone of squares of the space of quaternion Hermitian matrices of order p is the cone of quaternion Hermitian positive semidefinite matrices of orders p, denoted by QH p +. The Euclidean Jordan algebraic structure of the cone of complex Hermitian positive semidefinite matrices is analogues to that of the cone of real symmetric positive semidefinite matrices and that of the cone of complex Hermitian positive semidefinite matrices. The last class of problems is obtained when J 1 = QH n 1 and J = QH n with X Y := (XY + Y X)/. Hence K J1 = QH n 1 + and K J = QH n 1 + ; the cones of quaternion Hermitian positive semidefinite matrices of orders n 1 and n, respectively. Similar to Problem (5), there is a difficulty in dealing with quaternion entries, but this difficulty can be overcome by defining a transformation that maps Hermitian matrices with quaternion entries to symmetric matrices; see, for example, [35, 18]. Of course, it is also possible to consider a mixed problem such as that one obtained by selecting J 1 = R n 1 and J = E n, or by selecting J 1 = E n 1 and J = S n, etc. In many applications we find that the models lead to SSPs in dual standard form. The focus of this dissertation is on the SSP in dual standard form. In the next two chapters, we will focus on interior point algorithms for solving the SSP problem (.1.3) and (.1.4). 41

53 Chapter 3 A Class of Polynomial Logarithmic Barrier Decomposition Algorithms for Stochastic Symmetric Programming In this chapter, we turn our attention to the study of logarithmic barrier decompositionbased interior point methods for the general SSP problem. Since our algorithm applies to all symmetric cones, this work extends Zhao s work [48] which focuses particularly on SLP problem (Problem 1), and Mehrotra and Özevin s work [5] which focuses particularly on the SSDP problem (Problem 4). More specifically, we use logarithmic barrier interior point methods to present a Bender s decomposition-based algorithm for solving the SSP problem (.1.3) and (.1.4) and then prove its polynomial complexity. Our convergence analysis proceeds by showing that the log barrier associated with the recourse function of SSPs behaves as a strongly self-concordant barrier and forms a self-concordant family on the first stage solutions. Our procedure closely follows that of [5] (which essentially follows the procedure of [48]), but our setting is much more general. The results of this 4

54 chapter have been submitted for publication [3]. 3.1 The log barrier problem for SSPs We begin by presenting the extensive formulation of the SSP problem (.1.3,.1.4) and the log barrier [30] problems associated with them Formulation and assumptions We now examine (.1.3,.1.4) when the event space Ω is finite. Let {(T (k), W (k), h (k), d (k) ) : k = 1,,..., K} be the set of the possible values of the random variables ( T (ω), W (ω), h(ω), d(ω) ) and let p k := P ( T (ω), W (ω), h(ω), d(ω) ) = ( T (k), W (k), h (k), d (k)) be the associated probability for k = 1,,..., K. Then Problem (.1.3,.1.4) becomes K max c T x + p k Q (k) (x) k=1 s.t. Ax KJ1 b, (3.1.1) where, for k = 1,,..., K, Q (k) (x) is the maximum value of the problem max d (k)t y (k) s.t. W (k) y (k) KJ h (k) T (k) x, (3.1.) where x R m 1 is the first-stage decision variable, and y (k) R m is the second-stage variable for k = 1,,..., K. Now, for convenience we redefine d (k) as d (k) := p k d (k) for 43

55 k = 1,,..., K, and rewrite Problem (3.1.1, 3.1.) as K max c T x + Q (k) (x) k=1 s.t. Ax + s = b (3.1.3) s KJ1 0, where, for k = 1,,..., K, Q (k) (x) is the maximum value of the problem max s.t. d (k)t y (k) W (k) y (k) + s (k) = h (k) T (k) x s (k) KJ 0. (3.1.4) Let ν (k) be the second-stage dual multiplier. The dual of Problem of (3.1.4) is min s.t. (h (k) T (k) x) ν (k) W (k)t ν (k) = d (k) ν (k) KJ 0. (3.1.5) The log barrier problem associated with Problem (3.1.3, 3.1.4) is K max η(µ, x) := c T x + ρ (k) (µ, x) + µ ln det s k=1 s.t. Ax + s = b (3.1.6) s KJ1 0, where, for k = 1,,..., K, ρ (k) (µ, x) is the maximum value of the problem max s.t. d (k)t y (k) + µ ln det s (k) W (k) y (k) + s (k) = h (k) T (k) x s (k) KJ 0. (3.1.7) 44

56 (Here µ > 0 is a barrier parameter.) If for some k, Problem (3.1.7) is infeasible, then we define K k=1 ρ(k) (µ, x) :=. The log barrier problem associated with Problem (3.1.5) is the problem min s.t. (h (k) T (k) x) ν (k) µ ln det ν (k) W (k)t ν (k) = d (k) ν (k) KJ 0, (3.1.8) which is the Lagrangian dual of (3.1.7). Because Problems (3.1.7) and (3.1.8) are, respectively, concave or convex, (y (k), s (k) ) and ν (k) are optimal solutions to (3.1.7) and (3.1.8), respectively, if and only if they satisfy the following optimality conditions: s (k) ν (k) = µ e, W (k) y (k) + s (k) = h (k) T (k) x, W (k)t ν (k) = d (k), (3.1.9) s (k) KJ 0, ν (k) KJ 0. The elements s (k) and ν (k) may not operator commute. So, the equality s (k) = µ ν (k) 1 may not hold (see Remark 1 in Chapter 1). In fact, we need to scale the optimality conditions (3.1.9) so that the scaled elements are simultaneously decomposed. Let p KJ 0. From now on, with respect to p, we define s (k) := p 1 s (k), ν (k) := pν (k), h := p 1 h, W (k) := p 1 W (k), and T (k) := p 1 T (k). Recall that, p p 1 = p p 1 = I. We have the following lemma and proposition. Lemma (Lemma 8, [36]). Let p be an invertible element in J. Then s ν = µe if and only if s ν = µe. Proposition (y, s, ν) satisfies the optimality conditions ( 3.1.1) if and only if 45

57 (y, s, ν) satisfies the relaxed optimality conditions: s (k) ν (k) = µe, W (k) y (k) + s (k) = h (k) T (k) x, W (k)t ν (k) = d (k), (3.1.10) s (k) KJ 0, ν (k) KJ 0. Proof. The proof follows from Lemma 3.1.1, Proposition 1.3., and the fact that p(k J ) = K J, and likewise, as an operator, p(int(k J )) = int(k J ), because K J is symmetric. To our knowledge, this is the first time this effective way of scaling is being used for stochastic programing, but it was originally proposed by Monteiro [8] and Zhang [47] for DSDP, and after that generalized by Schmieta and Alizadeh [36] for DSP. With this change of variable Problem (3.1.6, 3.1.7) becomes K max η(µ, x) := c T x + ρ (k) (µ, x) + µ ln det s k=1 s.t. Ax + s = b (3.1.11) s KJ1 0, where, for k = 1,,..., K, ρ (k) (µ, x) is the maximum value of the problem max s.t. d (k)t y (k) + µ ln det s (k) W (k) y (k) + s (k) = h (k) T (k) x s (k) KJ 0, (3.1.1) 46

58 and Problem (3.1.8) becomes min s.t. ( h (k) T (k) x) ν (k) µ ln det ν (k) W (k) T ν (k) = d (k) ν (k) KJ 0, (3.1.13) Note that the Problem (3.1.7) and Problem (3.1.1) have the same maximizer, but their optimal objective values are equal up to a constant. More precisely, the difference between their optimal objective values would be µ ln det p (see item 1 of Lemma (1.3.)). Similarly, Problems (3.1.8) and (3.1.13) have the same minimizer but their optimal objective values differ by the same value (µ ln det p ). The SSP (3.1.11, 3.1.1) can be equivalently written as a DSP: max s.t. = η(µ,x) {}}{ K ) c T x + µ ln det s + (d (k)t y (k) + µ ln det s (k) k=1 } {{ } = ρ (k) (µ,x) Ax + s = b W (k) y (k) + s (k) = h (k) T (k) x, k = 1,,..., K s KJ1 0, s (k) KJ 0, k = 1,,..., K. (3.1.14) In Subsection 3.1., we need to compute x η(µ, x) and η(µ, x) so that we can determine the Newton direction defined by x := { η(µ, x)} 1 x η(µ, x) for our algorithms. We shall see that each choice of p leads to a different search direction (see Algorithm 1). As we mentioned earlier, we are interested in the class of p for which the scaled elements are simultaneously decomposed. In view of Theorem 1.3.5, it is enough to choose p so that s (k) and ν (k) operator commute. That is, we restrict our attention to 47

59 the following set of scalings: C(s (k), ν (k) ) := {p KJ 0 s (k) and ν (k) operator commute}. We introduce the following definition [1]. Definition The set of directions x arising from those p C(s (k), ν (k) ) is called the commutative class of directions, and a direction in this class is called a commutative direction. It is clear that p = e may not be in C(s (k), ν (k) ). The following choices of p shows that the set C(s (k), ν (k) ) is not empty. We may choose p = s (k)1/ and get s (k) = e. In fact, by using Lemma 1.3., it can be seen that s (k) = p 1 s (k) = s (k) 1/ ( s (k)1/) ( = s (k) 1/ s (k)1/) e ( = s (k)1/ 1 s (k)1/) e = e. We may choose p = ν (k) 1/ and get ν (k) = e. To see this, note that by using Lemma 1.3. we have ν (k) = p ν (k) = ν (k) 1/ ( ν (k)1/) ( = ν (k) 1/ ν (k)1/) e ( = ν (k)1/ 1 ν (k)1/) e = e. The above two choices of directions are well-known in commutative classes. These two choices form a class of Newton directions derived by Helmberg et al [0], Monteiro [8], 48

60 and Kojima et al [], and referred to as the HRVW/KSH/M directions. It is interesting to mention that the popular search direction due to Nesterov-Todd (NT) is also in a commutative class, because in this case one chooses p in such a way that ν (k) = s (k). More precisely, in the NT direction we choose p = ( ν (k)1/ ( ν (k)1/ s (k) ) 1/ ) 1/ = ( s (k) 1/ ( s (k)1/ ν (k) ) 1/ ) 1/, and, using Lemma 1.3., we get p ν (k) = p ν (k) = = = = = ν (k)1/ ( ν (k)1/ s (k) ) 1/ 1 ν (k) ν (k) 1/ ( ν (k)1/ s (k) ) 1/ ν (k) ( ν (k) 1/ ( ν (k)1/ ) 1/ s (k) ν (k) 1/) ν (k) ( ν (k) 1/ ( ν (k)1/ ) ) 1/ s (k) ( ν (k) 1/ ν (k)1/) s (k) e = s (k). Thus, ν (k) = pν (k) = p 1 s (k) = s (k). We proceed by making some assumptions. First we define F 1 := { x : Ax + s = b, s KJ1 0 } ; F (k) (x) := { y (k) : W (k) y (k) + s (k) = h (k) T (k) x, s (k) KJ 0 } for k = 1,,..., K; F (k) := { x: F (k) (x) } for k = 1,,..., K; F := K k=1 F (k) ; F 0 := F 1 F ; 49

61 F := { (x, s, γ) (y (1), s (1), ν (1),..., y (K), s (K), ν (K) ) : Ax + s = b, s KJ1 0, W (k) y (k) + s (k) = h (k) T (k) x, s (k) KJ 0, W (k)t ν (k) = d (k), ν (k) KJ 0, k = 1,,..., K; A T γ + K k=1 T (k)t ν (k) = c }. Here γ is the first-stage dual multiplier. Now we make Assumption The matrices A and W (k) for all k have full column rank. Assumption The set F is nonempty. Assumption is for convenience. Assumption 3.1. guarantees strong duality for first- and second-stage SSPs. In other words, it requires that Problem (3.1.14) and its dual have strictly feasible solutions. This implies that problems ( ) have a unique solutions. Note that for a given µ > 0, K k=1 ρ(k) (µ, x) < if and only if x F. Hence, the feasible region for (3.1.11) is described implicitly by F 0. Throughout the paper we denote the optimal solution of the first-stage problem (3.1.11) by x(µ), and the solutions of the optimality conditions (3.1.10) by (y (k) (µ, x), s (k) (µ, x), ν (k) (µ, x)). The following proposition establishes the relationship between the optimal solutions of Problems (3.1.11, 3.1.1) and those of Problem (3.1.14). Proposition Let µ > 0 be fixed. Then (x(µ), s(µ); y (1) (µ), s (1) (µ); ; y (K) (µ), s (K) (µ)) is the optimal solution of (3.1.14) if and only if (x(µ), s(µ)) is the optimal solution of (3.1.11) and (y (1) (µ), s (1) (µ); ; y (K) (µ), s (K) (µ)) are the optimal solutions for (3.1.1) for given µ and x = x(µ) Computation of x η(µ, x) and xx η(µ, x) In order to compute x η(µ, x) and xx η(µ, x), we need to determine the derivative of ρ (k) (µ, x) with respect to x. Let (y (k), ν (k), s (k) ) := (y (k) (µ, x), ν (k) (µ, x), s (k) (µ, x)). We 50

62 first note that from (3.1.10) we have that ( h (k) T (k) x) ν (k) = ( W (k) y (k) + s (k) ) ν (k) = y (k)t ( W (k)t ν (k) )+ s (k) ν (k) = y (k)t d (k) +r µ, where in the second equality we used the observation that m W (k) y (k) ν (k) = i=1 (y (k) i w (k) i ) ν (k) = m i=1 y (k) i ( w (k) i ν (k) ) = y (k)t ( W (k)t ν (k) ) (here w (k) i J is the i th column of W (k) ), and in the last equality we used that trace(e ) = rank(j ) = r. This implies that ( h (k) T (k) x) ν (k) µ ln det ν (k) = ρ (k) (µ, x) µ ln det s (k) + r µ µ ln det ν (k) = ρ (k) (µ, x) + r µ µ ln det( s (k) ν (k) ) = ρ (k) (µ, x) + r µ (1 lnµ). Thus, ρ (k) (µ, x) = ( h (k) T (k) x) ν (k) µ ln det ν (k) r µ (1 lnµ). (3.1.15) Differentiating (3.1.15) and using the optimality conditions (3.1.10), we obtain x ρ (k) (µ, x) = x (( h (k) T (k) x) ν (k) ) µ x ln det ν (k) = ( x ( h (k) T (k) x)) T ν (k) + ( x ν (k) ) T ( h (k) T (k) x) µ ( x ν (k) (k) 1 ) T ν = T (k)t ν (k) + ( x ν (k) ) T ( W (k) y (k) + s (k) ) µ ( x ν (k) (k) 1 ) T ν = T (k)t ν (k) + ( x ν (k) ) T ( W (k) y (k) ) + ( x ν (k) ) T ( s (k) µ ν (k) 1 ) = T (k)t ν (k) + ( x ν (k) ) T ( W (k) y (k) ) From (3.1.10) we have that implies that W (k) i ν (k) = d (k) i, where d (k) i is the i th entry of d (k). This 51

63 m ( x ν (k) ) T ( W (k) y (k) ) = ( x ν (k) ) T = = m i=1 m i=1 = 0. i=1 y (k) i w (k) i y (k) i ( x ν (k) ) T w (k) i y (k) i x ( ν (k) w (k) i ) Thus, x ρ (k) (µ, x) = T (k)t ν (k) and xx ρ (k) (µ, x) = T (k)t x ν (k). Therefore, we also need to determine the derivative of ν (k) with respect to x. Differentiating (3.1.10) with respect to x, we get the system x ν (k) = µ s (k) 1 x s (k), W (k) x y (k) + x s (k) = T (k), (3.1.16) W (k)t x ν (k) = 0. Solving the system (3.1.16), we obtain x s (k) = s (k)1/ P (k) s (k) 1/ T (k), x ν (k) = µ s (k) 1/ P (k) s (k) 1/ T (k), (3.1.17) x y (k) = R (k) 1 W (k) T s (k) 1 T (k), where R (k) := R (k) (µ, x) = W s (k)t (k) 1 W (k), P (k) := P (k) (µ, x) = I s (k) 1/ W (k) R (k) 1 W s (k) T (k) 1/. (3.1.18) Observing that, by differentiating Ax + s = b with respect to x, we get x s = A. 5

64 We then have x η(µ, x) = c = c K x ρ (k) (µ, x) + µ ( x s) T s 1, k=1 K T (k)t ν (k) µ A T s 1, k=1 (3.1.19) and xx η(µ, x) K = T (k)t x ν (k) + µ A T s 1 x s = µ k=1 K T s (k)t (k) 1/ P s (k) (k) 1/ T (k) µ A T s 1 A. k=1 (3.1.0) 3. Self-concordance properties of the log-barrier recourse The notion of so called self-concordant functions introduced by Nesterov and Nemirovskii [30] allows us to develop polynomial time path following interior point methods for solving SSPs. In this section, we prove that the recourse function with log barrier is a strongly self-concordant function and leads to a strongly self-concordant family with appropriate parameters Self-concordance of the recourse function This subsection is devoted to show that η(µ, ) is a µ strongly self-concordant barrier on F 0. First, we have the following definition. Definition 3..1 (Nesterov and Nemirovskii [30, Definition.1.1]). Let E be a finitedimensional real vector space space, G be an open nonempty convex subset of E, and let 53

65 f be a C 3, convex mapping from G to R. Then f is called α-self-concordant on G with the parameter α > 0 if for every x G and h E, the following inequality holds: 3 xxxf(x)[h, h, h] α 1/ ( xxf(x)[h, h]) 3/. (3..1) An α-self-concordant function f on G is called strongly α-self-concordant if f tends to infinity for any sequence approaching a boundary point of G. We note that in the above definition the set G is assumed to be open. However, relative openness would be sufficient to apply the definition. See also [30, Item A, Page 57]. The proof of strongly self-concordance of the function η(µ, ) relies on two lemmas that we state and prove below. We point out that the proof of the first lemma is shown in [30, Proposition 5.4.5] for s S n 1 ++ := int(s n 1 + ), i.e., s lies in the interior of the cone of real symmetric positive semidefinite matrices of orders n 1. We now generalize this proof for the case where s lies in the interior of an arbitrary symmetric cone of dimension n 1, where n 1, as mentioned before, is any positive integer. Lemma For any fixed µ > 0, the function f(s) := µ ln det s is a µ strongly self-concordant barrier on K J1. Proof. Let {s i } i=1 be any sequence in K J1. It is clear that the function f(s i ) tends to when s i approaches a point from boundary of K J1. It remains to show that (3..1) is satisfied for f(s) on K J1. Let s KJ1 0 and h J 1, then there exists an element u J 1 such that s = u, and therefore, by Lemmas and 1.3., we have that s f(s)[h] = µ{ln det (s + th)} t=0 = µ{ln det (u + th)} t=0 = µ{ln det ( u(e 1 + t u 1 h))} t=0 = µ{ln det (u ) + ln det (e 1 + t u 1 h))} t=0 54

66 = µ trace( u 1 h) det e 1 = µ(s 1 h), ssf(s)[h, h] = s ( s f(s)[h]) h = µ ( s (s 1 h)) h = µ (D s (s 1 h)) h = µ (s h) h = µ{(s 1 h) (s 1 h)}, 3 sssf(s)[h, h, h] = s ( ssf(s)[h, h]) h = µ ( s ((s 1 h) (s 1 h))) h = µ (D s ((s 1 h) (s 1 h))) h = µ ((s h) (s 1 h)) h = µ{(s 1 h) ((s 1 h) (s 1 h))}. Let ĥ = s 1 h J 1, and λ 1, λ,..., λ p be its eigenvalues. The result is established by observing that µ trace(ĥ3 ) = µ }{{} = 3 sss f(s)[h,h,h] p λ i 3 µ i=1 { p i=1 λ i } 3/ = µ {trace(ĥ )} 3/. }{{} = µ 1/ { xxf(s)[h,h]} 3/ Lemma 3... For any fixed µ > 0, ρ (k) (µ, x) is a µ strongly self-concordant barrier on F (k), k = 1,,, K. Proof. Let {x i } i=1 be any sequence in F (k). It is clear that the function ρ (k) (µ, x i ) tends to when x i approaches a point from boundary of F (k). It remains to show that (3..1) is satisfied for ρ (k) (µ, x) on F (k). For any µ > 0, x F (k), and d R m 1, we define the univariate function Ψ (k) (t) := xx ρ (k) (µ, x + td)[d, d]. 55

67 Note that Ψ (k) (0) = xx ρ (k) (µ, x)[d, d] and Ψ (k) (0) = 3 xxx ρ (k) (µ, x)[d, d, d]. So, to prove the lemma, it is enough to show that Ψ (k) (0) µ Ψ (k) (0) 3/. Let ( ν (k) (t), s (k) (t), P (k) (t), R (k) (t)) := ( ν (k) (µ, x+td), s (k) (µ, x+td), P (k) (µ, x+td), R (k) (µ, x+ td)). We also define u (k) (t) := µ P (k) (t) s (k) 1/ (t) T (k) (t)d. By the notations introduced in 4, we have ( ν (k), s (k) ) = ( ν (k) (0), s (k) (0)). We also let (P (k), R (k), u (k) ) := (P (k) (0), R (k) (0), u (k) (0)). Notice that, by the definition of P (k), we have P (k) = P (k). Using (3.1.17) and (3.1.18), we get Ψ (k) (0) = xx ρ (k) (µ, x)[d, d] = ( T (k)t x ν (k) )[d, d] ( s ) = µ T (k) (t) T (k) 1/ P (k) s (t) (k) 1/ T (k) [d, d] ( s s ) = µ d T T (k) T (k) 1/ (t) P (t) (k) (k) 1/ T (k) d = u (k) u (k) = u (k). Hence Ψ (k) (0) = u (k) u (k). So, in order to bound Ψ (k) (0), we need to compute the derivative of u (k) with respect to t. Using (3.1.18), we have u (k) = µ {P (k) = { µ s (k) 1/ (t) s (k) 1/ (t) } T (k) d s (k) 1/ (t) W (k) R (k) 1 W (k) T s (k) 1/ (t) } T (k) d 56

68 = { µ s (k) 1/ (t) s (t) W (k) 1/ (k) R (k) 1 W (k) T s (k) 1/ (t) s (t) (k) 1/ W (k) (R (k) 1 ) W (k) T s (k) 1/ (t) s (t) (k) 1/ W (k) R (k) 1 W (k) T( s ) } (k) 1/ (t) T (k) d = { µ s (k) 1/ (t) s (t) W (k) 1/ (k) R (k) 1 W (k) T s (k) 1/ (t) + s (t) (k) 1/ W (k) R (k) 1 (R (k) ) R (k) 1 W (k) T s (k) 1/ (t) s (t) (k) 1/ W (k) R (k) 1 W (k) T( s (k) 1/ (t) s (k) 1/ (t) )} + s (k) 1/ (t) s (k) 1/ (t) T (k) d = { µ s (k) 1/ (t) s (t) W (k) 1/ (k) R (k) 1 W (k) T s (k) 1/ (t) + s (t) (k) 1/ W (k) R (k) 1 W (k) T( s (k) 1/ (t) s (k) 1/ (t) + s (k) 1/ (t) s (t) ) W (k) 1/ (k) R (k) 1 W (k) T s (k) 1/ (t) s (t) (k) 1/ W (k) R (k) 1 W (k) T( s (k) 1/ (t) s (k) 1/ (t) )} + s (k) 1/ (t) s (k) 1/ (t) T (k) d = { ( µ s (k) 1/ (t) I W (k) R (k) 1 W (k) T s ) (k) 1/ (t) s (t) (k) 1/ W (k) R (k) 1 W (k) T( s (k) 1/ (t) s (k) 1/ (t) )( + s (k) 1/ (t) s (k) 1/ (t) W (k) R (k) 1 W (k) T s )} (k) 1/ (t) + I T (k) d = { µ s (k) 1/ (t) s (t) (k) 1/ W (k) R (k) 1 W (k) T( s (k) 1/ (t) s (k) 1/ (t) )}( + s (k) 1/ (t) s (k) 1/ (t) I W (k) R (k) 1 W (k) T s ) (k) 1/ (t) T (k) d = { µ s (k) 1/ (t) s (t) (k) 1/ W (k) R (k) 1 W (k) T( s (k) 1/ (t) s (k) 1/ (t) )}( 1P ) + s (k) 1/ (t) s (k) 1/ (t) s (t) (k) 1/ (k) s (k) 1/ (t) T (k) d { = s (k) 1/ (t) s (t) (k) 1/ W (k) R (k) 1 W (k) T( s (k) 1/ (t) s (k) 1/ (t) )} 1u + s (k) 1/ (t) s (k) 1/ (t) s (t) (k) 1/ (k). Notice that, for any ξ R m, we have u (k) s (k) 1/ (t) W (k) ξ = = (P (k) s (k) 1/ (t) ( s (k) 1/ (t) ) T (k) d s (t) (k) 1/ W (k) ξ ) T (k) d P (k) s (t) (k) 1/ W (k) ξ 57

69 This implies that = ( ) ( s (k) 1/ (t) T (k) d s (t) (k) 1/ W (k) ξ = 0. Ψ (k) (0) = u (k) u (k) = u (k) s (t) (k) 1/ W (k) R (k) 1 (k) W T s (t) W (k) 1/ (k) } {{ } = R (k) ) ξ ( 1u ) s (k) 1/ (t) s (t) (k) 1/ (k). (3..) By (3..), (3.1.17) and (3.1.18) and using norm inequalities, we get Ψ (k) (0) = µ 1/ u (k) s (k) 1/ (t) s (k)1/ (t) u (k) ( ) = µ 1/ u (k) s (k) 1/ (t) s (k)1/ (t) + s (k)1/ (t) s (k) 1/ (t) u (k) ( u = µ 1/ (k) s (k)1/ (t) s (k) 1/ (t) s (k) 1/ (t) ) + s (k) 1/ (t) s (k) 1/ (t) s (k)1/ (t) u (k) ( u ) = µ 1/ (k) s (k)1/ (t) s (k) 1/ (t) s (k)1/ (t) u (k) u = µ 1/ (k) s (k)1/ (t) s (k) 1 (t) s (k)1/ (t) u (k) u = µ 1/ (k) ν (t) ν (k) 1/ (k) (t) ν (k) 1/ (t) u (k) u = µ 1/ (k) ν (t) ν (k) 1/ (k) (t), ν (k) (t) ν (k) 1/ (t) u (k) u = µ 1/ (k) ν (t) ν ν (k) 1/ (k) (t), {ν (k) (µ, x + td)} (k) 1/ t=0 (t) u (k) u = µ 1/ (k) ν (t) ν (k) 1/ (k) (t), x ν (k) [d] ν (k) 1/ (t) u (k) u = µ 1/ (k) e, ν (k) 1/ (t) x ν (k) [d] ν (k) 1/ (t) u (k) u = µ 1/ (k) ν (k) 1/ (t) x ν (k) [d] ν (k) 1/ (t) u (k) µ 1/ u (k) ν (k) 1/ (t) x ν (k) [d] ν (k) 1/ (t) u (k) µ 1/ u (k) ν (k) 1/ (t) x ν (k) [d] ν (k) 1/ (t) = µ 1/ u (k) ν (k) 1/ (t) x ν (k) d = µ 1 u (k) s (k)1/ (t) x ν (k) d 58

70 = µ 1/ u (k) u (k) = µ 1/ Ψ (k) (0) 3/. The lemma is established. Theorem For any fixed µ > 0, η(µ, x) is a µ strongly self-concordant barrier on F 0. Proof. It is trivial to see that the linear function c T x is a µ strongly self-concordant barrier on F 1 (indeed, both sides of (3..1) are identically zero). By Lemma 3..1, one can see that the function µ ln det s is also a µ strongly self-concordant barrier on F 1. By Lemma 3.. and [30, Proposition.1.1(ii)], we can conclude that K k=1 ρ(k) (µ, x) is a µ strongly self-concordant barrier on F. The theorem is then established by [30, Proposition.1.1(ii)]. 3.. Parameters of the self-concordant family We have shown that the recourse function η(µ, ) is a strongly self-concordant function. Having such a property, this function enjoys many nice features. In this subsection, we show that the family of functions { η(µ, ) : µ > 0} is a strongly self-concordant family with appropriate parameters. We first introduce the definition of self-concordant families. Definition 3.. (Nesterov and Nemirovskii [30, Definition 3.1.1]). Let R ++ be the set of all positive real numbers. Let G be an open nonempty convex subset of R n. Let µ R ++ and let f µ : R ++ G R be a family of functions indexed by µ. Let α 1 (µ), α (µ), α 3 (µ), α 4 (µ) and α 5 (µ) : R ++ R ++ be continuously differentiable functions on µ. Then the family of functions {f µ } µ R++ is called strongly self-concordant with the parameters α 1, α, α 3, α 4, α 5, if the following conditions hold: (i) f µ is continuous on R ++ G, and for fixed µ R ++, f µ is convex on G. f µ has three partial derivatives on G, which are continuous on R ++ G and continuously differentiable with respect to µ on R

71 (ii) For any µ R ++, the function f µ is strongly α 1 (µ)-self-concordant. (iii) For any (µ, x) R ++ G and any h R n, { x f µ (µ, x)[h]} {ln α 3 (µ)} x f µ (µ, x)[h] α 4 (µ)α 1 (µ) 1 ( xx f µ (µ, x)[h, h] ) 1, { xxf µ (µ, x)[h, h]} {ln α (µ)} xxf µ (µ, x)[h, h] α 5 (µ) xxf µ (µ, x)[h, h]. In this section, we need to compute { x η(µ, x)}, and in order to do so we need to determine the partial derivative of ν (k) (µ, x) with respect to µ. Let (y (k), ν (k), s (k) ) denote the partial derivatives of (y (k) (µ, x), ν (k) (µ, x), s (k) (µ, x)) with respect to µ. Differentiating (3.1.10) with respect to µ, we get the system s (k) ν (k) + ν (k) s (k) = e, W (k) y (k) + s (k) = 0, (3..3) W (k)t ν (k) = 0. Solving the system (3..3), we obtain s (k) = W (k) R (k) 1 W (k) T s (k) 1, ν (k) = s (k) 1/ (t) P (k) e, (3..4) y (k) = R (k) 1 W (k) T s (k) 1. The proof of strongly self-concordance of the family { η(µ, ) : µ > 0} depends on the following two lemmas. Lemma For any µ > 0, x F 0 and h R m 1, the following inequality holds: ( { x η(µ, x) T [h]} (r1 + Kr ) µ xx η(µ, x)[h, h]) 1/. 60

72 Proof. By differentiating (3.1.19) with respect to µ and applying (3..4), we obtain { x η(µ, x)} = K k=1 T s (k)t (k) 1/ (t) P (k) e A T s 1 = 1 Bε, µ where B R m 1 (Kn +n 1 ) is defined by B := [ µ s T (1) (t) T (1) 1/ P (1),..., µ T s (k)t (K) 1/ (t) P (K), µ A T s 1/ ] and Notice that, in view of (3.1.0), we have ε := (e ;... ; e }{{} ; e 1 ) J J J }{{} 1. K times K times BB T = µ K k=1 T s (k)t (k) 1/ (t) P (k) s (t) (k) 1/ T (k) + µ A T s 1 A = xx η(µ, x). This gives { x η(µ, x) T } { xx η(µ, x)} 1 { x η(µ, x)} = 1 µ ε BT [BB T ] 1 Bε 1 µ ε ε = 1 µ (e 1 e 1 + Ke e ). Recall that e 1 e 1 = rank(j 1 ) = r 1 and e e = rank(j ) = r. It follows that { x η(µ, x) T } { xx η(µ, x)} 1 { x η(µ, x)} (r 1 + Kr ). (3..5) µ We then have { x η(µ, x) T [h]} ( { x η(µ, x) T } { xx η(µ, x)} 1 { x η(µ, x)} ) 1/ ( xx η(µ, x)[h, h]) 1/ ( 1/ (r1 + Kr ) µ xx η(µ, x)[h, h]), as desired. 61

73 Lemma For any µ > 0, x F 0 and h R m 1, the following inequality holds: { xx η(µ, x)[h, h]} r µ xx η(µ, x)[h, h]. Proof. Let ( ν (k), s (k), P (k), R (k) ) := ( ν (k) (µ, x), s (k) (µ, x), P (k) (µ, x), R (k) (µ, x)). We fix h R m 1 and define u (k) := µ P (k) s (k) 1/ T (k) h. Now by following some similar steps as in the proof of Lemma 3.., and using (3..), (3.1.18), (3..3), (3.1.10), and (3..4), we get (u (k) u (k) ) = u (k) u (k) ( 1u ) = u (k) s (k) 1/ (t) s (t) (k) 1/ (k) = u (k) s (k)1/ (t) s (k) 1 (t) s (k)1/ (t) u (k) = u (k) ν (t) ν (k) 1/ (k) (t) ν (k) 1/ (t) u (k) = u (k) ν (t) ν (k) 1/ (k) (t), ν (k) (t) ν (k) 1/ (t) = u (k) e, ν (k) 1/ (t) ν (k) (t) ν (k) 1/ (t) u (k) = u (k) ν (k) 1/ (t) ν (k) (t) ν (k) 1/ (t) u (k) = u (k) ν (k) 1 (t) ν (k) (t) u (k) = µ 1 u (k) s (k) (t) ν (k) (t) u (k) u (k) = µ 1 u (k) e ν (k) (t) s (k) (t) u (k) = µ 1 u (k) e µ s (k) 1 (t) s (k) (t) u (k) µ 1 u (k) e µ s (k) 1/ (t) s (k) (t) = µ 1 u (k) e µ s (t) (k) 1/ W (k) R (k) 1 W (k) T s (k) 1 = µ 1 u (k) e µ (e P (k) e ) µ 1 u (k) e, where the last inequality is obtained by observing that e P (k) e KJ 0, which can be 6

74 immediately seen by noting that P (k) I. Recall that e = rank(j ) = r. This gives us (u (k) u (k) ) r µ u (k) u (k). From (3.1.0), we have that K xx η(µ, x)[h, h] = u (k) u (k) µ (A T s 1 A)[h, h]. k=1 Therefore, for any h R n, we have { xx η(µ, x)[h, h]} K (u (k) u (k) ) + (A T s 1 A)[h, h] k=1 r µ K u (k) u (k) + (A T s 1 A)[h, h] k=1 r µ xx η(µ, x)[h, h]. This completes the proof. Theorem 3... The family { η(µ, ) : µ > 0} is a strongly self-concordant family with the following parameters α 1 (µ) = µ, α (µ) = α 3 (µ) = 1, α 4 (µ) = r1 + Kr, α 5 (µ) = µ r µ. Proof. It is clear that condition (i) of Definition 3.. holds. Theorem 3..1 shows that condition (ii) is satisfied and Lemmas 3..4 and 3..3 show that condition (iii) is satisfied. 63

75 3.3 A class of logarithmic barrier algorithms for solving SSPs In 3. we have established that the parametric functions η(µ, ) constitute a strongly self-concordant family. Therefore, it is straightforward to develop primal path following interior point algorithms for solving SSP (3.1.3, 3.1.4). In this section we introduce a class of log barrier algorithms for solving this problem. This class is stated formally in Algorithm 1. Algorithm 1 The Decomposition Algorithm for Solving SSP Require: ɛ > 0, γ (0, 1), θ > 0, β > 0, x 0 F 0 and µ 0 > 0. x := x 0, µ := µ 0 while µ ɛ do for k = 1,,..., K do solve (3.1.9) to obtain (y (k), s (k), ν (k) ) choose a scaling element p C(s (k), ν (k) ) and compute ( s (k), ν (k) ) end for compute x := { xx η(µ, x)} 1 x η(µ, x) using (3.1.19) and (3.1.0) compute δ(µ, x) := 1 µ xt xx η(µ, x) x using (3.1.0) while δ > β do x := x + θ x for k = 1,,..., K do solve (3.1.9) to obtain (y (k), s (k), ν (k) ) choose a scaling element p C(s (k), ν (k) ) and compute ( s (k), ν (k) ) end for compute x := { xx η(µ, x)} 1 x η(µ, x) using (3.1.19) and (3.1.0) compute δ(µ, x) := end while µ := γµ apply inverse scaling to ( s (k), ν (k) ) end while 1 µ xt xx η(µ, x) x using (3.1.0) Our algorithm is initialized with a starting point x 0 F 0 and a starting value µ 0 > 0 for the barrier parameter µ, and is indexed by a parameter γ (0, 1). We use δ as a measure of the proximity of the current point x to the central path, and β as a threshold 64

76 for that measure. If the current x is too far away from the central path in the sense that δ > β, Newton s method is applied to find a point close to the central path. Then the value of µ is reduced by a factor γ and the whole precess is repeated until the value of µ is within the tolerance ɛ. By tracing the central path as µ approaches zero, a strictly feasible ɛ-optimal solution to (3.1.11) will be generated. 3.4 Complexity analysis In this section we present the complexity analysis for two variants of algorithms: shortstep algorithms and long-step algorithms, which are controlled by the input value of γ in Algorithm 1. As mentioned in [5], the first part of the following proposition follows directly from the definition of self concordance and is due to [30, Theorem.1.1]. The second part is results from the first part and is given in [48] without proof. Proposition For any µ > 0 and x F 0, we denote x := { xx η(µ, x)} 1 x η(µ, x) and δ := 1 µ xt xx η(µ, x) x. Then for δ < 1, τ [0, 1] and any h, h 1, h R m we have (i) (1 τδ) h T xx η(µ, x)h h T xx η(µ, x+τ x)h (1 τδ) h T xx η(µ, x)h, (ii) h T 1 ( xx η(µ, x + τ x) xx η(µ, x))h [(1 τδ) 1] h T 1 xx η(µ, x)h 1 h T xx η(µ, x)h. The following lemma is essentially Theorem..3 of [30] and describes the behavior of the Newton method as applied to η(µ, ). Lemma For any µ > 0 and x F 0, let x be the Newton direction defined by x := { xx η(µ, x)} 1 x η(µ, x); δ := δ(µ, x) = 1 µ xt xx η(µ, x) x, x + = x + x, x + be the Newton direction calculated at x +, and δ(µ, x + ) := 1 µ x+t xx η(µ, x + ) x +. Then the following relations hold: 65

77 (i) If δ < ( ) δ 3, then δ(µ, x + ) δ 1 δ. (ii) If δ 3, then η(µ, x) η(µ, x + θ x) µ(δ ln(1 + δ)), where θ = (1 + δ) Complexity for short-step algorithm In the short-step version of the algorithm, we decrease the barrier parameter by a factor γ := 1 σ/ r 1 + Kr, with σ < 0.1, in each iteration. The k th iteration of the shortstep algorithm is performed as follows: at the beginning of the iteration, we have µ (k 1) and x (k 1) on hand and x (k 1) is close to the central path, i.e., δ(µ (k 1), x (k 1) ) β. After the barrier parameter µ is reduced from µ (k 1) to µ k := γµ (k 1), we have that δ(µ k, x (k 1) ) β. Then a full Newton step with size θ = 1 is taken to produce a new point x k with δ(µ k, x k ) β. We now show that, in this class of algorithms, only one Newton step is sufficient for recentering after updating the parameter µ. For the purpose of proving this result, we present the following proposition which is a restatement of [30, Theorem 3.1.1]. Proposition Let χ κ ( η; µ, µ + ) := κ and µ + := γµ satisfies Then δ(µ +, x) < κ. χ κ ( η; µ, µ + ) 1 ( 1+r + ) r 1 +Kr κ δ(µ, x). κ lnγ 1. Assume that δ(µ, x) < Lemma Let µ + = γµ, where γ = 1 σ/ r 1 + Kr and σ 0.1, and let β = ( 3)/. If δ(µ, x) β, then δ(µ +, x) β. Proof. Let κ := β = 3. Since δ(µ, y) κ/, one can verify that for σ 0.1, µ + satisfies χ κ ( η; µ, µ + ) 1 δ(µ, y) 1. κ By Proposition 3.4., we have δ(µ +, y) κ. 66

78 By Lemmas 3.4.1(i) and 3.4., we conclude that we can reduce the parameter µ by the factor γ := 1 σ/ r 1 + Kr, σ < 0.1, at each iteration, and that only one Newton step is sufficient to restore proximity to the central path. So we have the following complexity result for short-step algorithms. Theorem Consider Algorithm 1 and let µ 0 be the initial barrier parameter, ɛ > 0 the stopping criterion, and β = ( 3)/. If the starting point x 0 is sufficiently close to the central path, i.e., δ(µ 0, x 0 ) β, then the short-step algorithm reduces the barrier parameter µ at a linear rate and terminates with at most O( r 1 + Kr ln(µ 0 /ɛ)) iterations Complexity for long-step algorithm In the long-step version of the algorithm, the barrier parameter is decreased by an arbitrary constant factor γ (0, 1). It has a potential for larger decrease on the objective function value, however, several damped Newton steps might be needed for restoring the proximity to the central path. The k th iteration of the long-step algorithms is performed as follows: at the beginning of the iteration we have a point x (k 1), which is sufficiently close to x(µ (k 1) ), where x(µ (k 1) ) is the solution to (3.1.11) for µ := µ (k 1). The barrier parameter is reduced from µ (k 1) to µ k := γµ (k 1), where γ (0, 1), and then a search is started to find a point x k that is sufficiently close to x(µ k ). The long-step algorithm generates a finite sequence consisting of N points in F 0, and we finally take x k to be equal to the last point of this sequence. We want to determine an upper bound on N, the number of Newton iterations that are needed to find the point x k. For µ > 0 and x F 0, we define the function φ(µ, x) := η(µ, x(µ)) η(µ, x), which represents the difference between the objective value η(µ k, x (k) ) at the end of k th iteration and the minimum objective value η(µ k, x(µ k 1 )) at the beginning of k th iteration. Then 67

79 our task is to find an upper bound on φ(µ +, x). To do so, we first give upper bounds on φ(µ, x) and φ (µ, x) respectively. We have the following lemma. Lemma Let µ > 0 and x F 0, we denote x := x(µ) x and define δ := δ(µ, x) = 1 µ x T xx η(µ, x) x. For any µ > 0 and x F 0, if δ < 1, then the following inequalities hold: ( ) δ φ(µ, x) µ + ln(1 δ), (3.4.1) 1 δ φ (µ, x) r 1 + Kr ln(1 δ). (3.4.) Proof. φ(µ, x) := η(µ, x(µ)) η(µ, x) := 1 0 x η(µ, x + τ x) T xdτ. Since x(µ) is the optimal solution, we have x η(µ, x(µ)) = 0 (3.4.3) Hence, with the aid of Proposition 3.4.1(i), we get 1 τ φ(µ, x) = x T xx η(µ, x + α x) x dαdτ x T xx η(µ, x) x dαdτ 0 τ (1 αˆδ) 1 1 µ δ = 0 τ (1 α δ) dαdτ ( ) δ = µ + ln(1 δ), 1 δ which establishes (3.4.1). Now, for any µ > 0, by applying the chain rule, using (3.4.3), and applying the 68

80 Mean-Value Theorem, we get φ (µ, x) = η (µ, x(µ)) η (µ, x) + x η(µ, x(µ)) T x (µ) = η (µ, x(µ)) η (µ, x) = x η(µ, x + ϖ x) T x, (3.4.4) for some ϖ (0, 1). Hence φ (µ, x) = x η (µ, x + τ x) T x dτ x T xx η(µ, x + τ x) x x η (µ, x + τ x) T [ xx η(µ, x + τ x)] 1 x η (µ, x + τ x) dτ. Then, by using (3..5) and Proposition 3.4.1(i), we obtain 1 x T xx η(µ, x) x φ r 1 + Kr (µ, x) 0 1 τ δ µ 1 δ µ r 1 + Kr = 0 1 τ δ dτ µ = 1 δ r 1 + Kr dτ 1 τ δ = r 1 + Kr ln(1 δ), 0 dτ which establishes (3.4.). Lemma Let µ > 0 and x F 0 be such that δ < 1, where δ is as defined in Lemma Let µ + := γµ with γ (0, 1). Then η(µ +, x(µ + )) η(µ +, x) O(r 1 + Kr )µ +. Proof. Differentiating (3.4.4) with respect to µ, we get φ (µ, x) = η (µ, x(µ)) η (µ, x) + x η (µ, x(µ)) T x (µ). (3.4.5) 69

81 We will work on the right-hand side of (3.4.5) by verifying that the second term η (µ, x) is nonnegative and then bounding the first and the last term. From the definition of η(x, µ), we have that η (µ, x) = K ρ k (µ, x) with respect to µ and using (3..4) and (3.1.10) gives k=1 ρ k (µ, x). Differentiating ρ k (µ, x) = d(k)t y (k) + ln det s (k) + µ s (k) 1 s (k) = ln det s (k) + d (k)t ( R (k) 1 W (k) T s (k) 1 ) + µ s (k) 1 ( W (k) R (k) 1 W (k) T s (k) 1 ) = ln det s (k) + d (k)t ( R (k) 1 W (k) T s (k) 1 ) + ν (k) ( W (k) R (k) 1 W (k) T s (k) 1 ) = ln det s (k) + ( d (k) + W (k)t ν (k) ) T (R (k) 1 W (k) T s (k) 1 ) = ln det s (k). Observe that (I P (k) ) = I P (k) (remember P (k) = P (k) ). Therefore, by differentiating ρ k (µ, x) with respect to µ and using (3..4) and (3.1.18), we obtain ρ k (µ, x) = s(k) 1 s (k) = s (k) 1 ( W (k) R (k) 1 W (k) T s (k) 1 ) (I = s (k) 1 ( s ) s ) (k)1/ P (k) (k)1/ s (k) 1 = ( I P (k)) s (k)1/ s (k) 1 0. Thus, for µ > 0 and x F 0, η (µ, x) 0, and hence η(µ, x) is a convex function of µ. In addition, using (3.1.18), we also have ( ρ k (µ, x) = s(k) 1 s ) (k) s (k) 1 s (k)1/ P s (k) (k)1/ s (k) 1 s (k)1/ P s (k) (k)1/ = s (k) 1 s (k) s (k) 1 s (k) 1 = s (k) 1 s (k) s (k) 1 P (k) s (k)1/ s (k) 1 s (k) 1 s (k) 1 s (k) s (k) 1 = s (k) 1 s (k) 70

82 = µ 1 ν (k) s (k) = µ 1 trace(e ) = r µ. Hence By differentiating (3.4.3) with respect to µ, we get η (µ, x(µ)) Kr µ. (3.4.6) x η (µ, x(µ)) + xx η(µ, x(µ))x (µ) = 0, or equivalently x (µ) = { xx η(µ, x(µ))} 1 x η (µ, x(µ)). Hence, by using (3..5), we have x η (µ, x(µ)) T x (µ) = x η (µ, x(µ)) T { xx η(µ, x(µ))} 1 x η (µ, x(µ)) 1 µ (r 1 + Kr ). (3.4.7) By combining (3.4.6) and (3.4.7), and using the fact that η (µ, x) 0, we obtain φ (µ, x(µ)) r 1 + Kr. (3.4.8) µ Applying the Mean-Value Theorem and using Lemma and (3.4.8) gives µ + τ φ(µ +, x) = φ(µ, x) + φ (µ, x)(µ + µ) + φ (υ, x) dυdτ ( ) µ µ δ µ + ln(1 δ) r 1 + Kr ln(1 1 δ δ) (µ µ + ) +(r 1 + Kr ) µ + τ µ µ υ 1 dυdτ 71

83 ( ) δ = µ + ln(1 δ) r 1 + Kr ln(1 1 δ δ) (µ µ + ) +(r 1 + Kr ) (µ µ + ln τ ( ) µ δ µ + ln(1 δ) r 1 + Kr ln(1 1 δ δ) (µ µ + ) +(r 1 + Kr ) (µ µ + ) lnγ 1 (recall γ 1 = µ + /µ τ/µ). Since δ and γ are constants, the lemma is established. Notice that the previous lemma requires δ < 1. However, evaluating δ explicitly may not be possible. In the next lemma we will see that δ is actually proportional to δ, which can be evaluated. Lemma For any µ > 0 and x F 0, let x := { xx η(µ, x)} 1 x η(µ, x) and x := x x(µ). We denote δ := δ(µ, x) = 1µ xt xx η(µ, x) x and δ := δ(µ, x) = 1 µ x T xx η(µ, x) x. If δ < 1/6, then 3 δ δ δ. Proof. Let H := xx η(µ, x) and g := x η(µ, x), and denote ḡ := g + H x. From (3.4.3), we have x = H 1 g. This gives us x = x + H 1 ḡ. We then have δ = 1 µ ( x + H 1 ḡ) T xx η(µ, x)( x + H 1 ḡ) 1 µ xt xx η(µ, x) x 1 µ (H 1 ḡ) T xx η(µ, x) (H }{{} 1 ḡ) H δ + 1 µ ḡt H 1 ḡ, (3.4.9) where we used the triangle inequality to obtain the last inequality. Note that, by (3.4.3), 7

84 we have x η(µ, x x) = 0. Applying the Mean-Value Theorem gives h T ḡ = h T (H x + g) = h T ( xx η(µ, x) x + x η(µ, x)) = h T ( xx η(µ, x) x ( x η(µ, x x) x η(µ, x))) = h T ( xx η(µ, x) xx η(µ, x (1 ϖ) x)) x, for some ϖ (0, 1). Now in view of Proposition 3.4.1(ii) we have 1 h T ḡ = h T ( xx η(µ, x) xx η(µ, x (1 τ) x)) x dτ 0 x T H x 1 h T Hh ((1 (1 τ) δ) 1)dτ ( ) 0 δ = 1 δ x T H x h T Hh ( µ ) δ ht = 1 δ Hh. that It can be verified that ḡ T H 1 ḡ = max{h T Hh h T ḡ : h R m 1 }. It then follows ḡ T H 1 ḡ max { h T Hh + ( ) m1} µ δ ht Hh : h R 1 δ = µ δ 4. (3.4.10) (1 δ) From (3.4.9) and (3.4.10), we obtain δ δ+ δ, or equivalently, δ (1+δ) δ+δ 0. 1 δ Therefore, the condition δ 1/6 implies that 1 δ 7 δ + 1 0, which in turn gives that δ 1/3 = δ. From (3.4.9), by exchanging positions of x and x and following the above steps, we get δ δ + δ 1 δ δ, or equivalently, δ δ 1 δ = 3 δ. Thus, the condition δ < 1/6 implies that 3 δ δ δ. This completes the proof. 73

85 Combining Lemmas 3.4.1(ii), 3.4.4, and 3.4.5, we have the following complexity result for long-step algorithms. Theorem Consider Algorithm 1 and let µ 0 be the initial barrier parameter, ɛ > 0 the stopping criterion, and β = 1/6. If the starting point x 0 is sufficiently close to the central path, i.e., δ(µ 0, x 0 ) β, then the long-step algorithm reduces the barrier parameter µ at a linear rate and terminates with at most O((r 1 + Kr ) ln(µ 0 /ɛ)) iterations. 74

86 Chapter 4 A Class of Polynomial Volumetric Barrier Decomposition Algorithms for Stochastic Symmetric Programming Ariyawansa and Zhu [10] have derived a class of polynomial time decomposition-based algorithms for solving the SSDP problem (Problem 4) based on a volumetric barrier analogous to work of Mehrotra and Özevin [5] by utilizing the work of Anstreicher [7] for DSDP. In this chapter, we extend Ariyawansa and Zhu s work [10] to the case of SSPs by deriving a class of volumetric barrier decomposition algorithms for the general SSP problem and establishing polynomial complexity of certain members of the class of algorithms. The results of this chapter have been submitted for publication [4]. 75

87 4.1 The volumetric barrier problem for SSPs In this section we formulate appropriate volumetric barrier function for the SSP problem (with finite event space Ω) and obtain expressions for the derivatives required in the rest of the paper. Our procedure closely follows that in 3 of [10], although our setting is much more general Formulation and assumptions We now examine (.1.3,.1.4) when the event space Ω is finite. Let {(T (k), W (k), h (k), d (k) ) : k = 1,,..., K} be the set of the possible values of the random variables ( T (ω), W (ω), h(ω), d(ω) ) and let p k := P ( T (ω), W (ω), h(ω), d(ω) ) = ( T (k), W (k), h (k), d (k)) be the associated probability for k = 1,,..., K. Then Problem (.1.3,.1.4) becomes K max c T x + p k Q (k) (x) k=1 s.t. Ax KJ1 b, (4.1.1) where, for k = 1,,..., K, Q (k) (x) is the maximum of the problem max d (k)t y (k) s.t. W (k) y (k) KJ h (k) T (k) x, (4.1.) where x R m 1 is the first-stage decision variable, and y (k) R m is the second-stage variable for k = 1,,..., K. We notice that the constraints in (4.1.1, 4.1.) are negative symmetric while the common practice in the DSP literature is to use positive symmetric constraints. So for convenience we redefine d (k) as d (k) := p k d (k) for k = 1,,..., K, and rewrite Problem (4.1.1, 76

88 4.1.) as K min c T x + Q (k) (x) k=1 s.t. Ax b KJ1 0, (4.1.3) where, for k = 1,,..., K, Q (k) (x) is the maximum of the problem min d (k)t y (k) s.t. W (k) y (k) + T (k) x h (k) KJ 0. (4.1.4) In the rest of this paper our attention will be on Problem (4.1.3, 4.1.4), and from now on when we use the acronym SSP in this paper we mean Problem (4.1.3, 4.1.4) The volumetric barrier problem for SSPs In this section we formulate a volumetric barrier for SSPs and obtain expressions for the derivatives required in our subsequent development. In order to define the volumetric barrier problem for the SSP (4.1.3, 4.1.4), we need to make some assumptions. First we define F 1 := { x : s 1 (x) := Ax b KJ1 0 } ; F (k) (x) := { y (k) : s (k) (x, y (k) ) := W (k) y (k) + T (k) x h (k) KJ 0 } for k = 1,,..., K; F := { x: F (k) (x), k = 1,,..., K } ; F 0 := F 1 F. Now we make Assumption The matrix A and every matrix T (k) have full column rank. Assumption The set F 0 is nonempty. Assumption For each x F 0 and for k = 1,,..., K, Problem (4.1.4) has a nonempty isolated compact set of minimizers. 77

89 Assumption is for convenience. Under Assumption 4.1., the set F 1 is nonempty. The logarithmic barrier [30] for F 1 is the function l 1 : F 1 R defined by l 1 (x) := ln det(s 1 (x)), x F 1, and the volumetric barrier [30, 40] for F 1 is the function v 1 : F 1 R defined by v 1 (x) := 1 ln det( xxl 1 (x)), x F 1. Also under Assumption 4.1., F is nonempty and for x F, F (k) (x) is nonempty for k = 1,,..., K. The logarithmic barrier [30] for F (k) (x) is the function l (k) : F F (k) (x) R defined by l (k) (x, y (k) ) := ln det(s (k) (x, y (k) )), y (k) F (k) (x), x F, and the volumetric barrier [30, 40] for F (k) (x) is the function v (k) : F F (k) (x) R defined by v (k) (x, y (k) ) := 1 ln det( y (k) y (k) l (k) (x, y (k) )), y (k) F (k) (x), x F. Now, we define the volumetric barrier problem for the SSP (4.1.3, 4.1.4) as K min η(µ, x) : = c T x + ρ k (µ, x) + µc 1 v 1 (x), (4.1.5) where for k = 1,,..., K and x F 0, ρ k (µ, x) is the minimum of the problem k=1 min d (k)t y (k) + µc v (k) (x, y (k) ). (4.1.6) 78

90 Here c 1 := 5 n 1 and c := 450n 3 are constants, and µ > 0 is the barrier parameter. Now, we will show that (4.1.6) has a unique minimizer for each x F 0 and for k = 1,,..., K. For this purpose, we present the following theorem. Theorem (Fiacco and Mccormick [19, Theorem 8]). Consider the inequality constrained problem min f(x) s.t. g i (x) 0, i = 1,,..., m, (4.1.7) where the functions f, g 1,..., g m : R n R are continuous. Let I be a scalar-valued function of x with the following two properties: I(x) is continuous in the region R 0 := {x : g i (x) > 0, i = 1,,..., m}, which is assumed to be nonempty; if {x k } is any infinite sequence of points in R 0 converging to x B such that g i (x B ) = 0 for at least one i, then lim k I(x k ) = +. Let τ be a scalar-valued function of the single variable s with the following two properties: if s 1 > s > 0, then τ(s 1 ) > τ(s ) > 0; if {s k } is an infinite sequence of points such that lim k s k = 0, then lim k τ(s k ) = 0. Let U : R 0 R + R be defined by U(x, s) := f(x)+τ(s)i(x). If (4.1.7) has a nonempty, isolated compact set of local minimizers and {s k } is a strictly decreasing infinite sequence, then the unconstrained local minimizers of U(, s k ) exist for s k small. Lemma If Assumptions 4.1. and hold, then for each x F 0 and k = 1,,..., K, the Problem (4.1.6) has a unique minimizer for µ small. Proof. For any x F 0, v (k) (x, y (k) ) is defined on the nonempty set F (k) (x). By Theorem 1.3.1, there exist real numbers λ (k) 1, λ (k),, λ (k) r and a Jordan frame c (k) 1, c (k),, c (k) r such that s (k) (x, y (k) ) = λ (k) 1 c (k) 1 + λ (k) c (k) + + λ (k) r c (k) r, and λ (k) 1, λ (k),, λ (k) r eigenvalues of s (k) (x, y (k) ). Moreover, λ (k) j for j = 1,,..., r. Then λ (k) j are the can be viewed as a function of y (k) F (k) (x) is continuous for j = 1,,..., r and hence the constraint s (k) (x, y (k) ) KJ 0 can be replaced by the constraints: λ (k) j (y (k) ) > 0, j = 1,,..., r. So (4.1.4) can be rewritten in the form of (4.1.7). Therefore, by Theorem 4.1.1, local 79

91 minimizers of (4.1.6) exist for each x F 0 and k = 1,,... K for µ small. The uniqueness of the minimizer follows from the fact that v (k) is strictly convex. In view of Lemma 4.1.1, Problem (4.1.5) is well-defined, and its feasible set is F Computation of x η(µ, x) and xxη(µ, x) In order to compute the derivatives of η we need to determine the derivatives of ρ k, k = 1,,..., K, which in turn require the derivatives of v (k) and l (k) for k = 1,,..., K. We will drop the superscript (k) when it does not lead to confusion. Note that x s 1 (x) = A, x s (x, y) = T, and y s (x, y) = W. Hence x l 1 (x) = ( x s 1 ) T s 1 1 = A T s 1 1, and y l (x, y) = ( y s ) T s 1 = W T s 1. This implies that and xxl 1 (x) = A T s 1 1 x s 1 = A T s 1 1 A, H := yyl (x, y) = W T s 1 x s = W T s 1 W. We need the matrix calculus result for our computation. Proposition Let X R n n be nonsingular. Then X ij X 1 = X 1 e i e T j X 1, for i, j = 1,,..., n. Here, e i is the i th vector in the standard basis for R n. 80

92 To compute the first partial derivatives of v (x, y), we start by observing that s 1 = x k ij = ij = ij = ij s 1 s ij s ij s 1 x k s ij x k s ij s 1 e i e T j s 1 s, s 1 e i e T j s 1 s, t k ij s x k ij (4.1.8) = s 1 s, t k s 1. Then, for i = 1,,..., m 1, we have v (x, y) = 1 ln det H x i x i = 1 H 1 H x i = 1 H 1 (W T s 1 W ) x i = 1 H 1 W T ( x i s 1 ) W = H 1 W T s 1 s, t i s 1 W = W T H 1 W s 1 s, t i s 1 = W T H 1 W s 1/ s 1/ s 1/ s, t i s 1/ By defining P := s 1/ W T H 1 W s 1/ which acts as the orthogonal projection onto the range of v (x, y) = W T H 1 W s 1 x i for i = 1,,..., m 1. s, t i s 1, s 1/ W, we get = P s 1/ s, t i s 1/, (4.1.9) 81

93 Similarly, we have v (x, y) = W T H 1 W s 1 y i s, w i s 1 = P s 1/ s, w i s 1/, for i = 1,,..., m. To compute the second partial derivatives of v (x, y), we start by observing that H 1 = x k ij = ij = ij = ij H 1 H ij H ij x k (H 1 e i e T j H 1 ) [ W T s 1 x k [ (H 1 e i e T j H 1 ) ( W T s 1 x k W ] ij ) W ] [ (H 1 e i e T j H 1 ) W T s 1 s, t k s 1 W ij ] ij (4.1.10) = H 1 W T s 1 s, t k s 1 W H 1, and, using (4.1.8) and Lemma 1.3., that ( s 1 x j s, t i s 1 ( ) = s 1 x j + s 1 = s 1 s 1 = s 1 + s 1 ) s, t i s 1 s, t i ( s, t j s 1 s 1 x j s, t i s 1 s, t j s 1 By combining (4.1.9), (4.1.10) and (4.1.11), we get ( + s 1 s, t i x j ) s, t i s 1 s, t j s 1 s, t i s 1 ( t j, t i s, t i s 1 + s 1 s, t j ) s 1 ) s 1 tj, t i s 1. (4.1.11) xxv (y, x) = x x v (x, y) = Q xx + R xx T xx, (4.1.1) 8

94 where Q xx i,j = (W H 1 W T ) s 1 Ri,j xx = (W H 1 W T ) s 1 Ti,j xx = (W H 1 W T ) s 1 s, t j s 1 ( s, t i s 1 s, t j s 1 s, t i s 1, s, t j t j, t i ) s 1 W H 1 W s 1, s, t i s 1. By following similar steps as above, we have that yyv (y, x) = y y v (x, y) = Q yy + R yy T yy, where Q yy i,j = (W H 1 W T ) s 1 R yy i,j = (W H 1 W T ) s 1 T yy i,j = (W H 1 W T ) s 1 s, w j s 1 ( s, w i s 1 s, w j s 1 s, w i s 1, s, w j w j, w i ) s 1 W H 1 W s 1 s, w i s 1, ; where xyv (x, y) = y x v (x, y) = Q xy + R xy T xy, Q xy i,j = (W H 1 W T ) s 1 R xy i,j = (W H 1 W T ) s 1 T xy i,j = (W H 1 W T ) s 1 s, w j s 1 ( s, t i s 1 s, w j s 1 s, t i s 1, ) s 1 s, w j w j, t i W H 1 W s 1 s, t i s 1, ; and yxv (y, x) = x y v (x, y) = Q yx + R yx T yx, where Q yy i,j = (W H 1 W T ) s 1 s, t j s 1 s, w i s 1, 83

95 R yy i,j = (W H 1 W T ) s 1 T yx i,j = (W H 1 W T ) s 1 ( s, w i s 1 s, t j s 1 Now, define ϕ k : R + F 0 F (k) (x) R by ) s 1 s, t j t j, w i, s, w i s 1. W H 1 W s 1 ϕ k (µ, x, y) := d T y + µc v (x, y). By (4.1.6) we then have and ρ k (µ, x) = min ϕ k (µ, x, y) y F (k) (x) ρ k (µ, x) = ϕ k (µ, x, y) y=ȳ = ϕ k (µ, x, ȳ), where ȳ is the minimizer of (4.1.6). Observe that ȳ is a function of x and is defined by y ϕ k (µ, x, y) y=ȳ = 0. (4.1.13) Note that, by (4.1.13), we have ȳv (x, ȳ) = y v (x, y) y=ȳ = 1 µc d. This implies that ȳxv (x, ȳ) = x ȳv (x, ȳ) = 0 and 3 ȳxxv (x, ȳ) = xx ȳv (x, ȳ) = 0. Now we are ready to calculate the first and second order derivatives of ρ k with respect to x. We have x ρ k (µ, x) = [ x ϕ k (µ, x, y) + y ϕ k (µ, x, y) x y] y=ȳ = x ϕ k (µ, x, y) y=ȳ + y ϕ k (µ, x, y) y=ȳ x y y=ȳ = x ϕ k (µ, x, y) y=ȳ = µc x v (x, y) y=ȳ 84

96 = µc x v (x, ȳ), xxρ k (µ, x) = µ x { x v (x, ȳ)} = µc { xxv (x, ȳ) + ȳxv (x, ȳ) [ x y] y=ȳ } = µc xxv (x, ȳ), 3 xxxρ k (µ, x) = µ x { xxv (x, ȳ} = µc { 3 xxxv (x, ȳ) + 3 ȳxxv (x, ȳ) [ x y] y=ȳ } = µc 3 xxxv (x, ȳ). In summary we have x ρ k (µ, x) = µc x v (k) (x, ȳ (k) ), xxρ k (µ, x) = µc xxv (k) (x, ȳ (k) ), 3 xxxρ k (µ, x) = µc 3 xxxv (k) (x, ȳ (k) ), (4.1.14) and x η(µ, x) = c + µc 1 x v 1 (x) + xxη(µ, x) = µc 1 xxv 1 (x) + K k=1 K k=1 µc x v (k) (x, ȳ (k) ), µc xxv (k) (x, ȳ (k) ), (4.1.15) where x v (k) (x, ȳ (k) ), xxv (k) (x, ȳ (k) ), and 3 xxxv (k) (x, ȳ (k) ) are calculated in (4.1.9), (4.1.1), and (4..4) respectively. 4. Self-concordance properties of the volumetric barrier recourse In this section we prove that the recourse function with volumetric barrier is a strongly self-concordant function leading to a strongly self-concordant family with appropriate 85

97 parameters. Establishing this allows us to develop volumetric barrier polynomial time path following interior point methods for solving SSPs. We need the following proposition for proving some results. Proposition Let A, B, C R n n. Then 1. A, B 0 implies that A B 0;. if A 0 and B C, then A B A C Self-Concordance of η(µ, ) This subsection is devoted to show that η(µ, ) is a strongly self-concordant barrier on F 0 (see Definition 3..1). Throughout this subsection, with respect to h R m 1, we define m 1 b := b(h) := h i t i and b := i=1 Our proof relies on the following lemmas. s 1/ s, b s 1/ e. Lemma Let (x, y) be such that s (x, y) KJ 0. Then we have 0 Q xx xxv (x, y). (4..1) Proof. Let h R m 1, h 0. We have h T Q xx h = i,j Q xx ij h i h j = (W H 1 W T ) i,j = (W H 1 W T ) s 1 = (W H 1 W T ) s 1 = (W H 1 W T ) s 1 ( s 1 ( i,j s, t j s 1 s, h j t j s 1 [ s, h j t j j s, b s 1 s, t i s 1 ] s 1 s, b s 1 hi h j ) ) s 1 s, h i t i [ s, h i t i i ] s 1 86

98 = ( s 1/ W H 1 W T s 1/ ) ( s 1/ s, b s 1/ ) Similarly we have = P b. ( h T R xx h = (W H 1 W T ) s 1 s, b s 1 ( ) = s 1/ W H 1 W T s 1/ ( ( ) s, b = P b, s 1/ s 1/ s, b s 1 s 1/ b s 1 s 1/ ) b s 1 ) and h T T xx h = (W H 1 W T ) s 1 s, b s 1 W H 1 W s 1 s, b s 1 = P s, b P s, b s 1/ = P b P b. s 1/ s 1/ s 1/ Using Proposition 4..1 and observing that P 0 and b 0, we conclude that Q xx 0. Since P is a projection, we have that I P 0 and therefore b P b b = 1 ( b + b). (4..) This implies that P b P b 1 P ( b + b), which is exactly h T T xx h 1 ht (Q xx + R xx )h. Since h is arbitrary, we have shown that T xx 1 (Qxx + R xx ), which together with Q xx 0 establishes the result. 87

99 Lemma 4... For any h R m 1, and (x, y) be such that s (x, y) KJ 0. Then b r 3/ (h T Q xx h) 1/. (4..3) Proof. By Theorem 1.3.1, there exist real numbers λ 1, λ,, λ r and a Jordan frame c 1, c,, c r such that b = λ 1 c 1 +λ c + +λ r c r, and λ 1, λ,, λ r are the eigenvalues of b. Without loss of generality (scaling h as needed, and re-ordering indices), we may assume that 1 = λ 1 λ... λ r. In view of Lemma 1.3.3, the matrix b has a full set of orthonormal eigenvectors c ij with corresponding eigenvalues (1/)(λ i + λ j), for 1 i j r. It follows that [ h T Q xx h = 1 r ] P (λ i + λ j) c ij c T ij = 1 i,j=1 r i,j=1 (λ i + λ j) c T ijp c ij. Recall that P is a projection onto an m -dimensional space. So, we can write P as m P = u l u T l, l=1 where u 1, u,..., u m are the orthonormal eigenvectors of P corresponding to the nonzero eigenvalues of P. Consider u k for some k, we have u k = r i,j=1 for some constants α ij, for i, j = 1,,..., r, and 1 = u k = r i,j=1 α ij c ij α ij c ij, r i,j=1 α ij c ij = r i,j=1 α ij. 88

100 This means that there exist i k, j k such that α ik j k 1. r Thus ( m ) h T Q xx h = 1 (λ i + λ j)c T ij u l u T l i,j l=1 = 1 m (λ i + λ j) c T iju l u T l c ij i,j l=1 = 1 m (λ i + λ j) u T l c ij i,j l=1 1 (λ i + λ j) u T k c ij i,j = 1 (λ i + λ j) α ik j k i,j 1 (λ i + λ j) 1 r 4 i,j 1 (λ r 4 i + λ j) j 1 λ r 4 i j = 1 b r 4 j = 1 b r. 3 c ij The result is established. We will next compute the third partial derivative of v (x, y) with respect to x. To start, let (x, y) be such that s (x, y) KJ 0, and h R n. We have ( ) h T Q xx h = W H 1 W T s 1 x i x i + (W H 1 W T ) ( s 1 x j s, b s 1 s, b s 1 s, b s 1 s, b ) s 1 s, b ) s 1 = W H 1 W T s 1 s, t i s 1 W H 1 W T ( s 1 s, b s 1 + (W H 1 W T ) ( s 1 x i s, b s 1 ), 89

101 where ( s 1 x i s, b s 1 s, b s 1 ) = s 1 + s, b s 1 + s, b s 1 + s 1 ( s, t i s 1 s, t i s 1 s, b s 1 s, b s, b s 1 ( t i, b s 1 + s, b s 1 s, b ) s 1 ti, b. s, b s, t i ) s 1 We conclude that the first directional derivative of h T Q xx h with respect to x, in the direction h, is given by x h T Q xx h [h] = m 1 h i i=1 h T Q xx h x i = P b P b 3P b3 P b, b. By following similar steps as above, we obtain x h T R xx h [h] = P b P b 4P b, b, x h T T xx h [h] = 4P b P b P b 4P b P b P b P b. By combining the previous results, we get 3 xxxv (x, y) [h, h, h] = 1P b P b 6P b3 6P b, b +6P b P b 8P b P b P b. (4..4) We need the following lemma which bounds 3 xxxv (x, y) [h, h, h]. Lemma For any h R m 1 and (x, y) be such that s (x, y) KJ 0. Then 3 xxxv (x, y) [h, h, h] 30 b h T Q xx h. (4..5) 90

102 Proof. Note that b b 1 = ( b3 + b, b ). Then (4..4) can be rewritten as 3 xxxv (x, y) [h, h, h] = P b P ( 1 b + 6 b 8 b P b ) 1P b b. From (4..) we have (4..6) 1 b + 6 b 8 b P b 8 b + b. By observing that that b b b, we obtain 6 b 1 b + 6 b 8 b P b 18 b. (4..7) Let λ 1, λ,..., λ r be the eigenvalues of B. Then, for i, j = 1,,..., r, the eigenvalues of b are of the form (1/)(λi + λ j ) (see Lemma 1.3.3). We then have b I b b I and hence b P P b P b P. (4..8) Using (4..7), (4..8), and the face that b 0, we get P b P ( 1 b + 6 b 8 b P b ) 18 b P b. (4..9) In addition, since the elements b and b have the same eigenvectors, the matrices b 91

103 and b also have the same eigenvectors. This implies that b b b b b b. Thus we have that P b b b P b. (4..10) The result follows from (4..6), (4..9) and (4..10). We can now state the proof of Theorem Theorem For any fixed µ > 0, ρ k (µ, ) is µ-self-concordant on F 0, for k = 1,,..., K. Proof. By combining the results of (4..1), (4..3) and (4..5), we get 3 xxxv (x, ȳ) [h, h, h] 30 n 3/ (h T xxv (x, ȳ)h) 3/, which combined with (4.1.14) implies that 3 xxxρ k (y) [h, h, h] 30 µc r 3/ ( xxv (x, ȳ) [h, h]) 3/ = µ 1/ (c µ xxv (x, y) [h, h]) 3/ = µ 1/ ( xxρ k (x) [h, h]) 3/. The theorem is established. Corollary For any fixed µ > 0, η(µ, ) is a µ-self-concordant function on F 0. Proof. It is easy to verify that µc 1 v 1 is µ-self-concordant on F 1. The corollary follows from [30, Proposition.1.1]. 9

104 4.. Parameters of the self-concordant family In this subsection, we show that the family of functions {η(µ, ) : µ > 0} is a strongly self-concordant family with appropriate parameters (see definition 3..). The proof of self-concordancy of the family {η(µ, ) : µ > 0} relies on the following two lemmas. Lemma For any µ > 0 and x F 0, the following inequality holds: { xxη(µ, x)} [h, h] 1 µ xxη(µ, x)[h, h], h R m 1. Proof. Differentiating xxη(µ, x) in (4.1.15) with respect to µ, we obtain { xxη(µ, x)} = xxv 1 (x) + = xxv 1 (x) + K k=1 K k=1 { xxv (k) (x, ȳ (k) ) + µ 3 xxȳv (k) (x, ȳ (k) ) (ȳ (k) ) } xxv (k) (x, ȳ (k) ) = 1 µ xxη(µ, x). The result immediately follows by observing that xxη(µ, x) 0, and therefore, for any h R n, we have that 1 µ xxη(µ, x)[h, h] 0. For fixed (x, ȳ) with s (x, ȳ) KJ 0, let t i = s 1/ s, t i s 1/ e and w j = s 1/ s, w j s 1/ e, for i = 1,,..., m 1 and j = 1,,..., m. We can apply a Gram-Schmidt procedure to { w i } and obtain { u i } with u i = 1 for all i and u i u j = 0, i j. Then the linear span of { u i, i = 1,,..., m } is equal to the span of { w i, i = 1,,..., m }. Let Ū = [ū 1; ū ;... ; ū m ] R n m and û = m k=1 ūi. It follows that P = UU T. We 93

105 then have v (x, ȳ) x i = P t i = UU T t i and = trace(u t i U) m = ū k ( t i ū k ) k=1 m = t i ū k k=1 m ū k = t i k=1 = t i û, Q xx i,j = P t i t j = trace(u t i t j U) m = (ū k ( t i ( t j ū k ))) k=1 m = t i (ū k t j ) k=1 (( m ) ) = t i t j k=1 = t i (û t j ). ū k (4..11) (4..1) Lemma Let (x, ȳ) be such that s (x, ȳ) KJ 0. Then x v (x, ȳ) T { xxv (x, ȳ)} 1 x v (x, ȳ) m. (4..13) Proof. Let T = [ t 1 ; t ;... ; t m1 ] R n m 1. From (4..1) we have that Q xx i,j = t i (û t j ) = t i û t j, 94

106 and hence Q xx = T T û T. From (4..11), we also have x v (x, ȳ) T = T T û. Thus x v (x, ȳ) T (Q xx ) 1 x v (x, ȳ) = û T ( T T û T ) 1 T Tû = û 1/ û 1/ T ( T T û T ) 1 T T û 1/ û 1/ û 1/ û 1/ = trace(û) = m, where the last equality follows from the fact that û = m k=1 u k, and that trace(u k ) = u k u k = 1 for each k. In addition, Q xx xxv (x, ȳ) implies { xxv (x, ȳ)} 1 (Q xx ) 1. This completes the proof. Lemma For any µ > 0 and x F 0, we have x η (µ, x) T [h] (m 1 c 1 + m c )(1 + K) µ xxη(µ, x)[h, h], h R m 1. Proof. Differentiating x η(µ, x) in (4.1.15) with respect to µ, we obtain x η (µ, x) = c 1 x v 1 (x) + = c 1 x v 1 (x) + K k=1 K k=1 c 1 { x v (k) (x, ȳ (k) ) + µ xȳv (k) (x, ȳ (k) ) (ȳ (k) ) } c x v (k) (x, ȳ (k) ). 95

107 In Lemma 4..13, we have shown that x v (k) (x, ȳ (k) ) T { xxv (k) (x, ȳ (k) )} 1 x v (k) (x, ȳ (k) ) m, which is equivalent to x v (k) (x, ȳ (k) )[h] m xxv (k) (x, ȳ (k) )[h, h], h R m. (4..14) Similarly, we can show that x v 1 (x) T { xxv 1 (x)} 1 x v 1 (x) m 1, which is equivalent to x v 1 (x)[h] m 1 xxv 1 (x)[h, h], h R m 1. (4..15) Then, using (4..14) and (4..15), we have that for all h R m 1 ( ) x η K (µ, x)[h] = c 1 x v 1 (x) + c x v (k) (x, ȳ (k) ) [h] k=1 K c 1 x v 1 (x)[h] + c x v (k) (x, ȳ (k) )[h] k=1 m 1 c 1 xxv 1 (x)[h, h] + K k=1 m c xxv (k) (x, ȳ (k) )[h, h] K (m 1 c 1 )c 1 xxv 1 (x)[h, h] + k=1 ( (m1 c 1 + m c )(1 + K) c 1 xxv 1 (x)[h, h] + = (m 1 c 1 + m c )(1 + K) µ xxη(µ, x)[h, h]. (m c )c xxv (k) (x, ȳ (k) )[h, h] K k=1 c xxv (k) (x, ȳ (k) )[h, h] ) 96

108 The result is established. Theorem 4... The family {η(µ, ) : µ > 0} is a strongly self-concordant family with the following parameters α 1 (µ) = µ, α (µ) = α 3 (µ) = 1, α 4 (µ) = (1 + K)(m1 c 1 + m c ), α 5 (µ) = 1 µ µ. Proof. It is direct to see that condition (i) of Definition 3.. holds. Corollary 4..1 shows that condition (ii) is satisfied and Lemmas 4..4 and 4..6 show that condition (iii) is satisfied. 4.3 A class of volumetric barrier algorithms for solving SSPs In 5 we have established that the parametric functions η(µ, ) is a strongly self-concordant family. Therefore, it becomes straightforward to develop primal path following interior point algorithms for solving SSP (4.1.3, 4.1.4). In this section we introduce a class of volumetric barrier algorithms for solving this problem. This class is stated formally in Algorithm. Our algorithm is initialized with a starting point x 0 F 0 and a starting value µ 0 > 0 for the barrier parameter µ, and is indexed by a parameter γ (0, 1). We use δ as a measure of the proximity of the current point x to the central path, and β as a threshold for that measure. If the current x is too far away from the central path in the sense that δ > β, Newton s method is applied to find a point close to the central path. Then the value of µ is reduced by a factor γ and the whole precess is repeated until the value of µ is within the tolerance ɛ. By tracing the central path as µ approaches zero, a strictly 97

109 Algorithm Volumetric Barrier Algorithm for Solving SSP (4.1.3,4.1.4) Require: ɛ > 0, γ (0, 1), θ > 0, β > 0, x 0 F 0 and µ 0 > 0. x := x 0, µ := µ 0 while µ ɛ do for k = 1,,..., K do solve (4.1.6) to obtain ȳ (k) end for compute x := { xxη(µ, x)} 1 x η(µ, x) using (4.1.15) compute δ(µ, x) := 1 µ xt xxη(µ, x) x using (4.1.15) while δ > β do x := x + θ x for k = 1,,..., K do solve (4.1.6) to obtain ȳ (k) end for compute x := { xxη(µ, x)} 1 x η(µ, x) using (4.1.15) compute δ(µ, x) := end while µ := γµ end while 1 µ xt xxη(µ, x) x using (4.1.15) feasible ɛ-solution to (4.1.6) will be generated. 4.4 Complexity analysis Theorems and 4.4. present the complexity analysis for two variants of algorithms: short-step algorithms and long-step algorithms, which are controlled by the manner of selection γ that we have made in Algorithm. In the short-step version of the algorithm, we decrease the barrier parameter in each iteration by a factor γ := 1 σ/ (1 + K)(m 1 c 1 + m c ), σ < 0.1. The k th iteration of the short-step algorithms is performed as follows: at the beginning of the iteration, we have µ (k 1) and x (k 1) on hand and x (k 1) is close to the center path, i.e., δ(µ (k 1), x (k 1) ) β. After the barrier parameter µ is reduced from µ (k 1) to µ k := γµ (k 1), we have that δ(µ k, x (k 1) ) β. Then a full Newton step with size θ = 1 is taken to produce a new 98

110 point x k with δ(µ k, x k ) β. We now show that, in this class of algorithms, only one Newton step is sufficient for recentering after updating the parameter µ. We have the following theorem. Theorem Consider Algorithm and let µ 0 be the initial barrier parameter, ɛ > 0 the stopping criterion, and β = ( 3)/. If the starting point x 0 is sufficiently close to the central path, i.e., δ(µ 0, x 0 ) β, then the short-step algorithm reduces the barrier parameter µ at a linear rate and terminates with at most O( (1 + K)(m 1 c 1 + m c ) ln(µ 0 /ɛ)) iterations. In the long-step version of the algorithm, the barrier parameter is decreased by an arbitrary constant factor γ (0, 1). It has a potential for larger decrease on the objective function value, however, several damped Newton steps might be needed for restoring the proximity to the central path. The k th iteration of the long-step algorithms is performed as follows: at the beginning of the iteration we have a point x (k 1), which is sufficiently close to x(µ (k 1) ), where x(µ (k 1) ) is the solution to (4.1.5) for µ := µ (k 1). The barrier parameter is reduced from µ (k 1) to µ k := γµ (k 1), where γ (0, 1), and then searching is started to find a point x k that is sufficiently close to x(µ k ). The long-step algorithm generates a finite sequence consisting of N points in F 0, and we finally take x k to be equal to the last point of this sequence. We want to determine an upper bound on N, the number of Newton iterations that are needed to find the point x k. We have the following theorem. Theorem Consider Algorithm and let µ 0 be the initial barrier parameter, ɛ > 0 the stopping criterion, and β = 1/6. If the starting point x 0 is sufficiently close to the central path, i.e., δ(µ 0, x 0 ) β, then the long-step algorithm reduces the barrier parameter µ at a linear rate and terminates with at most O((1+K)(m 1 c 1 +m c ) ln(µ 0 /ɛ)) iterations. The proofs of the Theorems and are similar to the proofs of Theorems

111 and 3.4. and are given in the subsections below. For convenience, we restate Proposition 3.4.1(i) and Lemma Proposition For any µ > 0 and x F 0, we denote x := { xxη(µ, x)} 1 1 x η(µ, x) and δ := µ xxη(µ, x)[ x, x]. Then for δ < 1, τ [0, 1] and any h R m we have xxη(µ, x + τ x)[h, h] (1 τδ) xxη(µ, x)[h, h]. Lemma For any µ > 0 and x F 0, let x be the Newton direction defined by x := { xxη(µ, x)} 1 1 x η(µ, x); δ := δ(µ, x) = µ xxη(µ, x)[ x, x], x + = x + x, x + be the Newton direction calculated at x +, and δ(µ, x + ) := 1 µ xxη(µ, x + )[ x +, x + ]. Then the following relations hold: (i) If δ < 3, then δ(µ, x + ) ( ) δ δ 1 δ. (ii) If δ 3, then η(µ, x) η(µ, x + θ x) µ(δ ln(1 + δ)), where θ = (1 + δ) Complexity for short-step algorithm For the purpose of proving Theorems 4.4.1, we present the following proposition which is a restatement of [30, Theorem 3.1.1]. Proposition Let χ κ (η; µ, µ + ) := δ(µ, x) < κ and µ + := γµ satisfies ( 1 + (1+K)(m1 c 1 +m c ) κ ) lnγ 1. Assume that χ κ (η; µ, µ + ) 1 δ(µ, x). κ Then δ(µ +, x) < κ. Lemma Let µ + = γµ, where γ = 1 σ/ (1 + K)(m 1 c 1 + m c ) and σ 0.1, and let β = ( 3)/. If δ(µ, x) β, then δ(µ +, x) β. 100

112 Proof. Let κ := β = 3. Since δ(µ, x) κ/, one can verify that for σ 0.1, µ + satisfies χ κ (η; µ, µ + ) 1 δ(µ, x) 1. κ By Proposition 4.4., we have δ(µ +, x) κ. By Lemmas 4.4.1(i) and 4.4., we conclude that we can reduce the parameter µ by the factor γ := 1 σ/ (1 + K)(m 1 c 1 + m c ), σ < 0.1, at each iteration, and that only one Newton step is sufficient to restore proximity to the central path. Hence, Theorem follows Complexity for long-step algorithm For x F 0 and µ > 0, we define the function φ(µ, x) := η(µ, x(µ)) η(µ, x), which represents the difference between the objective value η(µ k, x (k) ) at the end of k th iteration and the minimum objective value η(µ k, x(µ k 1 )) at the beginning of k th iteration. Then the task is to find an upper bound on φ(µ, x). To do so, we first give upper bounds on φ(µ, x) and φ (µ, x) respectively. We have the following lemma. Lemma Let µ > 0 and x F 0, we denote x := x x(µ) and define δ := δ(µ, x) = 1 µ xxη(µ, x)[ x, x]. For any µ > 0 and x F 0, if δ < 1, then the following inequalities hold: ( ) δ φ(µ, x) µ + ln(1 δ), (4.4.1) 1 δ φ (µ, x) (1 + K)(m 1 c 1 + m c ) ln(1 δ). (4.4.) 101

113 Proof. φ(µ, x) := η(µ, x) η(µ, x(µ)) := 1 0 x η(µ, x + (1 τ) x)[ x]dτ. Since x(µ) is the optimal solution, we have x η(µ, x(µ)) = 0. (4.4.3) Hence, with the aid of Proposition 4.4.1, we get φ(µ, x) = = = µ ( 1 δ xxη(µ, x(µ) + (1 α) x)[ x, x] dαdτ xxη(µ, x)[ x, x] (1 ˆδ dαdτ + αˆδ) µ δ (1 ˆδ + αˆδ) dαdτ ) + ln(1 δ) 1 δ, which establishes (4.4.1). Now, for any µ > 0, by applying the chain rule, using (4.4.3), and applying the Mean-Value Theorem, we get φ (µ, x) = η (µ, x) η (µ, x(µ)) x η(µ, x(µ)) T x (µ) = η (µ, x) η (µ, x(µ)) (4.4.4) = x η(µ, x(µ) + ϖ x) T x, for some ϖ (0, 1). Hence φ (µ, x) = 1 0 { x η(µ, x(µ) + τ x)} [ x] dτ 10

114 1 xxη(µ, x(µ) + τ x)[ x, x] 0 { x η(µ, x(µ) + τ x)} ({ xxη(µ, x(µ) + τ x)} 1 { x η(µ, x(µ) + τ x)} ) dτ. In view of Lemma 4..6 we have the following estimation { x η(µ, x)} ({ xxη(µ, x)} 1 { x η(µ, x)} ) (1 + K)(m 1c 1 + m c ). (4.4.5) µ Then, by using (4.4.5), Proposition 4.4.1, and the observation x(µ)+τ x = x (1 τ) x, we obtain φ (µ, x) = xxη(µ, x)[ 1 (1 τ) δ δ µ 1 δ + τ δ x, x] = (1 + K)(m 1 c 1 + m c ) (1 + K)(m 1 c 1 + m c ) µ (1 + K)(m 1 c 1 + m c ) µ 1 0 dτ δ 1 δ dτ + τ δ = (1 + K)(m 1 c 1 + m c ) ln(1 δ), dτ which establishes (4.4.). Lemma Let µ > 0 and x F 0 be such that δ < 1, where δ is as defined in Lemma Let µ + := γµ with γ (0, 1). Then η(µ +, x) η(µ +, x(µ + )) O(1)[(1 + K)(m 1 c 1 + m c )]µ +. Proof. Differentiating (4.4.4) with respect to µ, we get φ (µ, x) = η (µ, x) η (µ, x(µ)) { x η(µ, x(µ))} T x (µ). (4.4.6) We will work on the right-hand side of (4.4.6) by bounding the second and the last terms. Observe that η (µ, x(µ)) = K k=1 ρ(k) (µ, x). Differentiating ρ (k) (µ, x) with re- 103

115 spect to µ we obtain ρ (k) (µ, x) = c v (x, ỹ). Now, differentiating ρ (k) (µ, x) with respect to µ we obtain ρ (k) (µ, x) = c ỹv (x, ỹ) T ỹ. Then, differentiating (4.1.13) yields ỹ = 1 µ { ỹỹv (x, ỹ)} 1 ỹv (x, ỹ). Therefore we have ρ k(µ, x) = 1 µ c ȳv (x, ȳ) T { ȳȳv (x, ȳ)} 1 ȳv (x, ȳ). It can be shown (see also [7, Theorem 4.4]) that ρ k (µ, y) 1 µ c m. Thus η (µ, x(µ)) = K k=1 By differentiating (4.4.3) with respect to µ, we get ρ (k) (µ, x) m c K µr. (4.4.7) { x η(µ, x(µ))} + xxη(µ, x(µ))x (µ) = 0, or equivalently, x (µ) = { xxη(µ, x(µ))} 1 x η (µ, x(µ)). Hence, by using (4.4.5), we have { x η(µ, x(µ))} T x (µ) = { x η(µ, x(µ))} T { xxη(µ, x(µ))} 1 x {η(µ, x(µ))} (1 + K)(m 1c 1 + m c ). µ (4.4.8) Observe that η(µ, x) is concave in µ, it follows that η (µ, x) 0. Combining this with (4.4.7) and (4.4.8), we obtain 104

116 φ (µ, x(µ)) m(1 + K)(m 1c 1 + m c ). (4.4.9) µ Applying the Mean-Value Theorem and using Lemma and (4.4.9) to get µ µ φ(µ +, x) = φ(µ, x) + φ (µ, x)(µ + µ) + φ (υ, x) dυdτ ( ) µ + τ δ µ + ln(1 δ) (1 + K)(m 1 c 1 + m c ) ln(1 1 δ δ) (µ + µ) µ τ +m(1 + K)(m 1 c 1 + m c ) υ 1 dυdτ ( ) µ + µ δ = µ + ln(1 δ) (1 + K)(m 1 c 1 + m c ) ln(1 1 δ δ) (µ + µ) +m(1 + K)(m 1 c 1 + m c ) (µ µ + ) ln τ ( ) µ δ µ + ln(1 δ) (1 + K)(m 1 c 1 + m c ) ln(1 1 δ δ) (µ µ + ) +m(1 + K)(m 1 c 1 + m c ) (µ µ + ) lnγ 1. (Recall that γ 1 = µ + /µ τ/µ.) Since δ and γ are constants, the lemma is established. Note that the previous lemma requires δ < 1. However, evaluating δ explicitly may not be possible. In the next lemma we will see that δ is actually proportional to δ, which can be evaluated. Lemma For any µ > 0 and x F 0, let x := { xxη(µ, x)} 1 x η(µ, x) and x := x x(µ). We denote δ := δ(µ, x) = 1µ xxη(µ, x)[ x, x] and δ := δ(µ, x) = 1 µ xxη(µ, x)[ x, x]. If δ < 1/6, then 3 δ δ δ. Proof. See the proof of Lemma Theorem 4.4. follows by combining Lemmas 4.4.1(ii), 4.4.4, and

117 Chapter 5 Some Applications In this chapter of this dissertation, we will turn our attention to present four applications of SSOCPs and SRQCPs. Namely, we describe stochastic Euclidean facility location problem and the portfolio optimization problem with loss risk constraints as two applications of SSOCPs, then we describe the optimal covering random ellipsoid problem and an application in structural optimization as two applications of SRQCPs. We also refer the reader to a paper by Maggioni et al. [4] which describes another important application of SRQCPs in mobile ad hoc networks. The results of this chapter have been submitted for publication []. 5.1 Two applications of SSOCPs Each one of the following two subsections is devoted to describe an application of SSOCPs Stochastic Euclidean facility location problem In facility location problems (FLPs) we are interested in choosing a location to build a new facility or locations to build multiple new facilities so that an appropriate measure of distance from the new facilities to existing facilities is minimized. FLPs arise locating 106

118 airports, regional campuses, wireless communication towers, etc. The following are two ways of classifying FLPs (see also [39]): We can classify FLPs based on the number of new facilities in the following sense: if we add only one new facility then we get a problem known as a single facility location problem (SFLP), while if we add multiple new facilities instead of adding only one, then we get more a general problem known as a multiple facility location problem (MFLP). Another way of classification is based on the distance measure used in the model between the facilities. If we use the Euclidean distance then these problems are called Euclidean facility location problems (EFLPs), if we use the rectilinear distance then these problems are called rectilinear facility location problems (RFLPs). In (deterministic) Euclidean single facility location problem (ESFLP), we are given r existing facilities represented by the fixed points a 1, a,, a r in R n, and we plan to place a new facility represented by x so that we minimize the weighted sum of the distances between x and each of the points a 1, a,, a r. This leads us to the problem min r i=1 w i x a i or, alternatively, to the problem (see for example [4]) min r i=1 w i t i s.t. (t 1 ; x a 1 ; ; t r ; x a r ) r (n+1)r 0, where w i is the weight associated with the ith existing facility and the new facility for i = 1,,..., r. However, the resulting model is a DSOCP model. So, there is no stochasticity in this model. But what if we assume that some of the fixed existing facilities are random? Where could such a randomness be found in the real world? Picture (A) in Figure 5.1 shows the locations of multi-national military facilities led by troops from the United States and the United Kingdom in the Middle East region as 107

Figure 5.1: The multinational military facilities locations before and after starting the Iraq war. Pictures taken from www.globalsecurity.org/military/facility/centcom.htm. of December, 31, 00, i.e., before the beginning of the Iraq war (it is known that the war began on March 0, 003).

119 Figure 5.1: The multinational military facilities locations before and after starting the Iraq war. Pictures taken from of December, 31, 00, i.e., before the beginning of the Iraq war (it is known that the war began on March 0, 003). This picture shows the multi-national military facilities including the navy facility locations and the air force facility locations but not the army facility locations. Assuming that the random existing facilities are the Iraqi military facilities whose locations are unknown at the time of taking this picture which is before the beginning of the war. Assuming also that the new facilities are the army multi-national facilities whose locations have to be determined at the time of taking this picture which is before the beginning of the war. Then it is reasonable to look at a problem of this kind as a stochastic ESFLP. Picture (B) in Figure 5.1 shows the locations of all multi-national military facility locations including the army facility locations as of December, 31, 00, i.e., after starting the Iraq war. So, in some applications, the locations of existing facilities cannot be fully specified because the locations of some of them depend on information not available at the time when decision needs to be made but will only be available at a later point in time. In general, in order to be precise only the latest information of the random facilities is used. This may require an increasing or decreasing of the number of the new facilities after 108

120 the latest information about the random facilities become available. In this case, we are interested in stochastic facility location problems (or abbreviated as stochastic FLPs). When the locations of all old facilities are fully specified, FLPs are called deterministic facility location problems (or abbreviated as deterministic FLPs). In this section we consider (both single and multiple) stochastic Euclidean facility location problems, and in the next chapter we describe four different models of FLP, two of them can be viewed as generalizations of the models presented in this section. Stochastic ESFLPs Let a 1, a,..., a r1 be fixed points in R n representing the coordinates of r 1 existing fixed facilities and ã 1 (ω), ã (ω),..., ã r (ω) be random points in R n representing the coordinates of r random facilities who realizations depends on an underlying outcome ω in an event space Ω with a known probability function P. Suppose that at present we do not know the realizations of r random facilities, and that at some point in time in future the realizations of these r random facilities become known. Our goal is to locate a new facility x that minimizes the weighted sums of the Euclidean distance between the new facility and each one of the existing fixed facilities and also minimizes the expected weighted sums of the distance between the new facility and the realization of each one of the random facilities. Note that this decision needs to be made before the realizations of the r random facilities become available. This leads us to the following SSOCP model: min r1 i=1 w i t i + E [Q(x, ω)] s.t. (t 1 ; x a 1 ;... ; t r1 ; x a r1 ) r1 0, where Q(x, ω) is the minimum value of the problem 109

121 min r j=1 w j t j s.t. ( t 1 ; x ã 1 (ω);... ; t r ; x ã r (ω)) r 0, and E[Q(x, ω)] := Q(x, ω)p (dω), Ω where w i is the weight associated with the ith existing facility and the new facility for i = 1,,..., r 1 and w j (ω) is the weight associated with the jth random existing facility and the new facility for j = 1,,..., r. Stochastic EMFLPs Assume that we need to add m new facilities, namely x 1, x,..., x m R n, instead of adding only one. This may require an increasing or decreasing of the number of the new facilities after the latest information about the random facilities become available. For simplicity, let us assume that the number of new facilities was previously known and fixed. Then we have two cases depending on whether or not there is an interaction among the new facilities in the underlying model. If there is no interaction between the new facilities, we are just concerned in minimizing the weighted sums of the distance between each one of the new facilities on one hand and each one of the fixed facilities and the realization of each one of the random facilities on the other hand. In other words, we solve the following SSOCP model: min m j=1 r1 i=1 w ij t ij + E [Q(x 1 ;... ; x m, ω)] s.t. (t 1j ; x j a 1 ;... ; t r1 j; x j a r1 ) r1 0, j = 1,,..., m, where Q(x 1 ; ; x m, ω) is the minimum value of the problem 110

122 min m j=1 r i=1 w ij(ω) t ij s.t. ( t 1j ; x j ã 1 (ω);... ; t r j; x j ã r (ω)) r 0, j = 1,,..., m, and E[Q(x 1 ;... ; x m, ω)] := Q(x 1 ;... ; x m, ω)p (dω). Ω where w ij is the weight associated with the ith existing facility and the jth new facility for j = 1,,..., m and i = 1,,..., r 1, and w ij (ω) is the weight associated with the ith random existing facility and the jth new facility for j = 1,,..., m and i = 1,,..., r. If interaction exists among the new facilities, then, in addition to the above requirements, we need to minimize the sum of the Euclidean distances between each pair of the new facilities. In this case, we are interested in a model of the form: min s.t. m r1 j=1 i=1 w ijt ij + m j 1 j=1 j =1 ŵjj ˆt jj + E [Q(x 1 ;... ; x r1, ω)] (t 1j ; x j a 1 ;... ; t r1 j; x j a r1 ) r1 0, j = 1,,..., m (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jm ; x j x m ) (m j) 0, j = 1,,..., m 1, where Q(x 1 ;... ; x m, ω) is the minimum value of the problem min s.t. m r j=1 i=1 w ij(ω) t ij + m j 1 j=1 j =1 ŵjj ˆt jj ( t 1j ; x j ã 1 (ω);... ; t r j; x j ã r (ω)) r 0, j = 1,,..., m (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jm ; x j x m ) (m j) 0, j = 1,,..., m 1, and E[Q(x 1 ;... ; x m, ω)] := Q(x 1 ;... ; x m, ω)p (dω), Ω 111

123 where ŵ jj is the weight associated with the new facilities j and j for j = 1,,..., j 1 and j = 1,,..., m Portfolio optimization with loss risk constraints The application in this subsection is a well-known problem from portfolio optimization. We consider the problem of maximizing the expected return subject to loss risk constraints. The same problem over one period was cited as an application of DSOCP (see Lobo, et al. [3]). Some extensions of this problem will be described. The problem Consider a portfolio problem with n assets or stocks over two periods. We start by letting x i denote the amount of asset i held at the beginning of (and throughout) the first period, and p i will denote the price change of asset i over this period. So, the vector p R n is the price vector over the first period. For simplicity, we let p be Gaussian with known mean p and covariance Σ, so the return over this period is the (scalar) Gaussian random variable r = p T x with mean r = p T x and variance σ = x T Σx where x = (x 1 ; x ;... ; x n ). Let y i denote the amount of asset i held at the beginning of (and throughout) the second period, and q i (ω) will denote the random price change of asset i over this period whose realization depends on an underlying outcome ω in an event space Ω with known probability function P. Similarly, we take the price vector q(ω) R n over the second period to be Gaussian with mean q(ω) and covariance Σ(ω), so the return over this period is the (scalar) Gaussian random variable r(ω) = q(ω) T y with mean r(ω) = q(ω) T y and variance σ(ω) = y T Σ(ω)y where y = (y1 ; y ;... ; y n ). Suppose that in the first period we do not know the realization of the Gaussian vector q(ω) R n, and that at some point in time in the future (after the first period) the realization of this Gaussian vector becomes known. As pointed out in [3], the choices of portfolio variables x and y involve the 11

124 (classical, Markowitz) tradeoff between random return mean and random variance. The optimization variables are the portfolio vectors x R n (of the first stage) and y R n (of the second stage). For these portfolio vectors, we take the simplest assumptions x 0, y 0 (i.e., no short positions [3]) and budget [3]). n i=1 x i = n i=1 y i = 1 (i.e. unit total Let α and α be given unwanted return levels over the first and second periods, respectively, and let β and β be given maximum probabilities over the first and second periods, respectively. Assuming the above data is given, our goal is to determine the amount of the asset i (which is x i over the first period and y i over the second period), i.e. determine x and y, such that the expected returns over these two periods are maximized subject to the following two types of loss risk constraints: the constraint P (r α) β must be satisfied over the first period, and the constraint P ( r(ω) α) β must be satisfied over the second period. This determination needs to be made before the realizations become available. Formulation of the model As noted in [3], the constraint P (r α) β is equivalent to the second-order cone constraint ) (α r; Φ 1 (β)(σ 1 x) 0, provided β 1/ (i.e.,φ 1 (β) 0), where Φ(z) = 1 π z e t / dt is the cumulative normal distribution function of a zero mean unit variance Gaussian random variable. To prove this (see also [3]), notice that the constraint P (r α) β 113

125 can be written as P ( r r σ α r σ ) β. Since (r r)/ σ is a zero mean unit variance Gaussian random variable, the probability above is simply Φ((α r)/ σ), thus the constraint P (r α) β can be expressed as Φ((α r)/ σ) β or ((α r)/ σ) Φ 1 (β), or equivalently r + Φ 1 (β) σ α. Since σ = xt Σ x = (Σ 1/ x) T (Σ 1/ x) = Σ 1/ x, the constraint P (r α) β is equivalent to the the second-order cone inequality r + Φ 1 (β) Σ 1/ x α or equivalently to the second-order cone constraint ) (α r; Φ 1 (β)(σ 1 x) 0. Similarly, provided β 1/, the constraint P ( r(ω) α) β is equivalent to the second-order cone constraint ( α r(ω); Φ 1 ( β)( Σ(ω) ) 1/ y) 0. Our goal is to determine the amount of the asset i (which is x i over the first period and y i over the second period), i.e. determine x and y, such that the expected returns over these two periods are maximized. This problem can be cast as a two-stage SSOCP as follows: max s.t. p T x + E[Q(x, ω)] ( α p T x; Φ 1 (β)(σ 1/ x) ) 0 1 T x = 1, x 0, (5.1.1) where Q(x, ω) is the maximum value of the problem 114

126 max s.t. q(ω) T y ( α q(ω) T y; Φ 1 ( β)( Σ(ω) ) 1/ y) 0 1 T y = 1, y 0, (5.1.) and E[Q(x, ω)] := Q(x, ω)p (dω). Ω Note that this formulation is different from one suggested by Mehrotra and Özevin [7] which minimizes the variance period (two-stage extension of Markowitz s mean-variance model). Here we formulate a model with a linear objective function leading to another approach for solving this problem. Extensions The simple problem described above has many extensions [3]. One of these extensions is imposing several loss risk constraints, i.e., the constraints P (r α i ) β i, i = 1,,..., k 1 (where β i 1/, for i = 1,,..., k 1 ), or equivalently ( αi r; Φ 1 (β i )(Σ 1/ x) ) 0, for i = 1,,..., k 1 to be satisfied over the first period, and the constraints P ( r(ω) α j ) β j, j = 1,,..., k (where β j 1/, for j = 1,,..., k ), or equivalently ( α j r(ω); Φ 1 ( β ) j )( Σ(ω) 1/ y) 0, for j = 1,,..., k to be satisfied over the second period. So our problem becomes 115

127 max s.t. p T x + E[Q(x, ω)] ( αi r; Φ 1 (β i )(Σ 1/ x) ) 0, i = 1,,..., k 1 1 T x = 1, x 0, where Q(x, ω) is the maximum value of the problem max s.t. q(ω) T y ( α j r(ω); Φ 1 ( β ) j )( Σ(ω) 1/ y) 0, j = 1,,..., k 1 T y = 1, y 0, and E[Q(x, ω)] := Q(x, ω)p (dω). Ω As another extension, observe that the statistical models ( p; Σ) for the price changes ( during the first period and q(ω), Σ(ω) ) for the price changes during the second period are both uncertain (regardless of the fact that the later depends on ω while the first does not) and one of the limitations of model in (5.1.1, 5.1.) is its need to estimate these statistical models. In [11], Bawa et al. indicated that using estimates of unknown expected returns and unknown covariance matrices leads to an estimation risk in portfolio choice. To handle this uncertainty, as in [11], we take those expected returns and covariance matrices that belong to bounded intervals. i.e., we consider the statistical models ( p; Σ) and ( q(ω), Σ(ω)) that belong to the bounded sets: ℵ := {( p; Σ) : p l p p u, Σ l Σ Σ u } 116

128 and R := {( q(ω); Σ(ω)) : q l (ω) q(ω) q u (ω), Σ l (ω) Σ(ω) Σ u (ω)}, respectively, where p l, p u, q l (ω), q u (ω), Σ l, Σ u, Σ l (ω) and Σ u (ω) are the extreme values of the intervals mentioned above, then we take a worst case realization of the statistical ( models ( p; Σ) and q(ω), Σ(ω) ) by maximizing the minimum of the expected returns over all ( p; Σ) ℵ and ( q(ω), Σ(ω)) R. Let us consider here only a discrete version of this development. Suppose we have N 1 different possible scenarios over the first period, each of which is modeled by a simple Gaussian model for the price change vector p k with mean p k and covariance Σ k. Similarly, we also have N different possible scenarios over the second period, each of which is modeled by a simple Gaussian model for the price change q l (ω), with mean q l (ω) and covariance Σ l (ω). We can then take a worst case realization of these information by maximizing the minimum of the expected returns for these different scenarios, subject to a constraint on the loss risk for each scenario to get the following SSOCP model: max s.t. p T x + E[Q(x, ω)] ( ) α p T i x; Φ 1 (β)(σ 1/ i x) 0, i = 1,,..., N 1 1 T x = 1, x 0, where E[Q(x, ω)] is the maximum value of the problem max s.t. q(ω) T y ( α q j (ω) T y; Φ 1 ( β)( Σ ) j (ω) 1/ y) 0, j = 1,,..., N 1 T y = 1, y 0, and E[Q(x, ω)] := Q(x, ω)p (dω). Ω 117

129 5. Two applications of SRQCPs The stochastic versions of the two problems in this section have been described and formulated by Ariyawansa and Zhu [49] as SSDP models, but it has been found recently [1, 3, 4] that on numerical grounds solving DSOCPs (SSOCPs) or DRQCPs 1 (SRQCPs) by treating them as DSDPs (SSDPs) is inefficient, and that DSOCPs (SSOCPs) or DRQCPs (SRQCPs) can be solved more efficiently by exploiting their structure. In fact, in [1, 3] we see many problems first formulated as DSDPs in [38, 41, 30] have been reformulated as DSOCPs or DRQCP mainly for this reason. In view of this fact, in this section we reformulate SRQCP models (as these problems should be solved as such) for the minimumvolume covering random ellipsoid problem and the structural optimization problem Optimal covering random ellipsoid problem The problem The problem description in this part is taken from Ariyawansa and Zhu [9, Subsection 3.]. Suppose we have n 1 fixed ellipsoids: E i := {x R n : x T H i x + g T i x + v i 0} R n, i = 1,,..., n 1 where H i S n +, g i R n and v i R are deterministic data for i = 1,,..., n 1. Suppose we also have n random ellipsoids: Ẽ i (ω) := {x R n : x T Hi (ω)x + g i (ω) T x + ṽ i (ω) 0} R n, i = 1,,..., n, where, for i = 1,,..., n, Hi (ω) S n +, g i (ω) R n and ṽ i (ω) R are random data whose realizations depend on an underlying outcome ω in an event space Ω with a known 1 The acronym DRQCPs stands for deterministic rotated quadratic cone programs. 118

130 probability function P. We assume that, at present time, we do not know the realizations of n random ellipsoids, and that at some point in the future the realizations of these n ellipsoids become known. Given this, we need to determine a ball that contains all n fixed ellipsoids and the realizations of the n random ellipsoids. This decision needs to be made before the realizations of the random ellipsoids become available. Consequently, when the realizations of the random ellipsoids do become available, the ball that has already been determined may or may not contain all the realized random ellipsoids. In order to guarantee that the modified ball contains all (fixed and realizations of random) ellipsoids, we assume that at that stage we are allowed to change the radius of the ball (but not its center), if necessary. We consider the same assumptions as in [49]. We assume that the cost of choosing the ball has three components: the cost of the center, which is proportional to the Euclidean distance to the center from the origin; the cost of the initial radius, which is proportional to the square of the radius; the cost of changing the radius. The center and the radius of the initial ball are determined so that the expected total cost is minimized. In [49], Ariyawansa and Zhu describe the following concrete version of this generic application: Let n :=. The fixed ellipsoids contain targets that need to be destroyed, and the random ellipsoids contain targets that also need to be destroyed but are moving. Fighter aircrafts take off from the origin with a planned disk of coverage that contains the fixed ellipsoids. In order to be accurate only the latest information about the moving targets is used. This may require increasing the radius of the planned disk of coverage after latest information about the random ellipsoids become available, which may occur after the initially planned fighter aircrafts have taken off. This increase, dependent on 119

131 the initial disk of coverage and the specific information about the moving targets, may result in an additional cost. The initial disk of coverage need to be determined so that the expected total cost is minimized. Our first goal is to determine x R n and γ R such that the ball B defined by B := {x R n : x T x x T x + γ 0} contains the fixed ellipsoids E i for i = 1,,..., n 1. As we mentioned, this determination need to be determined before the realization of the random ellipsoids become available. When the realizations of the random ellipsoids become available, if necessary, we need to determine γ so that the new ball B := {x R n : x T x x T x + γ 0} contains all the realizations of the random ellipsoids Ẽi(ω) for i = 1,,..., n. Notice that the center of the ball B is x, its radius is r := x T x γ, and the distance from the origin to its center is x T x. Notice also that the new ball B has the same center x as B but a larger radius r := x T x γ. Formulation of the model We introduce the constraints d 1 x T x and d r = x T x γ. That is, d 1 is an upper bound on the distance between the center of the ball B and the origin, x T x, and d is an upper bound on square of the radius of the ball B. In order to proceed, we feel that it is necessary for the reader to recall the following fact: Fact 1 (Sun and Freund [37]). Suppose that we are given two ellipsoids E i R n, i = 1, defined by E i := {x R n : x T H i x + g T i x + v i 0}, where H i S n +, g i R n and v i R 10

132 for i = 1,, then E 1 contains E if and only if there exists τ 0 such that the linear matrix inequality H 1 g 1 g1 T v 1 τ H g g T v holds. In view of Fact 1 and the requirement that the ball B contains the fixed ellipsoids E i for i = 1,,..., n 1, and the realizations of the random ellipsoids Ẽi(ω) for i = 1,,..., n, we accordingly add the following constraints: I x x T γ τ i H i g i T g i v i, i = 1,,..., n 1, and I x x T γ δ i H i (ω) g T i (ω) g i (ω) v i (ω), i = 1,,..., n, or equivalently M i 0, i = 1,..., n 1 and M i (ω) 0, i = 1,,..., n where for each i = 1,..., n 1, M i := τ ih i I τ i g T i + x T τ i g i + x, τ i v i γ and for each i = 1,..., n, M i (ω) := δ H i i (ω) I δ i g i (ω) + x δ i g T i (ω) + x T δ i v i (ω) γ. 11

133 Since we are looking to minimizing d, where d is an upper bound on square of the radius of the ball B, we can write the constraint d x T x γ as d = x T x γ. So, the matrix M i can be then written as M i = τ ih i I τ i g T i + x T τ i g i + x. τ i v i + d x T x Now, let H i := Ξ i Λ i Ξ T i be the spectral decomposition of H i, where Λ i := diag(λ i1 ;... ; λ in ), and let u i := Ξ T i (τ i g i + x). Then, for each i = 1,..., n 1, we have M i := ΞT i 0 M i Ξ i 0 = τ iλ i I 0 T 1 0 T 1 u T i u i. τ i v i + d x T x Consequently, M i 0 if and only if Mi 0 for each i = 1,,..., n 1. Now our formulation of the problem in SSOCP depends on the following lemma (see also [3]): Lemma For each i = 1,,..., n 1, Mi 0 if and only if τ i λ min (H i ) 1 and x T x d + τ i v i + 1 T s i, where s i = (s i1 ;... ; s in ), s ij = u ij/(τ i λ ij 1) for all j such that τ i λ ij > 1, and s ij = 0 otherwise. Proof. For each i = 1,,..., n 1, it is known that the matrix M i 0 if and only if every principle minor of Mi is nonnegative. Since τ i λ i τ i λ i det(τ i Λ i I) = τ n1 λ in1 1 = Π n 1 j=1 (τ jλ ij 1). It follows that Mi 0 if and only if Π s j=1(τ j λ ij 1) 0, for all s = 1,..., n 1 and det( M i ) 0. Thus, Mi 0 if and only if τ j λ ij 1, for all j = 1,..., n 1 and det( M i ) 0. 1

134 Notice that ( k ) ( det( M i ) = (τ i λ ij 1) (τ i v i + d x T x) k j=1 j=1 u ij ). This means the inequality det( M i ) 0 strictly holds for each i j k such that τ j λ ij = 1. Hence, det( M i ) 0 if and only if (τ i v i + d x T x) 1 T s i = (τ i v i + d x T x) (u ij/(τ j λ ij 1)) 0. τ i λ ij >1 Therefore, Mi 0 if and only if τ i λ min (H i ) 1 and d x T x τ i v i + 1 T s i. So far, we have shown that each constraint M i 0 can be replaced by the constraints τ i λ min (H i ) 1, x T x σ and σ d + τ i v i 1 T s i. Similarly, for each i = 1,,..., n, by letting H i := Ξ T i Λ i ΞT i (the spectral decomposition of Hi ), ũ i := Ξ T i (δ i g i + x), and s i := ( s i1 ; s i ;... ; s in ), where s ij := ũ ij/(δ i λij 1) for all j when δ i λij > 1, and s ij := 0 otherwise, then we can show that each constraint M i (ω) 0 can be replaced by the constraints δ i λmin ( H i (ω)) 1, x T x σ(ω), and σ(ω) d + δ i v i (ω) 1 T s i (ω). Since we are minimizing d and d, then for all j = 1,,..., n, we can relax the definitions of s ij and s ij by replacing them by u ij s ij (τ i λ ij 1) for all i = 1,,..., n 1 and ũ ij s ij (δ i λij 1) for all i = 1,,..., n, respectively. When the realizations of the random ellipsoids become available, if necessary, we determine λ so that the new ball B := {x R n : x T x x T x + λ 0} contains all the realizations of the random ellipsoid. This new ball B has the same center x as B but a larger radius, r := x T x. We note that r r = ( x T x γ) ( x T x γ) = γ γ, 13

135 and thus we introduce the constraint 0 γ γ z where z is an upper bound of r r. Let c > 0 denote the cost per unit of the Euclidean distance between the center of the ball B and the origin; let α > 0 be the cost per unit of the square of the radius of B; and let β > 0 be the cost per unit increase of the square of the radius if it becomes necessary after the realizations of the random ellipsoids are available. We now define the following decision variables x := (d 1 ; d ; x; γ; τ ) and y := (z; γ; δ). Then, by introducing the following unit cost vectors c := ( c; α; 0; 0; 0) and q := (β; 0; 0), and combining all of the above, we get the following SQRCP model: min c T x + E[Q(x, ω)] s.t. u i = Ξ T i (τ i g i + x), i = 1,,..., n 1 u ij s ij (τ i λ ij 1), i = 1,,..., n 1, j = 1,,..., n x T x σ 1 σ 1 d + τ i v i 1 T s i, i = 1,,..., n 1 τ i 1/λ min (H i ), i = 1,,..., n 1 x T x d 1, where Q(x, ω) is the minimum value of the problem 14

136 min q T y s.t. ũ i (ω) = Ξ T i (ω) T (δ i g i (ω) + x), i = 1,,..., n ũ ij (ω) s ij (ω)(δ i λij (ω) 1), i = 1,..., n, j = 1,,..., n x T x σ (ω) σ (ω) d + δ i v i (ω) 1 T s i (ω), i = 1,,..., n δ i 1/ λ min ( H i )(ω), i = 1,,..., n 0 δ δ z, and E[Q(x, ω)] := Q(x, ω)p (dω). Ω 5.. Structural optimization Ben-Tal and Bendsøe in [1] and Nemirovski [13] consider the following problem from structural optimization. A structure of k linear elastic bars connects a set of p nodes. They assume the geometry (topology and lengths of bars) and the material are fixed. The goal is to size the bars, i.e., determine appropriate cross-sectional areas of the bars. For i = 1,,..., k, and j = 1,,..., p, we define the following decision variables and parameters: f j := the external force applied on the j th node, d j := the (small) displacement of the j th node resulting from the load force f j, x i := the cross-sectional area of the i th bar, x i := the lower bound on the cross-sectional area of the i th bar, x i := the upper bound on the cross-sectional area of the i th bar, l i := the length of the i th bar, 15

137 v := the maximum allowed volume of the bars of the structure, G(x) := k i=1 x ig i is the stiffness matrix, where the matrices G i S p, i = 1,,..., k depend only on fixed parameters (such as length of bars and material). In the simplest version of the problem they consider one fixed set of externally applied nodal forces f j, j = 1,,..., p. Given this, the elastic stored energy within the structure is given by ε = f T d, which is a measure of the inverse of the stiffness of the structure. In view of the definition of the stiffness matrix G(x), we can also conclude that the following linear relationship between f and d: f = G(x) d. The objective is to find the stiffest truss by minimizing ε subject to the inequality l T x v as a constraint on the total volume (or equivalently, weight) and the constraint x x x as upper and lower bounds on the cross-sectional areas. For simplicity, we assume that x > 0 and G(x) 0, for all x > 0. In this case we can express the elastic stored energy in terms of the inverse of the stiffness matrix and the external applied nodal force as follows: ε = f T G(x) 1 f. In summary, they consider the problem min s.t. f T G(x) 1 f x x x l T x v, 16

138 which is equivalent to min s s.t. f T G(x) 1 f s x x x l T x v. (5..1) The first inequality constraint in (5..1) is just fractional quadratic function inequality constraint and it can be formulated as a hyperbolic inequality. In of [1] (see also of [3]), the authors demonstrate that this inequality is satisfied if and only if there exists u i R r i and t i R with t i > 0, i = 1,,..., k, such that k Di T u i = f, u T i u i x i t i, for i = 1,,..., k, and 1 T t s, i=1 where r i = rank(g i ) and the matrix D i R ri n is the Cholesky factor of G i (i.e., Di T D i = G i ) for each i = 1,,..., k. Using this result, problem (5..1) becomes the problem min s.t. s k i=1 DT i u i = f u T i u i x i t i, i = 1,..., k 1 T t s (5..) x x x l T x v, which includes only linear and hyperbolic constraints. The model (5..) depends on a simple assumption that says that the external forces applied to the nodes are fixed. As more complicated versions, they consider multiple loading scenarios as well. In [49], Ariyawanza and Zhu consider the case that external forces applied to the nodes are random variables with known distribution functions and they formulate an SSDP model for this problem. In this subsection, we formulate an 17

139 SRQCP model for this problem under the assumption that some of the external forces applied to the nodes are fixed and the rest are random variables with known distribution functions. Let us denote to the external force applied on the j th node by f j if it is fixed and by fj (ω) if it is random. Due to the changes in the environmental conditions (such as wind speed and temperature), we believe that the randomness of some of the external forces is much closer to the reality. Without loss of generality, let us assume that f(ω) = ( f; f(ω)) where f(ω) depends on an underlying outcome ω in an event space Ω with a known probability function P. Accordingly, the displacement of the j th node resulting from the random forces and the elastic stored energy within the structure are also random. Then we have the following relations ε = f T d, f = G(x) d, ε(ω) = f(ω) T d(ω), and f(ω) = G(x) d(ω). Suppose we design a structure for a customer. The structure will be installed in an open environment. From past experience, the customer can provide us with sufficient information so that we can model the external forces that will be applied to the nodes of the structure as random variables with known distribution functions. Given this information, we can formulate a model of this problem so that we can guarantee that our structure will be able to continue to function and stand up against the worst environmental conditions. 18

140 In summary, we solve the following SRQCP problem: min s.t. s + E [Q(x, ω)] k i=1 DT i ũi = f ũ T i ũi x i t i, i = 1,..., k 1 T t s x x x l T x v, where Q(x, ω) is the minimum value of the problem min s.t. s k i=1 DT i u(ω) i = f(ω) u(ω) T i u(ω) i x i t i, i = 1,..., k 1 T t s, and E[Q(x, ω)] := Q(x, ω)p (dω). Ω 19

141 Chapter 6 Related Open problems: Multi-Order Cone Programming Problems In this chapter we introduce a new class of convex optimization problems that can be viewed as an extension of second-order cone programs. We present primal and dual forms of multi-order cone programs (MOCPs) in which we minimize a linear function over a Cartesian product of p th -order cones (we allow different p values for different cones in the product). We then indicate weak and strong duality relations for the problem. We also introduce mixed integer multi-order cone programs (MIMOCPs) to handle MOCPs with integer-valued variables, and two-stage stochastic multi-order cone programs (SMOCPs) with recourse to handle uncertainty in data defining (deterministic) MOCPs. We demonstrate how decision making problems associated with facility location problems lead to MOCP, MIMOCP, and SMOCP models. It is interesting to investigate other applicational settings leading to (deterministic, stochastic and mixed integer) MOCPs. Development of algorithms for such multi-order cone programs which in turn will benefit from a duality theory is equally interesting and remains one of the important open problems for future 130

142 research. We begin by introducing some notations we use in the sequel. Given p 1, the p th -order cone of dimension n is defined as Q n p := {x = (x 0 ; x) R R n 1 : x 0 x p } where p denotes the p-norm. The cone Q n p is regular (see, for example, [45]). As special cases, when p = we obtain Q n := E n +; the second-order cone of dimension n, and when p = 1 or, Q n p is a polyhedral cone. We write x n p 0 to mean that x Qn p. Given 1 p i for i = 1,,..., r. We write x n 1,n,...,n r p 1,p,...,p r 0 to mean that x Q n 1 p 1 Q n p Q nr p r. It is immediately seen that, for every vector x R n where n = r i=1 n i, x n 1,n,...,n r p 1,p,...,p r 0 if and only if x is partitioned conformally as x = (x 1 ; x ;... ; x r ) and x i n i p i simplicity, we write: 0 for i = 1,,..., r. For Q n p as Q p and x n p 0 as x p 0 when n is known from the context; Q n 1 p 1 Q n p Q nr p r as Q p1,p,...,p r and x n 1,n,...,n r p 1,p,...,p r 0 as x p 1,p,...,p r 0 when n 1, n,..., n r are known from the context; x p, p,..., p }{{} 0 as x r p 0. r times The set of all interior points of Q n p is denoted by int(q n p) := {x = (x 0 ; x) R R n 1 : x 0 > x p }. We write x p1,p,...,p r 0 to mean that x int(q p 1,p,...,p r ) := int(q p1 ) int(q p ) int(q pr ). 6.1 Multi-order cone programming problems Let r 1 be an integer, and p 1, p,..., p r are such that 1 p i for i = 1,,..., r. Let m, n, n 1, n,..., n r be positive integers such that n = r i=1 n i. Then we define an 131

143 MOCP in primal standard form as min c T x (P ) s.t. A x = b x p1,p,...,p r 0, where A R m n, b R m and c R n constitute given data, and x R n is the primal decision variable. We define a MOCP in dual standard form as max b T y (D) s.t. A T y + z = c z q1,q,...,q r 0, where y R m and z R n are the dual decision variables, and q 1, q,..., q r are integers such that 1 q i for i = 1,,..., r. If (P) and (D) are defined by the same data, and q i is conjugate to p i, in the sense that 1/p i + 1/q i = 1 for i = 1,,..., r, then we can prove relations between (P) and (D) (see 3) to justify referring to (D) as the dual of (P) and vice versa. A p th -order cone programming (POCP) problem in primal standard form is min c T x s.t. A x = b (6.1.1) x r p 0, where m, n, n 1, n,..., n r are positive integers such that n = r i=1 n i, p [1, ], A R m n, b R m and c R n constitute given data, x R n is the primal variable. According to (D), the dual problem associated with POCP (6.1.1) is 13

144 max b T y s.t. A T y + z = c (6.1.) z r q 0, where y R m and z R n are the dual variables and q is conjugate to p. Clearly, second-order cone programs are a special case of POCPs which occurs when p = in (6.1.1) (hence q = in (6.1.)). Example 1. Norm minimization problems: In [1], Alizadeh and Goldfarb presented second-order cone programming formulations of three norm minimization problems where the norm is the Euclidean norm. In this subsection we indicate extensions of these three problems where we use arbitrary p norms leading to MOCPs. Let v i = A i x + b i R n i 1, i = 1,,..., r. The following norm minimization problems can be cast as MOCPs: 1. Minimization of the sum of norms: The problem min r i=1 v i pi can be formulated as min s.t. r i=1 t i A i x + b i = v i, i = 1,,..., r (t 1 ; v 1 ; t ; v ;... ; t r ; v r ) p1,p,...,p r 0.. Minimization of the maximum of norms: The problem min max 1 i r v i pi can be expressed as the MOCP problem min s.t. t A i x + b i = v i, i = 1,,..., r (t; v 1 ; t; v ;... ; t; v r ) p1,p,...,p r Minimization of the sum of the k largest norms: More generally, the problem of minimizing the sum of the k largest norms can also be cast as an MOCP. Let the 133

145 norms v [1] p[1], v [] p[],..., v [r] p[r] be the norms v 1 p1, v p,..., v r pr sorted in nonincreasing order. Then the problem min r i=1 v [i] p[i] can be formulated (see also [1] and [3] and the related references contained therein) as the MOCP problem min s.t. r i=1 s i + kt A i x + b i = v i, i = 1,,..., r (s 1 + t; v 1 ; s + t; v ;... ; s r + t; v r ) p1,p,...,p r 0 s i 0, i = 1,,..., r. 6. Duality Since MOCPs are a class of convex optimization problems, we can develop a duality theory for them. Here we indicate weak and strong duality for the pair (P, D) as justification for referring to them as a primal dual pair. It was shown by Faraut and Korányi in [18, Chapter I.] that the second-order cone is self-dual. We now prove the more general result that the the dual of the p th -order cone of dimension n is the q th -order cone of dimension n, where q is the conjugate to p. Lemma Q p = Q q, where 1 p and q is the conjugate to p. More generally, Q p1,p,...,p r = Q q1,q,...,q r, where 1 p i and q i is the conjugate to p i for i = 1,,..., r. Proof. The proof of the second part trivially follows from the first part and the fact that (K 1 K K r ) = K1 K Kr. To prove the first part, we first prove that Q q Q p. Let x = (x 0 ; x) Q q, we show that x Q p by verifying that x T ȳ 0 for any y Q p. So let y = (y 0 ; ȳ) Q p. Then x T y = x 0 y 0 + x T ȳ x q ȳ p + x T ȳ x T ȳ + x T ȳ 0, where the first inequality follows from the fact that x Q q and y Q p and the second one from Hölder s inequality. Now we show Q p Q q. Let y = (y 0 ; ȳ) Q p, we show that y Q q by verifying that y 0 ȳ q. This is trivial if 134

146 ȳ = 0 or p =. If ȳ 0 and 1 p <, let u := (y p/q 1 ; y p/q ;... ; y p/q n 1 ) and consider x := ( u p ; u) Q p. Then by using Hölder s inequality, where the equality is attained, we obtain 0 x T y = u p y 0 u T ȳ = u p y 0 u p ȳ q = u p (y 0 ȳ q ). This gives that y 0 ȳ q. From this lemma we deduce that the p th -order cone is reflexive, i.e., Q p = Q p, and more generally, Q p1,p,...,p r is also reflexive. On the basis of this fact, it is natural to infer that the dual of the dual is the primal. In view of the above lemma, problem (D) can be derived from (P) through the usual Lagrangian approach. The Lagrangian function for (P) is L(x, λ, ν) = c T x λ T (A x b) ν T x. The dual objective is q(λ, ν) := inf x L(x, λ, ν) = inf (c x AT λ ν) T x + λ T b. In fact, we may call the constraint x p1,p,...,p r 0 as the nonnegativity of x, but, with respect to the multi-order cone Q p1,p,...,p r. Note that the Lagrange multiplier ν corresponding to the inequality constraint x p1,p,...,p r 0 is restricted to be nonnegative with respect to the dual of Q p1,p,...,p r (i.e., ν q1,q,,q r 0), whereas the Lagrange multiplier λ corresponding to the equality constraint Ax b = 0 is unrestricted. Hence the dual problem is obtained by maximizing q(λ, ν) subject to ν q1,q,...,q r 0. If c A T λ ν 0, the infimum is clearly. So we can exclude λ for which c A T λ ν 0. When c A T λ ν = 0, the dual objective function is simply λ T b. 135

147 Primal Minimum Maximum Dual C vector: p1,p,...,p r vector: q1,q,...,q r V O vector: p1,p,...,p r vector: q1,q,...,q r A N vector or scalar: vector or scalar: R S vector or scalar: vector or scalar: B T vector or scalar: = vector or scalar: free L V vector: p1,p,...,p r vector: q1,q,...,q r C A vector: p1,p,...,p r vector: q1,q,...,q r O R vector or scalar: vector or scalar: N B vector or scalar: vector or scalar: S L vector or scalar: free vector or scalar: = T Table 6.1: Correspondence rules between primal and dual MOCPs. Hence, we can write the dual problem as follows: max b T λ s.t. A T λ + ν = c (6..1) ν q1,q,...,q r 0. Replacing λ by x and ν by z in (6..1) we get (D). In general, MOCPs can be written in a variety of forms different from the standard form (P, D). The situation in MOCPs is similar to that for linear programs; any MOCP problem can be written in the standard form. However, if we consider MOCPs in other forms, then it is more convenient to apply the duality rules directly. Table 6.1 is a summary of these rules. This table is a generalization of a similar table in [14, Section 4.]. For instance, using this table, the dual of (P) is the problem max b T λ s.t. A T λ q1,q,...,q r c, which is equivalent to problem (6..1) where A T λ q1,q,...,q r c means that c AT λ q1,q,...,q r

148 Using Lemma 6..1, we can prove the following weak duality property. Theorem (Weak duality) If x is any primal feasible solution of (P) and (y, z) is any feasible solution of (D), then the duality gap c T x b T y = x T z 0. Proof. Note that c T x b T y = (A T y + z) T x b T y = y T Ax + z T x y T b = y T (Ax b) + z T x = x T z. Since x Q p1,p,...,p r and z Q q1,q,...,q r = Q p1,p,...,p r, we conclude that x T z 0. We now give conditions for strong duality to hold. We say that problem (P) is strictly feasible if there exists a primal feasible point ˆx such that ˆx p1,p,...,p r 0. In the remaining part of this section, we assume that the m rows of the matrix A are linearly independent. Using the Karush-Kuhn-Tucker (KKT) conditions, we state and prove the following strong duality result. Theorem 6... (Strong duality I) Consider the primal dual pair (P, D). If (P) is strictly feasible and solvable with a solution x, then (D) is solvable and the optimal values of (P) and (D) are equal. Proof. By the assumptions of the theorem, x is an optimal solution of (P) where we can apply the KKT conditions. This implies that there are Lagrange multiplier vectors λ and ν such that (x, λ, ν) satisfies A x = b, x p1,p,...,p r 0, A T λ + ν = c, ν q1,q,...,q r 0, x T i ν i = 0, for i = 1,,..., r. This implies that (λ, ν) is feasible for the dual problem (D). Let (y, z) be any feasible solution of (D), then we have that b T y c T x = x T ν + b T λ = b T λ, where we used the weak duality to obtain the inequality and the complementary slackness to obtain the last equality. Thus, (λ, ν) is an optimal solution of (D) and c T x = b T λ as desired. 137

149 Note that the result in Theorem 6..1 is symmetric between (P) and (D). The following strong duality result can also be obtained by applying the duality relations [30, Theorem 4..1] to our problem formulation, Theorem (Strong duality II) Consider the primal dual pair (P, D). If both (P) and (D) have strictly feasible solutions, then they both have optimal solutions x and (y, z ), respectively, and p := c T x = d := b T y (i.e., x T z = 0 (complementary slackness)). From the above results, we get the following corollary. Corollary (Optimality conditions) Assume that both (P) and (D) are strictly feasible, then (x, y, z) R n+m+n is a pair of optimal solutions if and only if A x = b, x p1,p,...,p r 0, A T y + z = c, z q1,q,...,q r 0, x T i z i = 0, for i = 1,,..., r. 6.3 Multi-oder cone programming problems over integers In this section we introduce two important related problems that result when decision variables in an MOCP can only take integer values. Consider the MOCP problem (P). If we require an additional constraint that a subset of the variables have to attain 0-1 values, then we are interested in optimization problem of the form 138

150 min s.t. c T x Ax = b x p1,p,...,p r 0 x k {0, 1}, k Γ, where Γ {1,,..., n}, the decision variable x R n has some of its components x k (k Γ) with integer values and bounded by α k, β k R. This class of optimization problems may be termed as 0-1 multi-order cone programs (0-1MOCPs). A more general and interesting problem when in a MOCP some variables can only take integer values. If we are given the same data A, b, and c as in (P), then we are interested in the problem of the form min s.t. c T x Ax = b x p1,p,...,p 0 r x k [α k, β k ] Z, k Γ, (6.3.1) where Γ {1,,..., n}, the decision variable x R n has some of its components x k (k Γ) with integer values and bounded by α k, β k R. This class of optimization problems may be termed as mixed integer multi-order cone programs (MIMOCPs). 6.4 Multi-oder cone programming problems under uncertainty In this section we define two-stage stochastic multi-order cone programs (SMCOPs) with recourse to handle uncertainty in data defining (deterministic) MOCPs. Let r 1, r 1 be integers. For i = 1,,..., r 1 and j = 1,,..., r, let p 1i, p j [1, ] and m 1, m, n 1, n, n 1i, 139

151 n j be positive integers such that n 1 = r 1 i=1 n 1i and n = r i=1 n j. An SMOCP with recourse in primal standard form is defined based on deterministic data A R m 1 n 1, b R m 1 and c R n 1 and random data T R m n 1, W R m n, h R m and d R n whose realizations depend on an underlying outcome ω in an event space Ω with a known probability function P. Given this data, an SMOCP with recourse in primal standard form is min s.t. c T x + E [Q(x, ω)] Ax = b x p11,p 1,...,p 1r1 0, where x R n 1 is the first-stage decision variable and Q(x, ω) is the minimum value of the problem min s.t. d(ω) T y W (ω)y = h(ω) T (w)x y p1,p,...,p r 0, where y R n is the second-stage variable, and E[Q(x, ω)] := Q(x, ω)p (dω). Ω Two-stage stochastic p th (second)-order cone programs with recourse are a special case of SMOCPs with p 1i = p j = p 1 (p 1i = p j = ) for all i = 1,,..., r 1 and j = 1,,..., r. 6.5 An application Our application is four versions the FLP (see Subsection 5.1.1). For these four versions we present problem descriptions leading to a MOCP model, a 0-1MOCP model, an MIMOCP 140

152 model, and an SMOCP model. As we mentioned in Subsection 5.1.1, FLPs can be classified based on the distance measure used in the model between the facilities. If we use the Euclidean distance then these problems are called Euclidean facility location problems (EFLPs), if we use the rectilinear distance (also known as L 1 distance, city block distance, or Manhattan distance) then these problems are called rectilinear facility location problems (RFLPs). Furthermore, in some applications we use both the Euclidean and the rectilinear distances (based on the relationships between the pairs of facilities) as the distance measures used in the model between the facilities to get a mixed of EFLPs and RFLPs that we refer to as Euclidean-rectilinear facility location problems (ERFLPs). Another way of classification this problem is based on where we can place the new facilities in the solution space. When the new facilities can be placed any place in solution space, the problem is called a continuous facility location problem (CFLP), but usually the decision maker needs the new facilities to be placed at specific locations (called nodes) and not in any place in the solution space. In this case the problem is called a discrete facility location problem (DFLP). Each one of the next subsections is devoted to a version of ERFLPs. More specifically, we consider (deterministic) continuous Euclidean-rectilinear facility location problems (CERFLPs) which leads to an MOCP model, discrete Euclidean-rectilinear facility location problems (DERFLPs) which leads to a 0-1MOCP model, ERFLPs with integrality constraints which leads to an MIMOCP model, and stochastic continuous Euclideanrectilinear facility location problems (stochastic CERFLPs) which leads to an SMOCP model. 141

153 6.5.1 CERFLPs An MOCP model In single ERFLPs, we are interested in choosing a location to build a new facility among existing facilities so that this location minimizes the sum of weighted (either Euclidean or rectilinear) distances to all existing facilities. Assume that we are given r + s existing facilities represented by the fixed points a 1, a,..., a r, a r+1, a r+,..., a r+s in R n, and we plan to place a new facility represented by x so that we minimize the weighted sum of the Euclidean distances between x and each of the points a 1, a,..., a r and the weighted sum of the rectilinear distances between x and each of the points a r+1, a r+,..., a r+s. This leads us to the problem min r i=1 w i x a i + r+s i=r+1 w i x a i 1 or, alternatively, to the problem min r+s i=1 w i t i s.t. (t 1 ; x a 1 ;... ; t r ; x a r ) r 0 (t r+1 ; x a r+1 ;... ; t r+s ; x a r+s ) s 1 0, where w i is the weight associated with the ith existing facility and the new facility for i = 1,,..., r + s. In multiple ERFLPs we add m new facilities, namely x 1, x,..., x m R n, instead of adding only one. We have two cases depending whether or not there is an interaction among the new facilities in the underlying model. If there is no interaction between the new facilities, we are just concerned in minimizing the weighted sums of the distance between each one of the new facilities and each one of the fixed facilities. In other words, we solve the following MOCP model: 14

154 min s.t. m j=1 r+s i=1 w ij t ij (t 1j ; x j a 1 ;... ; t rj ; x j a r ) r 0, j = 1,,..., m (t (r+1)j ; x j a r+1 ;... ; t (r+s)j ; x j a r+s ) s 1 0, j = 1,,..., m, (6.5.1) where w ij is the weight associated with the ith existing facility and the jth new facility for j = 1,,..., m and i = 1,,..., r + s. If interaction exists among the new facilities, then, in addition to the above requirements, we need to minimize the sum of the (either Euclidean or rectilinear) distances between each pair of the new facilities. Let 1 l m and assume that we are required to minimize the weighted sum of the Euclidean distances between each pair of the new facilities x 1, x,..., x l and the weighted sum of the rectilinear distances between each pair of the new facilities x l+1, x l+,..., x m. In this case, we are interested in a model of the form: min s.t. m r+s j=1 i=1 w ij t ij + m j 1 j= j =1 ŵjj ˆt jj (t 1j ; x j a 1 ;... ; t rj ; x j a r ) r 0, j = 1,,..., m (t (r+1)j ; x j a r+1 ;... ; t (r+s)j ; x j a r+s ) s 1 0, j = 1,,..., m (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jl ; x j x l ) (l j) 0, j = 1,,..., l 1 (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jm ; x j x m ) (m j) 1 0, j = l + 1,,..., m 1, (6.5.) where ŵ jj is the weight associated with the new facilities j and j for j = 1,,..., j 1 and j =, 3,..., m DERFLPs A 0-1MOCP model We consider the discrete version of the problem by assuming that the new facilities x 1, x,..., x m need to be placed at specific locations and not in any place in - or 3-143

155 (or higher) dimensional space. Let the points v 1, v,..., v k R n represent these specific locations where k m. So, we add the constraint x i {v 1, v,..., v k } for i = 1,,..., m. Clearly, for i = 1,,..., m, the above constraint can be replaced by the following linear and binary constraints: x i = v 1 y i1 + v y i + + v k y ik, y i1 + y i + + y ik = 1, and y i = (y i1 ; y i ;... ; y ik ) {0, 1} k. We also assume that we cannot place more than one facility at each location. Consequently, we add the following constaints: (1; y 1l ; y l ;... ; y ml ) 1 0, for l = 1,,..., k. If there is no interaction between the new facilities, then the MOCP model (6.5.1) becomes the following 0-1MOCP model: min s.t. m j=1 r+s i=1 w ij t ij (t 1j ; x j a 1 ;... ; t rj ; x j a r ) r 0, j = 1,,..., m (t (r+1)j ; x j a r+1 ;... ; t (r+s)j ; x j a r+s ) s 1 0, j = 1,,..., m x i = v 1 y i1 + v y i + + v k y ik, i = 1,,..., m (1; y 1l ; y l ;... ; y ml ) 1 0, for l = 1,,..., k 1 T y i = 1, y i {0, 1} k, i = 1,,..., m. If interaction exists among the new facilities, then the MOCP model (6.5.) becomes the 144

156 following 0-1MOCP model: min s.t. m r+s j=1 i=1 w ij t ij + m j 1 j= j =1 ŵjj ˆt jj (t 1j ; x j a 1 ;... ; t rj ; x j a r ) r 0, j = 1,,..., m (t (r+1)j ; x j a r+1 ;... ; t (r+s)j ; x j a r+s ) s 1 0, j = 1,,..., m (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jl ; x j x l ) (l j) 0, j = 1,,..., l 1 (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jm ; x j x m ) (m j) 1 0, j = l + 1,,..., m 1 x i = v 1 y i1 + v y i + + v k y ik, i = 1,,..., m (1; y 1l ; y l ;... ; y ml ) 1 0, for l = 1,,..., k 1 T y i = 1, y i {0, 1} k, i = 1,,..., m. For l = 1,,..., k, let z l = 1 if the location v i is chosen, and 0 otherwise. Then, we can go further, and consider more assumptions: Let k 1, k, k 3, k 4 [1, k] be integers such that k 1 k and k 3 k 4. If we must choose at most k 1 of the locations v 1, v,..., v k, then we impose the constraints: (k 1 ; z 1 ; z ;... ; z k ) 1 0, and z {0, 1} k. If we must choose at most k 1 of the locations v 1, v,..., v k, or at most k 3 of the locations v 1, v,..., v k4, then we impose the constraints: (k 1 f; z 1 ; z ;... ; z k ) 1 0, (k 3 (1 f); z 1 ; z ;... ; z k4 ) 1 0, z {0, 1} k, and f {0, 1} ERFLPs with integrality constraints An MIMOCP model In some problems we may need the locations to have integer-valued coordinates. In most cities, streets are laid out on a grid, so that city is subdivided into small numbered blocks that are square or rectangular. In this case, usually the decision maker needs the new 145

157 facility to be placed at the corners of the city blocks. Thus, for each i {1,,..., m}, let us assume that the variable x i lies in the hyperrectangle Ξ n i {x i : ζ i x i η i, ζ i R n, η i R n } and has to be integer-valued, i.e. x i must be in the grid Ξ n i Z n. Thus, if there is no interaction between the new facilities, then instead of solving the MOCP model (6.5.1), we solve the following MIMOCP model: min s.t. m r+s j=1 i=1 w ij t ij + m j 1 j= j =1 ŵjj ˆt jj (t 1j ; x j a 1 ;... ; t rj ; x j a r ) r 0, j = 1,,..., m (t (r+1)j ; x j a r+1 ;... ; t (r+s)j ; x j a r+s ) s 1 0, j = 1,,..., m x k Ξ n k Z n, k. If interaction exists among the new facilities, then instead of solving the MOCP model (6.5.), we solve the following MIMOCP model: min s.t. m r+s j=1 i=1 w ij t ij + m j 1 j= j =1 ŵjj ˆt jj (t 1j ; x j a 1 ;... ; t rj ; x j a r ) r 0, j = 1,,..., m (t (r+1)j ; x j a r+1 ;... ; t (r+s)j ; x j a r+s ) s 1 0, j = 1,,..., m (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jl ; x j x l ) (l j) 0, j = 1,,..., l 1 (ˆt j(j+1) ; x j x j+1 ;... ; ˆt jm ; x j x m ) (m j) 1 0, j = l + 1,,..., m 1 x k Ξ n k Z n, k Stochastic CERFLPs An SMOCP model Before we describe the stochastic version of this generic application, we indicate a more concrete version of it. Assume that we have a new growing city with many suburbs and we want to build a hospital for treating the residents of this city. Some people live in the city at the present time. As the city expands, many houses in new suburbs need to be built and the locations of these suburbs will be known in the future in different parts of the city. Our goal is to find the best location of this hospital so that it can serve the 146

current suburbs and the new ones. This location must be determined at the current time and before information about the locations of the new suburbs become available.

158 current suburbs and the new ones. This location must be determined at the current time and before information about the locations of the new suburbs become available. For those houses that are close enough to the location of the hospital, we use the rectilinear distance as a distance measure between them and the hospital, while for the new suburbs that that will be located far way from the location of the hospital, we use the Euclidean distance as a distance measure between them and the hospital. See Figure 6.1. Figure 6.1: A more concrete version of the stochastic CERFLP: A new growing city with many expected building houses in different possible sides of the city. Generally speaking, let a 1, a,..., a r1, a r1 +1, a r1 +,..., a r1 +s 1 be fixed points in R n representing the coordinates of r 1 +s 1 existing fixed facilities and ã 1 (ω), ã (ω),..., ã r (ω), ã r +1(ω), ã r +(ω),..., ã r +s (ω) be random points in R n representing the coordinates of r + s random facilities whose realizations depends on an underlying outcome ω in an event space Ω with a known probability function P. Suppose that at present we do not know the realizations of r + s random facilities, and that at some point in time in future the realizations of these r + s random facilities 147

The Jordan Algebraic Structure of the Circular Cone

The Jordan Algebraic Structure of the Circular Cone Baha Alzalg Department of Mathematics, The University of Jordan, Amman 1194, Jordan Abstract In this paper, we study and analyze the algebraic structure