Decomposition Algorithms for Stochastic Programming on a Computational Grid

Size: px
Start display at page:

Download "Decomposition Algorithms for Stochastic Programming on a Computational Grid"

Transcription

1 Optimization Technical Report 02-07, September, 2002 Computer Sciences Department, University of Wisconsin-Madison Jeff Linderoth Stephen Wright Decomposition Algorithms for Stochastic Programming on a Computational Grid September 10, 2002 Abstract. We describe algorithms for two-stage stochastic linear programming with recourse and their implementation on a grid computing platform. In particular, we examine serial and asynchronous versions of the L-shaped method and a trust-region method. The parallel platform of choice is the dynamic, heterogeneous, opportunistic platform provided by the Condor system. The algorithms are of master-worker type (with the workers being used to solve second-stage problems), and the MW runtime support library (which supports masterworker computations) is key to the implementation. Computational results are presented on large sample average approximations of problems from the literature. 1. Introduction Consider the two-stage stochastic linear programming problem with fixed recourse, defined as follows: min c T x + N i=1 p iqi T y i, subject to (1a) Ax = b, x 0, (1b) W y i = h i T i x, y(ω i ) 0, i = 1, 2,..., N. (1c) The N scenarios are represented by ω 1, ω 2,..., ω N, with probabilities p i and data objects (q i, T i, h i ) for each i = 1, 2,..., N. The unknowns are first-stage variables x IR n1 and second-stage variables y i IR n2, i = 1, 2,..., N. This formulation is sometimes known as the deterministic equivalent because it lists the unknowns for all scenarios explicitly and poses the problem as a (structured) linear program. An alternative formulation is obtained by defining the ith second-stage problem as a linear program (LP) parametrized by the first-stage variables x, that is, Q i (x) def = min y i q T i y i subject to W y i = h i T i x, y i 0, (2) Jeff Linderoth: Industrial and Systems Engineering Department, Lehigh University, 200 West Packer Avenue, Bethlehem, PA 18015; jtl3@lehigh.edu Stephen Wright: Computer Sciences Department, University of Wisconsin, 1210 W. Dayton Street, Madison, WI 53706; swright@cs.wisc.edu Mathematics Subject Classification (1991): 90C15, 65K05, 68W10

2 2 Jeff Linderoth, Stephen Wright so that Q i ( ) is a piecewise linear convex function. The objective in (1a) is then and we can restate (1) as Q(x) def = c T x + N p i Q i (x), (3) i=1 min x Q(x), subject to Ax = b, x 0. (4) We can derive subgradient information for Q i (x) by considering dual solutions of (2). If π i is the Lagrange multiplier vector for the equality constraint in (2), it is easy to show that T T i π i Q i (x), (5) where Q i denotes the subgradient of Q i. By Rockafellar [19, Theorem 23.8], using polyhedrality of each Q i, we have from (3) that Q(x) = c + N p i Q i (x), (6) for each x that lies in the domain of every Q i, i = 1, 2,..., N. Let S denote the solution set for (4). Since (4) is a convex program, S is closed and convex. If S is nonempty, the projection operator P ( ) onto S is well defined. Subgradient information can be used by algorithms in different ways. Cuttingplane methods use this information to construct convex estimates of Q(x), and obtain each iterate by minimizing this estimate, as in the L-shaped methods described in Section 2. This approach can be stabilized by the use of a quadratic regularization term (Ruszczyński [20, 21], Kiwiel [15]) or by the explicit use of a trust region, as in the l trust-region approach described in Section 3. Alternatively, when an upper bound on the optimal value Q is available, one can derive each new iterate from an approximate analytic center of an approximate epigraph of Q. The latter approach has been explored by Bahn et al. [1] and applied to a large stochastic programming problem by Frangière, Gondzio, and Vial [9]. Parallel implementation of these approaches is obvious in principle. Because evaluation of Q i (x) and elements of its subdifferential can be carried out independently for each i = 1, 2,..., N, we can partition the scenarios i = 1, 2,..., N into clusters of scenarios and define a computational task to be the solution of all the second-stage LPs (2) in a number of clusters. Each such task could be assigned to an available worker processor. Bunching techniques (see Birge and Louveaux [5, Section 5.4]) can be used to exploit the similarity between different scenarios within a chunk. To avoid inefficiency, each task should contain enough scenarios that its computational requirements significantly exceeds the time required to send the data to the worker processor and to return the results. i=1

3 Stochastic Programming on a Computational Grid 3 In this paper, we describe implementations of decomposition algorithms for stochastic programming on a dynamic, heterogeneous computational grid made up of workstations, PCs, and supercomputer nodes. Specifically, we use the environment provided by the Condor system [16] running the MW runtime library (Goux et al. [13, 12]), a software layer that significantly simplifies the process of implementing parallel algorithms. For the dimensions of problems and the size of the computational grids considered in this paper, evaluation of the functions Q i (x) and their subgradients at a single x sometimes is insufficient to make effective use of the available processors. Moreover, synchronous algorithms those that depend for efficiency on all tasks completing in a timely fashion run the risk of poor performance in a parallel environment in which failure or suspension of worker processors while performing computation is not infrequent. We are led therefore to asynchronous approaches that consider different points x simultaneously. Asynchronous variants of the L-shaped and l -trust-region methods are described in Sections 2.2 and 4, respectively. Other parallel algorithms for stochastic programming have been described by Birge et al. [3,4], Birge and Qi [6], Ruszczyński [21], and Frangière, Gondzio, and Vial [9]. In [3], the focus is on multistage problems in which the scenario tree is decomposed into subtrees, which are processed independently and in parallel on worker processors. Dual solutions from each subtree are used to construct a model of the first-stage objective (using an L-shaped approach like that described in Section 2), which is periodically solved by a master process to obtain a new first-stage iterate. Birge and Qi [6] describe an interior-point method for two-stage problems, in which the linear algebra operations are implemented in parallel by exploiting the structure of the two-stage problem. However, this approach involves significant data movement and does not scale particularly well. In [9], the second-stage problems (2) are solved concurrently and inexactly by using an interior-point code. The master process maintains an upper bound on the optimal objective, and this information is used along with the approximate subgradients is used to construct an approximate truncated epigraph of the function. The analytic center of this epigraph is computed periodically to obtain a new iterate. The numerical results in [9] report solution of a two-stage stochastic linear program with 2.6 million variables and 1.2 million constraints in three hours on a cluster of 10 Linux PCs. The approach that is perhaps closest to the ones we describe in this paper is that of Ruszczyński [21]. When applied to two-stage problems (1), this algorithm consists of processes that solve each second-stage problem (2) at the latest available value of x to generate cuts; and a master process that solves a cutting-plane problem with the latest available cuts and a quadratic regularization term to generate new iterates x. The master process and second-stage processes can execute in parallel and share information asynchronously. This approach is essentially an asynchronous parallel version of the serial bundle-trustregion approaches described by Ruszczyński [20], Kiwiel [15], and Hiriart-Urruty and Lemaréchal [14, Chapter XV]. Algorithm ATR described in Section 4 likewise is an asynchronous parallel version of the bundle-trust-region method TR

4 4 Jeff Linderoth, Stephen Wright of Section 3, although the asynchronicity in the algorithm ATR described in Section 4 is more structured that that considered in [21]. In addition, l trust regions take the place of quadratic regularization terms in both Algorithms TR and ATR. We discuss the relationships between all these methods further in later sections. The remainder of this paper is structured as follows. Section 2 describes an L-shaped method and an asynchronous variant. Algorithm TR, a bundle-trustregion method with l trust regions is described and analyzed in Section 3, while its asynchronous cousin Algorithm ATR is described and analyzed in Section 4. Section 5 discusses computational grids and implementations of the algorithms on these platforms. Finally, computational results are given in Section L-shaped methods We describe briefly a well known variant of the L-shaped method for solving (4), together with an asynchronous variant The Multicut L-shaped method The L-shaped method of Van Slyke and Wets [25] for solving (4) proceeds by finding subgradients of partial sums of the terms that make up Q (3), together with linear inequalities that define the domain of Q. We sketch the approach here, and refer to Birge and Louveaux [5, Chapter 5] for a more complete description. Suppose that the second-stage scenarios 1, 2,..., N are partitioned into C clusters denoted by N 1, N 2,..., N C. Let Q [j] represent the partial sum from (3) corresponding to the cluster N j ; that is, Q [j] (x) = i N j p i Q i (x). (7) The algorithm maintains a model function m k [j] which is a piecewise linear lower bound on Q [j] for each j. We define this function at iteration k by m k [j] (x) = inf{θ j θ j e F[j] k x + f [j] k }, (8) where e = (1, 1,..., 1) T and F k [j] is a matrix whose rows are subgradients of Q [j] at previous iterates of the algorithm. The constraints in (8) are called optimality cuts. A subgradient g j of Q [j] is obtained from the dual solutions π i of (2) for each i N j as follows: g j = i N j p i T T i π i ; (9) see (5) and (6). An optimality cut is not added to the model m k [j] if the model function takes on the same value as Q [j] at iteration k. Cuts may also be deleted

5 Stochastic Programming on a Computational Grid 5 in the manner described below. The algorithm also maintains a collection of feasibility cuts of the form D k x d k, (10) which have the effect of excluding values of x for which some of the second-stage linear programs (2) are infeasible. By Farkas s theorem (see Mangasarian [17, p. 31]), if the constraints in (2) are infeasible, there exists π i with the following properties: W T π i 0, [h i T i x] T π i > 0. (In fact, such a π i can be obtained from the dual simplex method for the feasibility problem for (2).) To exclude this x from further consideration, we simply add the inequality [h i T i x] T π i 0 to the constraint set (10). The kth iterate x k of the multicut L-shaped method is obtained by solving the following approximation to (4): min x m k (x), subject to D k x d k, Ax = b, x 0, (11) where m k (x) def = c T x + C j=1 m k [j](x). (12) In practice, we substitute from (8) to obtain the following linear program: min c T x + x,θ 1,...,θ C C θ j, subject to (13a) j=1 θ j e F[j] k x + f [j] k, j = 1, 2,..., C, (13b) D k x d k, Ax = b, x 0. We make the following assumption for the remainder of the paper. Assumption 1. (13c) (13d) (i) The problem has complete recourse; that is, the feasible set of (2) is nonempty for all i = 1, 2,..., N and all x, so that the domain of Q(x) in (3) is IR n. (ii) The solution set S is nonempty. Under this assumption, feasibility cuts (10), (13c) are not present. Our algorithms and their analysis can be generalized to handle situations in which Assumption 1 does not hold, but for the sake of simplifying our analysis, we avoid discussing this more general case here. Under Assumption 1, we can specify the L-shaped algorithm formally as follows:

6 6 Jeff Linderoth, Stephen Wright Algorithm LS choose tolerance ɛ tol ; choose starting point x 0 ; define initial model m 0 to be a piecewise linear underestimate of Q(x) such that m 0 (x 0 ) = Q(x 0 ) and m 0 is bounded below; Q min Q(x 0 ); for k = 0, 1, 2,... obtain x k+1 by solving (11); if Q min m k (x k+1 ) ɛ tol (1 + Q min ) STOP; evaluate function and subgradient information at x k+1 ; Q min min(q min, Q(x k+1 )); obtain m k+1 by adding optimality cuts to m k ; end(for) Asynchronous parallel variant of the L-shaped method The L-shaped approach lends itself naturally to implementation in a masterworker framework. The problem (13) is solved by the master process, while solution of each cluster N j of second-stage problems, and generation of the associated cuts, can be carried out by the worker processes running in parallel. This approach can be adapted for an asynchronous, unreliable environment in which the results from some second-stage clusters are not returned in a timely fashion. Rather than having all the worker processors sit idle while waiting for the tardy results, we can proceed without them, re-solving the master by using the additional cuts that were generated by the other second-stage clusters. We denote the model function simply by m for the asynchronous algorithm, rather than appending a subscript. Whenever the time comes to generate a new iterate, the current model is used. In practice, we would expect the algorithm to give different results each time it is executed, because of the unpredictable speed and order in which the functions are evaluated and subgradients generated. Because of Assumption 1, we can write the subproblem min x m(x), subject to Ax = b, x 0. (14) Algorithm ALS, the asynchronous variant of the L-shaped method that we describe here, is made up of four key operations, three of which execute on the master processor and one of which runs on the workers. These operations are as follows: partial evaluate. Worker routine for evaluating Q [j] (x) defined by (7) for a given x and one or more of the clusters j = 1, 2,..., C, in the process generating a subgradient g j of each Q [j] (x). This task runs on a worker processor and returns its results to the master by activating the routine act on completed task on the master processor. evaluate. Master routine that places tasks of the type partial evaluate for a given x into the task pool for distribution to the worker processors as

7 Stochastic Programming on a Computational Grid 7 they become available. The completion of all these tasks leads to evaluation of Q(x). initialize. Master routine that performs initial bookkeeping, culminating in a call to evaluate for the initial point x 0. act on completed task. Master routine, activated whenever the results become available from a partial evaluate task. It updates the model and increments a counter to keep track of the number of tasks that have been evaluated at each candidate point. When appropriate, it solves the master problem with the latest model to obtain a new candidate iterate and then calls evaluate. In our implementation of both this algorithm and its more sophisticated cousin Algorithm ATR of Section 4, a single task consists of the evaluation of one or more clusters N j. We may bundle, say, 2 or 4 clusters into a single computational task, to make the task large enough to justify the master s effort in packing its data and unpacking its results, and to maintain the ratio of compute time to communication cost at a high level. We use T to denote the number of computational tasks, and T r, r = 1, 2,..., T to denote a partitioning of {1, 2,..., C}, so that task r consists of evaluation of the clusters j T r. The implementation depends on a synchronicity parameter σ which is the proportion of tasks that must be evaluated at a point to trigger the generation of a new candidate iterate. Typical values of σ are in the range 0.25 to 0.9. A logical variable speceval k keeps track of whether x k has yet triggered a new candidate. Initially, speceval k is set to false, then set to true when the proportion of evaluated clusters passes the threshold σ. We now specify all the methods making up Algorithm ALS. ALS: partial evaluate(x q, q, r) Given x q, index q, and task number r, evaluate Q [j] (x q ) from (7) for all j T r together with partial subgradients g j from (9); Activate act on completed task(x q, q, r) on the master processor. ALS: evaluate(x q, q) for r = 1, 2,..., T (possibly concurrently) partial evaluate(x q, q, r); end (for) ALS: initialize determine number of clusters C and number of tasks T, and the partitions N 1, N 2,..., N C and T 1, T 2..., T T ; choose tolerance ɛ tol ; choose starting point x 0 ; choose threshold σ (0, 1]; Q min ; k 0, speceval 0 false, t 0 0; evaluate(x 0, 0).

8 8 Jeff Linderoth, Stephen Wright ALS: act on completed task(x q, q, r) t q t q + 1; for each j T r add Q [j] (x q ) and cut g j to the model m; if t q = T Q min min(q min, Q(x q )); else if t q σt and not speceval q speceval q true; k k + 1; solve current model problem (14) to obtain x k+1 ; if Q min m(x k+1 ) ɛ tol (1 + Q min ) STOP; evaluate(x k, k); end (if) We present results for Algorithm ALS in Section 6. While the algorithm is able to use a large number of worker processors on our opportunistic platform, it suffers from the usual drawbacks of the L-shaped method, namely, that cuts, once generated, must be retained for the remainder of the computation to ensure convergence and that large steps are typically taken on early iterations before a sufficiently good model approximation to Q(x) is created, making it impossible to exploit prior knowledge about the location of the solution. 3. A Bundle-trust-region method Trust-region approaches can be implemented by making only minor modifications to implementations of the L-shaped method, and they possesses several practical advantages along with stronger convergence properties. The trustregion methods we describe here are related to the regularized decomposition method of Ruszczyński [20] and the bundle-trust-region approaches of Kiwiel [15] and Hiriart-Urruty and Lemaréchal [14, Chapter XV]. The main differences are that we use box-shaped trust regions yielding linear programming subproblems (rather than quadratic programs) and that our methods manipulate the size of the trust region directly rather than indirectly via a regularization parameter. We discuss these differences further in Section 3.3. When requesting a subgradient of Q at some point x, our algorithms do not require particular (e.g., extreme) elements of the subdifferential to be supplied. Nor do they require the subdifferential Q(x) to be representable as a convex combination of a finite number of vectors. In this respect, our algorithms contrast with that of Ruszczyński [20], for instance, which exploits the piecewise-linear nature of the objectives Q i in (2). Because of our weaker conditions on the subgradient information, we cannot prove a finite termination result of the type presented in [20, Section 3]. However, these conditions potentially allow our algorithms to be extended to a more general class of convex nondifferentiable functions.

9 Stochastic Programming on a Computational Grid A Method based on l trust regions A key difference between the trust-region approach of this section and the L- shaped method of the preceding section is that we impose an l norm bound on the size of the step. It is implemented by simply adding bound constraints to the linear programming subproblem (13) as follows: e x x k e, (15) where e = (1, 1,..., 1) T, is the trust-region radius, and x k is the current iterate. During the kth iteration, it may be necessary to solve several problems with trust regions of the form (15), with different model functions m and possibly different values of, before a satisfactory new iterate x k+1 is identified. We refer to x k and x k+1 as major iterates and the points x k,l, l = 0, 1, 2,... obtained by minimizing the current model function subject to the constraints and trustregion bounds of the form (15) as minor iterates. Another key difference between the trust-region approach and the L-shaped approach is that a minor iterate x k,l is accepted as the new major iterate x k+1 only if it yields a substantial reduction in the objective function Q over the previous iterate x k, in a sense to be defined below. A further important difference is that one can delete optimality cuts from the model functions, between minor and major iterations, without compromising the convergence properties of the algorithm. To specify the method, we need to augment the notation established in the previous section. We define m k,l (x) to be the model function after l minor iterations have been performed at iteration k, and k,l > 0 to be the trustregion radius at the same stage. Under Assumption 1, there are no feasibility cuts, so that the problem to be solved to obtain the minor iteration x k,l is as follows: min x m k,l (x) subject to Ax = b, x 0, x x k k,l (16) (cf. (11)). By expanding this problem in a similar fashion to (13), we obtain min c T x + x,θ 1,...,θ C C θ j, subject to (17a) j=1 θ j e F k,l [j] x + f k,l [j], j = 1, 2,..., C, (17b) Ax = b, x 0, (17c) k,l e x x k k,l e. (17d) We assume the initial model m k,0 at major iteration k to satisfy the following two properties: m k,0 (x k ) = Q(x k ), m k,0 is a convex, piecewise linear underestimate of Q. (18a) (18b) Denoting the solution of the subproblem (17) by x k,l, we accept this point as the new iterate x k+1 if the decrease in the actual objective Q (see (4)) is at

10 10 Jeff Linderoth, Stephen Wright least some fraction of the decrease predicted by the model function m k,l. That is, for some constant ξ (0, 1/2), the acceptance test is Q(x k,l ) Q(x k ) ξ ( Q(x k ) m k,l (x k,l ) ). (19) (A typical value for ξ is 10 4.) If the test (19) fails to hold, we obtain a new model function m k,l+1 by adding and possibly deleting cuts from m k,l (x). This process aims to refine the model function, so that it eventually generates a new major iteration, while economizing on storage by allowing deletion of subgradients that no longer seem helpful. Addition and deletion of cuts are implemented by adding and deleting rows from F k,l [j] and f k,l [j], to obtain F k,l+1 [j] and f k,l+1 [j], for j = 1, 2,..., C. Given some parameter η (ξ, 1), we obtain m k,l+1 from m k,l by means of the following procedure: Procedure Model-Update (k, l) for each optimality cut possible delete true; if the cut was generated at x k possible delete false; else if the cut is active at the solution of (17) with positive Lagrange multiplier possible delete false; else if the cut was generated at an earlier minor iteration l = 0, 1,..., l 1 such that [ Q(x k ) m k,l (x k,l ) > η Q(x k ) m k, l(x k, l) ] (20) possible delete false; end (if) if possible delete possibly delete the cut; end (for each) add optimality cuts obtained from each of the component functions Q [j] at x k,l. In our implementation, we delete the cut if possible delete is true at the final conditional statement and, in addition, the cut has not been active during the last 100 solutions of (17). More details are given in Section 6.2. Because we retain all cuts generated at x k during the course of major iteration k, the following extension of (18a) holds: m k,l (x k ) = Q(x k ), l = 0, 1, 2,.... (21) Since we add only subgradient information, the following generalization of (18b) also holds uniformly: m k,l is a convex, piecewise linear underestimate of Q, for l = 0, 1, 2,.... (22)

11 Stochastic Programming on a Computational Grid 11 We may also decrease the trust-region radius k,l between minor iterations (that is, choose k,l+1 < k,l ) when the test (19) fails to hold. We do so if the match between model and objective appears to be particularly poor, adapting the procedure of Kiwiel [15, p. 109] for increasing the coefficient of the quadratic penalty term in his regularized bundle method. If Q(x k,l ) exceeds Q(x k ) by more than an estimate of the quantity max Q(x k ) Q(x), (23) x x k 1 we conclude that the upside variation of the function Q deviates too much from its downside variation, and we reduce the trust-region radius k,l+1 so as to bring these quantities more nearly into line. Our estimate of (23) is simply 1 [ Q(x k ) m k,l (x k,l ) ], min(1, k,l ) that is, an extrapolation of the model reduction on the current trust region to a trust region of radius 1. Our complete strategy for reducing is therefore as follows. (The counter is initialized to zero at the start of each major iteration.) Procedure Reduce- evaluate ρ = min(1, k,l ) Q(xk,l ) Q(x k ) Q(x k ) m k,l (x k,l ) ; (24) if ρ > 0 counter counter+1; if ρ > 3 or (counter 3 and ρ (1, 3]) set reset counter 0; k,l+1 = 1 min(ρ, 4) k,l; If the test (19) is passed, so that we have x k+1 = x k,l, we have a great deal of flexibility in defining the new model function m k+1,0. We require only that the properties (18) are satisfied, with k + 1 replacing k. Hence, we are free to delete much of the optimality cut information accumulated at iteration k (and previous iterates). In practice, of course, it is wise to delete only those cuts that have been inactive for a substantial number of iterations; otherwise we run the risk that many new function and subgradient evaluations will be required to restore useful model information that was deleted prematurely. If the step to the new major iteration x k+1 shows a particularly close match between the true function Q and the model function m k,l at the last minor iteration of iteration k, we consider increasing the trust-region radius. Specifically, if Q(x k,l ) Q(x k ) 0.5 ( Q(x k ) m k,l (x k,l ) ), x k x k,l = k,l, (25)

12 12 Jeff Linderoth, Stephen Wright then we set k+1,0 = min( hi, 2 k,l ), (26) where hi is a prespecified upper bound on the radius. Before specifying the algorithm formally, we define the convergence test. Given a parameter ɛ tol > 0, we terminate if Q(x k ) m k,l (x k,l ) ɛ tol (1 + Q(x k ) ). (27) Algorithm TR choose ξ (0, 1/2), cut deletion parameter η (ξ, 1), maximum trust region hi, tolerance ɛ tol ; choose starting point x 0 ; define initial model m 0,0 with the properties (18) (for k = 0); choose 0,0 [1, hi ]; for k = 0, 1, 2,... finishedminoriteration false; l 0; counter 0; repeat solve (16) to obtain x k,l ; if (27) is satisfied STOP with approximate solution x k ; evaluate function and subgradient at x k,l ; if (19) is satisfied set x k+1 = x k,l ; obtain m k+1,0 by possibly deleting cuts from m k,l, but retaining the properties (18) (with k + 1 replacing k); choose k+1,0 [ k,l, hi ] according to (25), (26); finishedminoriteration true; else obtain m k,l+1 from m k,l via Procedure Model-Update (k, l); obtain k,l+1 via Procedure Reduce- ; l l + 1; until finishedminoriteration end (for) 3.2. Analysis of the trust-region method We now describe the convergence properties of Algorithm TR. We show that for ɛ tol = 0, the algorithm either terminates at a solution or generates a sequence of major iterates x k that approaches the solution set S (Theorem 2). Given some starting point x 0 satisfying the constraints Ax 0 = b, x 0 0, and setting Q 0 = Q(x 0 ), we define the following quantities that are useful in describing and analyzing the algorithm: L(Q 0 ) = {x Ax = b, x 0, Q(x) Q 0 }, (28) L(Q 0 ; ) = {x Ax = b, x 0, x y, for some y L(Q 0 )}, (29) β = sup{ g 1 g Q(x), for some x L(Q 0 ; hi )}. (30)

13 Stochastic Programming on a Computational Grid 13 Using Assumption 1, we can easily show that β <. Note that Q(x) Q g T (x P (x)), for all x L(Q 0 ; hi ), all g Q(x), so that Q(x) Q g 1 x P (x) β x P (x). (31) We start by showing that the optimal objective value for (16) cannot decrease from one minor iteration to the next. Lemma 1. Suppose that x k,l does not satisfy the acceptance test (19). Then we have m k,l (x k,l ) m k,l+1 (x k,l+1 ). Proof. In obtaining m k,l+1 from m k,l in Model-Update, we do not allow deletion of cuts that were active at the solution x k,l of (17). Using and to denote the active rows in F k,l [j] and f k,l [j], we have that xk,l is also the solution of the following linear program (in which the inactive cuts are not present): min c T x + x,θ 1,...,θ C F k,l [j] f k,l [j] C θ j, subject to (32a) j=1 θ j e Ax = b, x 0, F k,l [j] k,l e x x k k,l e. k,l x + f [j], j = 1, 2,..., C, (32b) (32c) (32d) The subproblem to be solved for x k,l+1 differs from (32) in two ways. First, additional rows may be added to and, consisting of function values F k,l [j] and subgradients obtained at x k,l and also inactive cuts carried over from the previous (17). Second, the trust-region radius k,l+1 may be smaller than k,l. Hence, the feasible region of the problem to be solved for x k,l+1 is a subset of the feasible region for (32), so the optimal objective value cannot be smaller. Next we have a result about the amount of reduction in the model function m k,l. Lemma 2. For all k = 0, 1, 2,... and l = 0, 1, 2,..., we have that m k,l (x k ) m k,l (x k,l ) = Q(x k ) m k,l (x k,l ) ( ) k,l min x k P (x k, 1 [Q(x k ) Q ]. (33) ) Proof. The first equality follows immediately from (21). To prove (33), consider the following subproblem in the scalar τ: min m ( k,l x k + τ[p (x k ) x k ] ) subject to τ[p (x k ) x k ] τ [0,1] k,l. (34) f k,l [j]

14 14 Jeff Linderoth, Stephen Wright Denoting the solution of this problem by τ k,l, we have by comparison with (16) that m k,l (x k,l ) m k,l ( x k + τ k,l [P (x k ) x k ] ). (35) If τ = 1 is feasible in (34), we have from (35) and (22) that m k,l (x k,l ) m k,l ( x k + τ k,l [P (x k ) x k ] ) m k,l ( x k + [P (x k ) x k ] ) = m k,l (P (x k )) Q(P (x k )) = Q. Hence, we have from (21) that m k,l (x k ) m k,l (x k,l ) Q(x k ) Q, so that (33) holds in this case. When τ = 1 is infeasible for (34), consider setting τ = k,l / x k P (x k ) (which is certainly feasible for (34)). We have from (35), the definition of τ k,l, (22), and convexity of Q that ( m k,l (x k,l ) m k,l x k P (x k ) x k ) + k,l P (x k ) x k Q (x k P (x k ) x k ) + k,l P (x k ) x k Q(x k k,l ) + P (x k ) x k (Q Q(x k )). Therefore, using (21), we have m k,l (x k ) m k,l (x k,l ) verifying (33) in this case as well. k,l P (x k ) x k [Q(x k ) Q ], Our next result finds a lower bound on the trust-region radii k,l. For purposes of this result we define a quantity E k to measure the closest approach to the solution set for all iterates up to and including x k, that is, def E k = min x k P (x k). (36) k=0,1,...,k Note that E k decreases monotonically with k. We also define F k as follows F k def = min k=0,1,...,k, x k / S Q(x k) Q x k P (x k), (37) with the convention that F k = 0 if x k S for any k k. By monotonicity of {Q(x k )}, we have F k > 0 whenever x k / S. Note also from (31) and the fact that x k L(Q 0 ; hi ) that F k β, k = 0, 1, 2,... (38)

15 Stochastic Programming on a Computational Grid 15 Lemma 3. For all trust regions k,l used in the course of Algorithm TR, we have k,l (1/4) min(e k, F k /β), (39) for β defined in (30). Proof. Suppose for contradiction that there are indices k and l such that k,l < (1/4) min (E k, F k /β). Since the trust region can be reduced by at most a factor of 4 by Procedure Reduce-, there must be an earlier trust region radius k, l (with k k) such that k, l < min (E k, F k /β), (40) and ρ > 1 in (24), that is, Q(x k, l) Q(x k) > 1 min(1, k, l) = 1 k, l ( ) Q(x k) m k, l(x k, l) ( Q(x k) m k, l(x k, l) ), (41) where we used (38) in (40) to deduce that k, l < 1. By applying Lemma 2, and using (40) again, we have ( ) k, l Q(x k) m k, l(x k, l) min, 1 [Q(x k) Q x k P (x k) ] = k, l x k P (x k) [Q(x k) Q ] (42) where the last equality follows from x k P (x k) E k E k > k, l. By combining (42) with (41), we have that Q(x k, l) Q(x k) Q(x k) Q F k F k. (43) x k P (x k) By using standard properties of subgradients, we have Q(x k, l) Q(x k) g T l (x k, l x k) g l 1 x k x k, l g l 1 k, l, for all g l Q(x k, l). (44) By combining this expression with (43), and using (40) again, we obtain that g l 1 1 k, l [Q(x k, l) Q(x k)] 1 k, l F k > β. However, since x k, l L(Q 0 ; hi ), we have from (30) that g l 1 β, giving a contradiction. Finite termination of the inner iterations is proved in the following two results. Recall that the parameters ξ and η are defined in (19) and (20), respectively.

16 16 Jeff Linderoth, Stephen Wright Lemma 4. Let ɛ tol = 0 in Algorithm TR, and let ξ and η be the constants from (19) and (20), respectively. Let l 1 be any index such that x k,l1 fails to satisfy the test (19). Then either the sequence of inner iterations eventually yields a point x k,l2 satisfying the acceptance test (19), or there is an index l 2 > l 1 such that Q(x k ) m k,l2 (x k,l2 ) η [ Q(x k ) m k,l1 (x k,l1 ) ]. (45) Proof. Suppose for contradiction that the none of the minor iterations following l 1 satisfies either (19) or the criterion (45); that is, Q(x k ) m k,q (x k,q ) > η [ Q(x k ) m k,l1 (x k,l1 ) ], for all q > l 1. (46) It follows from this bound, together with Lemma 1 and Procedure Model- Update, that none of the cuts generated at minor iterations q l 1 is deleted. We assume in the remainder of the proof that q and l are generic minor iteration indices that satisfy q > l l 1. Because the function and subgradients from minor iterations x k,l, l = l 1, l 1 + 1,... are retained throughout the major iteration k, we have By definition of the subgradient, we have m k,q (x k,l ) = Q(x k,l ). (47) m k,q (x) m k,q (x k,l ) g T (x x k,l ), for all g m k,q (x k,l ). (48) Therefore, from (22) and (47), it follows that so that Q(x) Q(x k,l ) g T (x x k,l ), for all g m k,q (x k,l ), m k,q (x k,l ) Q(x k,l ). (49) Since Q(x k ) < Q(x 0 ) = Q 0, we have from (28) that x k L(Q 0 ). Therefore, from the definition (29) and the fact that x k,l x k k,l hi, we have that x k,l L(Q 0 ; hi ). It follows from (30) and (49) that g 1 β, for all g m k,q (x k,l ). (50) Since x k,l is rejected by the test (19), we have from (47) and Lemma 1 that the following inequalities hold: m k,q (x k,l ) = Q(x k,l ) Q(x k ) ξ [ Q(x k ) m k,l (x k,l ) ] By rearranging this expression, we obtain Q(x k ) ξ [ Q(x k ) m k,l1 (x k,l1 ) ]. Q(x k ) m k,q (x k,l ) ξ [ Q(x k ) m k,l1 (x k,l1 ) ]. (51)

17 Stochastic Programming on a Computational Grid 17 Recalling that η (ξ, 1), we consider the following neighborhood of x k,l : x x k,l η ξ β Using this bound together with (48) and (50), we obtain m k,q (x k,l ) m k,q (x) g T (x k,l x) [ Q(x k ) m k,l1 (x k,l1 ) ] def = ζ > 0. (52) β x k,l x (η ξ) [ Q(x k ) m k,l1 (x k,l1 ) ]. By combining this bound with (51), we find that the following bound is satisfied for all x in the neighborhood (52): Q(x k ) m k,q (x) = [ Q(x k ) m k,q (x k,l ) ] + [ m k,q (x k,l ) m k,q (x) ] η [ Q(x k ) m k,l1 (x k,l1 ) ]. It follows from this bound, in conjunction with (46), that x k,q (the solution of the trust-region problem with model function m k,q ) cannot lie in the neighborhood (52). Therefore, we have x k,q x k,l > ζ. (53) But since x k,l x k k,l hi for all l l 1, it is impossible for an infinite sequence {x k,l } l l1 to satisfy (53). We conclude that (45) must hold for some l 2 l 1, as claimed. We now show that the minor iteration sequence terminates at a point x k,l satisfying the acceptance test, provided that x k is not a solution. Theorem 1. Suppose that ɛ tol = 0. (i) If x k / S, there is an l 0 such that x k,l satisfies (19). (ii) If x k S, then either Algorithm TR terminates (and verifies that x k S), or Q(x k ) m k,l (x k,l ) 0. Proof. Suppose for the moment that the inner iteration sequence is infinite, that is, the test (19) always fails. By applying Lemma 4 recursively, we can identify a sequence of indices 0 < l 1 < l 2 <... such that Q(x k ) m k,lj (x k,lj ) η [ Q(x k ) m k,lj 1 (x k,lj 1 ) ] When x k / S, we have from Lemma 3 that η 2 [ Q(x k ) m k,lj 2 (x k,lj 2 ) ]. η j [ Q(x k ) m k,0 (x k,0 ) ]. (54) k,l (1/4) min(e k, F k /β) def = lo > 0, for all l = 0, 1, 2,..., so the right-hand side of (33) is uniformly positive (independently of l). However, (54) indicates that we can make Q(x k ) m k,lj (x k,lj ) arbitrarily small by choosing j sufficiently large, contradicting (33).

18 18 Jeff Linderoth, Stephen Wright For the case of x k S, there are two possibilities. If the inner iteration sequence terminates finitely at some x k,l, we must have Q(x k ) m k,l (x k,l ) = 0. Hence, from (22), we have Q(x) m k,l (x) Q(x k ) = Q, for all feasible x with x x k k,l. Therefore, termination under these circumstances yields a guarantee that x k S. When the algorithm does not terminate, it follows from (54) that Q(x k ) m k,l (x k,l ) 0. By applying Lemma 1, we verify that the convergence is monotonic. We now prove convergence of Algorithm TR to S. Theorem 2. Suppose that ɛ tol = 0. The sequence of major iterations {x k } is either finite, terminating at some x k S, or is infinite, with the property that x k P (x k ) 0. Proof. If the claim does not hold, there are two possibilities. The first is that the sequence of major iterations terminates finitely at some x k / S. However, Theorem 1 ensures that the minor iteration sequence will terminate at some new major iteration x k+1 under these circumstances, so we can rule out this possibility. The second possibility is that the sequence {x k } is infinite but that there is some ɛ > 0 and an infinite subsequence of indices {k j } j=1,2,... such that x kj P (x kj ) ɛ, j = 0, 1, 2,.... Since the sequence {Q(x kj )} j=1,2,... is infinite, decreasing, and bounded below, it converges to some value Q > Q. Moreover, since the entire sequence {Q(x k )} is monotone decreasing, it follows that Q(x k ) Q > Q Q > 0, k = 0, 1, 2,.... Hence, by boundedness of the subgradients (see (30)), and using the definitions (36) and (37), we can identify a constant ɛ > 0 such that E k ɛ and F k ɛ for all k. Therefore, by Lemma 2, we have Q(x k ) m k,l (x k,l ) min( k,l / ɛ, 1)[ Q Q ], k = 0, 1, 2,.... (55) For each major iteration index k, let l(k) be the minor iteration index that passes the acceptance test (19). By combining (19) with (55), we have that Q(x k ) Q(x k+1 ) ξ min( k,l(k) / ɛ, 1)[ Q Q ]. Since Q(x k ) Q(x k+1 ) 0, we deduce that lim k k,l(k) = 0. However, since E k and F k are bounded away from 0, we have from Lemma 3 that k,l is bounded away from 0, giving a contradiction. We conclude that the second possibility (an infinite sequence {x k } not converging to S) cannot occur either, so the proof is complete.

19 Stochastic Programming on a Computational Grid 19 It is easy to show that the algorithm terminates finitely when ɛ tol > 0. The argument in the proof of Theorem 1 shows that either the test (27) is satisfied at some minor iteration, or the algorithm identifies a new major iteration. Since the amount of reduction at each major iteration is at least ξɛ tol (from (19)), and since we assume that a solution set exists, the number of major iterations must be finite Discussion If a 2-norm trust region is used in place of the -norm trust region of (16), it is well known that the solution of the subproblem min x is identical to the solution of m k,l (x) subject to Ax = b, x 0, x x k 2 k min x m k,l (x) + λ x x k 2 subject to Ax = b, x 0, (56) for some λ 0. We can transform (56) to a quadratic program in the same fashion as the transformation of (16) to (17). The regularized or proximal bundle approaches described in Kiwiel [15], Hiriart-Urruty and Lemaréchal [14, Chapter XV], and Ruszczyński [20, 21] work with the formulation (56). They manipulate the parameter λ directly rather than adjusting the trust-region radius, more in the spirit of the Levenberg-Marquardt method for least-squares problems than of a true trust-region method. We chose to devise and analyze an algorithm based on the -norm trust region for two reasons. First, the linear-programming trust-region subproblems (17) can be solved by high-quality linear programming software, making the algorithm much easier to implement than the specialized quadratic programming solver required for (56). Although it is well known that the 2-norm trust region often yields a better search direction than the -norm trust region when the objective is smooth, it is not clear if the same property holds for the the function Q, which is piecewise linear with a great many pieces. Our second reason was that the convergence analysis of the -norm algorithm differs markedly from that of the regularized methods presented in [15, 14, 20, 21], making this project interesting from the theoretical point of view as well as computationally. Finally, we note that aggregation of cuts, which is a feature of the regularized methods mentioned above and which is useful in limiting storage requirements, can also be performed to some extent in Algorithm TR. In Procedure Model- Update, we still need to retain the cuts generated at x k, and at the earlier minor iterations l satisfying (20). However, the cuts active at the solution of the subproblem (17) can be aggregated into C cuts, one for each index j = 1, 2,..., C. To describe the aggregation, we use the alternative form (32) of the subproblem (17), from which the inactive cuts have been removed. Denoting the Lagrange multiplier vectors for the constraints (32b) by λ j, j = 1, 2,..., C, we have from the optimality conditions for (32b) that λ j 0 and e T λ j = 1,

20 20 Jeff Linderoth, Stephen Wright j = 1, 2,..., C. Moreover, if we replace the constraints (32b) by the C aggregated constraints θ j λ T k,l j F [j] x + λt k,l j f [j], j = 1, 2,..., C, (57) then the solution of (32) and its optimal objective value are unchanged. Hence, in Procedure Model-Update, we can delete the else if clause concerning the constraints active (17), and insert the addition of the cuts (57) to the end of the procedure. 4. An Asynchronous bundle-trust-region method In this section we present an asynchronous, parallel version of the trust-region algorithm of the preceding section and analyze its convergence properties Algorithm ATR We now define a variant of the method of Section 3 that allows the partial sums Q [j], j = 1, 2,..., C (7) and their associated cuts to be evaluated simultaneously for different values of x. We generate candidate iterates by solving trust-region subproblems centered on an incumbent iterate, which (after a startup phase) is the point x I that, roughly speaking, is the best among those visited by the algorithm whose function value Q(x) is fully known. By performing evaluations of Q at different points concurrently, we relax the strict synchronicity requirements of Algorithm TR, which requires Q(x k ) to be evaluated fully before the next candidate x k+1 is generated. The resulting approach, which we call Algorithm ATR (for asynchronous TR ), is more suitable for implementation on computational grids of the type we consider here. Besides the obvious increase in parallelism that goes with evaluating several points at once, there is no longer a risk of the entire computation being held up by the slow evaluation of one of the partial sums Q [j] on a recalcitrant worker. Algorithm ATR has similar theoretical properties to Algorithm TR, since the mechanisms for accepting a point as the new incumbent, adjusting the size of the trust region, and adding and deleting cuts are all similar to the corresponding mechanisms in Algorithm TR. Algorithm ATR maintains a basket B of at most K points for which the value of Q and associated subgradient information is partially known. When the evaluation of Q(x q ) is completed for a particular point x q in the basket, it is installed as the new incumbent if (i) its objective value is smaller than that of the current incumbent x I ; and (ii) it passes a trust-region acceptance test like (19), with the incumbent at the time x q was generated playing the role of the previous major iteration in Algorithm TR. Whether x q becomes the incumbent or not, it is removed from the basket. When a vacancy arises in the basket, we may generate a new point by solving a trust-region subproblem similar to (16), centering the trust region at the

21 Stochastic Programming on a Computational Grid 21 current incumbent x I. During the startup phase, while the basket is being populated, we wait until the evaluation of some other point in the basket has reached a certain level of completion (that is, until a proportion σ (0, 1] of the partial sums (7) and their subgradients have been evaluated) before generating a new point. We use a logical variable speceval q to indicate when the evaluation of x q passes the specified threshold and to ensure that x q does not trigger the evaluation of more than one new iterate. (Both σ and speceval q play a similar role in Algorithm ALS.) After the startup phase is complete (that is, after the basket has been filled), vacancies arise only after evaluation of an iterate x q is completed. We use m( ) (without subscripts) to denote the model function for Q( ). When generating a new iterate, we use whatever cuts are stored at the time to define m. When solved around the incumbent x I with trust-region radius, the subproblem is as follows: trsub(x I, ): min x m(x) subject to Ax = b, x 0, x x I. (58) We refer to x I as the parent incumbent of the solution of (58). In the following description, we use k to index the successive points x k that are explored by the algorithm, I to denote the index of the incumbent, and B to denote the basket. As in the description of ALS, we use T 1, T 2,..., T T to denote a partition of {1, 2,..., C} such that the rth computational task consists of the clusters j T r (that is, evaluation of the partial sums Q [j], j T r and their subgradients). We use t k to count the number of tasks for the evaluation of Q(x k ) that have been completed so far. Given a starting guess x 0, we initialize the algorithm by setting the dummy point x 1 to x 0, setting the incumbent index I to 1, and setting the initial incumbent value Q I = Q 1 to. The iterate at which the first evaluation is completed becomes the first serious incumbent. We now outline some other notation used in specifying Algorithm ATR: Q I : The objective value of the incumbent x I, except in the case of I = 1, in which case Q 1 =. I q : The index of the parent incumbent of x q, that is, the incumbent index I at the time that x q was generated from (58). Hence, Q Iq = Q(x Iq ) (except when I q = 1; see previous item). q : The value of the trust-region radius used when solving for x q. curr : Current value of the trust-region radius. When it comes time to solve (58) to obtain a new iterate x q, we set q curr. m q : The optimal value of the objective function m in the subproblem trsub(x Iq, q ) (58). Our strategy for maintaining the model closely follows that of Algorithm TR. Whenever the incumbent changes, we have a fairly free hand in deleting the cuts that define m, just as we do after accepting a new major iterate in Algorithm TR. If the incumbent does not change for a long sequence of iterations (corresponding to a long sequence of minor iterations in Algorithm TR), we can still delete

22 22 Jeff Linderoth, Stephen Wright stale cuts that represent information in m that has likely been superseded (as quantified by a parameter η [0, 1)). The following version of Procedure Model- Update, which applies to Algorithm ATR, takes as an argument the index k of the latest iterate generated by the algorithm. It is called after the evaluation of Q at an earlier iterate x q has just been completed, but x q does not meet the conditions needed to become the new incumbent. Procedure Model-Update (k) for each optimality cut defining m possible delete true; if the cut was generated at the parent incumbent I k of k possible delete false; else if the cut was active at the solution x k of trsub(x I k, k ) possible delete false; else if the cut was generated at an earlier iteration l such that I l = I k 1 and possible delete false; end (if) if possible delete possibly delete the cut; end (for each) Q I k m k > η[q I k m l] (59) Our strategy for adjusting the trust region curr also follows that of Algorithm TR. The differences arise from the fact that between the time an iterate x q is generated and its function value Q(x q ) becomes known, other adjustments of current may have occurred, as the evaluation of intervening iterates is completed. The version of Procedure Reduce- for Algorithm ATR is as follows. Procedure Reduce- (q) if I q = 1 return; evaluate ρ = min(1, q ) Q(xq ) Q Iq Q Iq m q ; (60) if ρ > 0 counter counter+1; if ρ > 3 or (counter 3 and ρ (1, 3]) set + q q / min(ρ, 4); set curr min( curr, + q ); reset counter 0; return. The protocol for increasing the trust region after a successful step is based on (25), (26). If on completion of evaluation of Q(x q ), the iterate x q becomes the new incumbent, then we test the following condition: Q(x q ) Q Iq 0.5(Q Iq m q ) and x q x Iq = q. (61)

23 Stochastic Programming on a Computational Grid 23 If this condition is satisfied, we set curr max( curr, min( hi, 2 q )). (62) The convergence test is also similar to the test (27) used for Algorithm TR. We terminate if, on generation of a new iterate x k, we find that Q I m k ɛ tol (1 + Q I ). (63) We now specify the four key routines of the Algorithm ATR, which serve a similar function to the four main routines of Algorithm ALS. The routine partial evaluate defines a single task that executes on worker processors, while the other three routines execute on the master processor. ATR: partial evaluate(x q, q, r) Given x q, index q, and task index r, evaluate Q [j] (x q ) from (7) for each j T r, together with partial subgradients g j from (9); Activate act on completed task(x q, q, r) on the master processor. ATR: evaluate(x q, q) for r = 1, 2,..., T (possibly concurrently) partial evaluate(x q, q, r); end (for) ATR: initialization(x 0 ) determine number of clusters C and number of tasks T, and the partitions N 1, N 2,..., N C and T 1, T 2..., T T ; choose ξ (0, 1/2), trust region upper bound hi > 0; choose synchronicity parameter σ (0, 1]; choose maximum basket size K > 0; choose curr (0, hi ], counter 0; B ; I 1; x 1 x 0 ; Q 1 ; I 0 1; k 0; speceval 0 false; t 0 0; evaluate(x 0, 0). ATR: act on completed task(x q, q, r) t q t q + 1; for each j T r add Q [j] (x q ) and cut g j to the model m; basketfill false; basketupdate false; if t q = T (* evaluation of Q(x q ) is complete *) if Q(x q ) < Q I and (I q = 1 or Q(x q ) Q Iq ξ(q Iq m q )) (* make x q the new incumbent *) I q; Q I Q(x I ); possibly increase curr according to (61) and (62);

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse

An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse Yongjia Song, James Luedtke Virginia Commonwealth University, Richmond, VA, ysong3@vcu.edu University

More information

Solution Methods for Stochastic Programs

Solution Methods for Stochastic Programs Solution Methods for Stochastic Programs Huseyin Topaloglu School of Operations Research and Information Engineering Cornell University ht88@cornell.edu August 14, 2010 1 Outline Cutting plane methods

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Stochastic Integer Programming

Stochastic Integer Programming IE 495 Lecture 20 Stochastic Integer Programming Prof. Jeff Linderoth April 14, 2003 April 14, 2002 Stochastic Programming Lecture 20 Slide 1 Outline Stochastic Integer Programming Integer LShaped Method

More information

MODIFYING SQP FOR DEGENERATE PROBLEMS

MODIFYING SQP FOR DEGENERATE PROBLEMS PREPRINT ANL/MCS-P699-1097, OCTOBER, 1997, (REVISED JUNE, 2000; MARCH, 2002), MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY MODIFYING SQP FOR DEGENERATE PROBLEMS STEPHEN J. WRIGHT

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

5.6 Penalty method and augmented Lagrangian method

5.6 Penalty method and augmented Lagrangian method 5.6 Penalty method and augmented Lagrangian method Consider a generic NLP problem min f (x) s.t. c i (x) 0 i I c i (x) = 0 i E (1) x R n where f and the c i s are of class C 1 or C 2, and I and E are the

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:

More information

The L-Shaped Method. Operations Research. Anthony Papavasiliou 1 / 44

The L-Shaped Method. Operations Research. Anthony Papavasiliou 1 / 44 1 / 44 The L-Shaped Method Operations Research Anthony Papavasiliou Contents 2 / 44 1 The L-Shaped Method [ 5.1 of BL] 2 Optimality Cuts [ 5.1a of BL] 3 Feasibility Cuts [ 5.1b of BL] 4 Proof of Convergence

More information

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

A strongly polynomial algorithm for linear systems having a binary solution

A strongly polynomial algorithm for linear systems having a binary solution A strongly polynomial algorithm for linear systems having a binary solution Sergei Chubanov Institute of Information Systems at the University of Siegen, Germany e-mail: sergei.chubanov@uni-siegen.de 7th

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell

On fast trust region methods for quadratic models with linear constraints. M.J.D. Powell DAMTP 2014/NA02 On fast trust region methods for quadratic models with linear constraints M.J.D. Powell Abstract: Quadratic models Q k (x), x R n, of the objective function F (x), x R n, are used by many

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. . Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,

More information

Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem

Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem UDOM JANJARASSUK, JEFF LINDEROTH Department of Industrial and Systems Engineering, Lehigh University, 200 W. Packer Ave. Bethlehem,

More information

AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS

AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS MARTIN SCHMIDT Abstract. Many real-world optimization models comse nonconvex and nonlinear as well

More information

A Trust Region Method for the Solution of the Surrogate Dual in Integer Programming

A Trust Region Method for the Solution of the Surrogate Dual in Integer Programming A Trust Region Method for the Solution of the Surrogate Dual in Integer Programming N. L. Boland, A. C. Eberhard and A. Tsoukalas Faculty of Science and Information Technology, School of Mathematical and

More information

A Benders Algorithm for Two-Stage Stochastic Optimization Problems With Mixed Integer Recourse

A Benders Algorithm for Two-Stage Stochastic Optimization Problems With Mixed Integer Recourse A Benders Algorithm for Two-Stage Stochastic Optimization Problems With Mixed Integer Recourse Ted Ralphs 1 Joint work with Menal Güzelsoy 2 and Anahita Hassanzadeh 1 1 COR@L Lab, Department of Industrial

More information

Key words. constrained optimization, composite optimization, Mangasarian-Fromovitz constraint qualification, active set, identification.

Key words. constrained optimization, composite optimization, Mangasarian-Fromovitz constraint qualification, active set, identification. IDENTIFYING ACTIVITY A. S. LEWIS AND S. J. WRIGHT Abstract. Identification of active constraints in constrained optimization is of interest from both practical and theoretical viewpoints, as it holds the

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Computational Stochastic Programming

Computational Stochastic Programming Computational Stochastic Programming Jeff Linderoth Dept. of ISyE Dept. of CS Univ. of Wisconsin-Madison linderoth@wisc.edu SPXIII Bergamo, Italy July 7, 2013 Jeff Linderoth (UW-Madison) Computational

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

arxiv: v1 [math.oc] 20 Aug 2014

arxiv: v1 [math.oc] 20 Aug 2014 Adaptive Augmented Lagrangian Methods: Algorithms and Practical Numerical Experience Detailed Version Frank E. Curtis, Nicholas I. M. Gould, Hao Jiang, and Daniel P. Robinson arxiv:1408.4500v1 [math.oc]

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

1. Introduction. We consider the general smooth constrained optimization problem:

1. Introduction. We consider the general smooth constrained optimization problem: OPTIMIZATION TECHNICAL REPORT 02-05, AUGUST 2002, COMPUTER SCIENCES DEPT, UNIV. OF WISCONSIN TEXAS-WISCONSIN MODELING AND CONTROL CONSORTIUM REPORT TWMCC-2002-01 REVISED SEPTEMBER 2003. A FEASIBLE TRUST-REGION

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

CONVERGENCE ANALYSIS OF SAMPLING-BASED DECOMPOSITION METHODS FOR RISK-AVERSE MULTISTAGE STOCHASTIC CONVEX PROGRAMS

CONVERGENCE ANALYSIS OF SAMPLING-BASED DECOMPOSITION METHODS FOR RISK-AVERSE MULTISTAGE STOCHASTIC CONVEX PROGRAMS CONVERGENCE ANALYSIS OF SAMPLING-BASED DECOMPOSITION METHODS FOR RISK-AVERSE MULTISTAGE STOCHASTIC CONVEX PROGRAMS VINCENT GUIGUES Abstract. We consider a class of sampling-based decomposition methods

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

Adaptive augmented Lagrangian methods: algorithms and practical numerical experience

Adaptive augmented Lagrangian methods: algorithms and practical numerical experience Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 Adaptive augmented Lagrangian methods: algorithms and practical numerical

More information

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Linear Programming: Simplex

Linear Programming: Simplex Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016

More information

Research Article A New Global Optimization Algorithm for Solving Generalized Geometric Programming

Research Article A New Global Optimization Algorithm for Solving Generalized Geometric Programming Mathematical Problems in Engineering Volume 2010, Article ID 346965, 12 pages doi:10.1155/2010/346965 Research Article A New Global Optimization Algorithm for Solving Generalized Geometric Programming

More information

Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem

Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem Network Interdiction Stochastic Network Interdiction and to Solve a Stochastic Network Interdiction Problem Udom Janjarassuk Jeff Linderoth ISE Department COR@L Lab Lehigh University jtl3@lehigh.edu informs

More information

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions Angelia Nedić and Asuman Ozdaglar April 16, 2006 Abstract In this paper, we study a unifying framework

More information

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS

A DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624

More information

Outline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation

Outline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Lagrangian Relaxation Lecture 12 Single Machine Models, Column Generation 2. Dantzig-Wolfe Decomposition Dantzig-Wolfe Decomposition Delayed Column

More information

Computer Sciences Department

Computer Sciences Department Computer Sciences Department Solving Large Steiner Triple Covering Problems Jim Ostrowski Jeff Linderoth Fabrizio Rossi Stefano Smriglio Technical Report #1663 September 2009 Solving Large Steiner Triple

More information

Implications of the Constant Rank Constraint Qualification

Implications of the Constant Rank Constraint Qualification Mathematical Programming manuscript No. (will be inserted by the editor) Implications of the Constant Rank Constraint Qualification Shu Lu Received: date / Accepted: date Abstract This paper investigates

More information

DELFT UNIVERSITY OF TECHNOLOGY

DELFT UNIVERSITY OF TECHNOLOGY DELFT UNIVERSITY OF TECHNOLOGY REPORT -09 Computational and Sensitivity Aspects of Eigenvalue-Based Methods for the Large-Scale Trust-Region Subproblem Marielba Rojas, Bjørn H. Fotland, and Trond Steihaug

More information

Preprint ANL/MCS-P , Dec 2002 (Revised Nov 2003, Mar 2004) Mathematics and Computer Science Division Argonne National Laboratory

Preprint ANL/MCS-P , Dec 2002 (Revised Nov 2003, Mar 2004) Mathematics and Computer Science Division Argonne National Laboratory Preprint ANL/MCS-P1015-1202, Dec 2002 (Revised Nov 2003, Mar 2004) Mathematics and Computer Science Division Argonne National Laboratory A GLOBALLY CONVERGENT LINEARLY CONSTRAINED LAGRANGIAN METHOD FOR

More information

Scenario decomposition of risk-averse two stage stochastic programming problems

Scenario decomposition of risk-averse two stage stochastic programming problems R u t c o r Research R e p o r t Scenario decomposition of risk-averse two stage stochastic programming problems Ricardo A Collado a Dávid Papp b Andrzej Ruszczyński c RRR 2-2012, January 2012 RUTCOR Rutgers

More information

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION Optimization Technical Report 02-09, October 2002, UW-Madison Computer Sciences Department. E. Michael Gertz 1 Philip E. Gill 2 A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION 7 October

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding Tim Roughgarden October 29, 2014 1 Preamble This lecture covers our final subtopic within the exact and approximate recovery part of the course.

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A

More information

Generation and Representation of Piecewise Polyhedral Value Functions

Generation and Representation of Piecewise Polyhedral Value Functions Generation and Representation of Piecewise Polyhedral Value Functions Ted Ralphs 1 Joint work with Menal Güzelsoy 2 and Anahita Hassanzadeh 1 1 COR@L Lab, Department of Industrial and Systems Engineering,

More information

An Enhanced Spatial Branch-and-Bound Method in Global Optimization with Nonconvex Constraints

An Enhanced Spatial Branch-and-Bound Method in Global Optimization with Nonconvex Constraints An Enhanced Spatial Branch-and-Bound Method in Global Optimization with Nonconvex Constraints Oliver Stein Peter Kirst # Paul Steuermann March 22, 2013 Abstract We discuss some difficulties in determining

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Regularized optimization techniques for multistage stochastic programming

Regularized optimization techniques for multistage stochastic programming Regularized optimization techniques for multistage stochastic programming Felipe Beltrán 1, Welington de Oliveira 2, Guilherme Fredo 1, Erlon Finardi 1 1 UFSC/LabPlan Universidade Federal de Santa Catarina

More information

Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs

Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs Siqian Shen Dept. of Industrial and Operations Engineering University of Michigan Joint work with Yan Deng (UMich, Google)

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

Lecture 3. 1 Terminology. 2 Non-Deterministic Space Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, 2005.

Lecture 3. 1 Terminology. 2 Non-Deterministic Space Complexity. Notes on Complexity Theory: Fall 2005 Last updated: September, 2005. Notes on Complexity Theory: Fall 2005 Last updated: September, 2005 Jonathan Katz Lecture 3 1 Terminology For any complexity class C, we define the class coc as follows: coc def = { L L C }. One class

More information

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

A Parametric Simplex Algorithm for Linear Vector Optimization Problems A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Disconnecting Networks via Node Deletions

Disconnecting Networks via Node Deletions 1 / 27 Disconnecting Networks via Node Deletions Exact Interdiction Models and Algorithms Siqian Shen 1 J. Cole Smith 2 R. Goli 2 1 IOE, University of Michigan 2 ISE, University of Florida 2012 INFORMS

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Analytic Center Cutting-Plane Method

Analytic Center Cutting-Plane Method Analytic Center Cutting-Plane Method S. Boyd, L. Vandenberghe, and J. Skaf April 14, 2011 Contents 1 Analytic center cutting-plane method 2 2 Computing the analytic center 3 3 Pruning constraints 5 4 Lower

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

The Simplex Method: An Example

The Simplex Method: An Example The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified

More information

Optimization Methods in Logic

Optimization Methods in Logic Optimization Methods in Logic John Hooker Carnegie Mellon University February 2003, Revised December 2008 1 Numerical Semantics for Logic Optimization can make at least two contributions to boolean logic.

More information

On a class of minimax stochastic programs

On a class of minimax stochastic programs On a class of minimax stochastic programs Alexander Shapiro and Shabbir Ahmed School of Industrial & Systems Engineering Georgia Institute of Technology 765 Ferst Drive, Atlanta, GA 30332. August 29, 2003

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

Problem List MATH 5143 Fall, 2013

Problem List MATH 5143 Fall, 2013 Problem List MATH 5143 Fall, 2013 On any problem you may use the result of any previous problem (even if you were not able to do it) and any information given in class up to the moment the problem was

More information

Can Li a, Ignacio E. Grossmann a,

Can Li a, Ignacio E. Grossmann a, A generalized Benders decomposition-based branch and cut algorithm for two-stage stochastic programs with nonconvex constraints and mixed-binary first and second stage variables Can Li a, Ignacio E. Grossmann

More information

A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS IN POLYNOMIAL OPTIMIZATION

A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS IN POLYNOMIAL OPTIMIZATION A PARALLEL INTERIOR POINT DECOMPOSITION ALGORITHM FOR BLOCK-ANGULAR SEMIDEFINITE PROGRAMS IN POLYNOMIAL OPTIMIZATION Kartik K. Sivaramakrishnan Department of Mathematics North Carolina State University

More information

Solving large Semidefinite Programs - Part 1 and 2

Solving large Semidefinite Programs - Part 1 and 2 Solving large Semidefinite Programs - Part 1 and 2 Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Singapore workshop 2006 p.1/34 Overview Limits of Interior

More information

On the Relative Strength of Split, Triangle and Quadrilateral Cuts

On the Relative Strength of Split, Triangle and Quadrilateral Cuts On the Relative Strength of Split, Triangle and Quadrilateral Cuts Amitabh Basu Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA 53 abasu@andrew.cmu.edu Pierre Bonami LIF, Faculté

More information

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Viveck R. Cadambe EE Department, Pennsylvania State University, University Park, PA, USA viveck@engr.psu.edu Nancy Lynch

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees

A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees Noname manuscript No. (will be inserted by the editor) A Quasi-Newton Algorithm for Nonconvex, Nonsmooth Optimization with Global Convergence Guarantees Frank E. Curtis Xiaocun Que May 26, 2014 Abstract

More information

Date: July 5, Contents

Date: July 5, Contents 2 Lagrange Multipliers Date: July 5, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 14 2.3. Informative Lagrange Multipliers...........

More information

Benders Decomposition Methods for Structured Optimization, including Stochastic Optimization

Benders Decomposition Methods for Structured Optimization, including Stochastic Optimization Benders Decomposition Methods for Structured Optimization, including Stochastic Optimization Robert M. Freund April 29, 2004 c 2004 Massachusetts Institute of echnology. 1 1 Block Ladder Structure We consider

More information

Constraint Identification and Algorithm Stabilization for Degenerate Nonlinear Programs

Constraint Identification and Algorithm Stabilization for Degenerate Nonlinear Programs Preprint ANL/MCS-P865-1200, Dec. 2000 (Revised Nov. 2001) Mathematics and Computer Science Division Argonne National Laboratory Stephen J. Wright Constraint Identification and Algorithm Stabilization for

More information

INEXACT CUTS IN BENDERS' DECOMPOSITION GOLBON ZAKERI, ANDREW B. PHILPOTT AND DAVID M. RYAN y Abstract. Benders' decomposition is a well-known techniqu

INEXACT CUTS IN BENDERS' DECOMPOSITION GOLBON ZAKERI, ANDREW B. PHILPOTT AND DAVID M. RYAN y Abstract. Benders' decomposition is a well-known techniqu INEXACT CUTS IN BENDERS' DECOMPOSITION GOLBON ZAKERI, ANDREW B. PHILPOTT AND DAVID M. RYAN y Abstract. Benders' decomposition is a well-known technique for solving large linear programs with a special

More information

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems

On the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. xx, No. x, Xxxxxxx 00x, pp. xxx xxx ISSN 0364-765X EISSN 156-5471 0x xx0x 0xxx informs DOI 10.187/moor.xxxx.xxxx c 00x INFORMS On the Power of Robust Solutions in

More information

Stochastic Decomposition

Stochastic Decomposition IE 495 Lecture 18 Stochastic Decomposition Prof. Jeff Linderoth March 26, 2003 March 19, 2003 Stochastic Programming Lecture 17 Slide 1 Outline Review Monte Carlo Methods Interior Sampling Methods Stochastic

More information

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017

CMSC 451: Lecture 7 Greedy Algorithms for Scheduling Tuesday, Sep 19, 2017 CMSC CMSC : Lecture Greedy Algorithms for Scheduling Tuesday, Sep 9, 0 Reading: Sects.. and. of KT. (Not covered in DPV.) Interval Scheduling: We continue our discussion of greedy algorithms with a number

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Decomposition Algorithms for Two-Stage Distributionally Robust Mixed Binary Programs

Decomposition Algorithms for Two-Stage Distributionally Robust Mixed Binary Programs Decomposition Algorithms for Two-Stage Distributionally Robust Mixed Binary Programs Manish Bansal Grado Department of Industrial and Systems Engineering, Virginia Tech Email: bansal@vt.edu Kuo-Ling Huang

More information

arxiv: v1 [cs.cc] 5 Dec 2018

arxiv: v1 [cs.cc] 5 Dec 2018 Consistency for 0 1 Programming Danial Davarnia 1 and J. N. Hooker 2 1 Iowa state University davarnia@iastate.edu 2 Carnegie Mellon University jh38@andrew.cmu.edu arxiv:1812.02215v1 [cs.cc] 5 Dec 2018

More information

Recoverable Robustness in Scheduling Problems

Recoverable Robustness in Scheduling Problems Master Thesis Computing Science Recoverable Robustness in Scheduling Problems Author: J.M.J. Stoef (3470997) J.M.J.Stoef@uu.nl Supervisors: dr. J.A. Hoogeveen J.A.Hoogeveen@uu.nl dr. ir. J.M. van den Akker

More information