A Filter Active-Set Algorithm for Ball/Sphere Constrained Optimization Problem

Size: px
Start display at page:

Download "A Filter Active-Set Algorithm for Ball/Sphere Constrained Optimization Problem"

Transcription

1 A Filter Active-Set Algorithm for Ball/Sphere Constrained Optimization Problem Chungen Shen Lei-Hong Zhang Wei Hong Yang September 6, 014 Abstract In this paper, we propose a filter active-set algorithm for the minimization problem over a product of multiple ball/sphere constraints. By making effective use of the special structure of the ball/sphere constraints, a new limited memory BFGS (L-BFGS) scheme is presented. The new L-BFGS implementation takes advantage of the sparse structure of the Jacobian of the constraints, and generates curvature information of the minimization problem. At each iteration, only two or three reduced linear systems are required to solve for the search direction. Filter technique combining with the backtracking line search strategy ensures the global convergence, and the local superlinear convergence can also be established under mild conditions. The algorithm is applied to two specific applications, the nearest correlation matrix with factor structure and the maximal correlation problem. Our numerical experiments indicate that the proposed algorithm is competitive to some recently custom-designed methods for each individual application. Keywords. SQP, Active Set, Filter, L-BFGS, Ball/sphere constraints, the nearest correlation matrix with factor structure, the maximal correlation problem AMS subject classification. 65K05, 90C30 1 Introduction In this paper, we consider a class of optimization problems of minimizing a (at least) twice continuously differentiable function (probably nonconvex) f(x) : R n R over a product of multiple balls/spheres constraints. Upon rescaling the balls/spheres, we cast without loss of generality such class of minimization problems in the following form: (BCOP) min x R n f(x) s.t. c i (x) := x [i] 1 = 0, i E, c i (x) := x [i] 1 0, i I, where E = 1,,..., m 1 }, I = m 1 + 1, m 1 +,..., m}, x [i] R pi, x = (x T [1], xt [],..., xt [m] )T, n = m i=1 p i. Here, we introduce the notation x [i] R pi to represent the ith sub-vector of x R n, and formulate the product of multiple ball/sphere constraints as a set of equality and inequality constraints. To simplify subsequent presentation, we name the above programming the ball/sphere constrained optimization problem (BCOP). This research is supported by National Natural Science Foundation of China (Nos , , and ). Department of Applied Mathematics, Shanghai Finance University, Shanghai 0109, China. School of Mathematics, Shanghai University of Finance and Economics, Shanghai 00433, China. Department of Mathematics, Fudan University, Shanghai 00433, China 1

2 The reason that we are interested in BCOP is twofold: on the one hand, many practical applications that arise recently from, for example, correlation matrix approximation with factor structure [3, ], factor models of asset returns [9], collateralized debt obligations [, 10], multivariate time series [5] and maximal correlation problem [7, 43, 44] can be recast in such form; on the other hand, general algorithms for nonlinearly constrained optimization may not be efficient as they generally do not take much advantage of the special structure of BCOP. Therefore, custom-made algorithm for BCOP can provide a uniform and much more efficient tool for these applications. Relying upon the framework of the sequential quadratic programming (SQP) method, e.g., [4, 16, 17, 18, 4, 7, 35, 36], and making heavy use of the special structure of BCOP, we will refine the SQP method to propose a custom-made implementation. It is known that SQP is one of the most widely used methods for the general nonlinearly constrained optimization. In particular, it generates steps by solving quadratic subproblems (QPs). Traditional SQP method (see e.g., [16]) takes certain penalty function as the merit function to determine if a trial step is accepted or not. One known problem in this procedure is that a suitable penalty parameter is difficult to set. To get around that trouble, Fletcher and Leyffer [13] introduced the filter technique to globalize the SQP method, which turns out to be very efficient and effective, and is proved to be globally convergent [1, 14]. The filter technique is later applied to various problems and combined into other methods; examples include Ulbrich et al. [37], Karas et al. [1], Ribeiro et al. [3], Wächter and Biegler [38, 39, 40], etc. Unfortunately, when it is directly applied to solve BCOP, the classical SQP method based on QP subproblems encounters numerical difficulties if m and p i get large. For instance, in the problem of the nearest correlation matrix with p = p i (i = 1,,..., m) factors structure [3, ] to be discussed in Section 5 (see (5.68)), solving the corresponding QP subproblem is both time-consuming and memory demanding as m and p increase. It is nearly intractable with dimensions, say m = 500, p = 50. As indicated in [3], both Newton method and classical SQP method fail to solve BCOP when m and p are large. The spectral projected gradient method (SPGM) is thus proposed in [3] to alleviate such heavy computational burden which uses less memory and computational costs at each iteration. The numerical results [3] show that SPGM is efficient for many medium-scale tested instances, but the number of iterations probably varies drastically from instance to instance, and can perform worse in case when p is close to m than in other situations. Fortunately, the standard SQP method can be improved largely for BCOP by exploiting the special structure contained in the constraints. One of remarkable features is that the Jacobian matrix c(x) is sparse and structured, which can be utilized to reduce computational amounts and memory requirements at each iteration. To do so, we employ the active set technique [4, 41] to estimate the active set of inequalities associated with the minimizer and then, similar to QP-free methods [6, 15, 9, 30, 34, 41, 4], transform the QP subproblem into relevant linear system(s). As m and p get large, the size of the resulting linear system can naturally be large too, but the limited memory BFGS (L-BFGS) [3] plus duality technique [36] can be effectively employed, which dramatically reduces the computational costs and memory requirements for the associated linear systems. By counting the detailed computational complexity for this procedure, we will see that there is a large amount of flops saved at each iteration. On the other hand, the local fast convergence can be preserved due to the SQP framework and the L-BFGS technique, and the global convergence is also guaranteed with the aid of filter technique. We apply this implementation to two specific practical applications: the correlation approximation problem [3, ] and maximal correlation problem [7] in Section 5; our numerical experiments demonstrate that the proposed method is robust and efficient, and is competitive to some recently custom-designed methods for each individual application, including SPGM, the block relaxation method [3] and the majorization method [3] for the correlation approximation problem, and the Riemannian trust-region method [44] for the maximal correlation problem. The rest of this paper is organized as follows. In the first part of Section, we first reformulate the QP subproblem into a relevant linear system by duality, and then introduce the L-BFGS technique to alleviate

3 the computational burden in solving these linear systems; the detailed implementation by exploiting the sparsity of the Jacobian matrix c(x) is stated; then we discuss the filter technique to globalize the SQP method; the overall algorithm is presented in the last part of Section. In Sections 3 and 4, we establish the global convergence and the local convergence rate of the proposed algorithm, respectively. The numerical experiments on the two specific applications are carried out in Section 5, where we report our numerical experiences by comparing the performance of our algorithm with others. Concluding remarks are finally drawn in Section 6. There are a few words for notation. We denote the feasible region of BCOP by Ω := x c i (x) = 0, i E; c i (x) 0, i I}. For the constrained functions c i (x) for i = 1,,..., m, we let c(x) = (c 1 (x),..., c m (x)) T : R n R m and c(x) = ( c 1 (x),..., c m (x)) R n m ; for a particular index subset J = i 1, i,..., i j } of 1,,..., m}, we denote by J c the cardinality of J and denote c J (x) = (c i1 (x),..., c ij (x)) T : R n R j and c J (x) = ( c i1 (x),..., c ij (x)) R n j ; thus the definitions of c E (x) and c I (x) follow naturally. Finally, suppose η k } and ν k } are two vanishing sequences, where η k, ν k R, k N; we denote η k = O(ν k ) if there exists a scalar c > 0 such that η k c ν k for all k sufficiently large, η k = o(ν k ) if lim k + η k ν k = 0, and η k = Θ(ν k ) if both ν k = O(η k ) and η k = O(ν k ) hold. Algorithm.1 The working set We begin with the first-order optimality conditions (or the KKT conditions), which can be written as where x L(x, λ) = f(x) + c(x)λ = 0, (.1) λ i c i (x) = 0, i I, (.) c i (x) 0, λ i 0, i I, (.3) c i (x) = 0, i E, (.4) L(x, λ) := f(x) + c(x) T λ is the Lagrange function and λ R m is the Lagrange multiplier. As our method is based on the active set approach, we next state the strategy to identify the active set. To this end, similar to [11, 19, 8], we first introduce the following function φ : R n+m R, where Ψ : R n+m R n+m is defined by Ψ(x, λ) = φ(x, λ) = Ψ(x, λ), x L(x, λ) c E (x) min c I (x), λ I }. 3

4 Thus the set A I (x, λ) = i I c i (x) minφ(x, λ), 10 6 } } (.5) provides an estimation of the active set I(x ) = i c i (x ) = 0, i I} of inequality constraints, where (x, λ ) is the KKT point at the minimizer of BCOP. It is true that when (x, λ) is sufficiently close to (x, λ ), the estimate A I (x, λ) is accurate, provided both of the Mangasarian-Fromovitz constraint qualification (MFCQ) and the second-order sufficient condition (SOSC) hold at (x, λ ) (see [8, Theorem.]). Now, suppose the current iteration (x k, λ k ) is an approximation to (x, λ ), then we define A k := A I (x k, λ k ) E (.6) as our working set, which includes all equality constraints, nearly active indices of inequality constraints and the indices of the violated inequality constraints. This choice of the working set is similar to [15, 41, 4] and is based on the following observations: it is reasonable to include i I whenever c i (x k ) is close to zero (say c i (x k ) 10 6 ); as for equality constraints and those violated inequality constraints (say c i (x k ) > 10 6 ), we include them in the working set in the hope of reducing the violation. After identifying the working set A k, a QP subproblem can be formulated which, by the QP-free technique [6, 15, 9, 30, 34, 41, 4], can alternatively be solved by solving a relevant linear system (details on the linear systems are discussed in the next subsection). The solution of the resulting linear system yields the search direction and generates curvature information of BCOP at (x k, λ k ). One issue related with the linear system is the consistency, which is equivalent to the linear independence of the gradients of constraints corresponding to the working set A k. Due to the structure of BCOP, we prove in Lemma.1 that c Ak (x k ) is of full column rank as long as x k is confined to the set Ω p := x x k [i] 0.5 for all i E}. Based on this fact, we can say that our choice of working set A k does not invoke any complicated procedure as those in [34, 41, 4], where the working sets I k should be determined via calculating the rank of c Ik (x k ) or the determinant of c Ik (x k ) T c Ik (x k ) for each trial estimate I k until c Ik (x k ) is of full column rank. Lemma.1. If x k Ω p, then the vectors c i (x k ), i A k are linear independent, where A k is defined in (.5)-(.6). Proof Since x k Ω p, it follows that x k [i] 0.5 for all i E and therefore x k [i] 0 for all i E. For i A k E, c i (x k ) = x k [i] and therefore x k [i] 0. Suppose that there exist scalars l i R, i = 1,..., m such that m i=1 l i c i (x k ) = 0. Note that l 1 x m k [1] l i c i (x k ) =.. i=1 l m x k [m] Because x k [i] 0 for all i = 1,..., m, we have that l i = 0 for all i A k, which implies that c i (x k ), i A k are linear independent. Analogously, we have the following lemma. Lemma.. Let the subsequence x k l } of x k } with x k } Ω p converge to x, and let A kl A for all sufficiently large l. Then c A (x ) is of full column rank. Proof Since x k l Ω p and x k l x, we have that x [i] 0.5 for all i E and therefore x [i] 0 for all i E. For i A E, c i (x k l ) 10 6, and then c i (x ) 10 6 as k l. By the definition of c(x), we also have that x [i] 0 for all i A I. Analogous to the proof of Lemma.1, c i (x ), i A are linear independent as was to be shown. 4

5 . The QP subproblem and its reformulation In this and the next subsections, we discuss how to compute the search direction at x k. After the working set A k is determined, the search direction d k and its associated Lagrange multiplier λ k can be determined via solving (probably two or three with different perturbed vectors w k R m where m = A k c ) equality constrained QP subproblem(s) in the form of: min d R n 1 dt B k d + f(x k ) T d s.t. c Ak (x k ) T d = w k, (.7) where B k R n n is symmetric and positive definite that is an approximation of the Hessian of the Lagrangian function L(x k, λ k ). We point out that B k can be updated by the BFGS formula [7]. The strategy of choosing different perturbed w k is similar to [4, 41] and they correspond to two types of search directions d k, which are designed for the purpose of the global convergence and locally superlinear convergence. In order to simplify the subsequent presentation, we identify these two cases by a boolean variable FAST, i.e., FAST=FALSE or FAST=TRUE, respectively. Details of the choice of w k for the search direction are delayed until Algorithm 3 and Remark., and we next will discuss an efficient procedure for solving the solution d k of (.7). It is evident that the equality constrained quadratic programming (.7) is equivalent to the linear system: B k d + c Ak (x k )λ = f(x k ), c Ak (x k ) T (.8) d = w k. However, as n gets large, solving the linear system (.8) can be expensive. In addition, without effectively exploiting the underlying sparse structure, the associated coefficient matrix could occupy too much memory. To resolve these numerical difficulties, we make use of the duality technique and solve the dual problem of (.7) 1 max λ R m λt W k λ + b T k λ. (.9) Note that (.9) is an unconstrained optimization problem with relatively smaller size m, where W k = c Ak (x k ) T B 1 k c A k (x k ), (.10) b k = w k + c Ak (x k ) T B 1 k f(xk ). (.11) Note that B k is positive definite and therefore strong duality follows, which implies that the search direction d k and the guess λ k of the associated Lagrange multiplier can be obtained from (.9), instead of (.7). In particular, observing that W k R m m and m m is much smaller than n, solving the KKT condition of (.9) or, equivalently, solving a much smaller linear system: W k λ = b k (.1) is inexpensive. Once λ k is obtained from (.1), putting it into the first equation in (.8) yields d k = B 1 ( k f(x k ) + c Ak (x k )λ k). (.13) The above procedure resolves most numerical difficulties. The last issue is how to calculate W k efficiently. The idea is to adopt the L-BFGS techinique which is the topic of the next subsection. 5

6 .3 Compute the search direction based on the L-BFGS formula The limited memory BFGS method [7, Chapter 9] is one of the most effective and widely used methods in the field of large scale unconstrained optimization. The main advantage is that the L-BFGS approach does not require to calculate or store a full Hessian matrix, which might be too expensive for large scale problems. For BCOP, we have pointed out that the matrix W k = c Ak (x k ) T B 1 k c A k (x k ) in (.10) needs to be computed. Note that c Ak (x k ) is large but sparse and structured, and if we adopt the L-BFGS formula to update the inverse of the Hessian approximation B k, much storage space and computational costs can be saved. To describe the detailed procedure, let S k = [s k l,..., s k 1 ], Y k = [y k l,..., y k 1 ], where s i = x i+1 x i and y i = L(x i+1, λ i ) L(x i, λ i ), i = k l,..., k 1. One may notice that the solution λ i to (.1) is in R m rather than in R m, and plugging λ i into L(x i, λ i ) is inappropriate. Nevertheless, we can augment λ i by setting λ i j = 0 for j I\A i. With this augment scheme, in what follows, we will use λ i to denote the estimate multiplier in R m as long as no confusion is caused. By the L-BFGS formula, the matrix B k resulting from l updates to the basic matrix B 0 = ν k I is given by ( B k = ν k I where L k, D k R l l are defined by (L k ) i,j = ) ( ) 1 ( ) ν k Sk T ν k S k Y S k L k ν k Sk T k L T k D k Yk T, (s k l 1+i ) T (y k l 1+i ) if i > j, 0 otherwise, D k = diag(s T k ly k l,..., s T k 1y k 1 ), and ν k = yt k 1 y k 1 s T k 1 y. To ensure the positive definiteness of B k+1, we adopt so-called damped BFGS technique k 1 to modify y k so that s T k y k is sufficiently positive. Let y k θ k y k + (1 θ k )B k s k, where the scalar θ k is defined as 1, if s T k θ k = y k 0.0s T k B ks k, (0.98s T k B ks k )/(s T k B ks k s T k y k), if s T k y k < 0.0s T k B ks k. We then use s k and the modified y k to update S k+1 and Y k+1, respectively. Let H k denote the inverse of B k, then the update formula for H k is given by H k+1 = V T k H k V k + ρ k y k s T k, (.14) where ρ k = 1 yk T s and V k = I ρ k k y k s T k. Using the information (S k and Y k ) of the last l iterations and choosing δ k I with δ k = 1 ν k as the initial approximation Hk 0, we obtain by repeatedly applying (.14) that H k = H f k + Hs k, where and H f k = δ k(v T k 1 V T k l)(v k l V k 1 ) H s k = ρ k l (V T k 1 V T k l+1)s k l s T k l(v k l+1 V k 1 ) +ρ k l+1 (V T k 1 V T k l+)s k l+1 s T k l+1(v k l+ V k 1 ) + + ρ k 1 s k 1 s T k 1. 6

7 For simplicity, we denote c Ak (x k ) by A k. It then follows from (.10) that W k = A T k H k A k = A T k H f k A k + A T k H s ka k. (.15) Since the matrix A k is sparse (no more than n nonzero elements) and V k is structured, we are able to accomplish matrix-chain multiplication for A T k Hf k A k and A T k Hs k A k rather efficiently, through transformation of the most right hand-side of (.15). In particular, it is straightforward that (V k l V k 1 )A k = A k ρ k 1 y k 1 s T k 1A k ρ k l+1 y k l+1 s T k l+1(v k l+ V k 1 )A k ρ k l y k l s T k l(v k l+1 V k 1 )A k. Let q i = ρ i s T i (V i+1 V k 1 )A k for i = k l,..., k and q k 1 = ρ k 1 s T k 1 A k. It then follows that A T k H f k A k = δ k (A T k = δ k A T k A k + k 1 i=k l k 1 q T i y T i k 1 i=k l j=k l Using q i, the last item in (.15) can be rewritten as ) ( A T k H s ka k = A k k 1 i=k l δ k (y T i y j )q T i q j k 1 i=k l y i q i ) k 1 i=k l δ k (q T i y T i A k + A T k y i q i ). (.16) q T i q i ρ i. (.17) Consequently, based on (.16) and (.17), the whole procedure for computing W k = A T k H ka k can be summarized by the pseudo-code in Algorithm 1. We remark that the procedure between lines -13 computes W s k = AT k Hs k A k and lines 15-5 computes W f k = AT k Hf k A k, and line 6 finally forms W k. Remark.1. We finally count the computational complexity of computing W k in Algorithm 1. For this purpose, we assume p i = p for i = 1,,..., m, only for simplicity. First, it requires at most (because m m) (l + l + )mp + lm + O(m) flops for computing W s k = AT k Hs k A k (lines -13), and costs at most ( 3 l + 7 l + 3)mp + (3 l + 7 l)m + O(m) flops for W f k = AT k Hf k A k (lines 15-5). Note that mp = n, and this implies that for l n, computation of W k requires at most O(m +mp) = O(m +n) flops. As for b k and d k, the main computational effort is to compute the matrix-vector product H k z. Applying [7, Algorithm 9.1], it is easy to know that 6lmp = 6ln flops are required for computing H k z, and therefore, computation of b k in (.11) and d k in (.13) needs at most 1lmp + 6mp = (1l + 6)n flops..4 The NLP Filter Suppose we have the search direction d k, then the step size α k is the next important ingredient that determines the iterate x k+1 := x k + α k d k. 7

8 Algorithm 1: Procedure for computing W k based on the L-BFGS formula Data: S k, Y k, A k, δ k Result: W k 1 % Compute W s k = AT k Hs k A k for i = k l,..., k 1 do 3 ρ i = 1/y T i s i; 4 end 5 W s k = 0; 6 for i = k 1,..., k l do 7 u = s T i ; 8 for j = i,..., k do 9 u = u ρ j+1 (uy j+1 )s T j+1 ; 10 end 11 q i = ρ i ua k % q i = 1 W s k = W s k + qt i (q i/ρ i ); 13 end 14 % Compute W f k = AT k Hf k A k 15 W f k = δ ka T k A k 16 for i = k l + 1,..., k 1 do 17 for j = k l + 1,..., i do 18 β = δ k (y T i y j); 19 W f k = W f k + (βq i) T q j ; 0 if j < i then 1 W f k = W f k + qt j (βq i); end 3 end 4 W f k = W f k (δ kq T i )(yt i A k) (A T k y i)(δ k q i ); 5 end 6 W k = W f k + W s k ; ρ i s T i (V i+1 V k 1 )A k, i = k l,..., k ρ k 1 s T k 1 A k, i = k 1 In choosing α k, we will use the filter method and the backtracking line search procedure. In particular, we will generate a decreasing sequence of trials for α k (αmin k, 1] until our preset acceptance criterion is fulfilled or the feasibility restoration phase (Section.5) is called. Here, αmin k 0 is a lower bound of αk and we will give an explicit formula of αmin k in the next subsection. Let ˆx := x k + ˆαd k, ˆα (αmin, k 1] denote a trial point. Using ( ) c E (x) h(x) = maxc I (x), 0} as a measure of infeasibility at the point x, we now give relevant definitions about filter. The first one, Definition.1, is a variant of [14, (.6)]. Definition.1. For given β (0, 1) and γ (0, 1), a trial point ˆx (or equivalently the pair (h(ˆx), f(ˆx))) is 8

9 acceptable to x l (or equivalently the pair (h(x l ), f(x l ))), if h(ˆx) βh(x l ) or (.18) f(ˆx) f(x l ) γ minh(ˆx), h(ˆx) }. (.19) In the original paper of Fletcher and Leyffer [13], a pair (h(ˆx), f(ˆx)) is said to dominate (h(x l ), f(x l )) if both (.18) and (.19) hold with β = 1 and γ = 0, and a filter is defined as a list of pairs (h(x l ), f(x l )) such that no pair dominates any other in this filter [13, Definition ]. The condition (.19) is a variant of [14, (.6)] where f(ˆx) f(x l ) γh(ˆx). Note that (.19) is equivalent to: f(ˆx) f(x l ) γh(ˆx) if h(ˆx) 1 and f(ˆx) f(x l ) γh(ˆx) otherwise. The reason to introduce this modified condition on h(ˆx) is that we prefer to accept the trial point ˆx for the purpose of convergence whenever the violation of the feasibility is not severe, i.e., h(ˆx) < 1. Similar to the original definition of the filter in [13], based on Definition.1, we define our filter, denoted by F k at the iteration k, as a set of pairs (h(x l ), f(x l )) such that any pair in the filter is acceptable to all previous pairs in F k in the sense of Definition.1. Initially with k = 0, the filter F k can begin with the pair (χ, ), where χ > 0 is imposed on h(ˆx) as an upper bound to control the constraint violation [13]. At the start of iteration k, the current pair (h(x k ), f(x k )) F k but must be acceptable to it, while at the end of iteration k, the pair (h(x k ), f(x k )) may or may not be added to F k, depending on our acceptance rule to be discussed in Remark.3. But once (h(x k ), f(x k )) is added to F k, we remove all pairs in the current filter F k which are worse than (h(x k ), f(x k )) with respect to both the objective function value and the constraint violation; the detailed procedure for updating the filter F k will be described in Algorithm 3 and Remark.3. Definition.. A trial point ˆx (or a pair (h(ˆx), f(ˆx))) is acceptable to the filter F k, if ˆx (or a pair (h(ˆx), f(ˆx))) is acceptable to x l in the sense of Definition.1, for all l F k := l (h(x l ), f(x l )) F k }. The trial point ˆx is to be accepted as the next iteration if it is acceptable both to x k (by Definition.1) and to the filter F k (by Definition.). Nevertheless, such acceptance rule for the trial ˆx may cause the situation: we always accept the points that satisfy (.18) alone, but not (.19). This would result in an iterative sequence converging to a feasible, but non-optimal point. To avoid this situation, we impose additional condition on ˆx: Case 1 When FAST=FALSE or ˆα < 1: if ˆα f(x k ) T d k > δh (x k ), (.0) then accepting ˆx as the next iterate x k+1 should satisfy f(ˆx) f(x k ) + ˆαη f(x k ) T d k ; (.1) Case When FAST=TRUE and ˆα = 1: if f(x k ) T d k > δh (x k ) and h(x k ) ζ 1 d k ζ, (.) then accepting ˆx as the next iterate x k+1 should satisfy f(ˆx) f(x k ) η min f(x k ) T d k, ξ d k ζ }, (.3) where ζ 1 > 0, ζ (, 3), ξ > 0, η (0, 1 ), and δ > 0 is chosen to satisfy δ γ/η. Note that Case 1 and Case are mutually exclusive. The motivation for these conditions is from [33, section ]. The switching condition for Case 1 and Case and the sufficient reduction conditions (.1) and 9

10 (.3) are useful for the global convergence and the fast local convergence as well: If (.0) for Case 1 is satisfied, then the direction d k is descent for f(x), and thereby imposing the reduction condition (.1) on f(x) is helpful for the global convergence; if (.) for Case is satisfied, implying d k a search direction for fast local convergence, the full step (i.e., ˆα = 1) is expected so that the fast local convergence can be achieved. Note that the condition (.3) is more relaxed than (.1) as we prefer to accept the full step. Finally, we are able to state our rule for accepting the trial point ˆx as the next iterate. Acceptance Rule: A trial point ˆx is accepted as the next iterate x k+1 if it is acceptable to F k (h(x k ), f(x k ))}, and one of the following two conditions holds, (i) either (.0) and (.1) for Case 1 or (.) and (.3) for Case are satisfied; (ii) (.0) for Case 1 or (.) for Case is not satisfied. If the trial point ˆx does not satisfy ˆx Ω p or the Acceptance Rule, we shrink ˆα until the trial point is accepted or ˆα αmin k. Once the latter occurs, the feasibility restoration phase is called, which is discussed in the next subsection..5 Feasibility Restoration Phase Motivated by [38], we define the lower bound αmin k of ˆα by } αmin k min 1 β, γh(xk ), δh (x k ), f(x k ) T d k < 0, = f(x k ) T d k f(x k ) T d k α φ, otherwise, (.4) where α φ is a positive scalar. Through shrinking ˆα, if we cannot find a step size ˆα (αmin k, 1] such that the trial point ˆx is accepted by the Acceptance Rule, we then turn to the feasibility restoration phase. Note that when the iteration gets into the restoration phase, x k is infeasible, but if x k is feasible, h(x k ) = 0 and there must be some ˆα (αmin k, 1] so that ˆx is accepted (see Lemma 3.9). Based on these facts, in the restoration phase, we project x k onto Ω to get the next iterate x k+1 = P Ω (x k ). Since the feasible set Ω is of special structure, projecting x k onto Ω (Algorithm ) is easy and costs only at most 3n flops. Algorithm : P Ω (x k ): projection x k onto Ω 1 Given x k ; for i=1,...,m do 3 if (i m 1 & x k [i] 1) or (i > m 1 & x k [i] > 1) then 4 x k [i] xk [i] / xk [i] ; 5 end 6 end 7 return x k ; 10

11 .6 The Statement of Algorithm We now state the overall algorithm. Algorithm 3: Filter active set method (FilterASM) 1 Given x 0 Ω p, χ > h(x 0 ), ν (, 3), β (0, 1), γ (0, 1), η (0, 1 ), δ γ η, ξ > 0, α φ (0, 1 ), ζ 1 > 0, ζ (, 3), r (0, 1). Initialize F 0 with the pair (χ, ); for k=0,1,,...,maxit do 3 Determine the working set A k ; 4 Compute λ k,0 by (.1) with w k = c Ak (x k ) and d k,0 by (.13) with λ k = λ k,0 5 if d k,0 = 0 and λ k,0 i 0 ( i A k I), stop % Termination condition 6 if λ k,0 i 0, i A k I, (.5) then 7 Set FAST=TRUE, d k = d k,0, λ k = λ k,0, and w k = c Ak (x k ) c Ak (x k + d k ) d k,0 ν e; 8 else 9 Set FAST=FALSE, and w k = 0; 10 end 11 Compute λ k,1 by (.1) with w k and compute d k,1 by (.13) with λ k = λ k,1 ; 1 if FAST=TRUE then 13 Set ˆd k = 14 else 15 Set [u Ak ] i = 0, if d k,1 d k,0 > d k,0, d k,1 d k,0, otherwise; min c ji (x k ), 0} + λ k,1 j i, λ k,1 j i < 0 (j i A k I), c ji (x k ), others where A k = j 1,..., j Ak c }; 16 Compute λ k, by (.1) with w k = u Ak and compute d k, by (.13) with λ k = λ k, ; 17 Set d k = d k,, λ k = λ k,1 ; 18 end 19 if FAST=FALSE or x k + d k + ˆd k does not satisfy the Acceptance Rule, or x k + d k + ˆd k / Ω p then 0 Find α k > αmin k, the first number αk of the sequence 1, r, r,...} such that ˆx = x k + α k d k satisfies the Acceptance Rule and ˆx Ω p ; 1 else Set ˆx = x k + d k + ˆd k and α k = 1; 3 end 4 if the above α k (i.e., α k > αmin k ) does not exist then 5 Go to the feasibility restoration phase to get x k+1 = P Ω (x k ) and add (h(x k ), f(x k )) to F k ; 6 else 7 if (.0) for Case 1 or (.) for Case does not hold then add (h(x k ), f(x k )) to F k ; 8 Set x k+1 = ˆx, s k = x k+1 x k, y k = L(x k+1, λ k ) L(x k, λ k ), and update S k, Y k to S k+1, Y k+1. 9 end 30 end Remark.. In Algorithm 3, lines 3-18 state the procedure for computing the search direction d k, the guess of Lagrange multiplier λ k, together with some other quantities (say d k,0, λ k,0, d k,1, λ k,1, d k, and λ k, etc.) related to d k and λ k, while lines 19-3 describe the procedure for the step size α k. In computing the search direction between lines 3 and 18, there are two different cases: 11

12 (i) FAST=TRUE. The pair (d k, λ k ) = (d k,0, λ k,0 ) solves B k d k,0 + c Ak (x k )λ k,0 = f(x k ), c Ak (x k ) T d k,0 = c Ak (x k ), (.6) which is a quasi-newton equation of KKT system (.1)-(.4) at the working set A k. To achieve fast local convergence and to overcome the Maratos effect, we adopt the second order correction technique. In particular, we compute the second order correction step by setting ˆd k = d k,1 d k,0 where d k,1 is from B k d k,1 + c Ak (x k )λ k,1 = f(x k ), c Ak (x k ) T d k,1 = (c Ak (x k + d k ) + c Ak (x k ) + d k,0 ν (.7) e). Here, e = (1, 1,...., 1) T with appropriate dimension. Then we check if ˆx = x k + d k + ˆd k satisfies the Acceptance Rule. If it fails, this second order correction step ˆd k is discarded, and the backtracking technique is invoked to find a step size α k such that x k + α k d k is accepted. (ii) FAST=FALSE. The search direction d k = d k, is computed by solving B k d k, + c Ak (x k )λ k, = f(x k ), c Ak (x k ) T d k, (.8) = u Ak, where u Ak (line 15) uses the information of λ k,1 from the system B k d k,1 + c Ak (x k )λ k,1 = f(x k ), c Ak (x k ) T d k,1 = 0. (.9) We explain the above two linear systems as follows: the solution d k,1 of (.9) is in the null space of c Ak (x k ) T and targets at improving f(x) rather than h(x); because d k,1 may be close to zero with a negative multiplier λ k,1, a slight perturbation system (.8) of (.9) is to be solved and yields a new direction d k,, which aims at improving h(x) instead, and prevents the unwelcome effect caused by a negative multiplier. In all, d k in this case contributes to the global convergence. Remark.3. The filter F k is updated either in line 5 or line 7. In other words, the pair (h(x k ), f(x k )) is added to F k and remove all other pairs in F k dominated by (h(x k ), f(x k )) if (.0) for Case 1 or (.) for Case is not fulfilled or the restoration phase is invoked. Remark.4. For the sake of convenience for analyzing the convergence, we borrow the terminology from Fletcher, Leyffer and Toint [14]: we call an iterate an f-type iterate if x k+1 = x k + α k d k (or x k+1 = x k + d k + ˆd k ) is accepted according to (i) of the Acceptance Rule; otherwise, we call the iterate an h-type iterate, which means that x k+1 is accepted according to (ii) of the Acceptance Rule, or is recovered from the feasibility restoration phase. 3 Global convergence In this section we show the global convergence of Algorithm 3 under the following two assumptions: (A1) The objective function f(x) is twice continuously differentiable; (A) The matrix B k is bounded and uniformly positive definite for all k; that is, there exists a scalar τ > 0 such that 1 τ d d T B k d τ d holds for any d R n and any k. We begin with the boundedness of the iterates. Lemma 3.1. The sequence x k } generated by Algorithm 3 is bounded. 1

13 Proof Since all iterates from Algorithm 3 satisfy the upper bound condition h(x k ) χ because F 0 = (χ, )}, combining with the definitions of h(x) directly leads to the boundedness of x k }. Theorem 3.. Suppose that Assumption (A1) holds. Let x k l } be an infinite subsequence of x k } on which (h(x k l ), f(x k l )) is added into the filter. Then lim k l h(xk l ) = 0. Proof From Assumption (A1) and Lemma 3.1, we know that f(x k l )} is bounded from below. Applying [33, Lemma 3.1] yields the assertion. Theorem 3. implies that all accumulation points of x k l } on which (h(x k l ), f(x k l )) is added into the filter are feasible points for BCOP. Lemma 3.3. Suppose that Assumptions (A1)-(A) hold. If FAST=TRUE, then the sequence (d k,0, λ k,0 )} is bounded; if FAST=FALSE, then both sequences (d k,1, λ k,1 )} and (d k,, λ k, )} are bounded. Proof From Algorithm 3, λ k,0 = W 1 k b k with b k = c Ak (x k ) + c Ak (x k ) T B 1 k f(xk ) in the case of FAST=TRUE, where W k = c Ak (x k ) T B 1 k c A k (x k ) is uniformly positive definite for all k due to Lemmas., 3.1 and Assumption (A). Again using Lemma 3.1 and Assumption (A), b k is bounded and therefore λ k,0 is bounded too, which together with the boundedness of B 1 k, xk and λ k,0 implies that d k in (.13) is bounded for all k. Analogously, in the case of FAST=FALSE, W k and its inverse are bounded for all k. Lemma 3.1 and Assumption (A) ensure the boundedness of c Ak (x k ) T B 1 k f(xk ). Since λ k,1 = W 1 k c A k (x k ) T B 1 k f(xk ) and d k,1 = B 1 ( k f(x k ) + c Ak (x k )λ k), it follows that both λ k,1 and d k,1 are bounded for all k. In view of the definition of u Ak (see line 15 of Algorithm 3) and the boundedness of x k }, u Ak is bounded too, which implies the boundedness of λ k, = W 1 k (u A k + c Ak (x k ) T B 1 k f(xk )). Consequently, d k, in (.13) with λ k, is bounded for all k. Remark 3.1. Based on the previous lemmas, for the convenience of further reference, we assume d k,j M d, j = 0, 1, and λ k,j M λ, j = 0, 1, for all k, where M d > 0 and M λ > 0 are two constants. Lemma 3.4. Under Assumptions (A1)-(A), the following two statements are true. (i) If FAST=TRUE and d k = 0, then x k is a KKT point of BCOP. (ii) If FAST=FALSE, h(x k ) = 0 and f(x k ) T d k = 0, then x k is a KKT point of BCOP. Proof (i) Since λ k,0 is from (.1) with b k = c Ak (x k ) + c Ak (x k ) T B 1 k f(xk ), rearranging (.1) leads to c Ak (x k ) = W k λ k,0 + c Ak (x k ) T B 1 k f(xk ) which, using (.13) and the definition of W k, gives c Ak (x k ) = c Ak (x k ) T B 1 k ( c A k (x k )λ k,0 + f(x k )) = c Ak (x k ) T d k,0. Putting d k,0 = d k = 0 into the above equation yields c Ak (x k ) = 0; now combining with the definition of A k implies that x k is feasible, that is, c E (x k ) = 0 and c I (x k ) 0. From Assumption (A) and (.13), d k,0 = 0 leads to c Ak (x k )λ k,0 + f(x k ) = 0 which shows the dual feasibility at x k. In addition, the nonnegativeness of λ k,0 is guaranteed by the mechanism of Algorithm 3 (in the case of FAST=TRUE). Thus, x k satisfies a variant of the KKT conditions (.1)-(.4) and therefore is a KKT point. (ii) By Algorithm 3, if FAST=FALSE, then λ k,1 = W 1 k c A k (x k ) T B 1 k f(xk ), (3.30) d k,1 = B 1 k ( f(xk ) + c Ak (x k )λ k,1 ), (3.31) λ k, = W 1 k u A k + λ k,1, (3.3) d k, = d k,1 + B 1 k c A k (x k )W 1 k u A k. (3.33) 13

14 From (3.33) and (3.30), we have that f(x k ) T d k, = f(x k ) T d k,1 + f(x k ) T B 1 k c A k (x k )W 1 k u A k = f(x k ) T d k,1 (λ k,1 ) T u Ak. (3.34) By premultiplying the first equation of (.9) by (d k,1 ) T and using the second equation of (.9), we get f(x k ) T d k,1 = (d k,1 ) T B k d k,1. Substituting it into (3.34) yields f(x k ) T d k, = (d k,1 ) T B k d k,1 (λ k,1 ) T u Ak. (3.35) According to the hypothesis (ii) of this lemma, c E (x k ) = 0, c I (x k ) 0 and f(x k ) T d k, = 0. Combining with the definition of u Ak, the second term in the righthand side of (3.35) can be changed to and then λ k,1 i <0,i A k I 0 = (d k,1 ) T B k d k,1 [(λ k,1 i ) + max λ k,1 i c i (x k ), 0}] λ k,1 i <0,i A k I λ k,1 i 0,i A k I [(λ k,1 i ) + max λ k,1 i c i (x k ), 0}] + λ k,1 i c i (x k ), λ k,1 i 0,i A k I λ k,1 i c i (x k ). It is easy to see that the first two terms (excluding the sign) in the righthand side are non-negative and the last term is non-positive, which implies that all terms in the righthand side must be zero. In particular, the first term (d k,1 ) T B k d k,1 = 0 implies the primal optimality condition c Ak (x k )λ k,1 + f(x k ) = 0 due to Assumption (A) and (3.31); the second term λ k,1 i <0,i A k I [(λk,1 i ) + max λ k,1 i c i (x k ), 0}] = 0 implies λ k,1 0; and the third term λ k,1 i 0,i A k I λk,1 i c i (x k ) = 0 implies λ k,1 i c i (x k ) = 0, i A k I which gives the complementarity condition. Thus, x k is a KKT point of BCOP. Remark 3.. Since B k is uniformly positive definite and uniformly bounded, by Lemma., the conclusion of Lemma 3.4 can be extended to its limit form: (i) if FAST=TRUE and d k l 0, then any limit point x of x k l } is a KKT point of BCOP, where k l } is an infinite subsequence of k}; (ii) if FAST=FALSE, h(x k l ) 0 and f(x k l ) T d k l 0, then any limit point x of x k l } is a KKT point of BCOP, where k l } is an infinite subsequence of k}. We next establish a series of lemmas concerning the f-type iterates. Lemma 3.5. Suppose that Assumptions (A1)-(A) hold. Then there exist scalars M h, M f > 0 and αu k (0, 1] such that h(x k + αd k ) (1 α)h(x k ) M hα d k (3.36) holds for all α (0, α k u], and f(x k + αd k ) f(x k ) α f(x k ) T d k M f α d k (3.37) holds for all α (0, 1], where d k is generated by Algorithm 3. Proof If FAST=TRUE, (d k, λ k ) = (d k,0, λ k,0 ) solves (.6), implying that c Ak (x k ) + c Ak (x k ) T d k = 0, (3.38) 14

15 and if FAST=FALSE, (d k, λ k ) = (d k,, λ k, ) solves (.8) which together with the definition of u Ak yields c i (x k ) + c i (x k ) T d k = c i (x k = 0, i E, ) + u i (3.39) 0, i A k I. Since c i (x k ), i A k are quadratic functions, it follows that for i A k c i (x k + αd k ) = c i (x k ) + α c i (x k ) T d k + α (dk ) T Q i d k, where Q i is the Hessian of c i (x). As a result, for either FAST=TRUE or FAST=FALSE, using (3.38) and (3.39) we have c i (x k + αd k ) = (1 α)c i (x k ) + α (dk ) T Q i d k i E, c i (x k + αd k ) (1 α)c i (x k ) + α (dk ) T Q i d k i A k I. Therefore, it is straightforward to get that for all i E and for all i A k I c i (x k + αd k ) (1 α) c i (x k ) + M hα d k (3.40) max0, c i (x k + αd k )} (1 α) max0, c i (x k )} + M hα d k, (3.41) where M h > 0 is a scalar satisfying Q i M h for all i A k. On the other hand, for i I\A k, c i (x k ) < 0 due to the definition of A k ; by the continuity of c i (x), there exists a scalar αu k (0, 1] such that c i (x k +αd k ) < 0 for all i I\A k and all α (0, αu]. k Consequently, in view of the definition of h(x), ( ) ( ) h(x k c E (x k ) ) = maxc Ak (x k and h(x k + αd k c E (x k + αd k ) ) = ), 0} maxc Ak (x k + αd k α (0, α ), 0} u], k which together with (3.40) and (3.41) gives (3.36). As for (3.37), it readily follows from Taylor s Theorem that f(x k + αd k ) f(x k ) α f(x k ) T d k = α (dk ) T f(ξ k )d k (3.4) where ξ k R n lies in the line segment from x k to x k + d k. Since x k and d k are bounded for all k, and the objective function f(x) is twice continuously differentiable, there exists a scalar M f > 0 such that f(ξ k ) M f for all ξ k, and thus using (3.4) gives (3.37). We remark that αu k in Lemma 3.5 is related to x k ; however, with some additional conditions, αu k in the conclusion of Lemma 3.36 can be reduced to a constant, which is shown in the following corollary. Corollary 3.6. Suppose that Assumptions (A1)-(A) hold. Let x k l } converge to a non-optimal point x and A kl keeps unchanged for all k l. Then there exist scalars M h > 0 and α u (0, 1] such that holds for all α (0, α u ], where d k l is generated by Algorithm 3. h(x k l ) (1 α)h(x k l ) M hα d k l (3.43) 15

16 Proof According to the hypothesis of this corollary, A kl A for all k l, where A is a finite index set independent of k l. Recalling the definition of A (i.e., A kl ) and x k l x, we obtain that c i (x ) < 0 for all i I\A and by continuity of c i (x), there exists an open ball B(x ; r) of radius r > 0 centered at x such that for any y B(x ; r), c i (y) < 0, i I\A. Again using x k l x, and d k l M d due to Remark 3.1, there exists a scalar ᾱ > 0 and an integer k l > 0 such that c i (x k l ) < 0, i I\A for all α (0, ᾱ] and all k l k l. Thus for all α (0, ᾱ] and k l k l, ( ) ( ) h(x k l c E (x k l ) ) = maxc A (x k l and h(x k l c E (x k l ) ) = ), 0} maxc A (x k l. ), 0} Following the proof of Lemma 3.5, for all i E and for all i A I c i (x k l ) (1 α) c i (x k l ) + M hα d k l, max0, c i (x k l )} (1 α) max0, c i (x k l )} + M hα d k l, and therefore (3.43) holds for all α (0, ᾱ] and k l k l. On the other hand, for those iterations with k l < k l, it follows from Lemma 3.5 that (3.43) holds for all α (0, α k l u ]. Define α u = minαu k1, αu k k l 1,..., αu, ᾱ}. We therefore conclude that (3.43) holds for all α (0, α u ] which completes the proof. Define the quantity d k,0, F AST = T RUE Υ k := h(x k ) + f(x k ) T d k,, F AST = F ALSE which is actually another first-order optimality measure due to Lemma 3.4. The proofs of the following lemmas and theorem are related to the optimality measure Υ k. In particular, the next lemma reveals that the search direction d k generated by Algorithm 3 is descent for the objective function if a point is nearly feasible but non-optimal. Lemma 3.7. Suppose that Assumptions (A1)-(A) hold. Let x k l } be a subsequence of x k } for which Υ kl ɛ with a constant ɛ > 0. Then there exist two scalars ɛ 1 > 0 and ɛ > 0 such that the following statement is true: h(x k l ) ɛ 1 f(x k l ) T d k l ɛ. Proof We first consider the case FAST=TRUE. In this situation, Υ kl = d k l,0 ɛ, and (d k l, λ k l ) = (d k l,0, λ k l,0 ) solves (.6). Premultiplying the first equation of (.6) by (d k l,0 ) T, we have that f(x k l ) T d k l = (d k l,0 ) T B k d k l,0 (d k l,0 ) T c Akl (x k l )λ k l,0, while premultiplying the second equation of (.6) by (λ kl,0 ) T and substituting it into above equation yield f(x k l ) T d k l = (d kl,0 ) T B k d kl,0 + i c i (x k l ). (3.44) i A kl λ kl,0 Due to FAST=TRUE, we have λ kl,0 0, and using Remark 3.1 gives λ kl,0 M λ. It is straightforward that λ kl,0 i c i (x k l ) mh(x k l ) λ kl,0 mm λ h(x k l ), i A kl which together with (3.44), Assumption (A) and d k l,0 ɛ gives f(x k l ) T d k l ɛ τ + mm λ h(x k l ). 16

17 Let ɛ 1 := ɛ mτm λ. If h(x k l ) ɛ 1, we then obtain that f(x k l ) T d k l ɛ, where ɛ := ɛ τ. Next, we show the assertion for the case FAST=FALSE. In this situation, d k l = d k l, and Υ kl = h(x k l ) + f(x k l ) T d k l, ɛ. If h(x k l ) ɛ, then From (3.35) and the definition of u Ak, f(x k l ) T d k l, By Assumption (A), one has f(x k l ) T d k l = f(x k l ) T d k l, ɛ. (3.45) = (d k l,1 ) T B kl d k l,1 + f(x k l ) T d k l, λ k l,1 i 0,i A kl I λ k l,1 i λ k l,1 i <0,i A kl I λ k l,1 i c i (x k l ) + λ k l,1 i c i (x k l ). i E 0,i A kl I mh(x k l ) λ k l,1 [(λ k l,1 i ) + max λ k l,1 i c i (x k l ), 0}] λ k,1 i c i (x k l ) + λ k l,1 i c i (x k l ) i E mm λ h(x k l ), (3.46) } ɛ where the third inequality follows from Remark 3.1. Let ɛ 1 := min, ɛ 3 mm λ and ɛ := ɛ. If h(xk l ) ɛ 1, then mm λ h(x k l ) ɛ 3 which combining with (3.46) and (3.45) yields f(xk l ) T d k l ɛ. Lemma 3.8. Suppose that Assumptions (A1)-(A) hold. If h(x k l ) > 0 and f(x k l ) T d k l ɛ (ɛ is from Lemma 3.7), then x k l is acceptable to the k l th filter for all α ᾱ k l, where ᾱ k l = minq 1 h(x k l ), q, α k l u }, q 1 = M h Md and q = ɛ M f. Md Proof The mechanism of Algorithm 3 (lines 19-3) ensures that (h(x k l ), f(x k l )) is acceptable to the k l th filter. We now show that x k l is no worse than x k l for all sufficiently small α > 0 in both feasibility and the objective function, implying that x k l is acceptable to the k l th filter. Since d k l M d due to Remark 3.1, it follows from (3.36) in Lemma 3.5 that for α (0, α k l u ] which turns out to be h(x k l ) h(x k l ) αh(x k l ) + α M h M d h(x k l ) h(x k l ) if 0 α minq 1 h(x k l ), α k l u } with q 1 := M h. Similarly, using (3.37) in Lemma 3.5 and the boundedness Md of d k l, we have that f(x k l ) f(x k l ) α f(x k l ) T d k l + α M f Md, which together with the assumption f(x k l ) T d k l ɛ yields f(x k l ) f(x k l ) αɛ + α M f M d Define q := ɛ M f. If 0 α q Md, then f(x k l ) f(x k l ). Therefore, x k l is acceptable to the k l th filter for all α ᾱ k l := minq 1 h(x k l ), α k l u, q }. With the help of Lemma 3.8, the following two lemmas show that there always exists some acceptable step size α such that x k + αd k is accepted as an f-type iteration point under certain conditions.. 17

18 Lemma 3.9. Suppose that Assumptions (A1)-(A) hold. If x k l is feasible but not optimal, then either x k l + d k l + ˆd k l is an f-type iteration point or there exists α k l 0 > αk l min such that xk l + α k l 0 dk l is an f-type iteration point. Proof The conclusion follows immediately if x k l + d k l + ˆd k l is an f-type iteration point. Otherwise, we need to prove that x k l + α k l 0 dk l is an f-type iteration point for some α k l 0 > αk l min. Since xk l is feasible but not optimal, we must have that h(x k l ) = 0 and Υ kl ɛ with some scalar ɛ > 0. By the mechanism of Algorithm 3 (line 7) and Lemma 3.7, the condition (.0) is always true if h(x l ) = 0, and therefore only pairs with h(x l ) > 0 can be added into the k l th filter. Let According to Lemma 3.5 and d k l M d, π k l := minh(x l ) (h(x l, f(x l )) F kl }. h(x k l ) α M h Md (3.47) } holds for all α (0, α k l u ]. If 0 α min α k l u, βπ k l M h, then h(x k l ) βπ k l, which implies that Md x k l is acceptable to the k l th filter. Since x k l is feasible, it follows from the definition of Ω p that x k l is in the interior of Ω p, which together with the boundedness of d k l shows x k l Ω p for all α in some subinterval of (0, 1], and therefore, we can assume without loss of generality, that x k l Ω p for all α (0, α k l u ]. By Lemma 3.7, h(x k l ) = 0 implies f(x k l ) T d k l ɛ, (3.48) which means that the switching condition for Case 1 and Case holds trivially no matter α < 1 or α = 1. It follows from (3.37) in Lemma 3.5 and the boundedness of d k l that f(x k l ) f(x k l ) αη f(x k l ) T d k l Thus, the sufficient reduction condition (.1) holds if 0 α (1 η)ɛ M f Md it is true from (3.47) that h(x k l ) αηɛ γ. Combining with (.1) and (3.48) yields α(1 η)ɛ + α M f M d f(x k l ) f(x k l ) γh(x k l ),. When 0 α min. } α k l ηɛ u, γm h, Md i.e., x k l is acceptable to x k l. From (.4) and the above proof, we have αmin k = 0, and we can choose any α in (0, ᾱ k l ] as α k l 0 such that xk l + α k l 0 dk l is an f-type iteration point, where } ᾱ k l := min α k βπ l u, k l M h Md, (1 η)ɛ ηɛ M f Md, γm h Md. Lemma Suppose that Assumptions (A1)-(A) hold. Let x k l } be an infinite subsequence of x k } on which (h(x k l ), f(x k l )) is added into the filter, and assume that x k l } converges to x and A kl keeps unchanged for all k l. If x is not a KKT point, then for all sufficiently large k l, either x k l + d k l + ˆd k l is an f-type iteration point or there exists α k l 0 > αk l min such that xk l + α k l 0 dk l is an f-type iteration point. Proof If x k l +d k l + ˆd k l is an f-type iteration point, the conclusion follows immediately. It suffices to prove the assertion for x k l. Since x is not a KKT point, it follows from Remark 3. that there exists a scalar 18

19 ɛ > 0 such that Υ kl ɛ for all sufficiently large k l. In the case of h(x k l ) = 0, the conclusion follows from Lemma 3.9. We now consider the remaining iteration k l with h(x k l ) > 0. As Υ kl ɛ, if h(x k l ) ɛ 1, then by Lemma 3.7, f(x k l ) T d k l ɛ. (3.49) If h(x k l ) < ɛ 1 and α minq 1 h(x k l ), α k l u, q }, it follows from Lemma 3.8 that x k l is acceptable to the k l th filter. Since x k l } converges to x and A kl keeps unchanged for all k l, Corollary 3.6 implies α k l u is independent of k l and we thereby drop the superscript k l in α k l u for the simplicity of the following proof. Analogous to the proof of Lemma 3.9, if 0 α (1 η)ɛ M f, the sufficient reduction condition (.1) is Md fulfilled. Again using Corollary 3.6 and the boundedness of d k l, for 0 α α u, h(x k l ) h(x k l ) αh(x k l ) + α M h M d and therefore h(x k l ) h(x k l ) if 0 α minq 1 h(x k l ), α u }, where q 1 is defined as Lemma 3.8. On the other hand, if (.0) is true, it follows from (.1) that f(x k l ) f(x k l ) αη f(x k l ) T d k l ηδh (x k l ) ηδh (x k l ) γh (x k l ), where the last inequality follows from δ γ/η. Hence, x k l is acceptable to x k l. Since h(x k l ) 0 due to Theorem 3., according to the definition of Ω p, x k l is in the interior of Ω p for all sufficiently large k l, which together with the boundedness of d k l implies x k l Ω p for all α in some subinterval of (0, 1] and all sufficiently large k l and, we assume without loss of generality that x k l Ω p for all α (0, α u ] for all sufficiently large k l. Therefore, we have now shown that for all sufficiently large k l, x k l is acceptable to x k l and the k l th filter, x k l Ω p, and the sufficient reduction condition (.1) holds if (.0) is satisfied, 0 α α k l and where α k l h(x k l ) min 1, ɛ 1, rq 1ɛ δ := min q 1 h(x k l ), q, α u, (1 η)ɛ } M f Md,, }, (3.50) and r is from line 0 in Algorithm 3. Let α k l 0 denote the first trial step size in the sequence 1, r, r,...} that satisfies α α k l. In view of Theorem 3., h(x k l ) tends to zero as k l, and therefore α k l = q 1 h(x k l ) and (3.50) is satisfied for all sufficiently large k l. It is evident that for all sufficiently large k l. Using (3.49) and (3.51) we have α k l 0 r αk l = rq 1 h(x k l ) (3.51) α k l 0 f(xk l ) T d k l rq 1 ɛ h(x k l ), which together with (3.50) implies that the switching condition (.0) for Case 1 is satisfied. Lastly, we show α k l 0 > αk l min. Noting the definition (.4) of αk l min, and using (3.49) and Theorem 3., we know α k l min = δh(x k l ) f(x k l )T d k l 19

Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Inexact Newton Methods and Nonlinear Constrained Optimization

Inexact Newton Methods and Nonlinear Constrained Optimization Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton

More information

An Inexact Newton Method for Optimization

An Inexact Newton Method for Optimization New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)

More information

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization

An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Scientific Computing: An Introductory Survey

Scientific Computing: An Introductory Survey Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

An Inexact Newton Method for Nonlinear Constrained Optimization

An Inexact Newton Method for Nonlinear Constrained Optimization An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results

More information

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity

A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University

More information

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING

CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global and local convergence results

More information

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems

Outline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction

More information

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α

More information

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE

INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global

More information

PDE-Constrained and Nonsmooth Optimization

PDE-Constrained and Nonsmooth Optimization Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)

More information

A GLOBALLY CONVERGENT STABILIZED SQP METHOD

A GLOBALLY CONVERGENT STABILIZED SQP METHOD A GLOBALLY CONVERGENT STABILIZED SQP METHOD Philip E. Gill Daniel P. Robinson July 6, 2013 Abstract Sequential quadratic programming SQP methods are a popular class of methods for nonlinearly constrained

More information

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Dr. Abebe Geletu Ilmenau University of Technology Department of Simulation and Optimal Processes (SOP)

More information

2.3 Linear Programming

2.3 Linear Programming 2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO

Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems Guidance Professor Masao FUKUSHIMA Hirokazu KATO 2004 Graduate Course in Department of Applied Mathematics and

More information

On the use of piecewise linear models in nonlinear programming

On the use of piecewise linear models in nonlinear programming Math. Program., Ser. A (2013) 137:289 324 DOI 10.1007/s10107-011-0492-9 FULL LENGTH PAPER On the use of piecewise linear models in nonlinear programming Richard H. Byrd Jorge Nocedal Richard A. Waltz Yuchen

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Lecture 13: Constrained optimization

Lecture 13: Constrained optimization 2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

Implementation of an Interior Point Multidimensional Filter Line Search Method for Constrained Optimization

Implementation of an Interior Point Multidimensional Filter Line Search Method for Constrained Optimization Proceedings of the 5th WSEAS Int. Conf. on System Science and Simulation in Engineering, Tenerife, Canary Islands, Spain, December 16-18, 2006 391 Implementation of an Interior Point Multidimensional Filter

More information

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way. AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier

More information

A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm

A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm Journal name manuscript No. (will be inserted by the editor) A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm Rene Kuhlmann Christof Büsens Received: date / Accepted:

More information

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30,

More information

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality

More information

A globally and quadratically convergent primal dual augmented Lagrangian algorithm for equality constrained optimization

A globally and quadratically convergent primal dual augmented Lagrangian algorithm for equality constrained optimization Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 A globally and quadratically convergent primal dual augmented Lagrangian

More information

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE

A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30, 2014 Abstract Regularized

More information

Interior Methods for Mathematical Programs with Complementarity Constraints

Interior Methods for Mathematical Programs with Complementarity Constraints Interior Methods for Mathematical Programs with Complementarity Constraints Sven Leyffer, Gabriel López-Calva and Jorge Nocedal July 14, 25 Abstract This paper studies theoretical and practical properties

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)

Part 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL) Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where

More information

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties

A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties Xinwei Liu and Yaxiang Yuan Abstract. We present a null-space primal-dual interior-point algorithm

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS

AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS MARTIN SCHMIDT Abstract. Many real-world optimization models comse nonconvex and nonlinear as well

More information

Hot-Starting NLP Solvers

Hot-Starting NLP Solvers Hot-Starting NLP Solvers Andreas Wächter Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu 204 Mixed Integer Programming Workshop Ohio

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

More information

A New Penalty-SQP Method

A New Penalty-SQP Method Background and Motivation Illustration of Numerical Results Final Remarks Frank E. Curtis Informs Annual Meeting, October 2008 Background and Motivation Illustration of Numerical Results Final Remarks

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION

A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-19-3 March

More information

Linear Programming Redux

Linear Programming Redux Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS

REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS Philip E. Gill Daniel P. Robinson UCSD Department of Mathematics Technical Report NA-11-02 October 2011 Abstract We present the formulation and analysis

More information

Some new facts about sequential quadratic programming methods employing second derivatives

Some new facts about sequential quadratic programming methods employing second derivatives To appear in Optimization Methods and Software Vol. 00, No. 00, Month 20XX, 1 24 Some new facts about sequential quadratic programming methods employing second derivatives A.F. Izmailov a and M.V. Solodov

More information

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION

A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION Optimization Technical Report 02-09, October 2002, UW-Madison Computer Sciences Department. E. Michael Gertz 1 Philip E. Gill 2 A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION 7 October

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA

Survey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA Survey of NLP Algorithms L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA NLP Algorithms - Outline Problem and Goals KKT Conditions and Variable Classification Handling

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints

A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality

More information

Lecture 15: SQP methods for equality constrained optimization

Lecture 15: SQP methods for equality constrained optimization Lecture 15: SQP methods for equality constrained optimization Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 15: SQP methods for equality constrained

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

HYBRID FILTER METHODS FOR NONLINEAR OPTIMIZATION. Yueling Loh

HYBRID FILTER METHODS FOR NONLINEAR OPTIMIZATION. Yueling Loh HYBRID FILTER METHODS FOR NONLINEAR OPTIMIZATION by Yueling Loh A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy. Baltimore,

More information

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties

A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical

More information

Constrained Nonlinear Optimization Algorithms

Constrained Nonlinear Optimization Algorithms Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu Institute for Mathematics and its Applications University of Minnesota August 4, 2016

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

POWER SYSTEMS in general are currently operating

POWER SYSTEMS in general are currently operating TO APPEAR IN IEEE TRANSACTIONS ON POWER SYSTEMS 1 Robust Optimal Power Flow Solution Using Trust Region and Interior-Point Methods Andréa A. Sousa, Geraldo L. Torres, Member IEEE, Claudio A. Cañizares,

More information

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION

A SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION A SHIFTED RIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OTIMIZATION hilip E. Gill Vyacheslav Kungurtsev Daniel. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-18-1 February 1, 2018

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

Multidisciplinary System Design Optimization (MSDO)

Multidisciplinary System Design Optimization (MSDO) Multidisciplinary System Design Optimization (MSDO) Numerical Optimization II Lecture 8 Karen Willcox 1 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Today s Topics Sequential

More information

MODIFYING SQP FOR DEGENERATE PROBLEMS

MODIFYING SQP FOR DEGENERATE PROBLEMS PREPRINT ANL/MCS-P699-1097, OCTOBER, 1997, (REVISED JUNE, 2000; MARCH, 2002), MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY MODIFYING SQP FOR DEGENERATE PROBLEMS STEPHEN J. WRIGHT

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints

An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

1. Introduction. We consider the general smooth constrained optimization problem:

1. Introduction. We consider the general smooth constrained optimization problem: OPTIMIZATION TECHNICAL REPORT 02-05, AUGUST 2002, COMPUTER SCIENCES DEPT, UNIV. OF WISCONSIN TEXAS-WISCONSIN MODELING AND CONTROL CONSORTIUM REPORT TWMCC-2002-01 REVISED SEPTEMBER 2003. A FEASIBLE TRUST-REGION

More information

A STABILIZED SQP METHOD: GLOBAL CONVERGENCE

A STABILIZED SQP METHOD: GLOBAL CONVERGENCE A STABILIZED SQP METHOD: GLOBAL CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-13-4 Revised July 18, 2014, June 23,

More information

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL) Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x

More information

IBM Research Report. Line Search Filter Methods for Nonlinear Programming: Motivation and Global Convergence

IBM Research Report. Line Search Filter Methods for Nonlinear Programming: Motivation and Global Convergence RC23036 (W0304-181) April 21, 2003 Computer Science IBM Research Report Line Search Filter Methods for Nonlinear Programming: Motivation and Global Convergence Andreas Wächter, Lorenz T. Biegler IBM Research

More information

Recent Adaptive Methods for Nonlinear Optimization

Recent Adaptive Methods for Nonlinear Optimization Recent Adaptive Methods for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke (U. of Washington), Richard H. Byrd (U. of Colorado), Nicholas I. M. Gould

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

A globally convergent Levenberg Marquardt method for equality-constrained optimization

A globally convergent Levenberg Marquardt method for equality-constrained optimization Computational Optimization and Applications manuscript No. (will be inserted by the editor) A globally convergent Levenberg Marquardt method for equality-constrained optimization A. F. Izmailov M. V. Solodov

More information

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Numerical Optimization

Numerical Optimization Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,

More information

Constrained Optimization Theory

Constrained Optimization Theory Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August

More information

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS

LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,

More information

On the complexity of an Inexact Restoration method for constrained optimization

On the complexity of an Inexact Restoration method for constrained optimization On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization

More information