A Filter Active-Set Algorithm for Ball/Sphere Constrained Optimization Problem
|
|
- Wilfred Manning
- 5 years ago
- Views:
Transcription
1 A Filter Active-Set Algorithm for Ball/Sphere Constrained Optimization Problem Chungen Shen Lei-Hong Zhang Wei Hong Yang September 6, 014 Abstract In this paper, we propose a filter active-set algorithm for the minimization problem over a product of multiple ball/sphere constraints. By making effective use of the special structure of the ball/sphere constraints, a new limited memory BFGS (L-BFGS) scheme is presented. The new L-BFGS implementation takes advantage of the sparse structure of the Jacobian of the constraints, and generates curvature information of the minimization problem. At each iteration, only two or three reduced linear systems are required to solve for the search direction. Filter technique combining with the backtracking line search strategy ensures the global convergence, and the local superlinear convergence can also be established under mild conditions. The algorithm is applied to two specific applications, the nearest correlation matrix with factor structure and the maximal correlation problem. Our numerical experiments indicate that the proposed algorithm is competitive to some recently custom-designed methods for each individual application. Keywords. SQP, Active Set, Filter, L-BFGS, Ball/sphere constraints, the nearest correlation matrix with factor structure, the maximal correlation problem AMS subject classification. 65K05, 90C30 1 Introduction In this paper, we consider a class of optimization problems of minimizing a (at least) twice continuously differentiable function (probably nonconvex) f(x) : R n R over a product of multiple balls/spheres constraints. Upon rescaling the balls/spheres, we cast without loss of generality such class of minimization problems in the following form: (BCOP) min x R n f(x) s.t. c i (x) := x [i] 1 = 0, i E, c i (x) := x [i] 1 0, i I, where E = 1,,..., m 1 }, I = m 1 + 1, m 1 +,..., m}, x [i] R pi, x = (x T [1], xt [],..., xt [m] )T, n = m i=1 p i. Here, we introduce the notation x [i] R pi to represent the ith sub-vector of x R n, and formulate the product of multiple ball/sphere constraints as a set of equality and inequality constraints. To simplify subsequent presentation, we name the above programming the ball/sphere constrained optimization problem (BCOP). This research is supported by National Natural Science Foundation of China (Nos , , and ). Department of Applied Mathematics, Shanghai Finance University, Shanghai 0109, China. School of Mathematics, Shanghai University of Finance and Economics, Shanghai 00433, China. Department of Mathematics, Fudan University, Shanghai 00433, China 1
2 The reason that we are interested in BCOP is twofold: on the one hand, many practical applications that arise recently from, for example, correlation matrix approximation with factor structure [3, ], factor models of asset returns [9], collateralized debt obligations [, 10], multivariate time series [5] and maximal correlation problem [7, 43, 44] can be recast in such form; on the other hand, general algorithms for nonlinearly constrained optimization may not be efficient as they generally do not take much advantage of the special structure of BCOP. Therefore, custom-made algorithm for BCOP can provide a uniform and much more efficient tool for these applications. Relying upon the framework of the sequential quadratic programming (SQP) method, e.g., [4, 16, 17, 18, 4, 7, 35, 36], and making heavy use of the special structure of BCOP, we will refine the SQP method to propose a custom-made implementation. It is known that SQP is one of the most widely used methods for the general nonlinearly constrained optimization. In particular, it generates steps by solving quadratic subproblems (QPs). Traditional SQP method (see e.g., [16]) takes certain penalty function as the merit function to determine if a trial step is accepted or not. One known problem in this procedure is that a suitable penalty parameter is difficult to set. To get around that trouble, Fletcher and Leyffer [13] introduced the filter technique to globalize the SQP method, which turns out to be very efficient and effective, and is proved to be globally convergent [1, 14]. The filter technique is later applied to various problems and combined into other methods; examples include Ulbrich et al. [37], Karas et al. [1], Ribeiro et al. [3], Wächter and Biegler [38, 39, 40], etc. Unfortunately, when it is directly applied to solve BCOP, the classical SQP method based on QP subproblems encounters numerical difficulties if m and p i get large. For instance, in the problem of the nearest correlation matrix with p = p i (i = 1,,..., m) factors structure [3, ] to be discussed in Section 5 (see (5.68)), solving the corresponding QP subproblem is both time-consuming and memory demanding as m and p increase. It is nearly intractable with dimensions, say m = 500, p = 50. As indicated in [3], both Newton method and classical SQP method fail to solve BCOP when m and p are large. The spectral projected gradient method (SPGM) is thus proposed in [3] to alleviate such heavy computational burden which uses less memory and computational costs at each iteration. The numerical results [3] show that SPGM is efficient for many medium-scale tested instances, but the number of iterations probably varies drastically from instance to instance, and can perform worse in case when p is close to m than in other situations. Fortunately, the standard SQP method can be improved largely for BCOP by exploiting the special structure contained in the constraints. One of remarkable features is that the Jacobian matrix c(x) is sparse and structured, which can be utilized to reduce computational amounts and memory requirements at each iteration. To do so, we employ the active set technique [4, 41] to estimate the active set of inequalities associated with the minimizer and then, similar to QP-free methods [6, 15, 9, 30, 34, 41, 4], transform the QP subproblem into relevant linear system(s). As m and p get large, the size of the resulting linear system can naturally be large too, but the limited memory BFGS (L-BFGS) [3] plus duality technique [36] can be effectively employed, which dramatically reduces the computational costs and memory requirements for the associated linear systems. By counting the detailed computational complexity for this procedure, we will see that there is a large amount of flops saved at each iteration. On the other hand, the local fast convergence can be preserved due to the SQP framework and the L-BFGS technique, and the global convergence is also guaranteed with the aid of filter technique. We apply this implementation to two specific practical applications: the correlation approximation problem [3, ] and maximal correlation problem [7] in Section 5; our numerical experiments demonstrate that the proposed method is robust and efficient, and is competitive to some recently custom-designed methods for each individual application, including SPGM, the block relaxation method [3] and the majorization method [3] for the correlation approximation problem, and the Riemannian trust-region method [44] for the maximal correlation problem. The rest of this paper is organized as follows. In the first part of Section, we first reformulate the QP subproblem into a relevant linear system by duality, and then introduce the L-BFGS technique to alleviate
3 the computational burden in solving these linear systems; the detailed implementation by exploiting the sparsity of the Jacobian matrix c(x) is stated; then we discuss the filter technique to globalize the SQP method; the overall algorithm is presented in the last part of Section. In Sections 3 and 4, we establish the global convergence and the local convergence rate of the proposed algorithm, respectively. The numerical experiments on the two specific applications are carried out in Section 5, where we report our numerical experiences by comparing the performance of our algorithm with others. Concluding remarks are finally drawn in Section 6. There are a few words for notation. We denote the feasible region of BCOP by Ω := x c i (x) = 0, i E; c i (x) 0, i I}. For the constrained functions c i (x) for i = 1,,..., m, we let c(x) = (c 1 (x),..., c m (x)) T : R n R m and c(x) = ( c 1 (x),..., c m (x)) R n m ; for a particular index subset J = i 1, i,..., i j } of 1,,..., m}, we denote by J c the cardinality of J and denote c J (x) = (c i1 (x),..., c ij (x)) T : R n R j and c J (x) = ( c i1 (x),..., c ij (x)) R n j ; thus the definitions of c E (x) and c I (x) follow naturally. Finally, suppose η k } and ν k } are two vanishing sequences, where η k, ν k R, k N; we denote η k = O(ν k ) if there exists a scalar c > 0 such that η k c ν k for all k sufficiently large, η k = o(ν k ) if lim k + η k ν k = 0, and η k = Θ(ν k ) if both ν k = O(η k ) and η k = O(ν k ) hold. Algorithm.1 The working set We begin with the first-order optimality conditions (or the KKT conditions), which can be written as where x L(x, λ) = f(x) + c(x)λ = 0, (.1) λ i c i (x) = 0, i I, (.) c i (x) 0, λ i 0, i I, (.3) c i (x) = 0, i E, (.4) L(x, λ) := f(x) + c(x) T λ is the Lagrange function and λ R m is the Lagrange multiplier. As our method is based on the active set approach, we next state the strategy to identify the active set. To this end, similar to [11, 19, 8], we first introduce the following function φ : R n+m R, where Ψ : R n+m R n+m is defined by Ψ(x, λ) = φ(x, λ) = Ψ(x, λ), x L(x, λ) c E (x) min c I (x), λ I }. 3
4 Thus the set A I (x, λ) = i I c i (x) minφ(x, λ), 10 6 } } (.5) provides an estimation of the active set I(x ) = i c i (x ) = 0, i I} of inequality constraints, where (x, λ ) is the KKT point at the minimizer of BCOP. It is true that when (x, λ) is sufficiently close to (x, λ ), the estimate A I (x, λ) is accurate, provided both of the Mangasarian-Fromovitz constraint qualification (MFCQ) and the second-order sufficient condition (SOSC) hold at (x, λ ) (see [8, Theorem.]). Now, suppose the current iteration (x k, λ k ) is an approximation to (x, λ ), then we define A k := A I (x k, λ k ) E (.6) as our working set, which includes all equality constraints, nearly active indices of inequality constraints and the indices of the violated inequality constraints. This choice of the working set is similar to [15, 41, 4] and is based on the following observations: it is reasonable to include i I whenever c i (x k ) is close to zero (say c i (x k ) 10 6 ); as for equality constraints and those violated inequality constraints (say c i (x k ) > 10 6 ), we include them in the working set in the hope of reducing the violation. After identifying the working set A k, a QP subproblem can be formulated which, by the QP-free technique [6, 15, 9, 30, 34, 41, 4], can alternatively be solved by solving a relevant linear system (details on the linear systems are discussed in the next subsection). The solution of the resulting linear system yields the search direction and generates curvature information of BCOP at (x k, λ k ). One issue related with the linear system is the consistency, which is equivalent to the linear independence of the gradients of constraints corresponding to the working set A k. Due to the structure of BCOP, we prove in Lemma.1 that c Ak (x k ) is of full column rank as long as x k is confined to the set Ω p := x x k [i] 0.5 for all i E}. Based on this fact, we can say that our choice of working set A k does not invoke any complicated procedure as those in [34, 41, 4], where the working sets I k should be determined via calculating the rank of c Ik (x k ) or the determinant of c Ik (x k ) T c Ik (x k ) for each trial estimate I k until c Ik (x k ) is of full column rank. Lemma.1. If x k Ω p, then the vectors c i (x k ), i A k are linear independent, where A k is defined in (.5)-(.6). Proof Since x k Ω p, it follows that x k [i] 0.5 for all i E and therefore x k [i] 0 for all i E. For i A k E, c i (x k ) = x k [i] and therefore x k [i] 0. Suppose that there exist scalars l i R, i = 1,..., m such that m i=1 l i c i (x k ) = 0. Note that l 1 x m k [1] l i c i (x k ) =.. i=1 l m x k [m] Because x k [i] 0 for all i = 1,..., m, we have that l i = 0 for all i A k, which implies that c i (x k ), i A k are linear independent. Analogously, we have the following lemma. Lemma.. Let the subsequence x k l } of x k } with x k } Ω p converge to x, and let A kl A for all sufficiently large l. Then c A (x ) is of full column rank. Proof Since x k l Ω p and x k l x, we have that x [i] 0.5 for all i E and therefore x [i] 0 for all i E. For i A E, c i (x k l ) 10 6, and then c i (x ) 10 6 as k l. By the definition of c(x), we also have that x [i] 0 for all i A I. Analogous to the proof of Lemma.1, c i (x ), i A are linear independent as was to be shown. 4
5 . The QP subproblem and its reformulation In this and the next subsections, we discuss how to compute the search direction at x k. After the working set A k is determined, the search direction d k and its associated Lagrange multiplier λ k can be determined via solving (probably two or three with different perturbed vectors w k R m where m = A k c ) equality constrained QP subproblem(s) in the form of: min d R n 1 dt B k d + f(x k ) T d s.t. c Ak (x k ) T d = w k, (.7) where B k R n n is symmetric and positive definite that is an approximation of the Hessian of the Lagrangian function L(x k, λ k ). We point out that B k can be updated by the BFGS formula [7]. The strategy of choosing different perturbed w k is similar to [4, 41] and they correspond to two types of search directions d k, which are designed for the purpose of the global convergence and locally superlinear convergence. In order to simplify the subsequent presentation, we identify these two cases by a boolean variable FAST, i.e., FAST=FALSE or FAST=TRUE, respectively. Details of the choice of w k for the search direction are delayed until Algorithm 3 and Remark., and we next will discuss an efficient procedure for solving the solution d k of (.7). It is evident that the equality constrained quadratic programming (.7) is equivalent to the linear system: B k d + c Ak (x k )λ = f(x k ), c Ak (x k ) T (.8) d = w k. However, as n gets large, solving the linear system (.8) can be expensive. In addition, without effectively exploiting the underlying sparse structure, the associated coefficient matrix could occupy too much memory. To resolve these numerical difficulties, we make use of the duality technique and solve the dual problem of (.7) 1 max λ R m λt W k λ + b T k λ. (.9) Note that (.9) is an unconstrained optimization problem with relatively smaller size m, where W k = c Ak (x k ) T B 1 k c A k (x k ), (.10) b k = w k + c Ak (x k ) T B 1 k f(xk ). (.11) Note that B k is positive definite and therefore strong duality follows, which implies that the search direction d k and the guess λ k of the associated Lagrange multiplier can be obtained from (.9), instead of (.7). In particular, observing that W k R m m and m m is much smaller than n, solving the KKT condition of (.9) or, equivalently, solving a much smaller linear system: W k λ = b k (.1) is inexpensive. Once λ k is obtained from (.1), putting it into the first equation in (.8) yields d k = B 1 ( k f(x k ) + c Ak (x k )λ k). (.13) The above procedure resolves most numerical difficulties. The last issue is how to calculate W k efficiently. The idea is to adopt the L-BFGS techinique which is the topic of the next subsection. 5
6 .3 Compute the search direction based on the L-BFGS formula The limited memory BFGS method [7, Chapter 9] is one of the most effective and widely used methods in the field of large scale unconstrained optimization. The main advantage is that the L-BFGS approach does not require to calculate or store a full Hessian matrix, which might be too expensive for large scale problems. For BCOP, we have pointed out that the matrix W k = c Ak (x k ) T B 1 k c A k (x k ) in (.10) needs to be computed. Note that c Ak (x k ) is large but sparse and structured, and if we adopt the L-BFGS formula to update the inverse of the Hessian approximation B k, much storage space and computational costs can be saved. To describe the detailed procedure, let S k = [s k l,..., s k 1 ], Y k = [y k l,..., y k 1 ], where s i = x i+1 x i and y i = L(x i+1, λ i ) L(x i, λ i ), i = k l,..., k 1. One may notice that the solution λ i to (.1) is in R m rather than in R m, and plugging λ i into L(x i, λ i ) is inappropriate. Nevertheless, we can augment λ i by setting λ i j = 0 for j I\A i. With this augment scheme, in what follows, we will use λ i to denote the estimate multiplier in R m as long as no confusion is caused. By the L-BFGS formula, the matrix B k resulting from l updates to the basic matrix B 0 = ν k I is given by ( B k = ν k I where L k, D k R l l are defined by (L k ) i,j = ) ( ) 1 ( ) ν k Sk T ν k S k Y S k L k ν k Sk T k L T k D k Yk T, (s k l 1+i ) T (y k l 1+i ) if i > j, 0 otherwise, D k = diag(s T k ly k l,..., s T k 1y k 1 ), and ν k = yt k 1 y k 1 s T k 1 y. To ensure the positive definiteness of B k+1, we adopt so-called damped BFGS technique k 1 to modify y k so that s T k y k is sufficiently positive. Let y k θ k y k + (1 θ k )B k s k, where the scalar θ k is defined as 1, if s T k θ k = y k 0.0s T k B ks k, (0.98s T k B ks k )/(s T k B ks k s T k y k), if s T k y k < 0.0s T k B ks k. We then use s k and the modified y k to update S k+1 and Y k+1, respectively. Let H k denote the inverse of B k, then the update formula for H k is given by H k+1 = V T k H k V k + ρ k y k s T k, (.14) where ρ k = 1 yk T s and V k = I ρ k k y k s T k. Using the information (S k and Y k ) of the last l iterations and choosing δ k I with δ k = 1 ν k as the initial approximation Hk 0, we obtain by repeatedly applying (.14) that H k = H f k + Hs k, where and H f k = δ k(v T k 1 V T k l)(v k l V k 1 ) H s k = ρ k l (V T k 1 V T k l+1)s k l s T k l(v k l+1 V k 1 ) +ρ k l+1 (V T k 1 V T k l+)s k l+1 s T k l+1(v k l+ V k 1 ) + + ρ k 1 s k 1 s T k 1. 6
7 For simplicity, we denote c Ak (x k ) by A k. It then follows from (.10) that W k = A T k H k A k = A T k H f k A k + A T k H s ka k. (.15) Since the matrix A k is sparse (no more than n nonzero elements) and V k is structured, we are able to accomplish matrix-chain multiplication for A T k Hf k A k and A T k Hs k A k rather efficiently, through transformation of the most right hand-side of (.15). In particular, it is straightforward that (V k l V k 1 )A k = A k ρ k 1 y k 1 s T k 1A k ρ k l+1 y k l+1 s T k l+1(v k l+ V k 1 )A k ρ k l y k l s T k l(v k l+1 V k 1 )A k. Let q i = ρ i s T i (V i+1 V k 1 )A k for i = k l,..., k and q k 1 = ρ k 1 s T k 1 A k. It then follows that A T k H f k A k = δ k (A T k = δ k A T k A k + k 1 i=k l k 1 q T i y T i k 1 i=k l j=k l Using q i, the last item in (.15) can be rewritten as ) ( A T k H s ka k = A k k 1 i=k l δ k (y T i y j )q T i q j k 1 i=k l y i q i ) k 1 i=k l δ k (q T i y T i A k + A T k y i q i ). (.16) q T i q i ρ i. (.17) Consequently, based on (.16) and (.17), the whole procedure for computing W k = A T k H ka k can be summarized by the pseudo-code in Algorithm 1. We remark that the procedure between lines -13 computes W s k = AT k Hs k A k and lines 15-5 computes W f k = AT k Hf k A k, and line 6 finally forms W k. Remark.1. We finally count the computational complexity of computing W k in Algorithm 1. For this purpose, we assume p i = p for i = 1,,..., m, only for simplicity. First, it requires at most (because m m) (l + l + )mp + lm + O(m) flops for computing W s k = AT k Hs k A k (lines -13), and costs at most ( 3 l + 7 l + 3)mp + (3 l + 7 l)m + O(m) flops for W f k = AT k Hf k A k (lines 15-5). Note that mp = n, and this implies that for l n, computation of W k requires at most O(m +mp) = O(m +n) flops. As for b k and d k, the main computational effort is to compute the matrix-vector product H k z. Applying [7, Algorithm 9.1], it is easy to know that 6lmp = 6ln flops are required for computing H k z, and therefore, computation of b k in (.11) and d k in (.13) needs at most 1lmp + 6mp = (1l + 6)n flops..4 The NLP Filter Suppose we have the search direction d k, then the step size α k is the next important ingredient that determines the iterate x k+1 := x k + α k d k. 7
8 Algorithm 1: Procedure for computing W k based on the L-BFGS formula Data: S k, Y k, A k, δ k Result: W k 1 % Compute W s k = AT k Hs k A k for i = k l,..., k 1 do 3 ρ i = 1/y T i s i; 4 end 5 W s k = 0; 6 for i = k 1,..., k l do 7 u = s T i ; 8 for j = i,..., k do 9 u = u ρ j+1 (uy j+1 )s T j+1 ; 10 end 11 q i = ρ i ua k % q i = 1 W s k = W s k + qt i (q i/ρ i ); 13 end 14 % Compute W f k = AT k Hf k A k 15 W f k = δ ka T k A k 16 for i = k l + 1,..., k 1 do 17 for j = k l + 1,..., i do 18 β = δ k (y T i y j); 19 W f k = W f k + (βq i) T q j ; 0 if j < i then 1 W f k = W f k + qt j (βq i); end 3 end 4 W f k = W f k (δ kq T i )(yt i A k) (A T k y i)(δ k q i ); 5 end 6 W k = W f k + W s k ; ρ i s T i (V i+1 V k 1 )A k, i = k l,..., k ρ k 1 s T k 1 A k, i = k 1 In choosing α k, we will use the filter method and the backtracking line search procedure. In particular, we will generate a decreasing sequence of trials for α k (αmin k, 1] until our preset acceptance criterion is fulfilled or the feasibility restoration phase (Section.5) is called. Here, αmin k 0 is a lower bound of αk and we will give an explicit formula of αmin k in the next subsection. Let ˆx := x k + ˆαd k, ˆα (αmin, k 1] denote a trial point. Using ( ) c E (x) h(x) = maxc I (x), 0} as a measure of infeasibility at the point x, we now give relevant definitions about filter. The first one, Definition.1, is a variant of [14, (.6)]. Definition.1. For given β (0, 1) and γ (0, 1), a trial point ˆx (or equivalently the pair (h(ˆx), f(ˆx))) is 8
9 acceptable to x l (or equivalently the pair (h(x l ), f(x l ))), if h(ˆx) βh(x l ) or (.18) f(ˆx) f(x l ) γ minh(ˆx), h(ˆx) }. (.19) In the original paper of Fletcher and Leyffer [13], a pair (h(ˆx), f(ˆx)) is said to dominate (h(x l ), f(x l )) if both (.18) and (.19) hold with β = 1 and γ = 0, and a filter is defined as a list of pairs (h(x l ), f(x l )) such that no pair dominates any other in this filter [13, Definition ]. The condition (.19) is a variant of [14, (.6)] where f(ˆx) f(x l ) γh(ˆx). Note that (.19) is equivalent to: f(ˆx) f(x l ) γh(ˆx) if h(ˆx) 1 and f(ˆx) f(x l ) γh(ˆx) otherwise. The reason to introduce this modified condition on h(ˆx) is that we prefer to accept the trial point ˆx for the purpose of convergence whenever the violation of the feasibility is not severe, i.e., h(ˆx) < 1. Similar to the original definition of the filter in [13], based on Definition.1, we define our filter, denoted by F k at the iteration k, as a set of pairs (h(x l ), f(x l )) such that any pair in the filter is acceptable to all previous pairs in F k in the sense of Definition.1. Initially with k = 0, the filter F k can begin with the pair (χ, ), where χ > 0 is imposed on h(ˆx) as an upper bound to control the constraint violation [13]. At the start of iteration k, the current pair (h(x k ), f(x k )) F k but must be acceptable to it, while at the end of iteration k, the pair (h(x k ), f(x k )) may or may not be added to F k, depending on our acceptance rule to be discussed in Remark.3. But once (h(x k ), f(x k )) is added to F k, we remove all pairs in the current filter F k which are worse than (h(x k ), f(x k )) with respect to both the objective function value and the constraint violation; the detailed procedure for updating the filter F k will be described in Algorithm 3 and Remark.3. Definition.. A trial point ˆx (or a pair (h(ˆx), f(ˆx))) is acceptable to the filter F k, if ˆx (or a pair (h(ˆx), f(ˆx))) is acceptable to x l in the sense of Definition.1, for all l F k := l (h(x l ), f(x l )) F k }. The trial point ˆx is to be accepted as the next iteration if it is acceptable both to x k (by Definition.1) and to the filter F k (by Definition.). Nevertheless, such acceptance rule for the trial ˆx may cause the situation: we always accept the points that satisfy (.18) alone, but not (.19). This would result in an iterative sequence converging to a feasible, but non-optimal point. To avoid this situation, we impose additional condition on ˆx: Case 1 When FAST=FALSE or ˆα < 1: if ˆα f(x k ) T d k > δh (x k ), (.0) then accepting ˆx as the next iterate x k+1 should satisfy f(ˆx) f(x k ) + ˆαη f(x k ) T d k ; (.1) Case When FAST=TRUE and ˆα = 1: if f(x k ) T d k > δh (x k ) and h(x k ) ζ 1 d k ζ, (.) then accepting ˆx as the next iterate x k+1 should satisfy f(ˆx) f(x k ) η min f(x k ) T d k, ξ d k ζ }, (.3) where ζ 1 > 0, ζ (, 3), ξ > 0, η (0, 1 ), and δ > 0 is chosen to satisfy δ γ/η. Note that Case 1 and Case are mutually exclusive. The motivation for these conditions is from [33, section ]. The switching condition for Case 1 and Case and the sufficient reduction conditions (.1) and 9
10 (.3) are useful for the global convergence and the fast local convergence as well: If (.0) for Case 1 is satisfied, then the direction d k is descent for f(x), and thereby imposing the reduction condition (.1) on f(x) is helpful for the global convergence; if (.) for Case is satisfied, implying d k a search direction for fast local convergence, the full step (i.e., ˆα = 1) is expected so that the fast local convergence can be achieved. Note that the condition (.3) is more relaxed than (.1) as we prefer to accept the full step. Finally, we are able to state our rule for accepting the trial point ˆx as the next iterate. Acceptance Rule: A trial point ˆx is accepted as the next iterate x k+1 if it is acceptable to F k (h(x k ), f(x k ))}, and one of the following two conditions holds, (i) either (.0) and (.1) for Case 1 or (.) and (.3) for Case are satisfied; (ii) (.0) for Case 1 or (.) for Case is not satisfied. If the trial point ˆx does not satisfy ˆx Ω p or the Acceptance Rule, we shrink ˆα until the trial point is accepted or ˆα αmin k. Once the latter occurs, the feasibility restoration phase is called, which is discussed in the next subsection..5 Feasibility Restoration Phase Motivated by [38], we define the lower bound αmin k of ˆα by } αmin k min 1 β, γh(xk ), δh (x k ), f(x k ) T d k < 0, = f(x k ) T d k f(x k ) T d k α φ, otherwise, (.4) where α φ is a positive scalar. Through shrinking ˆα, if we cannot find a step size ˆα (αmin k, 1] such that the trial point ˆx is accepted by the Acceptance Rule, we then turn to the feasibility restoration phase. Note that when the iteration gets into the restoration phase, x k is infeasible, but if x k is feasible, h(x k ) = 0 and there must be some ˆα (αmin k, 1] so that ˆx is accepted (see Lemma 3.9). Based on these facts, in the restoration phase, we project x k onto Ω to get the next iterate x k+1 = P Ω (x k ). Since the feasible set Ω is of special structure, projecting x k onto Ω (Algorithm ) is easy and costs only at most 3n flops. Algorithm : P Ω (x k ): projection x k onto Ω 1 Given x k ; for i=1,...,m do 3 if (i m 1 & x k [i] 1) or (i > m 1 & x k [i] > 1) then 4 x k [i] xk [i] / xk [i] ; 5 end 6 end 7 return x k ; 10
11 .6 The Statement of Algorithm We now state the overall algorithm. Algorithm 3: Filter active set method (FilterASM) 1 Given x 0 Ω p, χ > h(x 0 ), ν (, 3), β (0, 1), γ (0, 1), η (0, 1 ), δ γ η, ξ > 0, α φ (0, 1 ), ζ 1 > 0, ζ (, 3), r (0, 1). Initialize F 0 with the pair (χ, ); for k=0,1,,...,maxit do 3 Determine the working set A k ; 4 Compute λ k,0 by (.1) with w k = c Ak (x k ) and d k,0 by (.13) with λ k = λ k,0 5 if d k,0 = 0 and λ k,0 i 0 ( i A k I), stop % Termination condition 6 if λ k,0 i 0, i A k I, (.5) then 7 Set FAST=TRUE, d k = d k,0, λ k = λ k,0, and w k = c Ak (x k ) c Ak (x k + d k ) d k,0 ν e; 8 else 9 Set FAST=FALSE, and w k = 0; 10 end 11 Compute λ k,1 by (.1) with w k and compute d k,1 by (.13) with λ k = λ k,1 ; 1 if FAST=TRUE then 13 Set ˆd k = 14 else 15 Set [u Ak ] i = 0, if d k,1 d k,0 > d k,0, d k,1 d k,0, otherwise; min c ji (x k ), 0} + λ k,1 j i, λ k,1 j i < 0 (j i A k I), c ji (x k ), others where A k = j 1,..., j Ak c }; 16 Compute λ k, by (.1) with w k = u Ak and compute d k, by (.13) with λ k = λ k, ; 17 Set d k = d k,, λ k = λ k,1 ; 18 end 19 if FAST=FALSE or x k + d k + ˆd k does not satisfy the Acceptance Rule, or x k + d k + ˆd k / Ω p then 0 Find α k > αmin k, the first number αk of the sequence 1, r, r,...} such that ˆx = x k + α k d k satisfies the Acceptance Rule and ˆx Ω p ; 1 else Set ˆx = x k + d k + ˆd k and α k = 1; 3 end 4 if the above α k (i.e., α k > αmin k ) does not exist then 5 Go to the feasibility restoration phase to get x k+1 = P Ω (x k ) and add (h(x k ), f(x k )) to F k ; 6 else 7 if (.0) for Case 1 or (.) for Case does not hold then add (h(x k ), f(x k )) to F k ; 8 Set x k+1 = ˆx, s k = x k+1 x k, y k = L(x k+1, λ k ) L(x k, λ k ), and update S k, Y k to S k+1, Y k+1. 9 end 30 end Remark.. In Algorithm 3, lines 3-18 state the procedure for computing the search direction d k, the guess of Lagrange multiplier λ k, together with some other quantities (say d k,0, λ k,0, d k,1, λ k,1, d k, and λ k, etc.) related to d k and λ k, while lines 19-3 describe the procedure for the step size α k. In computing the search direction between lines 3 and 18, there are two different cases: 11
12 (i) FAST=TRUE. The pair (d k, λ k ) = (d k,0, λ k,0 ) solves B k d k,0 + c Ak (x k )λ k,0 = f(x k ), c Ak (x k ) T d k,0 = c Ak (x k ), (.6) which is a quasi-newton equation of KKT system (.1)-(.4) at the working set A k. To achieve fast local convergence and to overcome the Maratos effect, we adopt the second order correction technique. In particular, we compute the second order correction step by setting ˆd k = d k,1 d k,0 where d k,1 is from B k d k,1 + c Ak (x k )λ k,1 = f(x k ), c Ak (x k ) T d k,1 = (c Ak (x k + d k ) + c Ak (x k ) + d k,0 ν (.7) e). Here, e = (1, 1,...., 1) T with appropriate dimension. Then we check if ˆx = x k + d k + ˆd k satisfies the Acceptance Rule. If it fails, this second order correction step ˆd k is discarded, and the backtracking technique is invoked to find a step size α k such that x k + α k d k is accepted. (ii) FAST=FALSE. The search direction d k = d k, is computed by solving B k d k, + c Ak (x k )λ k, = f(x k ), c Ak (x k ) T d k, (.8) = u Ak, where u Ak (line 15) uses the information of λ k,1 from the system B k d k,1 + c Ak (x k )λ k,1 = f(x k ), c Ak (x k ) T d k,1 = 0. (.9) We explain the above two linear systems as follows: the solution d k,1 of (.9) is in the null space of c Ak (x k ) T and targets at improving f(x) rather than h(x); because d k,1 may be close to zero with a negative multiplier λ k,1, a slight perturbation system (.8) of (.9) is to be solved and yields a new direction d k,, which aims at improving h(x) instead, and prevents the unwelcome effect caused by a negative multiplier. In all, d k in this case contributes to the global convergence. Remark.3. The filter F k is updated either in line 5 or line 7. In other words, the pair (h(x k ), f(x k )) is added to F k and remove all other pairs in F k dominated by (h(x k ), f(x k )) if (.0) for Case 1 or (.) for Case is not fulfilled or the restoration phase is invoked. Remark.4. For the sake of convenience for analyzing the convergence, we borrow the terminology from Fletcher, Leyffer and Toint [14]: we call an iterate an f-type iterate if x k+1 = x k + α k d k (or x k+1 = x k + d k + ˆd k ) is accepted according to (i) of the Acceptance Rule; otherwise, we call the iterate an h-type iterate, which means that x k+1 is accepted according to (ii) of the Acceptance Rule, or is recovered from the feasibility restoration phase. 3 Global convergence In this section we show the global convergence of Algorithm 3 under the following two assumptions: (A1) The objective function f(x) is twice continuously differentiable; (A) The matrix B k is bounded and uniformly positive definite for all k; that is, there exists a scalar τ > 0 such that 1 τ d d T B k d τ d holds for any d R n and any k. We begin with the boundedness of the iterates. Lemma 3.1. The sequence x k } generated by Algorithm 3 is bounded. 1
13 Proof Since all iterates from Algorithm 3 satisfy the upper bound condition h(x k ) χ because F 0 = (χ, )}, combining with the definitions of h(x) directly leads to the boundedness of x k }. Theorem 3.. Suppose that Assumption (A1) holds. Let x k l } be an infinite subsequence of x k } on which (h(x k l ), f(x k l )) is added into the filter. Then lim k l h(xk l ) = 0. Proof From Assumption (A1) and Lemma 3.1, we know that f(x k l )} is bounded from below. Applying [33, Lemma 3.1] yields the assertion. Theorem 3. implies that all accumulation points of x k l } on which (h(x k l ), f(x k l )) is added into the filter are feasible points for BCOP. Lemma 3.3. Suppose that Assumptions (A1)-(A) hold. If FAST=TRUE, then the sequence (d k,0, λ k,0 )} is bounded; if FAST=FALSE, then both sequences (d k,1, λ k,1 )} and (d k,, λ k, )} are bounded. Proof From Algorithm 3, λ k,0 = W 1 k b k with b k = c Ak (x k ) + c Ak (x k ) T B 1 k f(xk ) in the case of FAST=TRUE, where W k = c Ak (x k ) T B 1 k c A k (x k ) is uniformly positive definite for all k due to Lemmas., 3.1 and Assumption (A). Again using Lemma 3.1 and Assumption (A), b k is bounded and therefore λ k,0 is bounded too, which together with the boundedness of B 1 k, xk and λ k,0 implies that d k in (.13) is bounded for all k. Analogously, in the case of FAST=FALSE, W k and its inverse are bounded for all k. Lemma 3.1 and Assumption (A) ensure the boundedness of c Ak (x k ) T B 1 k f(xk ). Since λ k,1 = W 1 k c A k (x k ) T B 1 k f(xk ) and d k,1 = B 1 ( k f(x k ) + c Ak (x k )λ k), it follows that both λ k,1 and d k,1 are bounded for all k. In view of the definition of u Ak (see line 15 of Algorithm 3) and the boundedness of x k }, u Ak is bounded too, which implies the boundedness of λ k, = W 1 k (u A k + c Ak (x k ) T B 1 k f(xk )). Consequently, d k, in (.13) with λ k, is bounded for all k. Remark 3.1. Based on the previous lemmas, for the convenience of further reference, we assume d k,j M d, j = 0, 1, and λ k,j M λ, j = 0, 1, for all k, where M d > 0 and M λ > 0 are two constants. Lemma 3.4. Under Assumptions (A1)-(A), the following two statements are true. (i) If FAST=TRUE and d k = 0, then x k is a KKT point of BCOP. (ii) If FAST=FALSE, h(x k ) = 0 and f(x k ) T d k = 0, then x k is a KKT point of BCOP. Proof (i) Since λ k,0 is from (.1) with b k = c Ak (x k ) + c Ak (x k ) T B 1 k f(xk ), rearranging (.1) leads to c Ak (x k ) = W k λ k,0 + c Ak (x k ) T B 1 k f(xk ) which, using (.13) and the definition of W k, gives c Ak (x k ) = c Ak (x k ) T B 1 k ( c A k (x k )λ k,0 + f(x k )) = c Ak (x k ) T d k,0. Putting d k,0 = d k = 0 into the above equation yields c Ak (x k ) = 0; now combining with the definition of A k implies that x k is feasible, that is, c E (x k ) = 0 and c I (x k ) 0. From Assumption (A) and (.13), d k,0 = 0 leads to c Ak (x k )λ k,0 + f(x k ) = 0 which shows the dual feasibility at x k. In addition, the nonnegativeness of λ k,0 is guaranteed by the mechanism of Algorithm 3 (in the case of FAST=TRUE). Thus, x k satisfies a variant of the KKT conditions (.1)-(.4) and therefore is a KKT point. (ii) By Algorithm 3, if FAST=FALSE, then λ k,1 = W 1 k c A k (x k ) T B 1 k f(xk ), (3.30) d k,1 = B 1 k ( f(xk ) + c Ak (x k )λ k,1 ), (3.31) λ k, = W 1 k u A k + λ k,1, (3.3) d k, = d k,1 + B 1 k c A k (x k )W 1 k u A k. (3.33) 13
14 From (3.33) and (3.30), we have that f(x k ) T d k, = f(x k ) T d k,1 + f(x k ) T B 1 k c A k (x k )W 1 k u A k = f(x k ) T d k,1 (λ k,1 ) T u Ak. (3.34) By premultiplying the first equation of (.9) by (d k,1 ) T and using the second equation of (.9), we get f(x k ) T d k,1 = (d k,1 ) T B k d k,1. Substituting it into (3.34) yields f(x k ) T d k, = (d k,1 ) T B k d k,1 (λ k,1 ) T u Ak. (3.35) According to the hypothesis (ii) of this lemma, c E (x k ) = 0, c I (x k ) 0 and f(x k ) T d k, = 0. Combining with the definition of u Ak, the second term in the righthand side of (3.35) can be changed to and then λ k,1 i <0,i A k I 0 = (d k,1 ) T B k d k,1 [(λ k,1 i ) + max λ k,1 i c i (x k ), 0}] λ k,1 i <0,i A k I λ k,1 i 0,i A k I [(λ k,1 i ) + max λ k,1 i c i (x k ), 0}] + λ k,1 i c i (x k ), λ k,1 i 0,i A k I λ k,1 i c i (x k ). It is easy to see that the first two terms (excluding the sign) in the righthand side are non-negative and the last term is non-positive, which implies that all terms in the righthand side must be zero. In particular, the first term (d k,1 ) T B k d k,1 = 0 implies the primal optimality condition c Ak (x k )λ k,1 + f(x k ) = 0 due to Assumption (A) and (3.31); the second term λ k,1 i <0,i A k I [(λk,1 i ) + max λ k,1 i c i (x k ), 0}] = 0 implies λ k,1 0; and the third term λ k,1 i 0,i A k I λk,1 i c i (x k ) = 0 implies λ k,1 i c i (x k ) = 0, i A k I which gives the complementarity condition. Thus, x k is a KKT point of BCOP. Remark 3.. Since B k is uniformly positive definite and uniformly bounded, by Lemma., the conclusion of Lemma 3.4 can be extended to its limit form: (i) if FAST=TRUE and d k l 0, then any limit point x of x k l } is a KKT point of BCOP, where k l } is an infinite subsequence of k}; (ii) if FAST=FALSE, h(x k l ) 0 and f(x k l ) T d k l 0, then any limit point x of x k l } is a KKT point of BCOP, where k l } is an infinite subsequence of k}. We next establish a series of lemmas concerning the f-type iterates. Lemma 3.5. Suppose that Assumptions (A1)-(A) hold. Then there exist scalars M h, M f > 0 and αu k (0, 1] such that h(x k + αd k ) (1 α)h(x k ) M hα d k (3.36) holds for all α (0, α k u], and f(x k + αd k ) f(x k ) α f(x k ) T d k M f α d k (3.37) holds for all α (0, 1], where d k is generated by Algorithm 3. Proof If FAST=TRUE, (d k, λ k ) = (d k,0, λ k,0 ) solves (.6), implying that c Ak (x k ) + c Ak (x k ) T d k = 0, (3.38) 14
15 and if FAST=FALSE, (d k, λ k ) = (d k,, λ k, ) solves (.8) which together with the definition of u Ak yields c i (x k ) + c i (x k ) T d k = c i (x k = 0, i E, ) + u i (3.39) 0, i A k I. Since c i (x k ), i A k are quadratic functions, it follows that for i A k c i (x k + αd k ) = c i (x k ) + α c i (x k ) T d k + α (dk ) T Q i d k, where Q i is the Hessian of c i (x). As a result, for either FAST=TRUE or FAST=FALSE, using (3.38) and (3.39) we have c i (x k + αd k ) = (1 α)c i (x k ) + α (dk ) T Q i d k i E, c i (x k + αd k ) (1 α)c i (x k ) + α (dk ) T Q i d k i A k I. Therefore, it is straightforward to get that for all i E and for all i A k I c i (x k + αd k ) (1 α) c i (x k ) + M hα d k (3.40) max0, c i (x k + αd k )} (1 α) max0, c i (x k )} + M hα d k, (3.41) where M h > 0 is a scalar satisfying Q i M h for all i A k. On the other hand, for i I\A k, c i (x k ) < 0 due to the definition of A k ; by the continuity of c i (x), there exists a scalar αu k (0, 1] such that c i (x k +αd k ) < 0 for all i I\A k and all α (0, αu]. k Consequently, in view of the definition of h(x), ( ) ( ) h(x k c E (x k ) ) = maxc Ak (x k and h(x k + αd k c E (x k + αd k ) ) = ), 0} maxc Ak (x k + αd k α (0, α ), 0} u], k which together with (3.40) and (3.41) gives (3.36). As for (3.37), it readily follows from Taylor s Theorem that f(x k + αd k ) f(x k ) α f(x k ) T d k = α (dk ) T f(ξ k )d k (3.4) where ξ k R n lies in the line segment from x k to x k + d k. Since x k and d k are bounded for all k, and the objective function f(x) is twice continuously differentiable, there exists a scalar M f > 0 such that f(ξ k ) M f for all ξ k, and thus using (3.4) gives (3.37). We remark that αu k in Lemma 3.5 is related to x k ; however, with some additional conditions, αu k in the conclusion of Lemma 3.36 can be reduced to a constant, which is shown in the following corollary. Corollary 3.6. Suppose that Assumptions (A1)-(A) hold. Let x k l } converge to a non-optimal point x and A kl keeps unchanged for all k l. Then there exist scalars M h > 0 and α u (0, 1] such that holds for all α (0, α u ], where d k l is generated by Algorithm 3. h(x k l ) (1 α)h(x k l ) M hα d k l (3.43) 15
16 Proof According to the hypothesis of this corollary, A kl A for all k l, where A is a finite index set independent of k l. Recalling the definition of A (i.e., A kl ) and x k l x, we obtain that c i (x ) < 0 for all i I\A and by continuity of c i (x), there exists an open ball B(x ; r) of radius r > 0 centered at x such that for any y B(x ; r), c i (y) < 0, i I\A. Again using x k l x, and d k l M d due to Remark 3.1, there exists a scalar ᾱ > 0 and an integer k l > 0 such that c i (x k l ) < 0, i I\A for all α (0, ᾱ] and all k l k l. Thus for all α (0, ᾱ] and k l k l, ( ) ( ) h(x k l c E (x k l ) ) = maxc A (x k l and h(x k l c E (x k l ) ) = ), 0} maxc A (x k l. ), 0} Following the proof of Lemma 3.5, for all i E and for all i A I c i (x k l ) (1 α) c i (x k l ) + M hα d k l, max0, c i (x k l )} (1 α) max0, c i (x k l )} + M hα d k l, and therefore (3.43) holds for all α (0, ᾱ] and k l k l. On the other hand, for those iterations with k l < k l, it follows from Lemma 3.5 that (3.43) holds for all α (0, α k l u ]. Define α u = minαu k1, αu k k l 1,..., αu, ᾱ}. We therefore conclude that (3.43) holds for all α (0, α u ] which completes the proof. Define the quantity d k,0, F AST = T RUE Υ k := h(x k ) + f(x k ) T d k,, F AST = F ALSE which is actually another first-order optimality measure due to Lemma 3.4. The proofs of the following lemmas and theorem are related to the optimality measure Υ k. In particular, the next lemma reveals that the search direction d k generated by Algorithm 3 is descent for the objective function if a point is nearly feasible but non-optimal. Lemma 3.7. Suppose that Assumptions (A1)-(A) hold. Let x k l } be a subsequence of x k } for which Υ kl ɛ with a constant ɛ > 0. Then there exist two scalars ɛ 1 > 0 and ɛ > 0 such that the following statement is true: h(x k l ) ɛ 1 f(x k l ) T d k l ɛ. Proof We first consider the case FAST=TRUE. In this situation, Υ kl = d k l,0 ɛ, and (d k l, λ k l ) = (d k l,0, λ k l,0 ) solves (.6). Premultiplying the first equation of (.6) by (d k l,0 ) T, we have that f(x k l ) T d k l = (d k l,0 ) T B k d k l,0 (d k l,0 ) T c Akl (x k l )λ k l,0, while premultiplying the second equation of (.6) by (λ kl,0 ) T and substituting it into above equation yield f(x k l ) T d k l = (d kl,0 ) T B k d kl,0 + i c i (x k l ). (3.44) i A kl λ kl,0 Due to FAST=TRUE, we have λ kl,0 0, and using Remark 3.1 gives λ kl,0 M λ. It is straightforward that λ kl,0 i c i (x k l ) mh(x k l ) λ kl,0 mm λ h(x k l ), i A kl which together with (3.44), Assumption (A) and d k l,0 ɛ gives f(x k l ) T d k l ɛ τ + mm λ h(x k l ). 16
17 Let ɛ 1 := ɛ mτm λ. If h(x k l ) ɛ 1, we then obtain that f(x k l ) T d k l ɛ, where ɛ := ɛ τ. Next, we show the assertion for the case FAST=FALSE. In this situation, d k l = d k l, and Υ kl = h(x k l ) + f(x k l ) T d k l, ɛ. If h(x k l ) ɛ, then From (3.35) and the definition of u Ak, f(x k l ) T d k l, By Assumption (A), one has f(x k l ) T d k l = f(x k l ) T d k l, ɛ. (3.45) = (d k l,1 ) T B kl d k l,1 + f(x k l ) T d k l, λ k l,1 i 0,i A kl I λ k l,1 i λ k l,1 i <0,i A kl I λ k l,1 i c i (x k l ) + λ k l,1 i c i (x k l ). i E 0,i A kl I mh(x k l ) λ k l,1 [(λ k l,1 i ) + max λ k l,1 i c i (x k l ), 0}] λ k,1 i c i (x k l ) + λ k l,1 i c i (x k l ) i E mm λ h(x k l ), (3.46) } ɛ where the third inequality follows from Remark 3.1. Let ɛ 1 := min, ɛ 3 mm λ and ɛ := ɛ. If h(xk l ) ɛ 1, then mm λ h(x k l ) ɛ 3 which combining with (3.46) and (3.45) yields f(xk l ) T d k l ɛ. Lemma 3.8. Suppose that Assumptions (A1)-(A) hold. If h(x k l ) > 0 and f(x k l ) T d k l ɛ (ɛ is from Lemma 3.7), then x k l is acceptable to the k l th filter for all α ᾱ k l, where ᾱ k l = minq 1 h(x k l ), q, α k l u }, q 1 = M h Md and q = ɛ M f. Md Proof The mechanism of Algorithm 3 (lines 19-3) ensures that (h(x k l ), f(x k l )) is acceptable to the k l th filter. We now show that x k l is no worse than x k l for all sufficiently small α > 0 in both feasibility and the objective function, implying that x k l is acceptable to the k l th filter. Since d k l M d due to Remark 3.1, it follows from (3.36) in Lemma 3.5 that for α (0, α k l u ] which turns out to be h(x k l ) h(x k l ) αh(x k l ) + α M h M d h(x k l ) h(x k l ) if 0 α minq 1 h(x k l ), α k l u } with q 1 := M h. Similarly, using (3.37) in Lemma 3.5 and the boundedness Md of d k l, we have that f(x k l ) f(x k l ) α f(x k l ) T d k l + α M f Md, which together with the assumption f(x k l ) T d k l ɛ yields f(x k l ) f(x k l ) αɛ + α M f M d Define q := ɛ M f. If 0 α q Md, then f(x k l ) f(x k l ). Therefore, x k l is acceptable to the k l th filter for all α ᾱ k l := minq 1 h(x k l ), α k l u, q }. With the help of Lemma 3.8, the following two lemmas show that there always exists some acceptable step size α such that x k + αd k is accepted as an f-type iteration point under certain conditions.. 17
18 Lemma 3.9. Suppose that Assumptions (A1)-(A) hold. If x k l is feasible but not optimal, then either x k l + d k l + ˆd k l is an f-type iteration point or there exists α k l 0 > αk l min such that xk l + α k l 0 dk l is an f-type iteration point. Proof The conclusion follows immediately if x k l + d k l + ˆd k l is an f-type iteration point. Otherwise, we need to prove that x k l + α k l 0 dk l is an f-type iteration point for some α k l 0 > αk l min. Since xk l is feasible but not optimal, we must have that h(x k l ) = 0 and Υ kl ɛ with some scalar ɛ > 0. By the mechanism of Algorithm 3 (line 7) and Lemma 3.7, the condition (.0) is always true if h(x l ) = 0, and therefore only pairs with h(x l ) > 0 can be added into the k l th filter. Let According to Lemma 3.5 and d k l M d, π k l := minh(x l ) (h(x l, f(x l )) F kl }. h(x k l ) α M h Md (3.47) } holds for all α (0, α k l u ]. If 0 α min α k l u, βπ k l M h, then h(x k l ) βπ k l, which implies that Md x k l is acceptable to the k l th filter. Since x k l is feasible, it follows from the definition of Ω p that x k l is in the interior of Ω p, which together with the boundedness of d k l shows x k l Ω p for all α in some subinterval of (0, 1], and therefore, we can assume without loss of generality, that x k l Ω p for all α (0, α k l u ]. By Lemma 3.7, h(x k l ) = 0 implies f(x k l ) T d k l ɛ, (3.48) which means that the switching condition for Case 1 and Case holds trivially no matter α < 1 or α = 1. It follows from (3.37) in Lemma 3.5 and the boundedness of d k l that f(x k l ) f(x k l ) αη f(x k l ) T d k l Thus, the sufficient reduction condition (.1) holds if 0 α (1 η)ɛ M f Md it is true from (3.47) that h(x k l ) αηɛ γ. Combining with (.1) and (3.48) yields α(1 η)ɛ + α M f M d f(x k l ) f(x k l ) γh(x k l ),. When 0 α min. } α k l ηɛ u, γm h, Md i.e., x k l is acceptable to x k l. From (.4) and the above proof, we have αmin k = 0, and we can choose any α in (0, ᾱ k l ] as α k l 0 such that xk l + α k l 0 dk l is an f-type iteration point, where } ᾱ k l := min α k βπ l u, k l M h Md, (1 η)ɛ ηɛ M f Md, γm h Md. Lemma Suppose that Assumptions (A1)-(A) hold. Let x k l } be an infinite subsequence of x k } on which (h(x k l ), f(x k l )) is added into the filter, and assume that x k l } converges to x and A kl keeps unchanged for all k l. If x is not a KKT point, then for all sufficiently large k l, either x k l + d k l + ˆd k l is an f-type iteration point or there exists α k l 0 > αk l min such that xk l + α k l 0 dk l is an f-type iteration point. Proof If x k l +d k l + ˆd k l is an f-type iteration point, the conclusion follows immediately. It suffices to prove the assertion for x k l. Since x is not a KKT point, it follows from Remark 3. that there exists a scalar 18
19 ɛ > 0 such that Υ kl ɛ for all sufficiently large k l. In the case of h(x k l ) = 0, the conclusion follows from Lemma 3.9. We now consider the remaining iteration k l with h(x k l ) > 0. As Υ kl ɛ, if h(x k l ) ɛ 1, then by Lemma 3.7, f(x k l ) T d k l ɛ. (3.49) If h(x k l ) < ɛ 1 and α minq 1 h(x k l ), α k l u, q }, it follows from Lemma 3.8 that x k l is acceptable to the k l th filter. Since x k l } converges to x and A kl keeps unchanged for all k l, Corollary 3.6 implies α k l u is independent of k l and we thereby drop the superscript k l in α k l u for the simplicity of the following proof. Analogous to the proof of Lemma 3.9, if 0 α (1 η)ɛ M f, the sufficient reduction condition (.1) is Md fulfilled. Again using Corollary 3.6 and the boundedness of d k l, for 0 α α u, h(x k l ) h(x k l ) αh(x k l ) + α M h M d and therefore h(x k l ) h(x k l ) if 0 α minq 1 h(x k l ), α u }, where q 1 is defined as Lemma 3.8. On the other hand, if (.0) is true, it follows from (.1) that f(x k l ) f(x k l ) αη f(x k l ) T d k l ηδh (x k l ) ηδh (x k l ) γh (x k l ), where the last inequality follows from δ γ/η. Hence, x k l is acceptable to x k l. Since h(x k l ) 0 due to Theorem 3., according to the definition of Ω p, x k l is in the interior of Ω p for all sufficiently large k l, which together with the boundedness of d k l implies x k l Ω p for all α in some subinterval of (0, 1] and all sufficiently large k l and, we assume without loss of generality that x k l Ω p for all α (0, α u ] for all sufficiently large k l. Therefore, we have now shown that for all sufficiently large k l, x k l is acceptable to x k l and the k l th filter, x k l Ω p, and the sufficient reduction condition (.1) holds if (.0) is satisfied, 0 α α k l and where α k l h(x k l ) min 1, ɛ 1, rq 1ɛ δ := min q 1 h(x k l ), q, α u, (1 η)ɛ } M f Md,, }, (3.50) and r is from line 0 in Algorithm 3. Let α k l 0 denote the first trial step size in the sequence 1, r, r,...} that satisfies α α k l. In view of Theorem 3., h(x k l ) tends to zero as k l, and therefore α k l = q 1 h(x k l ) and (3.50) is satisfied for all sufficiently large k l. It is evident that for all sufficiently large k l. Using (3.49) and (3.51) we have α k l 0 r αk l = rq 1 h(x k l ) (3.51) α k l 0 f(xk l ) T d k l rq 1 ɛ h(x k l ), which together with (3.50) implies that the switching condition (.0) for Case 1 is satisfied. Lastly, we show α k l 0 > αk l min. Noting the definition (.4) of αk l min, and using (3.49) and Theorem 3., we know α k l min = δh(x k l ) f(x k l )T d k l 19
Penalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationAlgorithms for Constrained Optimization
1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic
More informationInexact Newton Methods and Nonlinear Constrained Optimization
Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton
More informationAn Inexact Newton Method for Optimization
New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)
More informationAn Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization
An Inexact Sequential Quadratic Optimization Method for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with Travis Johnson, Northwestern University Daniel P. Robinson, Johns
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationAn Inexact Newton Method for Nonlinear Constrained Optimization
An Inexact Newton Method for Nonlinear Constrained Optimization Frank E. Curtis Numerical Analysis Seminar, January 23, 2009 Outline Motivation and background Algorithm development and theoretical results
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationAN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING
AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general
More informationA Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity
A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University
More informationCONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING
CONVERGENCE ANALYSIS OF AN INTERIOR-POINT METHOD FOR NONCONVEX NONLINEAR PROGRAMMING HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global and local convergence results
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationINTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE
INTERIOR-POINT METHODS FOR NONCONVEX NONLINEAR PROGRAMMING: CONVERGENCE ANALYSIS AND COMPUTATIONAL PERFORMANCE HANDE Y. BENSON, ARUN SEN, AND DAVID F. SHANNO Abstract. In this paper, we present global
More informationPDE-Constrained and Nonsmooth Optimization
Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)
More informationA GLOBALLY CONVERGENT STABILIZED SQP METHOD
A GLOBALLY CONVERGENT STABILIZED SQP METHOD Philip E. Gill Daniel P. Robinson July 6, 2013 Abstract Sequential quadratic programming SQP methods are a popular class of methods for nonlinearly constrained
More informationOptimization Problems with Constraints - introduction to theory, numerical Methods and applications
Optimization Problems with Constraints - introduction to theory, numerical Methods and applications Dr. Abebe Geletu Ilmenau University of Technology Department of Simulation and Optimal Processes (SOP)
More information2.3 Linear Programming
2.3 Linear Programming Linear Programming (LP) is the term used to define a wide range of optimization problems in which the objective function is linear in the unknown variables and the constraints are
More information1 Computing with constraints
Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)
More informationSequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems. Hirokazu KATO
Sequential Quadratic Programming Method for Nonlinear Second-Order Cone Programming Problems Guidance Professor Masao FUKUSHIMA Hirokazu KATO 2004 Graduate Course in Department of Applied Mathematics and
More informationOn the use of piecewise linear models in nonlinear programming
Math. Program., Ser. A (2013) 137:289 324 DOI 10.1007/s10107-011-0492-9 FULL LENGTH PAPER On the use of piecewise linear models in nonlinear programming Richard H. Byrd Jorge Nocedal Richard A. Waltz Yuchen
More informationOptimality Conditions for Constrained Optimization
72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)
More informationLecture 13: Constrained optimization
2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems
More informationOptimisation in Higher Dimensions
CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained
More informationImplementation of an Interior Point Multidimensional Filter Line Search Method for Constrained Optimization
Proceedings of the 5th WSEAS Int. Conf. on System Science and Simulation in Engineering, Tenerife, Canary Islands, Spain, December 16-18, 2006 391 Implementation of an Interior Point Multidimensional Filter
More informationPenalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.
AMSC 607 / CMSC 878o Advanced Numerical Optimization Fall 2008 UNIT 3: Constrained Optimization PART 3: Penalty and Barrier Methods Dianne P. O Leary c 2008 Reference: N&S Chapter 16 Penalty and Barrier
More informationA Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm
Journal name manuscript No. (will be inserted by the editor) A Primal-Dual Augmented Lagrangian Penalty-Interior-Point Filter Line Search Algorithm Rene Kuhlmann Christof Büsens Received: date / Accepted:
More informationA GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE
A GLOBALLY CONVERGENT STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30,
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationA globally and quadratically convergent primal dual augmented Lagrangian algorithm for equality constrained optimization
Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 A globally and quadratically convergent primal dual augmented Lagrangian
More informationA STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE
A STABILIZED SQP METHOD: SUPERLINEAR CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-14-1 June 30, 2014 Abstract Regularized
More informationInterior Methods for Mathematical Programs with Complementarity Constraints
Interior Methods for Mathematical Programs with Complementarity Constraints Sven Leyffer, Gabriel López-Calva and Jorge Nocedal July 14, 25 Abstract This paper studies theoretical and practical properties
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More informationAn Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84
An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationA null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties
A null-space primal-dual interior-point algorithm for nonlinear optimization with nice convergence properties Xinwei Liu and Yaxiang Yuan Abstract. We present a null-space primal-dual interior-point algorithm
More informationConstrained optimization: direct methods (cont.)
Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a
More informationAN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS
AN INTERIOR-POINT METHOD FOR NONLINEAR OPTIMIZATION PROBLEMS WITH LOCATABLE AND SEPARABLE NONSMOOTHNESS MARTIN SCHMIDT Abstract. Many real-world optimization models comse nonconvex and nonlinear as well
More informationHot-Starting NLP Solvers
Hot-Starting NLP Solvers Andreas Wächter Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu 204 Mixed Integer Programming Workshop Ohio
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationA New Penalty-SQP Method
Background and Motivation Illustration of Numerical Results Final Remarks Frank E. Curtis Informs Annual Meeting, October 2008 Background and Motivation Illustration of Numerical Results Final Remarks
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationMATH 4211/6211 Optimization Basics of Optimization Problems
MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationA SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION
A SHIFTED PRIMAL-DUAL PENALTY-BARRIER METHOD FOR NONLINEAR OPTIMIZATION Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-19-3 March
More informationLinear Programming Redux
Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationREGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS
REGULARIZED SEQUENTIAL QUADRATIC PROGRAMMING METHODS Philip E. Gill Daniel P. Robinson UCSD Department of Mathematics Technical Report NA-11-02 October 2011 Abstract We present the formulation and analysis
More informationSome new facts about sequential quadratic programming methods employing second derivatives
To appear in Optimization Methods and Software Vol. 00, No. 00, Month 20XX, 1 24 Some new facts about sequential quadratic programming methods employing second derivatives A.F. Izmailov a and M.V. Solodov
More informationA PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION
Optimization Technical Report 02-09, October 2002, UW-Madison Computer Sciences Department. E. Michael Gertz 1 Philip E. Gill 2 A PRIMAL-DUAL TRUST REGION ALGORITHM FOR NONLINEAR OPTIMIZATION 7 October
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationSurvey of NLP Algorithms. L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA
Survey of NLP Algorithms L. T. Biegler Chemical Engineering Department Carnegie Mellon University Pittsburgh, PA NLP Algorithms - Outline Problem and Goals KKT Conditions and Variable Classification Handling
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More information4TE3/6TE3. Algorithms for. Continuous Optimization
4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca
More informationA new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality constraints
Journal of Computational and Applied Mathematics 161 (003) 1 5 www.elsevier.com/locate/cam A new ane scaling interior point algorithm for nonlinear optimization subject to linear equality and inequality
More informationLecture 15: SQP methods for equality constrained optimization
Lecture 15: SQP methods for equality constrained optimization Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 15: SQP methods for equality constrained
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationHYBRID FILTER METHODS FOR NONLINEAR OPTIMIZATION. Yueling Loh
HYBRID FILTER METHODS FOR NONLINEAR OPTIMIZATION by Yueling Loh A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy. Baltimore,
More informationA Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties
A Primal-Dual Interior-Point Method for Nonlinear Programming with Strong Global and Local Convergence Properties André L. Tits Andreas Wächter Sasan Bahtiari Thomas J. Urban Craig T. Lawrence ISR Technical
More informationConstrained Nonlinear Optimization Algorithms
Department of Industrial Engineering and Management Sciences Northwestern University waechter@iems.northwestern.edu Institute for Mathematics and its Applications University of Minnesota August 4, 2016
More informationSpectral gradient projection method for solving nonlinear monotone equations
Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department
More informationPOWER SYSTEMS in general are currently operating
TO APPEAR IN IEEE TRANSACTIONS ON POWER SYSTEMS 1 Robust Optimal Power Flow Solution Using Trust Region and Interior-Point Methods Andréa A. Sousa, Geraldo L. Torres, Member IEEE, Claudio A. Cañizares,
More informationA SHIFTED PRIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OPTIMIZATION
A SHIFTED RIMAL-DUAL INTERIOR METHOD FOR NONLINEAR OTIMIZATION hilip E. Gill Vyacheslav Kungurtsev Daniel. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-18-1 February 1, 2018
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationMultidisciplinary System Design Optimization (MSDO)
Multidisciplinary System Design Optimization (MSDO) Numerical Optimization II Lecture 8 Karen Willcox 1 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Today s Topics Sequential
More informationMODIFYING SQP FOR DEGENERATE PROBLEMS
PREPRINT ANL/MCS-P699-1097, OCTOBER, 1997, (REVISED JUNE, 2000; MARCH, 2002), MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY MODIFYING SQP FOR DEGENERATE PROBLEMS STEPHEN J. WRIGHT
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationCONSTRAINED NONLINEAR PROGRAMMING
149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationAn Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints
An Active Set Strategy for Solving Optimization Problems with up to 200,000,000 Nonlinear Constraints Klaus Schittkowski Department of Computer Science, University of Bayreuth 95440 Bayreuth, Germany e-mail:
More information1. Introduction. We analyze a trust region version of Newton s method for the optimization problem
SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John
More information1. Introduction. We consider the general smooth constrained optimization problem:
OPTIMIZATION TECHNICAL REPORT 02-05, AUGUST 2002, COMPUTER SCIENCES DEPT, UNIV. OF WISCONSIN TEXAS-WISCONSIN MODELING AND CONTROL CONSORTIUM REPORT TWMCC-2002-01 REVISED SEPTEMBER 2003. A FEASIBLE TRUST-REGION
More informationA STABILIZED SQP METHOD: GLOBAL CONVERGENCE
A STABILIZED SQP METHOD: GLOBAL CONVERGENCE Philip E. Gill Vyacheslav Kungurtsev Daniel P. Robinson UCSD Center for Computational Mathematics Technical Report CCoM-13-4 Revised July 18, 2014, June 23,
More informationPart 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)
Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization Nick Gould (RAL) x IR n f(x) subject to c(x) = Part C course on continuoue optimization CONSTRAINED MINIMIZATION x
More informationIBM Research Report. Line Search Filter Methods for Nonlinear Programming: Motivation and Global Convergence
RC23036 (W0304-181) April 21, 2003 Computer Science IBM Research Report Line Search Filter Methods for Nonlinear Programming: Motivation and Global Convergence Andreas Wächter, Lorenz T. Biegler IBM Research
More informationRecent Adaptive Methods for Nonlinear Optimization
Recent Adaptive Methods for Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke (U. of Washington), Richard H. Byrd (U. of Colorado), Nicholas I. M. Gould
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationA globally convergent Levenberg Marquardt method for equality-constrained optimization
Computational Optimization and Applications manuscript No. (will be inserted by the editor) A globally convergent Levenberg Marquardt method for equality-constrained optimization A. F. Izmailov M. V. Solodov
More informationLectures 9 and 10: Constrained optimization problems and their optimality conditions
Lectures 9 and 10: Constrained optimization problems and their optimality conditions Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lectures 9 and 10: Constrained
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationNumerical Optimization
Constrained Optimization Computer Science and Automation Indian Institute of Science Bangalore 560 012, India. NPTEL Course on Constrained Optimization Constrained Optimization Problem: min h j (x) 0,
More informationConstrained Optimization Theory
Constrained Optimization Theory Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Constrained Optimization Theory IMA, August
More informationLIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS
LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,
More informationOn the complexity of an Inexact Restoration method for constrained optimization
On the complexity of an Inexact Restoration method for constrained optimization L. F. Bueno J. M. Martínez September 18, 2018 Abstract Recent papers indicate that some algorithms for constrained optimization
More information