arxiv: v1 [math.oc] 10 Sep 2017

Size: px
Start display at page:

Download "arxiv: v1 [math.oc] 10 Sep 2017"

Transcription

1 A DC Programming Approach for Solving Multicast Network Design Problems via the Nesterov Smoothing Technique W. GEREMEW, 1 N. M. NAM, A. SEMENOV, 3 V. BOGINSKI, 4, and E. PASILIAO 5 arxiv: v1 [math.oc] 10 Sep 017 Abstract. This paper continues our effort initiated in [19] to study Multicast Communication Networks, modeled as bilevel hierarchical clustering problems, by using mathematical optimization techniques. Given a finite number of nodes, we consider two different models of multicast networks by identifying a certain number of nodes as cluster centers, and at the same time, locating a particular node that serves as a total center so as to minimize the total transportation cost through the network. The fact that the cluster centers and the total center have to be among the given nodes makes this problem a discrete optimization problem. Our approach is to reformulate the discrete problem as a continuous one and to apply Nesterov smoothing approximation technique on the Minkowski gauges that are used as distance measures. This approach enables us to propose two implementable DCA-based algorithms for solving the problems. Numerical results and practical applications are provided to illustrate our approach. Key words. DC programming, the Nesterov smoothing technique, hierarchical clustering, subgradient, Fenchel conjugate. AMS subject classifications. 49J5, 49J53, 90C31 1 Introduction The complexity of modern networks such as communication networks, broadcasting networks, and distribution networks requires multilevel connectivity. For instance, many department stores usually get their merchandise delivered to them by a delivery company. For efficiency purposes, the delivery company usually wants to identify a certain number of locations to serve as distribution centers for the delivery of supplies to the stores. At the same time, the company wants to identify a location as a main distribution center, also known as the total center, from which the other distribution centers receive their supplies. This is a typical description of a bilevel multicast communication network, which can also be seen as a multifacility location problem or as a bilevel hierarchical clustering problem. Borrowing some language from network optimization literature, these problems can be described mathematically as follows: Given m nodes a 1, a,..., a m in R n, the objective is to choose k cluster centroids a (1), a (),..., a (k) and a total center a (k+1) from the given nodes in such a way that the total transportation cost of the tree formed by connecting the cluster centers to the total center, and the remaining nodes to the nearest cluster centers is minimized. The fact that the centers and the total center have to be among the existing nodes makes the problem a discrete optimization problem, which can be shown to be NP hard. 1 School of General Studies, Stockton University, Galloway, NJ 0805, USA (wgeremew4@gmail.com). Geremew s research was partly supported by the AFRL Mathematical Modeling and Optimization Institute. Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, Portland, OR 9707, USA (mau.nam.nguyen@pdx.edu). Research of this author was partly supported by the National Science Foundation under grant # University of Jyväskylä, P.O.Box 35 FI University of Jyvaskyla, Finland (alexander.v.semenov@jyu.fi) 4 University of Central Florida, 1800 Pegasus Dr., Orlando, FL 3816 USA (vladimir.boginski@ucf.edu) 5 Air Force Research Laboratory Eglin AFB, FL, USA. (elpasiliao@gmail.com) 1

2 Many existing algorithms for solving bilevel hierarchical clustering problems are heuristics in nature, and do not optimize any well-defined objective function. The mathematical optimization approach for solving hierarchical clustering problems was initiated in the pioneering work from [6]. The authors introduced three models of hierarchical clustering based on the Euclidean norm and employed the derivative-free method developed in [5] to solve the problem in two dimensions. Replacing the Euclidean norm by the squared Euclidean norm, the authors in [3] used the DCA, a well-known algorithm for minimizing differences of convex functions introduced by Pham Dinh Tao (see [4, 7]), to solve the problem in high dimensions. In fact, the DCA provides an effective tool for solving the classical clustering problem and its variants; see [1,, 3, 6, 7, 8, 18, 19] and the references therein. In our recent work [19], we proposed a new method based on the Nesterov smoothing technique and the DCA to cope with the original models of hierarchical clustering introduced in [6]. The idea of using the Nesterov smoothing technique overcomes the drawback of the DCA stated in [3] as the DCA is not appropriate for these models. Our current paper continues the effort initiated in [3, 6] in which mathematical optimization techniques for solving optimization problems beyond convexity are used in multifacility location and clustering. In particular, this paper is the second part of our paper [19] as we propose other two bivelel hierarchical clustering models. Another novel component of the present paper compared to [19] is the possibility of considering problems with generalized distance generated by Minkowski gauges as well as the possibility to handle problems with constraints. In this paper, we propose two implementable algorithms based on a DC programming approach combined with the Nesterov smoothing technique to solve the resulting constrained minimization problems for both models. It is important to note that the DCA can only guarantees the convergence to a critical point, so to achieve better results we often run the algorithms multiple times with different starting points via suitable initialization techniques, such as running the k-means or a genetic algorithm to generate starting centers for the two proposed algorithms. The paper is organized as follows. In Section, we present the continuous optimization formulations of the two models using Minkowski gauges as distance measures. In section 3 we discuss some basic definitions and tools of optimization that are used throughout the paper. In Sections 4 and 5, we develop the two algorithms for the two proposed multicast communication networks. In Section 6 we present our numerical experiments and results performed on artificial datasets as well as real datasets. Problems Formulation In this section, we discuss two models of bilevel hierarchical clustering and provide the tools of optimization used throughout the paper. In order to reformulate the discrete optimization problem under consideration as a continuous optimization problem, we introduce k artificial centers which are not necessarily the existing nodes in designing the optimal multicast networks. Denote the k artificial cluster centers by x 1, x,..., x k and the distance measurement between the artificial center x l, l = 1,..., k, and the real node a i, i = 1,..., m,

3 by a generalized distance σ F (x l a i ), where σ F is the support function associated with a nonempty closed bounded convex set F containing the origin in its interior, i.e., σ F (x) := sup{ x, y y F }. Note that if F is the closed unit Euclidean ball in R n, then σ F (x) defines the Euclidean norm of x R n. In the case where F is the closed unit box of R n, i.e., F := {u = (u 1,..., u n ) R n 1 u i 1 for i = 1,..., n}, then σ F (x) defines the l 1 norm x 1 of x R n. In the first model, the m nodes are clustered around the k artificial centers by trying to minimize the minimum sum of the distances from each node to the k cluster centers. A node with the smallest such sum will serve as the total center. The total connection cost of the tree that needs to be minimized is given by ϕ 1 (x 1,..., x k ) := min σ F (x l a i ) + min σ F (x l a i ).,...,k,...,m On the other hand, in the second model the m nodes are clustered around k + 1 artificial centers by trying to minimize the minimum sum of the distances from each artificial center to the remaining k centers. Such a center will eventually be named as the total center. In this case, the total connection cost of the tree that needs to be minimized is given by ϕ (x 1,..., x k+1 ) := min σ F (x l a i ) + min,...,k+1,...,k+1 j=1 σ F (x l x j ). The main difference between Model I and Model II is the way in which the total center is selected. In addition, in Model II the total center also serves as a cluster center. The algorithms we will develop are expected to solve the continuous optimization models in a reasonable amount of time and give us approximate solutions to the original discrete optimization models. Note that each node a i is assigned to its closest center x l, but in both models the centers might not be real nodes. Therefore, for the continuous optimization model to solve (or approximate) the discrete model, we need to add a constraint that tries to minimize the difference between the artificial centers and the real centers, i.e., and φ 1 (x 1,..., x k ) := φ (x 1,..., x k+1 ) := min σ F (x l a i ) = 0,...,m min σ F (x l a i ) = 0.,...,m Note that we use the generalized distance generated by σ F in the constraints for convenience of presentation although it is possible to use different distances such as the Euclidean distance. Model I was originally proposed in [6] where the authors used the derivative-free discrete gradient method established in [5] to solve the resulting optimization problem, but this 3

4 method is not suitable for large-scale settings in high dimensions. It is also considered in [3] to solve a similar model where the squared Euclidean distance used as a similarity measure. Model II was considered in [8] without constraints, and the hyperbolic smoothing technique was used to solve the problem. 3 Basic Definitions and Tools of Optimization In this section, we present two main tools of optimization used to solve the bilevel hierarchical crusting problem: the DCA introduced by Pham Dinh Tao and the Nesterov smoothing technique. We consider throughout the paper DC programming: minimize f(x) := g(x) h(x), x R n, (3.1) where g : R n R and h: R n R are convex functions. The function f in (3.1) is called a DC function and g h is called a DC decomposition of f. Given a convex function g : R n R, the Fenchel conjugate of g is defined by g (y) := sup{ y, x g(x) x R n }. Note that g : R n (, + ] is also a convex function. In addition, x g (y) if and only if y g(x), where denotes the subdifferential operator in the sense of convex analysis; see, e.g., [13, 16, 5]. Let us present below the DCA introduced by Tao and An [4, 7] as applied to (3.1). Although the algorithm is used for nonconvex optimization problems, the convexity of the functions involved still plays a crucial role. Algorithm 1 The DCA 1: Input: x 0 R n, N N. : for k = 1,..., N do 3: Find y k h(x k 1 ) 4: Find x k g (y k ) 5: end for 6: Output: x N. Let us discuss below a convergence result of DC programming. A function h: R n R is called γ-convex (γ 0) if the function defined by k(x) := h(x) γ x, x R n, is convex. If there exists γ > 0 such that h is γ convex, then h is called strongly convex. We say that an element x R n is a critical point of the function f defined by (3.1) if g( x) h( x). Obviously, in the case where both g and h are differentiable, x is a critical point of f if and only if x satisfies the Fermat rule f( x) = 0. The theorem below provides a convergence result for the DCA. It can be derived directly from [7, Theorem 3.7]. 4

5 Theorem 3.1 Consider the function f defined by (3.1) and the sequence {x k } generated by the Algorithm 1. Then the following properties are valid: (i) If g is γ 1 -convex and h is γ -convex, then f(x k ) f(x k+1 ) γ 1 + γ x k+1 x k for all k N. (ii) The sequence {f(x k )} is monotone decreasing. (iii) If f is bounded from below, g is γ 1 -convex and h is γ -convex with γ 1 + γ > 0, and {x k } is bounded, then every subsequential limit of the sequence {x k } is a critical point of f. Let us present below a direct consequence of the Nesterov smoothing technique given in [1]. In the proposition below, d(x; Ω) denotes the Euclidean distance and P (x; Ω) denotes the Euclidean projection from a point x to a nonempty closed convex set Ω in R n. Proposition 3. Given any a R n and > 0, a Nesterov smoothing approximation of ϕ(x) := σ F (x a) has the representation Moreover, ϕ (x) = P ( x a ; F ) and where F := sup{ f f F }. ϕ (x) := 1 x a [ x a d( ; F )]. ϕ (x) ϕ(x) ϕ (x) + F, 4 Hierarchical Clustering via Continuous Optimization Techniques: Model I In this section, we present an approach of using continuous optimization techniques for hierarchical clustering. As mentioned earlier, our main tools are the DCA and the Nesterov smoothing technique. Recall that the first model under consideration is formulated as a constrained optimization problem: minimize subject to min σ F (x l a i ) +,...,k min,...,m σ F (x l a i ) min σ F (x l a i ) = 0, x 1,..., x k R n.,...,m After the centers x 1,..., x k have been found, a total center is selected from the existing nodes as follows: For each i = 1,..., m, we compute the sum k σ F (x l a i ). Then a total center c is a node a i that yields the smallest sum, i.e., { c := argmin σ F (x l a i ) } i = 1,..., m. 5

6 Now we convert the constrained optimization problem under consideration to an unconstrained optimization problem using the penalty method with a penalty parameter λ > 0: minimize min σ F (x l a i ) +,...,k x 1,..., x k R n. Proposition 4.1 The objective function f(x 1,..., x k ) := min σ F (x l a i ) +,...,k min,...,m min σ F (x l a i ) + λ,...,m min σ F (x l a i ),...,m σ F (x l a i ) + λ min σ F (x l a i ),...,m for x 1,..., x k R n and λ > 0 can be written as a difference of convex functions. Proof. First note that the minimum of m real numbers α i for i = 1,..., m has the representation: min α i = α i α i.,...,m Hence, we can represent f(x 1,..., x k ) as a function defined on (R n ) k as follows: f(x 1,..., x k ) = ( + λ) λ σ F (x l a i ) σ F (x l a i ) t=1,...,k l t This shows that f has a DC representation f = g 0 h 0, where and h 0 (x 1,..., x k ) := g 0 (x 1,..., x k ) := ( + λ) + t=1,...,k l t σ F (x l a i ) + λ σ F (x l a i ) are convex functions defined on (R n ) k. σ F (x l a i ) σ F (x l a i ). σ F (x l a i ) (4.1) σ F (x l a i ) Based on Proposition 3., we obtain a Nesterov s approximation of the generalized distance function ϕ(x) := σ F (x a) for x, a R n as follows ϕ (x) := x a [ ( )] x a d ; F. 6

7 As a result, the function g 0 defined in (4.1) has a smooth approximation given by g 0 (x 1,..., x k ) := ( + λ) x l a i ( + λ) [ ( x l a i )] d ; F. Thus, the function f has the following DC approximation convenient for applying the DCA: f (x 1,..., x k ) : = ( + λ) t=1,...,k l t x l a i σ F (x l a i ) λ σ F (x l a i ). ( + λ) s=1,...,m [ ( x l a i )] d ; F σ F (x l a i ) Instead of minimizing the function f, we minimize its DC approximation f (x 1,..., x k ) = g (x 1,..., x k ) h (x 1,..., x k ), x 1,..., x k R n. In this formulation, g and h are convex functions given by where g (x 1,..., x k ) := + λ x l a i, h (x 1,..., x k ) := h 1 (x 1,..., x k ) + h (x 1,..., x k ) + h 3 (x 1,..., x k ) + h 4 (x 1,..., x k ), h 1 (x 1,..., x k ) := h 3 (x 1,..., x k ) := λ ( + λ) [ ( x l a i )] d ; F, h (x 1,..., x k ) := σ F (x l a i ), h 4 (x 1,..., x k ) := The proposition below is a direct consequence of Proposition 3.. Proposition 4. Given any λ > 0 and > 0, the functions f and f satisfy ( f (x 1,..., x k ) f(x 1,..., x k ) f (x 1,..., x k ) + mk 1 + λ t=1,...,k l t σ F (x l a i ), σ F (x l a i ). ) F. for all x 1,..., x k R n. In what follows we will prove that each of the functions f and f admits an absolute minimum in (R n ) k. 7

8 Theorem 4.3 Given any λ > 0 and > 0, each of the functions f and f has an absolute minimum in (R n ) k. Proof. Let us show that for any γ R, the sublevel set L γ := {(x 1,..., x k ) f(x 1,..., x k ) γ} is bounded in (R n ) k. Since 0 int(f ), there exists r > 0 such that B(0; r) F. Consequently, r x = sup{ x, u u B(0; r)} sup{ x, u u F } = σ F (x) for all x R n. From the definition of the function f, we have {(x 1,..., x k ) (R n ) k f(x 1,..., x k ) γ} {(x 1,..., x k ) (R n ) k {(x 1,..., x k ) (R n ) k min,...,m min,...,m σ F (x l a i ) γ} x l a i γ r } m {(x 1,..., x k ) ϕ i (x 1,..., x k ) γ r }, where ϕ i (x 1,..., x k ) := k xl a i. Observe that for each i = 1,..., m, one has the inclusion {(x 1,..., x k ) ϕ i (x 1,..., x k ) γ r } {(x1,..., x k ) x l γ r + k ai }. Thus, L γ is a bounded set as it is contained in the union of a finite number of bounded sets in (R n ) k. As f is a continuous function, it has an absolute minimum in (R n ) k. Let γ := mk ( 1 + λ ) F. It follows from Proposition 4. that for any γ R, {(x 1,..., x k ) (R n ) k f (x 1,..., x k ) γ} {(x 1,..., x k ) (R n ) k f(x 1,..., x k ) γ +γ}. It follows that the sublevel set {(x 1,..., x k ) (R n ) k f (x 1,..., x k ) γ} is also bounded, and hence f has an absolute minimum in (R n ) k. To facilitate the gradient and subgradient calculations for the DCA, we will introduce a data matrix A and a variable matrix X. The data matrix A is formed by putting each a i, i = 1,..., m, in the i th row, i.e., a 11 a 1 a a 1n a A = 1 a a 3... a n..... a m1 a m a m3... a mn 8

9 Similarly, if x 1,..., x k are the k cluster centers, then the variable X is formed by putting each x l, l = 1,..., k, in the l th row, i.e., x 11 x 1 x x 1n x X = 1 x x 3... x n..... x k1 x k x k3... x kn With these notations, the decision variable X of the optimization problem belongs to R k n, the linear space of k n real matrices. Hence, we will assume that R k n is equipped with the inner product X, Y := trace(x T Y ). The Frobenius norm on R k n is defined by X F := X, X = k x l, x l = k x l. Let us start by computing the gradient of the first part of the DC decomposition, i.e., g (X) = + λ x l a i. Using the Frobenius norm, the function g can be written as g (X) = + λ = + λ = + λ x l a i [ x l x l, a i + a i ] [ m X F X, EA + k A ] F, where E is a k m matrix whose entries are all ones. Hence, g is differentiable and its gradient is given by g (X) = + λ [mx EA]. Our goal now is to find X g (Y), which can be accomplished by employing the relation This can equivalently be written as +λ X g (Y) if and only if Y g(x). [mx EA] = Y, and we solve for X as follows: ( + λ) [mx EA] = Y ( + λ)x = ( + λ)ea + Y X = ( + λ)ea + Y ( + λ)m 9

10 Next, we will demonstrate in more detail the techniques we used to compute a subgradient for the convex function 4 h = h 1 + h j. Since each function in this sum is convex, we will compute a subgradient of h applying the subdifferential sum rule (see, e.g., [16, Corollary.46]) and imum rule (see, e.g., [16, Proposition.54]) well known in convex analysis. We will begin our demonstration with h 1 given by h 1 (X) = ( + λ) j= [ ( x l a i )] d ; F. From its representation one can see that h 1 is differentiable. Thus, its gradient at X can be computed by computing the partial derivatives with respect to x 1,..., x k, i.e., h m 1 x l (X) = ( + λ) [ x l a i ( x l a i )] P ; F Hence, h 1 (X)) is a k n matrix H 1 whose l th row is h 1 x l (X)). for l = 1,..., k. (4.) Note that the convex functions h j for j =, 3, 4 are not differentiable in general. However, we can compute a subgradient for each function at X by applying the subdifferential sum rule and imum rule for convex functions. The following is an illustration of how one can compute subgradients of such functions using h as an example. For t = 1,..., k and i = 1,..., m, define γ ti (X) :=,l t σ F (x l a i ) = σ F (x l a i ) σ F (x t a i ) and γ i (X) := t=1,...,k γ ti(x). Thus, h can be represented as the sum of m convex functions as follows: h (X) = t=1,...,k,l t σ F (x l a i ) = γ i (X). Note that γ i is the imum of k convex functions γ ti for t = 1,..., k. Based on the subdifferential imum rule, for each i = 1,..., m, we will find a k n matrix H i γ i (X). Then, by the subdifferential sum rule H := m H i is a subgradient of h at X. To accomplish this goal, we first choose an index t {1,..., k} such that γ i (X) = γ t i(x) := k,l t σ F (x l a i ). The l th row wl i of the matrix H i for l t can be computed as described in Proposition 4.4 below, which follows from [16, Theorem.93]. The t row of the matrix H i is set to zero, as γ it is independent of x t. The procedures for computing a subgradient for h 3 and h 4 are very similar to the procedure we have illustrated. Proposition 4.4 Given a R n, the function ϕ(x) := σ F (x a) is convex with its subdifferential at x R n given by ϕ( x) = co F ( x), 10

11 where F ( x) := {q F x, q = σ F ( x)}. In particular, if F is the Euclidean closed unit ball in R n, then { x a x a if x a, ϕ( x) = B if x = a. At this point, we have demonstrated all the necessary steps in calculating the gradients and subgradients needed for our first DCA-based algorithm for solving the bilevel hierarchical clustering problem formulated in Model I. Algorithm Model I 1: Input: A, X 0, λ 0, 0, σ 1, σ, ε, N N. : while stopping criteria (λ,, ε) = false do 3: for k = 1,..., N do 4: Find Y k h (X k 1 ) 5: X k = (+λ)ea+y k (+λ)m 6: end for 7: update λ and 8: end while 9: Output: x N. Example 4.5 (l clustering with Algorithm ). In this example, we illustrate our method to study the problem of l clustering. The key point in Algorithm is the computation of Y h (X) for the case where F is the Euclidean closed unit ball B in R n. By the subdifferential sum rule, h (X) = h 1 (X) + h (X) + h 3 (X) + h 4 (X). Define x l a i if x l a i, x u li := l a i 0 otherwise. Now, we illustrate the way to find the gradient of h 1 and a subgradient of h i for i =, 3, 4 at X. The gradient of h 1 : The gradient Y 1 := h 1 (X) is the k n matrix whose l th row is h 1 (X) x l given in (4.). Note that in this case, the Euclidean projection P (z; F ) from z R n to F is given by { z z if z > 1, P (z; F ) := z otherwise. A subgradient of h : In this case, h (X) = t=1,...,k l t x l a i = 11 t=1,...,k ( k x l a i x t a i ).

12 For each i = 1,..., m, choose an index t(i) such that t=1,...,k ( k x l a i x t a i ) = x l a i x t(i) a i. Let us now form a k mn block matrix U = (u li ), where u li is considered as a row vector. We also use U i to denote the i th block column of the matrix U. Equivalently, U i is the k n matrix formed by placing the row vectors u li in its l th row for l = 1,..., k. Then a subgradient of h at X is given by Y := ( U i ) e t(i) u t(i)i, where e t(i) is the column vector of k components with 1 at the t(i) th position and 0 at other positions. A subgradient of h 3 : In this case, h 3 (X) = λ x l a i = λ ( m x l a i x l a t ). For each l = 1,..., k, we choose an index t(l) such that ( m x l a i x l a t ) = x l a i x l a t(l). Let V be the k n matrix whose l th row is m u li u lt(l). Then a subgradient of h 3 at X is given by Y 3 := λv. A subgradient of h 4 : In this case, h 4 (X) = = ( m x l a i x l a i Again, we choose an index t such that ( m ) x l a i x l a t = ) x l a t. x l a i x l a t. Let Z be the k n matrix whose l th row is m u li. Then a subgradient of h 4 is given by Y 4 := Z Z t, where Z t is the k n matrix whose l th row is u lt. 1

13 Example 4.6 (l 1 clustering with Algorithm ). In this example, we illustrate our method to study the problem of l 1 clustering. We will find a subgradient Y h (X) for the case where F is the closed unit box in R n given by F := {(u 1,..., u n ) R n 1 u i 1 for i = 1,..., n}. For t R, define 1 t > 0, sign(t) := 0 t = 0, 1 t < 0. Then we define sign(x) := (sign(x 1 ),..., sign(x n )) for x = (x 1,..., x n ) R n. Note that for the function p(x) := x 1, a subgradient of p at x R n is simply sign(x). Now, we illustrate the way to find the gradient of h 1 and a subgradient of h i for i =, 3, 4 at X. The gradient of h 1 : Similar to Example 4.5, the gradient of Y 1 := h 1 (X) is the k n matrix whose l th row is h 1 (X) given in (4.). Note that in this case, the Euclidean x l projection P (z; F ) from z R n to F is given by P (z; F ) := ( e, min(z, e)) componentwise, where e R n is the vector consisting of 1 in each component. A subgradient of h : In this case, h (X) = r=1,...,k l r x l a i 1 = r=1,...,k ( k x l a i 1 x r a i ) 1. For each i = 1,..., m, choose an index r(i) such that r=1,...,k ( k x l a i 1 x r a i ) 1 = x l a i 1 x r(i) a i 1. Now we form the k mn signed block matrix S = (s li ) given by s li = sign(x l a i ) as a row vector. We also use S i to denote the ith column block matrix of the signed matrix S. Then a subgradient of h at X is given by Y := ( S i ) e r(i) s r(i)i, where e r(i) is the column vector of k components with 1 at the r(i)th position and 0 at other positions. A subgradient of h 3 : In this case, h 3 (X) = λ σ F (x l a i ) = λ ( m x l a i 1 x l a t ) 1. 13

14 For each l = 1,..., k, we choose an index t(l) such that ( m x l a i 1 x l a t ) m 1 = x l a i 1 x l a t(l) 1. Let V be the k n matrix whose l th row is m s li s lt(l). Then a subgradient of h 3 at X is given by Y 3 := λv. A subgradient of h 4 : In this case, h 4 (X) = = ( m Again, we choose an index t such that x l a i 1 x l a i 1 ) x l a t 1. ( m x l a i 1 ) x l a t 1 = x l a i 1 x l a t 1. Let T be the k n matrix whose l th row is m s li. Then a subgradient of h 4 is given by Y 4 := T T t, where T t is the k n matrix whose l th row is s lt. 5 Hierarchical Clustering via Continuous Optimization Techniques: Model II In this section, we focus on developing nonconvex optimization techniques based on the DCA and the Nesterov smoothing technique for the second model. Similar to Model I, we will solve the following constrained optimization problem: minimize subject to min σ F (x l a i ) +,...,k+1 min,...,k+1 j=1 σ F (x l x j ) min σ F (x l a i ) = 0, x 1,..., x k+1 R n.,...,m The total center is determined by c := argmin { k+1 j=1 } σ F (x l x j ) l = 1,..., k

15 This constrained optimization problem can be solved by the following unconstrained optimization problem by the penalty method with a penalty parameter λ > 0: minimize min σ F (x l a i ) +,...,k+1 x 1,..., x k+1 R n. min,...,k+1 j=1 σ F (x l x j ) + λ min σ F (x l a i ),...,m With the Nesterov smoothing technique, the objective function has the following approximation that is convenient for implementing the DCA: (1 + λ) x l a i f (X) : = + x l x j λ (1 + λ) j=1 [ ( x l a i )] d ; F σ F (x l a i ) j=1 [ d r=1,...,k+1 ( x l x j σ F (x l a i ) l r ; F )] r=1,...,k+1 As in the previous section, we use a variable matrix X of size (k + 1) n to store the row vector x l in its l th row for l = 1,..., k + 1. Now we solve the following DC programming: minimize f (X) = g (X) h (X), X R (k+1) n, where g and h are convex functions by σ F (x l x j ). l r j=1 g (X) := g 1 (X) + g (X) (5.1) and h (X) := h 1 (X) + h (X) + h 3 (X) + h 4 (X) + h 5 (X), where their respective components are defined as follows: (1 + λ) x l a i g 1 (X) :=, g (X) = x l x j and h 1 (X) := h 3 (X) := h 5 (X) := (1 + λ) k+1 t=1,...,k+1 l t k+1 t=1,...,k+1 l t [ ( x l a i )] d ; F, h (X) := σ F (x l a i ), h 4 (X) := λ σ F (x l x j ). j=1 15 j=1 j=1 [ ( x l x j )] d ; F σ F (x l a i ),

16 Lemma 5.1 Let E be square matrix with size (k + 1) whose entries are all ones and let I be the identity matrix of size (k + 1). (i) Given any real numbers a and b with a 0 and a (k + 1)b, the matrix M := ai + be is invertible with M 1 = xi + ye, where x = 1 a and y = b a[a + b(k + 1)]. (ii) Let Ẽ := (k + 1)I E. Given any real numbers c and d with c 0 and c d(k + 1), the matrix N := ci + dẽ is invertible with where α = 1 c+d(k+1) and β = Proof. (i) Observe that d c[c + d(k + 1)]. N 1 = αi + βe, (ai + be)(xi + ye) = axi + (bx + ay)e + bye Thus, (ai + be)(xi + ye) = I if and only if = axi + (bx + ay)e + by(k + 1)E. ax = 1 and bx + [a + b(k + 1)]y = 0. Equivalently, x = 1 a and y = b a[a + b(k + 1)]. (ii) We have It remains to apply the result from (i). N = ci + dẽ = [c + d(k + 1)I] de. The proposition below provides a formula for computing g required for applying the DCA. Proposition 5. Given any λ > 0 and > 0, the Fenchel conjugate g of the function g defined in (5.1) is continuously differentiable with ( ) g(y) = (αi + βe) (1 + λ)ea + Y for Y R k n, where E is defined in Lemma 5.1 and α := 1 m(λ + 1) + (k + 1) and β := m(λ + 1)[m(λ + 1) + (k + 1)]. (5.) Proof. We have g 1 (X) = 1 + λ [mx EA], g (X) = [(k + 1)I E] X. 16

17 Recall that X g (Y) if and only if Y = g (X). The equation g (X) = Y can be written as 1 + λ [mx EA] + ẼX = Y (1 + λ) [mx EA] + ẼX = Y ( ) m(1 + λ)i + Ẽ X = (1 + λ)ea + Y. Solving this equation using Lemma 5.1(ii) yields ( ) X = (αi + βe) (1 + λ)ea + Y, (5.3) where α and β are given in (5.). It follows that g(y) is a singleton for every Y R k n, and so g is continuously differentiable and g(y) is given by the expression on the righthand side of (5.3); see [16, Theorem 3.3]. To implement the DCA, it remains to find a subgradient of h. From their representations, one can see that h 1 and h are differentiable. Their respective subgradients coincides with their gradients, that can be computed by the partial derivatives with respect to x 1,..., x k+1 given by h m 1 x l (X) = (1 + λ) [ x l a i ( x l a i )] P ; F Thus, h 1 (X)) is the (k + 1) n matrix H 1 whose l th row is h 1 (X). x l Similarly, k+1 h x l (X) = [ x l x j j=1 ( x l x j P )] ; F Hence, h (X) is the (k + 1) n matrix H 4 whose l th row is h x l (X). for l = 1,..., k + 1. (5.4) for l = 1,..., k + 1. (5.5) The procedures for computing a subgradient of h i for i = 3, 4, 5 are similar to those from the previous section. Therefore, we are ready to give a new DCA-based algorithm for the bilevel hierarchical clustering problem in Model II. Example 5.3 (l clustering with Algorithm 3). In this example, we consider the hierarchical clustering problem in Model II for the case where F is the Euclidean closed unit ball in R n. To implement Algorithm 3, it remains to find a subgradient Y h (X). Recall that h (X) = h 1 (X) + h (X) + h 3 (X) + h 4 (X) + h 5 (X) for X R (k+1) n. The functions h 1 and h are continuously differentiable. The gradients h 1 (X) and h (X) can be determined by their partial derivatives from (5.4) and (5.5), respectively. We can find subgradients Y 3 h 3 (X) and Y 4 h 4 (X) by the procedure developed in 17

18 Algorithm 3 Model II 1: Input: A, X 0, λ 0, 0, σ 1, σ, ε, N N. : while stopping criteria (λ,, ε) = false do 1 3: α := m(λ+1)+(k+1) 4: β := m(λ+1)[m(λ+1)+(k+1)] 5: for k = 1,..., N do 6: Find Y k h (X k 1 ) ) 7: X k = (αi + βe) ((1 + λ)ea + Y k 8: end for 9: update λ and 10: end while 11: Output: X N. Example 4.5. Now, we focus on finding a subgradient Y 5 h 5 (X). In this case, h 5 (X) := x l x j = x l x j x t x j. t=1,...,k+1 l t j=1 t=1,...,k+1 j=1 To find such a subgradient, we will apply the subdifferential sum rule and imum rule. Choose an index t such that x l x j x t x j = x l x j x t x j. t=1,...,k+1 j=1 j=1 j=1 j=1 j=1 Define x l x j if x l x j, x v lj := l x j 0 otherwise. Then Y 5 can be determined by the (k + 1) n matrix whose l th row is given by Y l := v lj v lt for l = 1,..., k + 1. j=1 By the procedure developed in Example 4.6 with the use of a signed matrix, we can similarly provide another example for hierarchical clustering for Model II in the case where F is the closed unit box in R n. The detail is left for the reader. 6 Numerical Experiments We conducted our numerical experiments on a MacBook Pro with. GHz Intel Core i7 Processor, 16 GB 1600 MHz DDR3 Memory. Even though the two continuous optimization formulations we consider are nonsmooth and nonconvex, the Nesterov smoothing technique allowed us to design two implementable DCA-based algorithms. 18

19 For the implementation of the algorithms, we wrote the codes in MATLAB. Since our algorithms are adaptations of the DCA, there is no guarantee that our algorithms converge to a global optimal solution. However, for the artificial test dataset we created to test the performance of Algorithm with 11 nodes, clearly identifiable cluster centers, and a total center (see Figure 1), the algorithm converges 100% of the time to a global optimal solution for all 55 different pairs of starting centers selected from the 11 points, i.e., ( 11 ) = 55. (a) Artificial Test Dataset for Model I (b) 100% convergence to a global optimal solution Figure 1: Performance of Algorithm. On the other hand, for the artificial test dataset we created to test the performance of Algorithm with 15 nodes, clearly identifiable cluster centers, and a total center (see Figure ), the algorithm converges to a global optimal solution 85% of the time, which means that for all 455 different starting centers selected from the 15 points, i.e., ( ) 15 3 = 455, the algorithm converges to a global optimal solution 85% of the time. (a) Artificial test dataset for Model II (b) 85% convergence to a global optimal solution Figure : Performance of Algorithm 3 on the Test Data Set. Further numerical experiments were performed on the dataset EIL76 (The 76 City Problem) taken from the Traveling Salesman Problem Library [4]. For instance, Figures 3(a) and 3(b) show optimal solutions for Model I and Model II, respectively, for three cluster centers and a total center. The optimal solutions were calculated by the brute-force search method in which we exhaustively generated all the four possible candidates, 3 cluster centers and 1 total center, and then computed the corresponding cost to take the minimum. In this 19

20 case, we have ( 76 3 ) = 70, 300 combinations for Model I and ( 76 4 ) = 1, 8, 975 combinations of cluster centers and a total center to check for Model II. For instance, the optimal value for Model I tested on EIL76 with 3 cluster centers and 1 total center is , while for Model II with 3 cluster centers and 1 total center, it is (a) Model I on EIL76 (b) Model II on EIL76 Figure 3: Optimal Solutions for Model I and Model II on EIL76. In the two MATLAB codes we wrote to implement the two algorithms, we updated the penalty parameter λ and the smoothing parameter in every iteration by the relations λ i+1 = σ 1 λ i, σ 1 > 1, and i+1 = σ i, σ (0, 1), respectively. The two parameters were updated until < For the choice of the starting centers, we used three different methods: Random. We used the datasample (a MATLAB built in function) to randomly select starting centers from the existing nodes without replacement. K-means clustering. We used the kmeans (a MATLAB built in function) to partition the nodes into k clusters first, and then we selected the k cluster centroid locations as starting centers. C++ implementation We implemented the model 1 and model algorithms in C++ and used uniform random numbers generator to generate starting centers. The code was developed using Armadillo library and run on a computer having 0 Intel(R) Xeon(R) CPU E5-640 cores and 50 GB RAM. 0

21 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Time1 Time Iter1 Iter k m n EIL EIL EIL EIL EIL EIL EIL EIL EIL EIL Table 1: Starting centers selected randomly, MATLAB code. 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Time1 Time Iter1 Iter k m n EIL EIL EIL EIL EIL EIL EIL EIL EIL EIL Table : Starting centers selected by the k-means, MATLAB code. 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Iter1 Iter Time1 Time k m n EIL EIL EIL EIL EIL EIL EIL EIL EIL EIL Table 3: Starting centers selected randomly, C++ code 1

22 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Iter1 Iter Time1 Time k m n 100C.56341e e C.1641e e C.55508e+06.55e C.983e e C.8579e e C.0867e e C.4936e e C 3.034e e C.33796e e C.37677e e Table 4: Starting centers selected randomly, C++ code 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Iter1 Iter Time1 Time k m n 10000RND e e RND.44543e e RND.36188e e RND.13395e e RND e e RND e e RND.450e e RND.38836e e RND e e RND.0534e e Table 5: Starting centers selected randomly, C++ code, u. randomly distributed points

23 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Iter1 Iter Time1 Time k m n 10000RND e e RND 5.331e e RND e e RND e e RND e e RND e e RND e e RND e e RND e e RND e e Table 6: Starting centers selected randomly, C++ code, u. randomly distributed points 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Iter1 Iter Time1 Time k m n 10000RND3D.83948e e RND3D.74404e e RND3D.9869e e RND3D e e RND3D e e RND3D.7841e e RND3D.94171e e RND3D e e RND3D.78719e e RND3D.8067e e Table 7: Starting centers selected randomly, C++ code, u. randomly distributed points, 3 dimensions 3

24 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Iter1 Iter Time1 Time k m n 1000RND6D e e RND6D e e RND6D e e RND6D 5.848e e RND6D 5.886e e RND6D e e RND6D e e RND6D e e RND6D e e RND6D e e Table 8: Starting centers selected randomly, C++ code, 1000 u. points in 6 dimensions randomly distributed 0 = 16, λ 0 = 0.01, σ 1 = 160, σ = 0.5 COST1 COST Iter1 Iter Time1 Time k m n RNDD 1.408e e RNDD e e RNDD e e RNDD e e RNDD e e RNDD e e RNDD e e RNDD e e RNDD e e RNDD e e Table 9: Starting centers selected randomly, C++ code, u. randomly distributed points in dimensions 7 Conclusion and Future Research In this study, we presented two DCA-based algorithms for solving two different bilevel hierarchical clustering problems where the similarity(dissimilarity) measure between two data points (nodes) is given by generalized distances. As special cases of generalized distances, we provided two detailed examples for the l 1 and l norms. We implemented the algorithms with MATLAB and C++ and tested them on different datasets of various sizes and dimensions. We expect that our method used in this paper for solving bilevel hierarchical clustering problems are applicable to solving other nonsmooth nonconvex optimization problems. 4

25 References [1] An, L.T.H., Belghiti, M.T., Tao, P.D.: A new efficient algorithm based on DC programming and DCA for clustering. J. Glob. Optim., 7, (007). [] An, L.T.H., Minh, L.H., Tao, P.D.: New and efficient DCA based algorithms for minimum sum-of-squares clustering, Pattern Recognition, 47, (014). [3] An, L.T.H., Minh, L.H.: Optimization based DC programming and DCA for hierarchical clustering. European J. Oper. Res. 183, (007). [4] An, L.T.H., Tao, P.D.: Convex analysis approach to D.C. programming: Theory, algorithms and applications. Acta Math. Vietnam., (1997). [5] Bagirov, A.: Derivative-free methods for unconstrained nonsmooth optimization and its numerical analysis. Investigacao Operacional. 19, (1999). [6] Bagirov, A., Jia, L., Ouveysi, I., Rubinov, A.M.: Optimization based clustering algorithms in Multicast group hierarchies, in: Proceedings of the Australian Telecommunications, Networks and Applications Conference (ATNAC), Melbourne Australia (published on CD, ISNB ) (003). [7] Bagirov, A., Taheri, S., Ugon, J.: Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems. Pattern Recognition. 53, 1 4 (016). [8] Barbosa, G. V., Villas-Boas, S. B., Xavier, A. E.: Solving the Two-level Clustering Problem by Hyperbolic Smoothing Approach, and Design of Multicast Networks, SELECTED PROCEED- INGS, WCTR RIO (013). [9] Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (011). [10] Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization, nd edition. Springer, New York (006). [11] Boţ, R.I.: Conjugate Duality in Convex Optimization. Springer, Berlin (010). [1] Hartman, P.: On functions representable as a difference of convex functions. Pacific J. Math. 9, (1959). [13] Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I, II. Springer, Berlin (1993). [14] Hiriart-Urruty, J.B.: Generalized differentiability, duality and optimization for problems dealing with differences of convex functions. Lecture Note in Economics and Math. Systems. 56, (1985). [15] Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation, I: Basic Theory, II: Applications. Springer, Berlin (006). [16] Mordukhovich, B.S., Nam, N.M.: An Easy Path to Convex Analysis and Applications. Morgan & Claypool Publishers, San Rafael, CA (014). [17] Nam, N. M., An, N. T., Rector, R. B., J. Sun, J.: Nonsmooth algorithms and Nesterov s smoothing technique for generalized Fermat-Torricelli problems. SIAM J. Optim. 4, (014). 5

26 [18] Nam, N.M., Rector, R.B., Giles, D.: Minimizing Differences of Convex Functions with Applications to Facility Location and Clustering. Journal of Optimization Theory and Applications. 173, (017). [19] Nam, N. M., Geremew, W., Reynolds, S., Tran, T: The Nesterov Smoothing Technique and Minimizing Differences of Convex Functions for Hierarchical Clustering, in press (017). [0] Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, (013). [1] Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103, (005). [] Nesterov, Y.: Introductory lectures on convex optimization. A basic course. Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA (004). [3] Ordin, B., Bagirov, A.: A heuristic algorithm for solving the minimum sum-of-squares clustering problems. Journal of Global Optimization. 61, (015). [4] Reinelt, G.: TSPLIB: A Traveling Salesman Problem Library. ORSA Journal of Computing. 3, (1991). [5] Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton, NJ (1970). [6] Rockafellar, R.T.: Conjugate Duality and Optimization. SIAM, Philadelphia, PA (1974). [7] Tao, P.D., An, L.T.H.: A d.c. optimization algorithm for solving the trust-region subproblem, SIAM J. Optim. 8, (1998). 6

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS

ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State

More information

arxiv: v2 [math.oc] 4 Feb 2016

arxiv: v2 [math.oc] 4 Feb 2016 MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS WITH APPLICATIONS TO FACILITY LOCATION AND CLUSTERING August 8, 2018 Nguyen Mau Nam 1, Daniel Giles 2, R. Blake Rector 3. arxiv:1511.07595v2 [math.oc] 4 Feb 2016

More information

Convex and Nonconvex Optimization Techniques for the Constrained Fermat-Torricelli Problem

Convex and Nonconvex Optimization Techniques for the Constrained Fermat-Torricelli Problem Portland State University PDXScholar University Honors Theses University Honors College 2016 Convex and Nonconvex Optimization Techniques for the Constrained Fermat-Torricelli Problem Nathan Lawrence Portland

More information

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION CHRISTIAN GÜNTHER AND CHRISTIANE TAMMER Abstract. In this paper, we consider multi-objective optimization problems involving not necessarily

More information

Generalized Differential Calculus and Applications to Optimization

Generalized Differential Calculus and Applications to Optimization Portland State University PDXScholar Dissertations and Theses Dissertations and Theses Spring 6-1-2017 Generalized Differential Calculus and Applications to Optimization R. Blake Rector Portland State

More information

Helly's Theorem and its Equivalences via Convex Analysis

Helly's Theorem and its Equivalences via Convex Analysis Portland State University PDXScholar University Honors Theses University Honors College 2014 Helly's Theorem and its Equivalences via Convex Analysis Adam Robinson Portland State University Let us know

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings arxiv:1505.04129v1 [math.oc] 15 May 2015 Heinz H. Bauschke, Graeme R. Douglas, and Walaa M. Moursi May 15, 2015 Abstract

More information

1 Introduction and Problem Formulation

1 Introduction and Problem Formulation CHARACTERIZATIONS OF DIFFERENTIABILITY, SMOOTHING TECHNIQUES AND DC PROGRAMMING WITH APPLICATIONS TO IMAGE RECONSTRUCTIONS Nguyen Mau Nam 1, Daniel Giles 2, Le Thi Hoai An 3, Nguyen Thai An 4 Abstract.

More information

On the convexity of piecewise-defined functions

On the convexity of piecewise-defined functions On the convexity of piecewise-defined functions arxiv:1408.3771v1 [math.ca] 16 Aug 2014 Heinz H. Bauschke, Yves Lucet, and Hung M. Phan August 16, 2014 Abstract Functions that are piecewise defined are

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect

GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In

More information

Monotone operators and bigger conjugate functions

Monotone operators and bigger conjugate functions Monotone operators and bigger conjugate functions Heinz H. Bauschke, Jonathan M. Borwein, Xianfu Wang, and Liangjin Yao August 12, 2011 Abstract We study a question posed by Stephen Simons in his 2008

More information

A Proximal Method for Identifying Active Manifolds

A Proximal Method for Identifying Active Manifolds A Proximal Method for Identifying Active Manifolds W.L. Hare April 18, 2006 Abstract The minimization of an objective function over a constraint set can often be simplified if the active manifold of the

More information

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT Linear and Nonlinear Analysis Volume 1, Number 1, 2015, 1 PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT KAZUHIRO HISHINUMA AND HIDEAKI IIDUKA Abstract. In this

More information

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L.

McMaster University. Advanced Optimization Laboratory. Title: A Proximal Method for Identifying Active Manifolds. Authors: Warren L. McMaster University Advanced Optimization Laboratory Title: A Proximal Method for Identifying Active Manifolds Authors: Warren L. Hare AdvOl-Report No. 2006/07 April 2006, Hamilton, Ontario, Canada A Proximal

More information

SOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION

SOLVING A MINIMIZATION PROBLEM FOR A CLASS OF CONSTRAINED MAXIMUM EIGENVALUE FUNCTION International Journal of Pure and Applied Mathematics Volume 91 No. 3 2014, 291-303 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v91i3.2

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

POLARS AND DUAL CONES

POLARS AND DUAL CONES POLARS AND DUAL CONES VERA ROSHCHINA Abstract. The goal of this note is to remind the basic definitions of convex sets and their polars. For more details see the classic references [1, 2] and [3] for polytopes.

More information

Relationships between upper exhausters and the basic subdifferential in variational analysis

Relationships between upper exhausters and the basic subdifferential in variational analysis J. Math. Anal. Appl. 334 (2007) 261 272 www.elsevier.com/locate/jmaa Relationships between upper exhausters and the basic subdifferential in variational analysis Vera Roshchina City University of Hong

More information

The Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment

The Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment he Nearest Doubly Stochastic Matrix to a Real Matrix with the same First Moment William Glunt 1, homas L. Hayden 2 and Robert Reams 2 1 Department of Mathematics and Computer Science, Austin Peay State

More information

Dedicated to Michel Théra in honor of his 70th birthday

Dedicated to Michel Théra in honor of his 70th birthday VARIATIONAL GEOMETRIC APPROACH TO GENERALIZED DIFFERENTIAL AND CONJUGATE CALCULI IN CONVEX ANALYSIS B. S. MORDUKHOVICH 1, N. M. NAM 2, R. B. RECTOR 3 and T. TRAN 4. Dedicated to Michel Théra in honor of

More information

Refined optimality conditions for differences of convex functions

Refined optimality conditions for differences of convex functions Noname manuscript No. (will be inserted by the editor) Refined optimality conditions for differences of convex functions Tuomo Valkonen the date of receipt and acceptance should be inserted later Abstract

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

FIXED POINTS IN THE FAMILY OF CONVEX REPRESENTATIONS OF A MAXIMAL MONOTONE OPERATOR

FIXED POINTS IN THE FAMILY OF CONVEX REPRESENTATIONS OF A MAXIMAL MONOTONE OPERATOR PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 0002-9939(XX)0000-0 FIXED POINTS IN THE FAMILY OF CONVEX REPRESENTATIONS OF A MAXIMAL MONOTONE OPERATOR B. F. SVAITER

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

A Projection Algorithm for the Quasiconvex Feasibility Problem 1

A Projection Algorithm for the Quasiconvex Feasibility Problem 1 International Mathematical Forum, 3, 2008, no. 28, 1375-1381 A Projection Algorithm for the Quasiconvex Feasibility Problem 1 Li Li and Yan Gao School of Management, University of Shanghai for Science

More information

On the order of the operators in the Douglas Rachford algorithm

On the order of the operators in the Douglas Rachford algorithm On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

On the relations between different duals assigned to composed optimization problems

On the relations between different duals assigned to composed optimization problems manuscript No. will be inserted by the editor) On the relations between different duals assigned to composed optimization problems Gert Wanka 1, Radu Ioan Boţ 2, Emese Vargyas 3 1 Faculty of Mathematics,

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received

More information

The sum of two maximal monotone operator is of type FPV

The sum of two maximal monotone operator is of type FPV CJMS. 5(1)(2016), 17-21 Caspian Journal of Mathematical Sciences (CJMS) University of Mazandaran, Iran http://cjms.journals.umz.ac.ir ISSN: 1735-0611 The sum of two maximal monotone operator is of type

More information

A Dual Condition for the Convex Subdifferential Sum Formula with Applications

A Dual Condition for the Convex Subdifferential Sum Formula with Applications Journal of Convex Analysis Volume 12 (2005), No. 2, 279 290 A Dual Condition for the Convex Subdifferential Sum Formula with Applications R. S. Burachik Engenharia de Sistemas e Computacao, COPPE-UFRJ

More information

Methods for a Class of Convex. Functions. Stephen M. Robinson WP April 1996

Methods for a Class of Convex. Functions. Stephen M. Robinson WP April 1996 Working Paper Linear Convergence of Epsilon-Subgradient Descent Methods for a Class of Convex Functions Stephen M. Robinson WP-96-041 April 1996 IIASA International Institute for Applied Systems Analysis

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

On DC. optimization algorithms for solving minmax flow problems

On DC. optimization algorithms for solving minmax flow problems On DC. optimization algorithms for solving minmax flow problems Le Dung Muu Institute of Mathematics, Hanoi Tel.: +84 4 37563474 Fax: +84 4 37564303 Email: ldmuu@math.ac.vn Le Quang Thuy Faculty of Applied

More information

THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS

THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS JONATHAN M. BORWEIN AND MATTHEW K. TAM Abstract. We analyse the behaviour of the newly introduced cyclic Douglas Rachford algorithm

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

arxiv: v1 [math.oc] 11 Oct 2012

arxiv: v1 [math.oc] 11 Oct 2012 SOLUTIONS CONSTRUCTIONS OF A GENERALIZED SYLVESTER PROBLEM AND A GENERALIZED FERMAT-TORRICELLI PROBLEM FOR EUCLIDEAN BALLS arxiv:110.314v1 [math.oc] 11 Oct 01 Nguyen Mau Nam, 1 Nguyen Hoang, and Nguyen

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

On the projection onto a finitely generated cone

On the projection onto a finitely generated cone Acta Cybernetica 00 (0000) 1 15. On the projection onto a finitely generated cone Miklós Ujvári Abstract In the paper we study the properties of the projection onto a finitely generated cone. We show for

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems

Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems Exact SDP Relaxations for Classes of Nonlinear Semidefinite Programming Problems V. Jeyakumar and G. Li Revised Version:August 31, 2012 Abstract An exact semidefinite linear programming (SDP) relaxation

More information

A quasisecant method for minimizing nonsmooth functions

A quasisecant method for minimizing nonsmooth functions A quasisecant method for minimizing nonsmooth functions Adil M. Bagirov and Asef Nazari Ganjehlou Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences,

More information

3.10 Lagrangian relaxation

3.10 Lagrangian relaxation 3.10 Lagrangian relaxation Consider a generic ILP problem min {c t x : Ax b, Dx d, x Z n } with integer coefficients. Suppose Dx d are the complicating constraints. Often the linear relaxation and the

More information

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1

The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1 October 2003 The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization 1 by Asuman E. Ozdaglar and Dimitri P. Bertsekas 2 Abstract We consider optimization problems with equality,

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Convergence Theorems for Bregman Strongly Nonexpansive Mappings in Reflexive Banach Spaces

Convergence Theorems for Bregman Strongly Nonexpansive Mappings in Reflexive Banach Spaces Filomat 28:7 (2014), 1525 1536 DOI 10.2298/FIL1407525Z Published by Faculty of Sciences and Mathematics, University of Niš, Serbia Available at: http://www.pmf.ni.ac.rs/filomat Convergence Theorems for

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Epiconvergence and ε-subgradients of Convex Functions

Epiconvergence and ε-subgradients of Convex Functions Journal of Convex Analysis Volume 1 (1994), No.1, 87 100 Epiconvergence and ε-subgradients of Convex Functions Andrei Verona Department of Mathematics, California State University Los Angeles, CA 90032,

More information

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see

Downloaded 09/27/13 to Redistribution subject to SIAM license or copyright; see SIAM J. OPTIM. Vol. 23, No., pp. 256 267 c 203 Society for Industrial and Applied Mathematics TILT STABILITY, UNIFORM QUADRATIC GROWTH, AND STRONG METRIC REGULARITY OF THE SUBDIFFERENTIAL D. DRUSVYATSKIY

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

D.C. Optimization Methods for Solving Minimum Maximal Network Flow Problem

D.C. Optimization Methods for Solving Minimum Maximal Network Flow Problem D.C. Optimization Methods for Solving Minimum Maximal Network Flow Problem Le Dung Muu (Hanoi Institute of Mathematics, Vietnam) (Jianming SHI) (Muroran Institute of Technology, Japan) Abstract We consider

More information

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings

On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings On a result of Pazy concerning the asymptotic behaviour of nonexpansive mappings Heinz H. Bauschke, Graeme R. Douglas, and Walaa M. Moursi September 8, 2015 (final version) Abstract In 1971, Pazy presented

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

A convergence result for an Outer Approximation Scheme

A convergence result for an Outer Approximation Scheme A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento

More information

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE

WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE Fixed Point Theory, Volume 6, No. 1, 2005, 59-69 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.htm WEAK CONVERGENCE OF RESOLVENTS OF MAXIMAL MONOTONE OPERATORS AND MOSCO CONVERGENCE YASUNORI KIMURA Department

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

SOME STABILITY RESULTS FOR THE SEMI-AFFINE VARIATIONAL INEQUALITY PROBLEM. 1. Introduction

SOME STABILITY RESULTS FOR THE SEMI-AFFINE VARIATIONAL INEQUALITY PROBLEM. 1. Introduction ACTA MATHEMATICA VIETNAMICA 271 Volume 29, Number 3, 2004, pp. 271-280 SOME STABILITY RESULTS FOR THE SEMI-AFFINE VARIATIONAL INEQUALITY PROBLEM NGUYEN NANG TAM Abstract. This paper establishes two theorems

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Extended Monotropic Programming and Duality 1

Extended Monotropic Programming and Duality 1 March 2006 (Revised February 2010) Report LIDS - 2692 Extended Monotropic Programming and Duality 1 by Dimitri P. Bertsekas 2 Abstract We consider the problem minimize f i (x i ) subject to x S, where

More information

Robust Duality in Parametric Convex Optimization

Robust Duality in Parametric Convex Optimization Robust Duality in Parametric Convex Optimization R.I. Boţ V. Jeyakumar G.Y. Li Revised Version: June 20, 2012 Abstract Modelling of convex optimization in the face of data uncertainty often gives rise

More information

Research Division. Computer and Automation Institute, Hungarian Academy of Sciences. H-1518 Budapest, P.O.Box 63. Ujvári, M. WP August, 2007

Research Division. Computer and Automation Institute, Hungarian Academy of Sciences. H-1518 Budapest, P.O.Box 63. Ujvári, M. WP August, 2007 Computer and Automation Institute, Hungarian Academy of Sciences Research Division H-1518 Budapest, P.O.Box 63. ON THE PROJECTION ONTO A FINITELY GENERATED CONE Ujvári, M. WP 2007-5 August, 2007 Laboratory

More information

Victoria Martín-Márquez

Victoria Martín-Márquez A NEW APPROACH FOR THE CONVEX FEASIBILITY PROBLEM VIA MONOTROPIC PROGRAMMING Victoria Martín-Márquez Dep. of Mathematical Analysis University of Seville Spain XIII Encuentro Red de Análisis Funcional y

More information

On a class of minimax stochastic programs

On a class of minimax stochastic programs On a class of minimax stochastic programs Alexander Shapiro and Shabbir Ahmed School of Industrial & Systems Engineering Georgia Institute of Technology 765 Ferst Drive, Atlanta, GA 30332. August 29, 2003

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

Global Optimality Conditions in Maximizing a Convex Quadratic Function under Convex Quadratic Constraints

Global Optimality Conditions in Maximizing a Convex Quadratic Function under Convex Quadratic Constraints Journal of Global Optimization 21: 445 455, 2001. 2001 Kluwer Academic Publishers. Printed in the Netherlands. 445 Global Optimality Conditions in Maximizing a Convex Quadratic Function under Convex Quadratic

More information

CONTINUOUS CONVEX SETS AND ZERO DUALITY GAP FOR CONVEX PROGRAMS

CONTINUOUS CONVEX SETS AND ZERO DUALITY GAP FOR CONVEX PROGRAMS CONTINUOUS CONVEX SETS AND ZERO DUALITY GAP FOR CONVEX PROGRAMS EMIL ERNST AND MICHEL VOLLE ABSTRACT. This article uses classical notions of convex analysis over euclidean spaces, like Gale & Klee s boundary

More information

An approach to constrained global optimization based on exact penalty functions

An approach to constrained global optimization based on exact penalty functions DOI 10.1007/s10898-010-9582-0 An approach to constrained global optimization based on exact penalty functions G. Di Pillo S. Lucidi F. Rinaldi Received: 22 June 2010 / Accepted: 29 June 2010 Springer Science+Business

More information

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008. 1 ECONOMICS 594: LECTURE NOTES CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS W. Erwin Diewert January 31, 2008. 1. Introduction Many economic problems have the following structure: (i) a linear function

More information

Week 3: Faces of convex sets

Week 3: Faces of convex sets Week 3: Faces of convex sets Conic Optimisation MATH515 Semester 018 Vera Roshchina School of Mathematics and Statistics, UNSW August 9, 018 Contents 1. Faces of convex sets 1. Minkowski theorem 3 3. Minimal

More information

1 Introduction CONVEXIFYING THE SET OF MATRICES OF BOUNDED RANK. APPLICATIONS TO THE QUASICONVEXIFICATION AND CONVEXIFICATION OF THE RANK FUNCTION

1 Introduction CONVEXIFYING THE SET OF MATRICES OF BOUNDED RANK. APPLICATIONS TO THE QUASICONVEXIFICATION AND CONVEXIFICATION OF THE RANK FUNCTION CONVEXIFYING THE SET OF MATRICES OF BOUNDED RANK. APPLICATIONS TO THE QUASICONVEXIFICATION AND CONVEXIFICATION OF THE RANK FUNCTION Jean-Baptiste Hiriart-Urruty and Hai Yen Le Institut de mathématiques

More information

CONVEX ANALYSIS APPROACH TO D. C. PROGRAMMING: THEORY, ALGORITHMS AND APPLICATIONS

CONVEX ANALYSIS APPROACH TO D. C. PROGRAMMING: THEORY, ALGORITHMS AND APPLICATIONS ACTA MATHEMATICA VIETNAMICA Volume 22, Number 1, 1997, pp. 289 355 289 CONVEX ANALYSIS APPROACH TO D. C. PROGRAMMING: THEORY, ALGORITHMS AND APPLICATIONS PHAM DINH TAO AND LE THI HOAI AN Dedicated to Hoang

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

An inexact subgradient algorithm for Equilibrium Problems

An inexact subgradient algorithm for Equilibrium Problems Volume 30, N. 1, pp. 91 107, 2011 Copyright 2011 SBMAC ISSN 0101-8205 www.scielo.br/cam An inexact subgradient algorithm for Equilibrium Problems PAULO SANTOS 1 and SUSANA SCHEIMBERG 2 1 DM, UFPI, Teresina,

More information

A Power Penalty Method for a Bounded Nonlinear Complementarity Problem

A Power Penalty Method for a Bounded Nonlinear Complementarity Problem A Power Penalty Method for a Bounded Nonlinear Complementarity Problem Song Wang 1 and Xiaoqi Yang 2 1 Department of Mathematics & Statistics Curtin University, GPO Box U1987, Perth WA 6845, Australia

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane

Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions. Marc Lassonde Université des Antilles et de la Guyane Conference ADGO 2013 October 16, 2013 Brøndsted-Rockafellar property of subdifferentials of prox-bounded functions Marc Lassonde Université des Antilles et de la Guyane Playa Blanca, Tongoy, Chile SUBDIFFERENTIAL

More information

Subgradient Projectors: Extensions, Theory, and Characterizations

Subgradient Projectors: Extensions, Theory, and Characterizations Subgradient Projectors: Extensions, Theory, and Characterizations Heinz H. Bauschke, Caifang Wang, Xianfu Wang, and Jia Xu April 13, 2017 Abstract Subgradient projectors play an important role in optimization

More information

SOME REMARKS ON THE SPACE OF DIFFERENCES OF SUBLINEAR FUNCTIONS

SOME REMARKS ON THE SPACE OF DIFFERENCES OF SUBLINEAR FUNCTIONS APPLICATIONES MATHEMATICAE 22,3 (1994), pp. 419 426 S. G. BARTELS and D. PALLASCHKE (Karlsruhe) SOME REMARKS ON THE SPACE OF DIFFERENCES OF SUBLINEAR FUNCTIONS Abstract. Two properties concerning the space

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Weak sharp minima on Riemannian manifolds 1

Weak sharp minima on Riemannian manifolds 1 1 Chong Li Department of Mathematics Zhejiang University Hangzhou, 310027, P R China cli@zju.edu.cn April. 2010 Outline 1 2 Extensions of some results for optimization problems on Banach spaces 3 4 Some

More information

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45 Division of the Humanities and Social Sciences Supergradients KC Border Fall 2001 1 The supergradient of a concave function There is a useful way to characterize the concavity of differentiable functions.

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

1. Introduction. We consider the following constrained optimization problem:

1. Introduction. We consider the following constrained optimization problem: SIAM J. OPTIM. Vol. 26, No. 3, pp. 1465 1492 c 2016 Society for Industrial and Applied Mathematics PENALTY METHODS FOR A CLASS OF NON-LIPSCHITZ OPTIMIZATION PROBLEMS XIAOJUN CHEN, ZHAOSONG LU, AND TING

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions

More information

On Total Convexity, Bregman Projections and Stability in Banach Spaces

On Total Convexity, Bregman Projections and Stability in Banach Spaces Journal of Convex Analysis Volume 11 (2004), No. 1, 1 16 On Total Convexity, Bregman Projections and Stability in Banach Spaces Elena Resmerita Department of Mathematics, University of Haifa, 31905 Haifa,

More information