F-Lipschitz Optimization with Wireless Sensor Networks Applications

Size: px

Start display at page:

Download "F-Lipschitz Optimization with Wireless Sensor Networks Applications"

Emery Hart
5 years ago
Views:

1 F-Lipschitz Optimization with Wireless Sensor Networks Applications 1 Carlo Fischione Automatic Control and ACCESS Linnaeus Center Electrical Engineering KTH Royal Institute of Technology 10044, Stockholm, Sweden carlofi@kth.se Abstract Motivated by the need for fast computations in wireless sensor networks, the new F-Lipschitz optimization theory is introduced for a novel class of optimization problems. These problems are defined by simple qualifying properties specified in terms of increasing objective function and contractive constraints. It is shown that feasible F-Lipschitz problems have always a unique optimal solution that satisfies the constraints at equality. The solution is obtained quickly by asynchronous algorithms of certified convergence. F-Lipschitz optimization can be applied to both centralized and distributed optimization. Compared to traditional Lagrangian methods, which often converge linearly, the convergence time of centralized F-Lipschitz problems is at least superlinear. Distributed F-Lipschitz algorithms converge fast, as opposed to traditional Lagrangian decomposition and parallelization methods, which generally converge slowly and at the price of many message passings. In both cases, the computational complexity is much lower than traditional Lagrangian methods. Examples of application of the new optimization method are given for distributed estimation and radio power control in wireless sensor networks. The drawback of the F-Lipschitz optimization is that it might be difficult to check the qualifying properties. For more general optimization problems, it is suggested that it is convenient to have conditions ensuring that the solution satisfies the constraints at equality. Index Terms Wireless Sensor Networks; Convex and Non-convex Optimization; Distributed Optimization; Multi- Objective Optimization; Parallel and Distributed Computation; Interference Function Theory. The author is grateful to A. Speranzon, for the numerous and fruitful discussions on distributed estimation topics, which have lead to the development of F-Lipschitz optimization, to U. Jönsson, who contributed to the proof of the case (III.2d) (III.2f) and for carefully proofreading the paper, and to S. Boyd and M. Johansson for stimulating discussions on the topics of this paper. The author acknowledges the support of the Swedish Research Council and of the EU project FeedNetBack. A preliminary conference version of the paper has appeared to ACM/IEEE IPSN 11.

2 2 I. INTRODUCTION Wireless sensor networks, with their applications to smart grids, water distribution, vehicular networks, are networked systems in which decision variables must be quickly optimized by algorithms of light computational cost. Unfortunately, many traditional optimization methods are difficult to use in wireless sensor networks due to the complex operations or the high number of messages that have to be exchanged among nodes to compute the optimal solution [1]. Wireless sensor networks are characterized by small and cheap hardware platforms. Thus they have limited computational capabilities. It follows that there is the need of fast, simple, and robust to errors and noises computations to solve optimization problems, both in a centralized and in a distributed set-up [2] [5]. Parallel and distributed computation is the basis for distributed optimization [6]. This is a topic that has recently attracted much attention [3], [7] [11]. However, distributed optimization algorithms from the literature above are often based on decomposition and parallelization methods or on intermediate consensus iterations, where an iterate-and-broadcast procedure injects huge amounts of control messages over the network. The amount of these messages usually overcomes by orders of magnitude the number of source data messages. Moreover, decomposition and parallelization methods suffer from slow convergence [7]. While this may be fine for networks of processors or of wires, this is unacceptable for wireless networks, where transmitting information demands about one hundred times the energy needed to perform computations [3], and the convergence speed is further exacerbated by the presence of message losses. In this paper, we propose a new optimization approach. We define a new class of problems that is characterized by the availability of fast and simple algorithms for the computation of the optimal solution. We denote it F-Lipschitz to evidence that it is fast, hence F, and based on a Lipschitz property of the constraints. It is assumed that the objective function is increasing, whereas the constraints be transformed into contractive Lipschitz functions. To compute the optimal solution to centralized optimization problems, F-Lipschitz algorithms do not require Lagrangian methods, but superlinear iterations based on a solution of a system of equations given by the constraints, as illustrated in Fig. 1. We show that F-Lipschitz optimization solves much more efficiently problems traditionally solved by Lagrangian methods. In distributed optimization, we propose an algorithm that do not require Lagrangian decomposition and parallelization methods, but simple asynchronous iterative methods. We show that the computation of the optimal solution of an F-Lipschitz problem is robust to quantization errors and not sensitive

3 3 yes Inequality constraints ac ve at the op mum? no Compute the solu on by F-Lipschitz algorithms Compute the solu on by Lagrangian methods Fig. 1. F-Lipschitz optimization defines a class of optimization problems for which all the constraints are satisfied at equality when computed at the optimal solution, including the inequality constraints. The solution to the set of equations given by the projected constraints is the optimal solution, which avoids using Lagrangian methods. These methods are computationally much more expensive, particularly for distributed optimization over wireless sensor networks. The challenging part of F-Lipschitz optimization is the availability of conditions ensuring that the constraints are active at the optimum without knowing what is the optimal solution in advance. For this reason, we restrict ourselves to problems in the form (III.1) presented in Section III. However, the method can be used for problems in the more general form (IV.1), as we see in Section IV. to perturbation of the constraints. The approach presented in this paper covers several problems in the general interference function theory [12] [14]. Such a theory is based on the assumption that the constraints of the radio power optimization problem have a special form, commonly referred to as interference function. Fig. 2 illustrates the intersection between the F-Lipschitz optimization and other classic optimization areas. The remainder of the paper is organized as follows: In Section II, two wireless sensor networks motivating examples are presented. The definition of F-Lipschitz optimization is given in Section III, along with properties and algorithms to compute the optimal solution. In Section IV, we show how to convert canonical optimization problems into the F-Lipschitz form. In Section V we show that the motivating examples of Section II are F-Lipschitz and we illustrate how they are solved. Finally, conclusions and future perspectives are given in Section VI. A. Notation We use the notation R n + to denote the set of strictly positive valued real vectors. By T we denote the transpose of a vector or of a matrix. By we denote the absolute value of a real number. For x R n we let x 1 = n k=1 x k and x = n max k=1 x k.

4 4 For a matrix A R n n we use the norm 1 and the norm defined as n n A 1 = max n a ij and A = max n a ij. j=1 i=1 i=1 The spectral radius of a matrix is defined as ρ(a) = max{ λ : λ eig(a)}, where eig(a) denotes the set of eigenvalues of A. By a b and a b we denote the element-wise inequalities between the vectors a and b. A matrix is called positive, A 0, if a ij 0, i,j. By I and 1 we denote the identity matrix and the vector (1,...,1) T, respectively, whose dimensions are clear from the context. Given the set D = [x 1,min,x 1,max ] [x 2,min,x 2,max ]...[x n,min,x n,max ] R n, with < x i,min < x i,max <, for i = 1,...,n, and the vector x D, we use the notation [x i ] D to denote the orthogonal projection with respect to the Euclidean norm of the i-th component of the vector x onto the i-th component of the closed set D, namely [x i ] D = x i if x i [x i,min,x i,max ], [x i ] D = x i,min if x i < x i,min, or [x i ] D = x i,max if x i > x i,max. denotes the gradient operator. Given a scalar function f(x) : R n R, [ df(x) f(x) =,..., df(x) ] T. dx 1 dx n Given a vector function F(x) : R n R n, we use the gradient matrix definition [ ] F(x) = F 1 (x)... F n (x), which is the transpose of the Jacobian matrix. When we study multi-objective or vector optimization problems, we consider always Pareto optimal solutions. Therefore, we use the notation optimal solution to mean Pareto optimal solution (see below in Section III the definition). A point x is feasible to an optimization problem if it satisfies all the constraints of the problem. j=1 II. MOTIVATING EXAMPLES In this section we describe some motivating examples where there is a need of fast optimization for wireless sensor networks. We argue that these problems cannot be solved efficiently with traditional approaches.

5 5 Non-Convex Optimization Convex Optimization F-Lipschitz Optimization Interference Function Optimization Geometric Programming Fig. 2. F-Lipschitz optimization theory includes the interference function optimization of type-i [12] for smooth functions, and, for example, part of convex optimization and geometric programming [15]. It follows that optimization problems previously solved by, e.g., convex solvers, that are F-Lipschitz can now be solved much more efficiently by the methods proposed in this paper. A. Distributed Conditions for the Computation of a Norm Consider a wireless sensor network of n nodes that communicate through some channel with possible loss of messages. In the areas of distributed consensus, distributed estimation and data fusion (see, e.g., [16] [19] and references therein), it is common that this network of nodes needs to compute a stable matrix K = [k i ] with respect to some norm, where each row k i, i = 1...,n, of the matrix belongs to a node [20]. Nodes could impose that k i max < 1, so that K max < 1 and the matrix is stable. However, it can be shown that this choice gives bad performance [18]. As a result, to ensure that a matrix has a bounded norm by local conditions is a challenging task. To overcome this problem, the following result was derived: Proposition 2.1 ([18]): Let K = [k i ] R n n, where k i R 1 n. Let 0 < γ max < 1. Suppose there exists a vector x = (x 1,x 2,...,x n ) T 0, such that x i + x i j Θ i xj γ max, (II.1) for all i = 1,...,n, where the set Θ i is the collection of communicating nodes located at two-hops distance from the node i, plus the neighbors of i. If k i 2 x i, i = 1,...,n, then K 2 γ max. This proposition suggests that, given some thresholds x i > 0 satisfying a set of non-linear inequalities, then as long as the norms of the rows k i are not above the thresholds x i, for i = 1,...,n, the matrix K is stable. Obviously, the condition on k i 2 x i leaves much freedom in choosing the single elements in the vector k i, but the smaller x i the smaller these

6 6 elements will be. In order not to penalize any node, it is useful to solve the optimization problem max x s.t. 1 T x (II.2a) x i + x i xj γ max j Θ i i = 1,...,n (II.2b) x R n +. Such a problem is a non-convex vector optimization problem. No closed form solution is available. One would have to show that strong duality holds for the associated scalarized problem and then use the Karush-Kuhn-Tucker (KKT) conditions [21]. In a distributed set-up, the solution might be computed via standard decomposition methods [6]. The possible presence of message losses may introduce large delays as information associated to primal and dual variables must be transmitted among nodes, as it is usually done by these Lagrangian methods. An alternative method is needed, which is the F-Lipschitz optimization that we introduce in Section III. It is then possible to find algorithms to compute the solution much faster than Lagrangian methods both in a centralized and distributed set-up. B. Radio Power Allocation with Intermodulation Powers In wireless systems, the problem of allocating the transmit radio powers is cast as an optimization problem. The radio power used to transmit signals must be minimized to reduce the nodes energy consumption and the interference caused to other wireless transmissions. At the same time, the radio powers should be high enough to allow the receivers detecting successfully the transmitted signals. For illustrative purposes, we consider the basic problem of radio power transmission that is of interest for sensor networks. However, more advanced radio problems can be investigated, as those listed in [12]. We consider a high data rate wireless sensor network of n transmitter nodes, where node i transmits at a radio power p i, i = 1,...,n. Since the data rate is high, it is meaningful to make a radio power control to save energy and prolong the lifetime of the sensor nodes. Let p R n the vector that contains the radio powers. In the network, there are n receiver nodes, where node i receives the power G ii p i of the signal from transmitter i. G ii is the channel gain from transmitter node i to receiver node i. Receiver node i is subject also to an interference from the signals from other transmitters, which is k i G ikp k and from the

7 7 thermal noise σ i. The signal to interference plus noise ratio of the i-th transmitter-receiver pair is defined as [15] SINR i = G ii p i σ i + k i G ikp k + k i M, ikp 2 i p2 k where G ik models the wireless channel gain between the transmitter i and the receiver j, and M ik is a intermodulation terms introduced by the amplifier of the receiver. Typically, these intermodulation terms are present when the amplifiers are built out of cheap and unreliable components, which may be typical for wireless sensor nodes. M ik, k i, assumes values smaller then G ik, and can take on both positive and negative values. The power optimization problem is usually written in the following form min p p (II.3) s.t. SINR i S min, i = 1,...,n, p min 1 p p max 1, where S min is the minimum required SINR and ensures that the signal of transmitter i can be received with the desired quality. The box constraints on the radio powers is naturally due to that transmitters have a minimum and maximum power level that can be used. Notice that the solution to this problem must be achieved in a distributed fashion, namely every node must be able to compute its own radio power because there is no time for node i to send the wireless channel coefficients G ij, which is measured at node i, to a central solver and wait back for the optimal power to use: in the meantime G ij would have changed due to the dynamic of the wireless channel and the optimal solution would be outdated [22]. Problem (II.3) is an unsolved optimization problem in communication theory. The interference function theory [12], [13], which is based on the monotonic and scalable property of the interference, is the fundamental reference to solve these problems. However, it cannot be used here because the intermodulation coefficients can be negative and this makes the interference term not monotonic and scalable. By the same argument, the geometric programming theory, which has also been widely employed to solve power allocation problems [23], cannot be used. Moreover, the problem is not convex. The classical approach would be to use parallelization and decomposition methods [6], provided that strong duality holds. Then, one would use iterative computation of the primal decision variables and dual variables until convergence is achieved. This makes it hard computing the solution of the problem (II.3), particularly when such a solution has to be computed by nodes

8 8 of reduced computational capability. An alternative theory is needed. This theory is developed in the following. III. F-LIPSCHITZ OPTIMIZATION In this section we give the definition of an F-Lipschitz optimization problem, we characterize the existence and uniqueness of the optimal solution, we give algorithms to compute such a solution both in a closed form, when possible, and numerically. We characterize several features of the new optimization, including sensitivity analysis and robustness to quantization. as Definition 3.1 (F-Lipschitz optimization): An F-Lipschitz optimization problem is defined max x f 0 (x) (III.1a) s.t. x i f i (x), i = 1,...,l (III.1b) x i = h i (x), i = l + 1,...,n (III.1c) x D, (III.1d) where D R n is a non empty, convex, and compact set, l n, with objective and constraints being continuous differentiable functions such that f 0 (x) : D R m, m 1 f i (x) : D R, h i (x) : D R, i = 1,...,l i = l + 1,...,n Let f(x) = [f 1 (x), f 2 (x),...,f l (x)] T, h(x) = [h l+1 (x), h l+2 (x),...,h n (x)] T, and let [ ] f(x) F(x) = [F i (x)] =, h(x) with F i (x) : D R. The meaning of max ins the optimization is in the Pareto optimal sense,

9 9 which we specify below in Definition 3.2. The following properties must be verified: 1.a f 0 (x) 0, i.e., f 0 (x) is strictly increasing, (III.2a) 1.b F(x) < 1 for some norm, (III.2b) and either 2.a j F i (x) 0 i,j, (III.2c) or 3.a f 0 (x) = c1 T x, c R +, (III.2d) 3.b j F i (x) 0 i,j, (III.2e) 3.c F(x) < 1, (III.2f) or 4.a f 0 (x) R, (III.2g) 4.b F(x) < δ δ +, δ = min i,x D if 0 (x), = max i,x D if 0 (x). (III.2h) (III.2i) (III.2j) We call properties (III.2a) (III.2h) the qualifying properties of an F-Lipschitz optimization problem. Note that it is possible that l = n, in which case there are no equality constraints. It is also possible that l = 0, in which case there are no inequality constraints. The objective function and the constraints are allowed to be linear or non linear functions, as for instance concave, convex, monomial, posynomial, etc. For example, the functions f i (x) and h i (x) can be convex. This makes the constraints x F(x) 0 non-convex in general, and therefore difficult to solve. The objective function (III.1a) is allowed to be both a decomposable or a non-decomposable function of the decision variables. Note that the objective function is allowed to be a vector in R m. When m = 1 we have a scalar optimization. In general, the problem is a multi-objective optimization one with m criteria. Examples of F-Lipschitz objective functions are f 0 (x) = x R n f 0 (x) = c T x, c R n, c 0.

10 10 Fig. 3. An optimization problem in the form (III.1) associated to a wireless sensor network. Every node has associated a decision variable and a constraint, or a group of decision variables and associated constraints. The line between nodes i and j denotes that the two nodes are able to communicate directly, without the need of a routing protocol. Given that an F-Lipschitz problem can be a multi-objective problem, we recall the concept of Pareto optimal solutions. Consider the following set A = {x D : x i f i (x),i = 1,...,l, x i = h i (x), i = l + 1,...,n}, and let B R m be the image set of f 0 (x),i.e., f 0 (x) : A B. Then, we consider that the set B is partially ordered in a natural way, namely if x,y B then x y if x i y i i (e.g., R m + is the ordering cone). Definition 3.2 (Pareto Optimal): A vector x is called a Pareto optimal (or an Edgeworth- Pareto optimal) point if there is no x A such that f 0 (x) f 0 (x ), i.e., if f 0 (x ) is a maximal element of the set B with respect to the natural partial ordering defined by the cone R m + [24]. In practice, a Pareto optimal solution is a vector for which is impossible to improve one component without decreasing another component. The Pareto optimal solutions are derived by converting a vector optimization problem into a scalar one via scalarization of the objective function [24]. Problem (III.1) can be used in centralized setting and in distributed settings. In this last case, a network of nodes needs to compute the optimal solution. The decision variable x i is associated to a node i of the network for which we need to solve problem (III.1). Analogously, constraint i is associated to node i. In this paper, we are interested in the case in which every node needs primarily its own decision variable x i, as it is common in wireless sensor network applications. An illustrative example of such a network is given in Fig. 3. Finally, it can be shown that Problem (III.1) with the qualifying conditions (III.2) is a much more general case of the interference function optimization [12], [13], [25]. Such an optimization

11 11 is a well known theory for the computation of the radio transmit powers and resource allocation in wireless communication, including wireless sensor networks. See our report [26] for the detail. In the following subsection, we establish the existence of solutions to F-Lipschitz problems. A. Existence and Uniqueness of Solutions Here we give one of the core contributions of this paper. We show that for F-Lipschitz problems there is a unique optimal solution that is given by the (projected) system of constraints at equality. We have the following result: Theorem 3.3: Let the F-Lipschitz optimization problem in (III.1) be feasible. Then, the problem admits a unique Pareto optimum x D given by the solutions of the following set of equations: x i = f i (x ) x i = h i (x ) i = 1,...,l i = l + 1,...,n. Proof: A proof is presented in Appendix A. Remark 3.4: It follows from the proof that assumption (III.2e) could be replaced by the assumption that F(x) 2 0. B. Computation of the Optimal Solution In subsection III-A we have proved that there is a unique optimal solution to F-Lipschitz optimization problems, which is achieved by solving the system of equations given by the projected constraints at equality. In this section, we show that the complexity to solve F- Lipschitz optimization problems is quite low compared to Lagrangian methods, which are the ones traditionally employed. If the set of equations given by the projected constraints at equality can be solved in a closed form, then we have the optimal solution in a closed form, otherwise we need numerical algorithms. These algorithms to solve systems of equations are well known, see [1], [27] as fundamental references. In the following, we summarize a low-computational complexity technique for centralized optimization problems, namely for problems that can be solved by a central node. We then present a simple algorithm for distributed optimization problems, where there is the lack of a central computational unit and every node cooperate to compute the solution in a distributed fashion. For simplicity we focus on first order techniques, though the convergence speed of numerical algorithms can be increased by heavy ball methods [27].

12 12 1) Centralized Computation: Since the functions f(x) and h(x) of problem (III.1) are differentiable within D, then the Newton s method can be applied, which consists of the iterations x(k + 1) = [ x(k) β (I F(x(k))) 1 (x(k) F(x(k))) ] D, (III.3) where β is a positive scalar that can be chosen so that the modulus of the previous mapping is contractive and as small as possible so to have the fastest convergence. The iterations can be initialized by any x(0) D, and convergence is certified if the gradient matrix I F(x) is invertible: Lemma 3.5: Consider the function F(x) = [f(x) T h(x) T ] T of problem (III.1) and let qualifying property (III.2b) hold. Then the matrix I F(x) has full rank. Proof: The simple proof is based on that the matrix F(x) is contractive. Its spectral radius is ρ( F(x)) F(x) < 1 [28], thus the eigenvalues of I F(x) are strictly positive and the matrix is invertible. It is well known that the convergence speed of the Newton s algorithm is quite fast, namely it is superlinear. As Bertsekas writes, it is the most complex and also the fastest among the gradient methods [1]. It has however the drawback of requiring the computation of the inverse of a matrix at each step. To avoid this computational burden, several other algorithms have been developed, including gradient methods, conjugate direction methods, quasi-newton methods, and non-derivative methods [1]. These methods are characterized by different convergence speeds and computational complexities. One can chose the most suitable according to the nature of the F-Lipschitz constraints. For example, if these constraints are quadratic, then gradient methods are good candidates. See [1] for rules and recommendations on what method for different kind of functions. 2) Distributed Computation: There are many algorithms to compute the solution to a system of equations in a distributed set-up. See [6] as a fundamental reference. Here, we present a simple iterative and asynchronous distributed algorithm. Recall that in the distributed set-up we assume that there is a global optimization problem that needs to be solved by the wireless sensor network, where each decision variable is associated to a node and every node cooperatively communicates with other nodes to compute the optimal solution without central coordination. Such a communication could be subject to delays and losses due to the underlying communication channel. We have the following result: Proposition 3.6: Let x(0) be an initial guess of the optimal solution to a feasible F- Lipschitz problem (III.1). Let x i (k) = [x 1 (k τ1(k)),x i 2 (k τ2(k)),...,x i i n(k τ n (k))] the

13 13 vector of decision variables available at node i at time k N +, where τj(k) i is the delay with which the decision variable of node j reaches node i at time k. Then, the following iterative algorithm converges to the optimal solution: x i (k + 1) = [f i (x i (k))] D i = 1,...,l (III.4) x i (k + 1) = [h i (x i (k))] D i = l + 1,...,n where k N + is an integer associated to the iterations. Proof: From Theorem 3.3 the unique optimal solution to a feasible F-Lipschitz problem (III.1) is given by the constraints at equality. Since the right hand side of the constraints is contractive for qualifying properties (III.2), the convergence of algorithm (III.4) to the optimal solution is guaranteed as k by applying the asynchronous converge theorem in [6], which concludes the proof. According to previous proposition, every node i of the network updates its decision variable by the iterative algorithm (III.4) and by using the decision variables of other nodes that are available at that time. Notice that when f i (x) depends only on the decision variables of the neighboring nodes, the communications of these variables is fast and practical. In other situations, f i (x) can be given by an oracle locally at node i without any direct communication of decision variables from the other nodes, as in the radio power optimization problems (see also, e.g., [14]). This is the case of the example we discuss in Section II-B. Proposition 3.7: Consider the distributed algorithm for the synchronous computation of the optimal solution, i.e., τ i j(k) is constant i,j. Let d = max x,y D x y. Let ε be the desired precision with which the optimal solution x must be known. Suppose ε < d. Then, an upper bound to the number of iterations needed for the algorithm to converge is O( k), where k = ln ε ln d ln α where α is the Lipshitz constant of F(x) for x D. Proof: From qualifying properties (III.2), the iterations of Algorithm (III.4) are contractive. It follows that x(k) x α k x(0) x α k d, whereby k follows immediately by recalling that α < 1 and ε < d. The previous proposition is useful in that it allows us to upper bound the number of iterations to compute the optimal solution with a desired accuracy. Notice that the upper bound does not,

14 14 depend on the number of variables, but on the Lipscitz constant of the contraction mapping. This result is remarkable, because for F-Lipschitz optimization the convergence speed of Algorithm (III.4) may be arbitrarily high regardless of the number of decision variables or nodes of the network. However, note that this is possible under the assumption that every nodes works in parallel at the same pace and that the delay with which variables are communicated is fixed. We now compare the convergence speed of F-Lipschitz algorithm to the one achieved by using the traditional decomposition methods based on Lagrange multipliers. To do that, we need to show that strong duality applies, as we see in the following subsection. Strong duality will be also very useful to study the sensitivity and stability of the optimal solution to perturbations of the constraints. C. Computational complexity The results of this section are useful to compare the convergence speed of the F-Lipschitz algorithms presented in subsection III-B to the traditional Lagrangian algorithms that one would use for non-linear optimization problems. In this section, we show that it is possible to solve F- Lipschitz optimization problems by Lagrangian iterative methods because strong duality applies. Moreover, strong duality is useful to characterize the sensitivity to the optimal solution to perturbations of the constraints. For analytical simplicity, we show that strong duality applies to the optimization problem (III.1) in the case when there are only inequality constraints. We assume for simplicity that the box constraints, x D, can be held implicit. Note, however, that the upper bound constraints x i x i,max poses no problem since they satisfy the invexity inequality used in the proof. Theorem 3.8: Consider the optimization problem (III.1) in the case when there are no equality constraints and suppose the problem feasible. Then strong duality applies and the KKT conditions are necessary and sufficient to compute the optimal solution of problem (III.1). Proof: See Appendix B. Given that we have strong duality, we can solve problem (III.1) by the KKT conditions: x i f i (x ) 0 i = 1,...,n λ i 0 i = 1,...,n λ i(x i f i (x )) = 0 i = 1,...,n x L(x,λ ) = 0,

15 15 where λ = [λ i ] R n is the vector of Lagrange multipliers and L(x,λ) is the Lagrange function associated to the scalarized version [21] of problem (III.1): n L(x,λ) = µ T f 0 (x) + λ i (x i f i (x)) where µ 0. Finding λ and x that solve these conditions is much more expensive than solving the system of equations given by the projected constraints at the equality as proposed in Subsection III-B. When the Lagrangian function gives closed form multipliers, one would have to solve two systems of n equations in n variables and recover the primal variables. Namely, 1) one would have to take the derivative of the lagrangian with respect to x and solve for x as function of λ the system of n equations x L(x,λ) = 0, thus achieving x (λ). Then, 2) one would have to plug in such a solution in L(x,λ), and solve a further system of n equations λ L(x (λ),λ) = 0, thus achieving λ. Finally, 3) one would have to recover the optimal solution by computing the function x (λ ). When the Lagrangian function does not give closed form multipliers, one would have to resort to numerical iterative Lagrangian methods. There are several such methods to compute the optimal solution to problem (III.1), such as barrier and interior point methods, penalty and augmented lagrangian methods, primal-dual interior point methods, etc., that we can use to solve an F-Lipschitz optimization problem. These methods differ for the convergence speed and computational complexity. Typically, at each iteration of these methods some further optimization problem and/or matrix inversions are required. To give an idea of convergence speed comparison with the F-Lipschitz algorithms described in Subsection III-B, we consider as reference a first order Lagrangian method for optimization problems with equality constraints. By using this method, the optimal solution to problem (III.1) is given by the iterations i=1 x(k + 1) = x(k) β x L(x(k),λ(k)) λ(k + 1) = λ(k) β λ L(x(λ(k)),λ(k)), (III.5) (III.6) where β is a positive scalar. It is clear that iterations (III.5) and (III.6) are computationally much more expensive than the algorithms in Subsection III-B because these iterations involve the computation of the gradient of the Lagrangian function and the update of the decision variables plus the Lagrangian multipliers. In a distributed set-up, the situation is even worse, because every node needs to send to all other nodes the Lagrange multipliers either directly or by incremental techniques [2], [7].

16 16 D. Sensitivity Analysis It is interesting to establish the sensitivity of the optimal solution to perturbations of the constraints. When an F-Lipschitz optimization problem is implemented on hardware platforms having limited computational capabilities, such as in wireless sensor networks, it may happen that the constraints are altered by errors, noises, and quantization. However, the optimal solution to F-Lipschitz problems is not sensitive to perturbations. We prove the result under the simplifying assumptions that 1) there are no equality constraints, 2) the box constraints are not active at the optimal solution. We have the following result: Claim 3.9: Consider the optimization problem (III.1) in the case when there are no equality constraints and suppose the problem is feasible. Then the unique global optimum is not sensitive to perturbations to the constraints. Proof: See Appendix C E. Robustness to Quantization Errors When the computation of the optimal solution is performed by resource constrained nodes that introduce quantization errors, such as wireless sensor networks, it is important to study the robustness of the computation of the optimal solution to these errors. The quantization error is modelled as a random variable q i (k), i = 1,...,n, having some distribution within the interval [ q max, q max ], where q max is the maximum quantization error. We do not assume any specific distribution, and we model the quantization process for the synchronous distributed algorithm as x i (k + 1) = f i ( x(k)) + q i (k) x i (k + 1) = h i ( x(k)) + q i (k) i = 1,...,l i = l + 1,...,n It follows that the decision variables at time k x i (k) s are affected by two quantization errors: the errors coming from the variables that concur in the computation of x i (k + 1) in f i ( x(k)), and the error affecting the computation of f i ( x(k)), which is q i (k). These errors may hinder the convergence of the algorithm. We would like to study the stability of iterations when there are these quantization errors. We have the following result: Proposition 3.10: Let x be the optimal solution to (III.1). Let α = max x D F(x) T. Let q max the maximum quantization error. Then the optimal solution computed by the syn-

17 17 chronous Algorithm (III.4) with quantization errors satisfies lim x(k) k x q max 1 α. Proof: Let x(0) the initial feasible vector used to compute the optimal solution. It holds that lim k x(k) = lim k F(x(k 1)) = x. We have x(1) = F(x(0)) + q(0) = x(1) + q(0). We focus on the upper bound of the proposition. In the following, we use that F(x) = D x x x max max x D D x 1, where x max = max i max x D and the inequality follows by the mean value theorem component-wise, with D x being a matrix constructed by the the Jacobian of F computed at some proper point that is different from element to element of F and depends on x. Then, x(2) = F( x(1)) + q(1) = F(x(1) + q(0)) + q(1) = F(x(1)) + D 0 q(0) + q(1), = F(x(1) + q(0)) + q(1) F(x(1)) + q max D q(1), where D is a matrix constructed from the maximum of the absolute value of the Jacobian of F with elements computed at proper different points. Then, x(3) = F( x(2)) + q(2) = F(F( x(1)) + q(1)) + q(2) = F(F( x(1))) + q max D1 + q(2) F(F(x(1) + q(0))) + q max D1 + q(2) F(F(x(1)) + D q(0) q(0)) + q max D1 + q(2) F(F(x(1))) + q max D q max D1 + q(2) x(2) + q max D q max D1 + q max 1, and it is straightforward to generalize the iterations to But then, k 1 x(k) x(k) + q max D j 1. lim k x(k) x q max j=0 j=0 D j 1 q max 1 α. The upper bound given by the proposition follows immediately. The same steps apply to derive the lower bound, which concludes the proof. From this proposition we see that if we would like to minimize the effect of the quantization error, than we should have α max as small as possible. This gives also a fast convergence speed of the iterations. Therefore, the faster the convergence speed, the lower the quantization error that

18 18 will affect the computation of the optimal solution. This is natural since at each iteration there is a quantization error that keeps accumulating. Reducing these iterations, reduces the entity of such an accumulation. We would like to mention that the analysis developed in this section applies also to traditional Lagrangian methods in (III.5) and (III.6). IV. PROBLEMS IN CANONICAL FORM We have defined F-Lipschitz optimization problems by the special form in problem (III.1). It is common in the optimization literature to have problems stated in the following form, which is often referred to as canonical form [21]: where min x g 0 (x) (IV.1a) s.t. g i (x) 0, i = 1,...,l (IV.1b) p i (x) = 0, i = l + 1,...,n (IV.1c) x D, g 0 (x) : D R m, g i (x) : D R, p i (x) : D R, m n i = 1,...,l i = l + 1,...,n Problem (IV.1) can be converted into a F-Lipschitz-like problem (III.1) by the following transformations where max x f 0 (x) (IV.2a) s.t. x i f i (x), i = 1,...,l, (IV.2b) x i = h i (x) i = l + 1,...,n, (IV.2c) x D, f 0 (x) = g 0 (x), (IV.3) f i (x) = x i γ i g i (x), i = 1,...,l, (IV.4) h i (x) = x i µ i p i (x), i = l + 1,...,n, (IV.5)

19 19 with γ i > 0, i = 1,...,l and µ i R, i = l + 1,...,n. We let G(x) = [g 1 (x),...,g l (x),p l+1 (x),...,p n (x)] T. Problem (IV.1) and problem (IV.2) have the same optimal solution because the constraints of problem (IV.2) hold if and only if the constraints of problem (IV.1) hold, since γ i > 0 and µ i 0 i. Clearly, problem (IV.2) is in general not an F-Lipschitz one. It is interesting to establish when problem (IV.2) is F-Lipschitz, namely when qualifying properties (III.2a) (III.2h) are satisfied for (IV.2). We have the following result: Theorem 4.1: Consider the optimization problems (IV.1) and (IV.2). Suppose that x D 1.a g 0 (x) 0, (IV.6a) 1.b i G i (x) > 0 i, (IV.6b) and either 2.a j G i (x) 0 j i, (IV.6c) 2.b i G i (x) > i G j (x) i, (IV.6d) j i or 3.a g 0 (x) = c1 T x c R +, (IV.6e) 3.b j G i (x) 0 j i, (IV.6f) 3.c i G i (x) > i G j (x) j i i, (IV.6g) or 4.a g 0 (x) R, (IV.6h) 4.b δ δ + ig i (x) > i G j (x) j i i, (IV.6i) where δ and are defined in Eqs. (III.2i) and (III.2j). Then, problem (IV.2) is F-Lipschitz. Proof: We show that by the conditions of the theorem, the F-Lipschitz properties are satisfied for the optimization problem (IV.2). Condition (IV.6a) imply that condition (III.2a) of an F-Lipschitz optimization problem is verified. Condition (IV.6a) allows to use the transformations (IV.4) and (IV.5).

20 20 Condition (IV.6c) implies (III.2c), whereas condition (IV.6d) implies (III.2b), namely that there is some norm for which the F-Lipschitz problem associated to the canonical has contractive constraints. Conditions (IV.6e) and (IV.6f) implies condition (III.2d) and (III.2e), whereas condition (IV.6f) gives (III.2f). Indeed, such a condition asks that 1 γ i i g i (x) + i j γ j i g j (x) < 1. (IV.7) If we let γ i = γ > 0 i and γ small enough so that 1 γ i g i (x) 0, than the preceding inequality is satisfied if γ i ( i g i (x) + i j i g j (x) ) < 0, which is ensured by condition (IV.6f). Condition (IV.6h) implies (III.2g), whereas condition (IV.6i) implies (III.2h). We can see this by that such a condition asks If we let γ i = γ > 0 i and 1 γ i i g i (x) + i j than the preceding inequality is satisfied if which asks γ( i g i (x) i j γ > γ = max x D γ γ = min x D γ j i g j (x) < δ δ +. 1 i g i (x), i g j (x) ) > 1 δ δ +, 1 δ δ+ i g i (x) i j ig j (x). If γ > γ, then the function F(x) built from Gx is contractive with respect to the norm infinity, which is ensured by condition (IV.6i). Therefore, by the conditions of the theorem, problem (IV.2) satisfies the F-Lipschitz qualifying properties. This concludes the proof. It is possible to use this theorem to show when convex optimization problems can be cast to F-Lipschitz and thus solved much more efficiently. By the same theorem, it is also possible to show that, for example, a class of geometric and signomial programming problems [15] is solved by an F-Lipschitz approach.

21 21 A. Convex Optimization In this subsection, we illustrate by examples that some convex optimization problems can be converted into F-Lipschitz problems and solved by the quick methods described in Section III-B. Recall that the traditional Lagrangian methods to compute the optimal solution, namely the interior point methods for convex problems, requires either more iterations, or more tedious algebraic steps than the method offered by our F-Lipschitz optimization. 1) Example 1: Consider the following optimization problem: min x,y ae x + be y (IV.8a) s.t. x 0.5y 1 0 (IV.8b) 0.5x + 2y 0 (IV.8c) x 0, y 0, where a > 0, b > 0, and x,y R. The problem is convex, since so does the objective function and the constraints. Slater condition apply, so the problem is feasible and admits an optimal solution [1]. However, there is no need to use the KKT conditions to compute the solution. Theorem 4.1 applies, since the objective function of (IV.8) is decreasing. For the first constraint we have x (x 0.5y 1) = 1 > 0 and y (x 0.5y 1) = 0.5 < 0. For the second constraint, we have x ( 0.5x + 2y) = 1 < 0 and y ( 0.5x + 2y) = 2 > 0. It follows that conditions (IV.6b) and (IV.6c) hold. Moreover, x (x 0.5y 1) = 1 > x ( 0.5x + 2y) = 0.5, y ( 0.5x + 2y) = 2 > y (x 0.5y 1) = 0.5, which means that the first alternative of condition (IV.6d) hold. Therefore, we conclude that Theorem 4.1 applies and the optimal solution to problem (IV.8) is given by the system of equations of constraints at the equality, namely x 0.5y 1 = 0 0.5x + 2y = 0, which gives x = 8/7 and y = 2/7. Same result could have been achieved by solving the equations given by the KKT conditions. This would have required much more algebraic passages that the ones we used by the F-Lipschitz optimization.

22 22 2) Example 2: In this section we show that convex optimization problems may have a hidden F-Lipschitz nature. By doing a simple change of variables, we show that the original convex problem can be converted into an F-Lipschitz one. Consider the following convex optimization problem: min x,y,z ae x + be y + ce z (IV.9a) s.t. 2x 0.5y + z (IV.9b) x + 2y z x y + z (IV.9c) (IV.9d) x min x x max, y min y y max, z min z z max, where a > 0, b > 0, and x,y,z R and we assume that z max 0.5. Theorem 4.1 does not apply here, because, for instance z (x 0.5y + z + 3) > 0 (condition (IV.6c) does not hold). However, we can make the following variable substitution t = z 1, and we obtain max x,y,t ae x be y ce t (IV.10a) s.t. 2x 0.5y + t (IV.10b) 0.5x + 2y t x y + t (IV.10c) (IV.10d) x min x x max, y min y y max, 1/z max t 1/z min. Clearly, the optimal solution to problem (IV.10), x,y,t is the same of problem (IV.9), with z = 1/t. It is easy to see that conditions (IV.6) and condition (IV.6c) hold, therefore Theorem 4.1 applies and problem (IV.10) is an F-Lipschitz optimization problem, the solution of which is achieved by solving the system of equations given by all the constraints at the equality. B. Geometric Programming The Geometric programming theory is an area of mathematical optimization that has been rediscovered in the recent years [15]. It has found numerous application particularly in wireless networks and electronic circuit design. The essential feature of a geometric program is that it can be mechanically converted into a convex optimization problem, for which there are interior point methods to compute the optimal solution. However, we show that there is vast class of

23 23 geometric programming problems that can be solved much faster and cheaper by the F-Lipschitz optimization. By this result, there is no need for a geometric optimization to be transformed into a convex optimization problem and then solved by interior point methods. This has obvious advantages for the computation of the optimal solution. This has also the advantage that the computation of the optimal solution is easy to distribute, as it is required in may systems such as wireless sensor networks. A geometric optimization problem has the following form min x g 0 (x) (IV.11a) s.t. g i (x) 1 i = 1,...,l (IV.11b) p i (x) = 1 i = l + 1,...,m (IV.11c) x D x 0 (IV.11d) where the objective function (IV.11a) and the inequality constraints (IV.11b) are posynomials and the equality constraint are monomials [15]. More specifically, x i R, i, and the posynomial constraints are g i (x) : D R with g i (x) = K k=1 c ik x a i1k 1 x a i2k 2 x a imk m i = 1,...,l, whereas the monomial constraints are p i (x) : D R with p i (x) = c i x b i1 1 x b i2 2 x b im m i = l + 1,...,m, where c ik > 0, a ijk R, b ij R, i,j,k. Clearly, a geometric optimization problem can always be easily rewritten in the form (IV.1), so that we can check if it is an F-Lipschitz problem by using Theorem 4.1. However, given the special structure of a geometric problem, there are some sufficient easy-to-check conditions telling if it is F-Lipschitz, as we show in the following corollary that applies in some special case: Corollary 4.2: Consider the geometric optimization problem (IV.11). Let A = {x D g i (x) 1,i = 1,...,l,p i (x) = 1}. Problem (IV.11) is an F-Lipschitz problem if the following conditions simultaneously hold: 1) g 0 (x) 0 x A ; 2) a iik > 0 and b ii > 0 i and a ijk 0 b ij 0 for i j;

24 24 3) i G i (x) > j i ig j (x) i 4) D = [x 1,min,x 1,max ] [x 2,min,x 2,max ]... [x n,min,x n,max ], with 0 < x i,min < x i,max < i. Proof: The first condition of the corollary ensures that problem can be converted into a maximization problem with an increasing objective function. The second condition ensures that the diagonal elements the Jacobian of the vector function given by the left-hand side of the constraints are positive and the off diagonal are always negative, since x 0. The third condition of the corollary ensures that it is possible to transform the constraints into contractive functions. Thus, Theorem 4.1 holds and this concludes the proof. Remark 4.3: Corollary 4.2 gives only sufficient conditions. If a geometric programming problem does not satisfy the conditions of Corollary 4.2, it is still possible that it is F-Lipschitz if the conditions of Theorem 4.1 are satisfied. The reason for the corollary to be sufficient is that only some of the conditions in Theorem 4.1 are covered by this corollary. The previous corollary says that if 1) the objective function of a geometric programming problem is decreasing, 2) the constraint i i has the variable x i with positive exponent, whereas the other variables of the constraint have always negative exponent, 3) the Jacobian is diagonally strictly dominant per row and 4) there is a box condition on the decision variables, than the problem is F-Lipschitz. V. EXAMPLES OF APPLICATIONS In this section, we show that the wireless sensor network motivating examples of Section II are F-Lipschitz. We also show examples of convex problems that can be solved by F-Lipschitz optimization. A. Computation of a Norm Here we illustrate the F-Lipschitz optimization by solving the problem of the wireless sensor networks example of Section II-A. Observe that from the constraints of problem (II.2), it follows immediately that 0 x 1. Observe that the qualifying conditions (IV.6e), (IV.6f) and (IV.6g) hold of Theorem 4.1 hold. In particular, we define where G i (x) = x i + x i j Θ i xj γ max and we have i G i (x) = x i j Θ i xj

25 x Iterations Fig. 4. Qualitative convergence behavior of the iterations (III.4) for the wireless sensor network optimization problem (II.2) of Section II-A with n = 10 nodes and Θ i = 5. The convergence is reached quickly, in this case with less than 6 iterations. The iterations are initialized with 1/n. and i G j (x) = 1 2 x i xj if j Θ i and i G j (x) = 0 if j / Θ i. It follows that condition (IV.6g) holds. Therefore, Theorem 4.1 applies and problem (II.2) has the optimal solution satisfying all the constraints at the equality. The solution could be computed by Algorithm (III.3) for a centralized solution, or by Algorithm (III.4) in a distributed set-up. In both cases, we have to transform the problem to the form (III.1) by the positive scalars γ i : F i (x) = x i γ i ( x i + x i j Θ i xj γ max This function is contractive Lipschitz provided that one chooses 1 γ i min 0 x 1 i g i (x), ), i = 1,...,n, (V.1) See the proof of Theorem 4.1 for the detail. Now, we present some numerical examples. We consider a network composed of n = 10 nodes, with an average number of Θ i = 5 neighbors per node. An example of the distributed computation as obtained by the algorithm (III.4) with the functions f i (x) given by (V.1) is reported in Fig. 4. The algorithm converges quickly. Specifically, we assumed that convergence is reached when x i (k +1) x i (k) ǫ, with ǫ being the desired accuracy. From Monte Carlo simulations, we observed that the algorithm converges in about 5 10 iterations on average by setting ǫ = 10 8, which is a quite small value if compared to the order of magnitude of the optimal solution (10 2 ). We solved problem (II.2)

26 26 also by traditional Lagrangian methods, which use iterations similar to (III.5) and (III.6). The method is commonly implemented by the Matlab function fmincon. We noticed that convergence is achieved by approximately 30 iterations and by twice the computations needed for algorithm (III.3). Therefore, the computational and communication efficiency of our new method is remarkable. B. Radio Power Allocation The power allocation problem of Section II-B is F-Lipschitz. We can see this by making the variable substitution x i = p i. Then, problem (II.3) can be rewritten as max x s.t. x G ii x i + S min (σ i k i G ik x k + k i M ik x 2 ix 2 k) 0, (V.2) i = 1,...,n, p max 1 x p min 1. Since M ik is smaller than G ii and G ik i,k, it is reasonable to assume that for 1p max x 1p min the following conditions hold: G ii > 2S min k i M ikx i x 2 k, G ik > 2M ik x 2 ix k and G ii + 2S min M ik p 3 min S min G ik. k i k i This assumption gives i g i (x) > j g i (x), i j for 1p max x 1p min. Therefore, Theorem 4.1 applies (see conditions Eq. (IV.6c) and Eq. (IV.6d)) and problem (V.2) has the optimal solution satisfying all the constraints at equality. The solution could be centrally computed by algorithm (III.3), or by algorithm (III.4) in a distributed set-up. In this last case, we have to transform the problem in the form (III.1) by positive scalars γ i, as shown in Section IV, thus achieving the equivalent constraints f i (x) = x i γ i g i (x), where g i (x) = G ii x i + S min (σ i G ik x k + M ik x 2 ix 2 k) i. k i k i

Constrained Optimization

1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange