Numerical methods for approximating. zeros of operators. and for solving variational inequalities. with applications

Size: px

Start display at page:

Download "Numerical methods for approximating. zeros of operators. and for solving variational inequalities. with applications"

Claribel Sims
5 years ago
Views:

1 Faculty of Mathematics and Computer Science Babeş-Bolyai University Erika Nagy Numerical methods for approximating zeros of operators and for solving variational inequalities with applications Doctoral Thesis Summary Supervisor: Prof. Dr. Gábor Kassay Cluj-Napoca 2013

3 Contents 1 Introduction 5 2 Background of the theory of monotone operators Operators Functions The primal-dual splitting technique Monotone inclusion problem formulation The primal-dual splitting algorithm Applications to convex minimization problems Numerical experiments Image processing Image deblurring Experimental setting Multifacility location problems Problem manipulation Computational experience Average consensus for colored networks Proposed algorithm Performance evaluation Support-vector machines classification The soft-margin hyperplane Experiments with digit recognition Interdisciplinary application of our algorithm 48 3

4 4 CONTENTS 5.1 Optimization in biology Pattern recognition techniques References 51

5 Chapter 1 Introduction The existence of optimization methods can be traced to the days of Newton, Lagrange, and Cauchy. The development of differential calculus methods for optimization was possible because of the contributions of Newton and Leibnitz to calculus. The foundations of calculus of variations, which deals with the minimization of functions, were laid by Bernoulli, Euler, Lagrange and Weierstrass. The method of optimization for constrained problems, which involve the addition of unknown multipliers, became known by the name of its inventor, Lagrange. Cauchy made the first application of the steepest descent method to solve unconstrained optimization problems. By the middle of the twentieth century, the high-speed digital computers made implementation of the complex optimization procedures possible and stimulated further research on newer methods. Spectacular advances followed, producing a massive literature on optimization techniques. This advancement also resulted in the emergence of several well defined new areas in optimization theory [67]. In recent years several splitting algorithms [43] have emerged for solving monotone inclusion problems involving parallel sums and compositions with linear continuous operators, which eventually are reduced to finding the zeros of the sum of a maximal monotone operator and a cocoercive or a monotone and Lipschitz continuous operator. The later problems were solved by employing in an appropriate product space forwardbackward or forward-backward-forward algorithms, respectively, and gave rise to socalled primal-dual splitting methods (see [9, 16, 20, 23, 61] and the references therein).

6 6 Introduction Recently, one can remark the interest of researchers in solving systems of monotone inclusion problems [1,5,6,22]. This is motivated by the fact that convex optimization problems arising, for instance, in areas like image processing [13], multifacility location problems [18, 31], average consensus in network coloring [46, 47] and support vector machines classification [27,28] are to be solved with respect to multiple variables, very often linked in different manners, for instance, by linear equations. The present research is motivated by the investigations made in [1]. The authors propose there an algorithm for solving coupled monotone inclusion problems, where the variables are linked by some operators which satisfy jointly a cocoercivity property. Our aim is to overcome the necessity of having differentiability for some of the functions occurring in the objective of the convex optimization problems in [1]. To this end we consider first a more general system of monotone inclusions, for which the coupling operator satisfies a Lipschitz continuity property, along with its dual system of monotone inclusions in an extended sense of the Attouch-Théra duality (see [2]). The simultaneous solving of the primal and dual system of monotone inclusions is reduced to the problem of finding the zeros of the sum of a maximal monotone operator and a monotone and Lipschitz continuous operator in an appropriate product space. The latter problem is solved by a forward-backward-forward algorithm, fact that allows us to provide for the resulting iterative scheme, which proves to have a high parallelizable formulation, both weak and strong convergence assertions. The thesis is organized in five chapters and a bibliography. In the first chapter we state the initial problem that stimulated this research. H. Attouch, L.M. Briceno- Arias and P.L. Combettes proposed in [1] a parallel splitting method for solving systems of coupled monotone inclusions in Hilbert spaces, establishing its convergence under the assumption that solutions exist. The method can handle an arbitrary number of variables and nonlinear coupling schemes. In the second chapter we give some necessary notations and preliminary results in order to facilitate the reading of the manuscript and to help the reader to understand easily the following parts. Notions like nonexpansive operators, parallel sum, cocoercivity, conjugation, Fenchel duality, subdifferentiability and maximal monotone operators are introduced together with some essential splitting algorithms like

7 Introduction 7 the Douglas-Rachford algorithm, forward-backward algorithm and Tseng s algorithm. These algorithms have been studied more extensively in [4]. Chapter three is devoted to the primal-dual splitting algorithm for solving the problem considered in this thesis and investigates its convergence behaviour. The operators arising in each of the inclusions of the system are processed in every iteration separately, namely, the single-valued are evaluated explicitly, which is equivalent to a forward step, while the set-valued ones via their resolvents, which is equivalent to a backward step. In addition, most of the steps in the iterative scheme can be executed simultaneously, this making the method applicable to a variety of convex minimization problems. The solving of convex optimization problems with multiple variables is also presented in this chapter. In Chapter four, the numerical performances and effectiveness of the proposed splitting algorithm are emphasized through several numerical experiments, like image processing, where a noisy and blurry image is cleared; multifacility location problems, where the objective is to minimize the energy path between existing facilities and multiple new facilities, which have to be located among the existing ones; average consensus on colored networks, which consists in calculating the average of the value of each node in a recursive and distributed way; image classification via support vector machines, where the aim is to construct a decision function with the help of training data, which should assign every new data correctly with a low misclassification rate. Finally, the last part of the thesis presents a collaboration with a research team in biology studying the presence of picoalgae in Transylvanian salt lakes. The aim was to adapt the proposed algorithm for image processing and pattern recognition for microscopic images and electrophoretograms. The developing process included the work with epifluorescence microscopic images of picoalgae and the goal was to count and distinguish picoalgae from the background noise. Keywords convex optimization, coupled systems, forward-backward-forward algorithm, Lipschitz continuity, monotone inclusion, operator splitting, pattern recognition

8 Chapter 2 Background of the theory of monotone operators The mathematical notion of a Hilbert space was named after the German mathematician David Hilbert. The earliest Hilbert spaces were studied as infinite-dimensional function spaces in the first decade of the 20 th century by David Hilbert, Frigyes Riesz and Erhard Schmidt. Throughout this work R + denotes the set of non negative real numbers, R ++ the set of strictly positive real numbers and R = R {± } the extended real-line. We consider Hilbert spaces endowed with the scalar (or inner) product and the associated norm =. In order to avoid confusion, when needed, appropriate indices will be used for the inner product and norm. The symbols and denote weak and strong convergence, respectively. 2.1 Operators Let H be a real Hilbert space and let 2 H be the power set of H, i.e., the family of all subsets of H. The notation M : H 2 H means that M is a set-valued operator, i.e. M maps every point x H to a set Mx H. We denote by zer M = {x H : 0 Mx} the set of zeros of M and by gra M = {(x, u) H H : u Mx} the graph of M. The domain and the range of M are dom M = {x H : Mx } and ran M = M(H),

9 2.2 Functions 9 respectively. In addition, the closure of dom M is denoted by domm and the closure of ran M is denoted by ranm. The reversal of M is ˇM : x M( x). The inverse of M, denoted by M 1 : H 2 H, is defined through its graph gra M 1 = {(u, x) H H : (x, u) gra M}. Thus, for every (x, u) H H, u Mx x M 1 u. Moreover, dom M 1 = ran M and ran M 1 = dom M. The sum and the parallel sum of two set-valued operators M, N : H 2 H are defined as M + N : H 2 H, (M + N)(x) = M(x) + N(x), x H (2.1.1) and M N : H 2 H, M N = ( M 1 + N 1) 1. (2.1.2) Let G be another real Hilbert space and consider the single-valued operator L : H G (also called mapping), that maps every point x in H to a point Lx in G. The norm of the linear continuous operator L is defined as L = sup{ Lx : x H, x 1}, while L : G H, defined by Lx y = x L y for all (x, y) H G, denotes the adjoint operator of L. 2.2 Functions Let H be a real Hilbert space. Definition Consider the function f : H R. The effective domain of f is dom f = {x H : f(x) < + }, the graph of f is gra f = {(x, ξ) H R : f(x) = ξ},

10 10 Chapter 2. Background of the theory of monotone operators the epigraph of f is epi f = {(x, ξ) H R : f(x) ξ}, the lower level set of f at height ξ R is lev f = {x H : f(x) ξ}, ξ and the strict lower level set of f at height ξ R is lev f = {x H : f(x) < ξ}. <ξ The function f is called proper if dom f and f(x) > for all x H. Furthermore, the closures of dom f and epi f are respectively denoted by domf and epif. f : H R. We denote by Γ(H) the set of proper, convex and lower semicontinuous functions The indicator function δ C : H R of a set C H is defined by 0, for x C; δ C (x) = +, otherwise. (2.2.1) Note that the indicator function δ C is lower semicontinuous if and only if C is closed. Definition Let f and g be two proper functions from H to R. The infimal convolution (or epi-sum) of f and g is defined by f g : H R, f g(x) = inf {f(y) + g(x y)}. y H Definition Let f : H R. The conjugate function (or Fenchel conjugate) of f is f : H R, f (u) = sup { u, x f(x) : x H} for all u H and the biconjugate of f is f = (f ). Note that, if f Γ(H), then f Γ(H), as well. The following version of Fenchel s duality theorem was obtained by R. T. Rockafellar [53].

11 2.2 Functions 11 Theorem Let H be a Hilbert space. Let f, g : H R + be proper convex functions. (i) If dom f dom g contains a point at which f or g is continuous, then (f + g) = f g with exact infimal convolution. (ii) Suppose that f and g belong to Γ(H). If dom f dom g contains a point at which f or g is continuous, then f g = (f + g ) with exact infimal convolution. Definition The primal problem associated with the sum of two proper functions f : H R and g : H R is min{f(x) + g(x)}, x H its dual problem is min {f (u) + g (u)}, u H the primal optimal value is µ = inf(f + g)(h), the dual optimal value is µ = inf(f + ǧ )(H), and the duality gap is 0, µ = µ R; (f, g) = µ + µ, otherwise. (2.2.2)

12 Chapter 3 The primal-dual splitting technique The following results presented in th thesis are based on the article [10], that resulted from my cooperation with professor Radu Ioan Boţ and Ernő Robert Csetnek. 3.1 Monotone inclusion problem formulation The problem under consideration is a more general system of monotone inclusions, than the one presented in [1], for which the coupling operator satisfies a Lipschitz continuity property, along with its dual system of monotone inclusions in an extended sense of the Attouch-Théra duality [2]. Problem Let m 1 be a positive integer, (H i ) 1 i m be real Hilbert spaces and for i = 1,..., m let B i : H 1... H m H i be a µ i -Lipschitz continuous operator with µ i R ++ jointly satisfying the monotonicity property (x 1,..., x m ) H 1... H m, (y 1,..., y m ) H 1... H m m x i y i B i (x 1,..., x m ) B i (y 1,..., y m ) Hi 0. (3.1.1) i=1 For every i = 1,..., m, let G i be a real Hilbert space, A i : G i 2 G i a maximal monotone operator, C i : G i 2 G i a monotone operator such that C 1 i is ν i -Lipschitz continuous with ν i R + and L i : H i G i a linear continuous operator. The problem is to solve

13 3.1 Monotone inclusion problem formulation 13 the system of coupled inclusions find x 1 H 1,..., x m H m such that together with its dual system find v 1 G 1,..., v m G m such that x 1 H 1,..., x m H m 0 L 1(A 1 C 1 )(L 1 x 1 ) + B 1 (x 1,..., x m ). 0 L m(a m C m )(L m x m ) + B m (x 1,..., x m ) (3.1.2) 0 = L 1v 1 + B 1 (x 1,..., x m ). 0 = L mv m + B m (x 1,..., x m ) v 1 (A 1 C 1 )(L 1 x 1 ). v m (A m C m )(L m x m ). (3.1.3) We say that (x 1,..., x m, v 1,..., v m ) H 1... H m G 1... G m is a primaldual solution to Problem 3.1.1, if 0 = L i v i + B i (x 1,..., x m ) and v i (A i C i )(L i x i ), i = 1,..., m. (3.1.4) If (x 1,..., x m, v 1,..., v m ) H 1... H m G 1... G m is a primal-dual solution to Problem 3.1.1, then (x 1,..., x m ) is a solution to (3.1.2) and (v 1,..., v m ) is a solution to (3.1.3). Notice also that (x 1,..., x m ) solves (3.1.2) 0 L i (A i C i )(L i x i ) + B i (x 1,..., x m ), i = 1,..., m 0 = L i v i + B i (x 1,..., x m ), i = 1,..., m v 1 G 1,..., v m G m such that. v i (A i C i )(L i x i ), i = 1,..., m. Thus, if (x 1,..., x m ) is a solution to (3.1.2), then there exists (v 1,..., v m ) G 1... G m such that (x 1,..., x m, v 1,..., v m ) is a primal-dual solution to Problem and, if (v 1,..., v m ) G 1... G m is a solution to (3.1.3), then there exists (x 1,..., x m ) H 1... H m such that (x 1,..., x m, v 1,..., v m ) is a primal-dual solution to Problem

14 14 Chapter 3. The primal-dual splitting technique 3.2 The primal-dual splitting algorithm The aim of this section is to provide an algorithm for solving Problem and to furnish weak and strong convergence results for the sequences generated by it. The proposed iterative scheme has the property that each single-valued operator is processed explicitly, while each set-valued operator is evaluated via its resolvent. Absolutely summable sequences make the algorithm error-tolerant. Algorithm For every i = 1,..., m let (a 1,i,n ) n 0, (b 1,i,n ) n 0, (c 1,i,n ) n 0 be absolutely summable sequences in H i and (a 2,i,n ) n 0, (b 2,i,n ) n 0, (c 2,i,n ) n 0 absolutely summable sequences in G i. Furthermore, set β = max m µ 2 i, ν 1,..., ν m + max L i, (3.2.1) i=1,...,m i=1 let ε ]0, 1/(β + 1)[ and (γ n ) n 0 be a sequence in [ε, (1 ε)/β]. For every i = 1,..., m let the initial points x i,0 H i and v i,0 G i be chosen arbitrary and set n 0 For i = 1,..., m y i,n = x i,n γ n (L i v i,n + B i (x 1,n,..., x m,n ) + a 1,i,n ) w i,n = v i,n γ n (C 1 i v i,n L i x i,n + a 2,i,n ) p i,n = y i,n + b 1,i,n r i,n = J γna 1 w i,n + b 2,i,n i q i,n = p i,n γ n (L i r i,n + B i (p 1,n,..., p m,n ) + c 1,i,n ) s i,n = r i,n γ n (C 1 i r i,n L i p i,n + c 2,i,n ) x i,n+1 = x i,n y i,n + q i,n v i,n+1 = v i,n w i,n + s i,n. The convergence of Algorithm is established by showing that its iterative scheme can be reduced to the error-tolerant version of the forward-backward-forward algorithm of Tseng (see [59] for the error-free case) recently provided in [16]. Our algo-

15 3.2 The primal-dual splitting algorithm 15 rithmic framework will hinge on the following splitting algorithm, which is of interest in its own right. Theorem [16] Let H be a real Hilbert space, let M : H 2 H be maximal monotone, and let L : H H be monotone. Suppose that zer(m + L) and that L is β-lipschitz continuous for some β ]0, + [. Let (a n ) n N, (b n ) n N and (c n ) n N be sequences in H such that a n < +, b n < + and c n < +, n N n N n N let x 0 H, let ε ]0, 1/(β + 1)[, let (γ n ) n N be a sequence in [ε, (1 ε)/β], and set n N y n = x n γ n (Lx n + a n ) p n = J γnmy n + b n q n = p n γ n (Lp n ) + c n ) x n+1 = x n y n + q n. Then the following hold for some x zer(m + L): (i) n N x n p n 2 < + and n N y n q n 2 < +. (ii) x n x and p n x. (iii) Suppose that one of the following is satisfied: (a) M + L is demiregular at x. (b) M or L is uniformly monotone at x. (c) int zer(m + L). Then x n x and p n x. For a more detailed insight on the problem of simultaneously solving a large class of composite monotone inclusions and their duals, while reducing it to the problem of finding a zero of the sum of a maximal monotone operator and a linear skew-adjoint operator, we recommend the work of Luis M. Briceño-Arias and Patrick L. Combettes in [16]. The following theorem establishes the convergence of Algorithm 3.2.1:

16 16 Chapter 3. The primal-dual splitting technique Theorem [10] Suppose that Problem has a primal-dual solution. For the sequences generated by Algorithm the following statements are true: (i) i {1,..., m} x i,n p i,n 2 H i < + and v i,n r i,n 2 G i < +. n 0 n 0 (ii) There exists a primal-dual solution (x 1,..., x m, v 1,..., v m ) to Problem such that: (a) i {1,..., m} x i,n x i, p i,n x i, v i,n v i and r i,n v i as n +. (b) if C 1 i, i = 1,..., m, is uniformly monotone and there exists an increasing function φ B : R + R + {+ } vanishing only at 0 and fulfilling (x 1,..., x m ) H 1... H m, (y 1,..., y m ) H 1... H m m x i y i B i (x 1,..., x m ) B i (y 1,..., y m ) Hi i=1 φ B ( (x 1,..., x m ) (y 1,..., y m ) ), (3.2.2) then i {1,..., m} x i,n x i, p i,n x i, v i,n v i and r i,n v i as n Applications to convex minimization problems In this section we turn our attention to the solving of convex minimization problems with multiple variables via the primal-dual algorithm presented and investigated in this thesis. Problem Let m 1 and p 1 be positive integers, (H i ) 1 i m, (H i) 1 i m and (G j ) 1 j p be real Hilbert spaces, f i, h i Γ(H i) such that h i is ν 1 i -strongly convex with ν i R ++, i = 1,..., m, and g j Γ(G j ) for i = 1,..., m, j = 1,..., p. Further, let be K i : H i H i and L ji : H i G j, i = 1,..., m, j = 1,..., p linear continuous operators. Consider the convex optimization problem inf (x 1,...,x m) H 1... H m { m (f i h i )(K i x i ) + i=1 ( p m )} g j L ji x i. (3.3.1) j=1 i=1

17 3.3 Applications to convex minimization problems 17 In what follows we show that under an appropriate qualification condition solving the convex optimization problem (3.3.1) can be reduced to the solving of a system of monotone inclusions of type (3.1.2). Let us define the following proper convex and lower semicontinuous function f : H 1... H m R, and the linear continuous operator (y 1,..., y m ) m (f i h i )(y i ), i=1 K : H 1... H m H 1... H m, (x 1,..., x m ) (K 1 x 1,..., K m x m ), having as adjoint K : H 1... H m H 1... H m, (y 1,..., y m ) (K 1y 1,..., K my m ). Further, consider the linear continuous operators having as adjoints L j : H 1... H m G j, (x 1,..., x m ) m L ji x i, j = 1,..., p, i=1 L j : G j H 1... H m, y (L j1y,..., L jmy), j = 1,..., p, respectively. We have (x 1,..., x m ) is an optimal solution to (3.3.1) ( ) p (0,..., 0) f K + g j L j (x 1,... x m ). (3.3.2) In order to split the above subdifferential in a sum of subdifferentials a so-called qualification condition must be fulfilled. In this context, we consider the following j=1

18 18 Chapter 3. The primal-dual splitting technique interiority-type qualification conditions: there exists x i H i such that (QC 1 ) K i x i (dom f i + dom h i ) and f i h i is continuous at K i x i, i = 1,..., m, and m i=1 L jix i dom g j and g j is continuous at m i=1 L jix i, j = 1,..., p and (QC 2 ) (0,..., 0) sqri ( m i=1 (dom f i + dom h i ) p j=1 dom g j {(K 1 x 1,..., K m x m, m i=1 L 1ix i,..., m i=1 L pix i ) : ) (x 1,..., x m ) H 1... H m }. We notice that (QC 1 ) (QC 2 ), these implications being in general strict, and refer the reader to [4, 7, 8, 29, 58, 68, 69] and the references therein for other qualification conditions in convex optimization. Remark As already pointed out, for i = 1,..., m, f i h i Γ(H i), hence, it is continuous on int(dom f i + dom h i ), providing this set is nonempty (see [29, 69]). For other results regarding the continuity of the infimal convolution of convex functions we invite the reader to consult [57]. Remark In finite-dimensional spaces the qualification condition (QC 2 ) is equivalent to there exists x (QC 2 ) i H i such that K i x i ri dom f i + ri dom h i, i = 1,..., m, and m i=1 L jix i ri dom g j, j = 1,..., p. Assuming that one of the qualification conditions above is fulfilled, we have that (x 1,..., x m ) is an optimal solution to (3.3.1) (0,..., 0) K f ( K(x 1,... x m ) ) p ( + L j g j Lj (x 1,... x m ) ) ( ) (0,..., 0) K1 (f 1 h 1 )(K 1 x 1 ),..., Km (f m h m )(K m x m ) p ( + L j g j Lj (x 1,... x m ) ). (3.3.3) j=1 j=1

19 3.3 Applications to convex minimization problems 19 The strong convexity of the functions h i imply that dom h i = H i and so (f i h i ) = f i h i, i = 1,..., m. Thus, (3.3.3) is further equivalent to where ( ) (0,..., 0) K1( f 1 h 1 )(K 1 x 1 ),..., Km( f m h m )(K m x m ) + p L jv j, j=1 ( v j g j Lj (x 1,... x m ) ) ( m ) v j g j L ji x i i=1 m i=1 L ji x i g j (v j ), j = 1,..., p. Then (x 1,..., x m ) is an optimal solution to (3.3.1) if and only if (x 1,..., x m, v 1,..., v p ) is a solution to by taking 0 K 1( f 1 h 1 )(K 1 x 1 ) + p j=1 L j1v j. 0 K m( f m h m )(K m x m ) + p j=1 L jmv j 0 g 1(v 1 ) m i=1 L 1ix i. 0 g p(v p ) m i=1 L pix i. (3.3.4) One can see now that (3.3.4) is a system of coupled inclusions of type (3.1.2), A i = f i, C i = h i, L i = K i, i = 1,..., m, G A m+j = gj j, x = 0, C m+j (x) =,, otherwise L m+j = Id Gj, j = 1,..., p, and, for (x 1,..., x m, v 1,..., v p ) H 1... H m G 1... G p, as coupling operators B i (x 1,..., x m, v 1,..., v p ) = p L jiv j, i = 1,..., m, j=1

20 20 Chapter 3. The primal-dual splitting technique and B m+j (x 1,..., x m, v 1,..., v p ) = Define m L ji x i, j = 1,..., p. i=1 B(x 1,..., x m, v 1,..., v p ) = (B 1,..., B m+p )(x 1,..., x m, v 1,..., v p ) (3.3.5) ( p ) p m m = L j1v j,..., L jmv j, L 1i x i,..., L pi x i. j=1 j=1 i=1 i=1 (3.3.6) It follows that C 1 i = ( h i ) 1 = h i = { h i } is ν i -Lipschitz continuous for i = 1,..., m. On the other hand, C 1 m+j Lipschitz continuous. is the zero operator for j = 1,..., p, thus 0- Furthermore, the operators B i, i = 1,..., m + p are linear and Lipschitz continuous, having as Lipschitz constants p µ i = L ji 2, i = 1,..., m, and µ m+j = m L ji 2, j = 1,..., p, j=1 i=1 respectively. For every (x 1,..., x m, v 1,..., v p ), (y 1,..., y m, w 1,..., w p ) H 1... H m G 1... G p it holds m x i y i B i (x 1,..., x m, v 1,..., v p ) B i (y 1,..., y m, w 1,..., w p ) Hi + = i=1 p v j w j B m+j (x 1,..., x m, v 1,..., v p ) B m+j (y 1,..., y m, w 1,..., w p ) Gj j=1 m x i y i p p L jiv j L jiw j Hi p m m v j w j L ji x i L ji y i = 0, Gj i=1 j=1 j=1 j=1 i=1 i=1 thus (3.1.1) is fulfilled. This proves also that the linear continuous operator B is skew (i.e. B = B). Remark Due to the fact that the operator B is skew, it is not cocoercive, hence,

21 3.3 Applications to convex minimization problems 21 the approach presented in [1] cannot be applied in this context. On the other hand, in the light of the characterization given in (3.3.2), in order to determine an optimal solution of the optimization problem (3.3.1) (and an optimal solution of its Fenchel-type dual as well) one can use the primal-dual proximal splitting algorithms which have been recently introduced in [23, 61]. These approaches have the particularity to deal in an efficient way with sums of compositions of proper, convex and lower semicontinuous function with linear continuous operators, by evaluating separately each function via a backward step and each linear continuous operator (and its adjoint) via a forward step. However, the iterative scheme we propose in this section for solving (3.3.1) has the advantage of exploiting the separable structure of the problem. Let us also mention that the dual inclusion problem of (3.3.4) reads (see (3.1.3)) find w 1 H 1,..., w m H m, w m+1 G 1,..., w m+p G p such that x 1 H 1,..., x m H m, v 1 G 1,..., v p G p 0 = K 1w 1 + p j=1 L j1v j. 0 = K mw m + p j=1 L jmv j 0 = w m+1 m i=1 L 1ix i. 0 = w m+p m i=1 L pix i w 1 ( f 1 h 1 )(K 1 x 1 ). w m ( f m h m )(K m x m ) w m+1 g 1(v 1 ). w m+p g p(v p ). (3.3.7) Then (x 1,...x m, v 1,..., v p, w 1,..., w m, w m+1,..., w m+p ) is a primal-dual solution to (3.3.4) - (3.3.7), if w i ( f i h i )(K i x i ), w m+j gj (v j ), p m 0 = Ki w i + L jiv j and 0 = w m+j L ji x i, i = 1,..., m, j = 1,..., p. j=1 i=1

22 22 Chapter 3. The primal-dual splitting technique Provided that (x 1,...x m, v 1,..., v p, w 1,..., w m, w m+1,..., w m+p ) is a primal-dual solution to (3.3.4) - (3.3.7), it follows that (x 1,...x m ) is an optimal solution to (3.3.1) and (w 1,..., w m, v 1,..., v p ) is an optimal solution to its Fenchel-type dual problem sup (w 1,...,w m,w m+1,...,w m+p ) H 1... H m G 1... G p K i w i+ p j=1 L ji w m+j=0,i=1,...,m { m (fi (w i ) + h i (w i )) i=1 } p gj (w m+j ). j=1 (3.3.8) Algorithm gives rise to the following iterative scheme for solving (3.3.4) - (3.3.7). Algorithm For every i = 1,..., m and every j = 1,..., p let (a 1,i,n ) n 0, (b 1,i,n ) n 0, (c 1,i,n ) n 0, be absolutely summable sequences in H i, (a 2,i,n ) n 0, (b 2,i,n ) n 0, (c 2,i,n ) n 0 be absolutely summable sequences in H i and (a 1,m+j,n ) n 0, (a 2,m+j,n ) n 0, (b 1,m+j,n ) n 0, (b 2,m+j,n ) n 0, (c 1,m+j,n ) n 0 and (c 2,m+j,n ) n 0 be absolutely summable sequences in G j. Furthermore, set where β = max m+p µ 2 i, ν 1,..., ν m + max { K 1,..., K m, 1}, (3.3.9) i=1 p µ i = L ji 2, i = 1,..., m, and µ m+j = m L ji 2, j = 1,..., p, (3.3.10) j=1 let ε ]0, 1/(β + 1)[ and (γ n ) n 0 i=1 be a sequence in [ε, (1 ε)/β]. Let the initial points (x 1,1,0,..., x 1,m,0 ) H 1... H m, (x 2,1,0,..., x 2,m,0 ) H 1... H m and

23 3.3 Applications to convex minimization problems 23 (v 1,1,0,..., v 1,p,0 ), (v 2,1,0,..., v 2,p,0 ) G 1... G p be arbitrary chosen and set n 0 For i = 1,..., m ( y 1,i,n = x 1,i,n γ n Ki x 2,i,n + ) p j=1 L jiv 1,j,n + a 1,i,n y 2,i,n = x 2,i,n γ n ( h i x 2,i,n K i x 1,i,n + a 2,i,n ) p 1,i,n = y 1,i,n + b 1,i,n p 2,i,n = Prox γnf i y 2,i,n + b 2,i,n For j = 1,..., p w 1,j,n = v 1,j,n γ n (v 2,j,n m i=1 L jix 1,i,n + a 1,m+j,n ) w 2,j,n = v 2,j,n γ n ( v 1,j,n + a 2,m+j,n ) r 1,j,n = w 1,j,n + b 1,m+j,n r 2,j,n = Prox γngj w 2,j,n + b 2,m+j,n For i = 1,..., m ( q 1,i,n = p 1,i,n γ n Ki p 2,i,n + ) p j=1 L jir 1,j,n + c 1,i,n q 2,i,n = p 2,i,n γ n ( h i p 2,i,n K i p 1,i,n + c 2,i,n ) x 1,i,n+1 = x 1,i,n y 1,i,n + q 1,i,n x 2,i,n+1 = x 2,i,n y 2,i,n + q 2,i,n For j = 1,..., p s 1,j,n = r 1,j,n γ n (r 2,j,n m i=1 L jip 1,i,n + c 1,m+j,n ) s 2,j,n = r 2,j,n γ n ( r 1,j,n + c 2,m+j,n ) v 1,j,n+1 = v 1,j,n w 1,j,n + s 1,j,n v 2,j,n+1 = v 2,j,n w 2,j,n + s 2,j,n. The following convergence result for Algorithm is a consequence of Theorem Theorem [10] Suppose that the optimization problem (3.3.1) has an optimal solution and that one of the qualification conditions (QC i ), i = 1, 2, is fulfilled. For the sequences generated by Algorithm the following statements are true: (i) i {1,..., m} x 1,i,n p 1,i,n 2 H i < +, x 2,i,n p 2,i,n 2 H i < + and n 0 n 0 j {1,..., p} v 1,j,n r 1,j,n 2 G j < +, v 2,j,n r 2,j,n 2 G j < +. n 0 n 0 (ii) There exists an optimal solution (x 1,..., x m ) to (3.3.1) and an optimal solution

24 24 Chapter 3. The primal-dual splitting technique (w 1,..., w m, w m+1,..., w m+p ) to (3.3.8), such that i {1,..., m} x 1,i,n x i, p 1,i,n x i, x 2,i,n w i and p 2,i,n w i and j {1,..., p} v 1,j,n w m+j and r 1,j,n w m+j as n +. Remark Recently, in [22], another iterative scheme for solving systems of monotone inclusions, that is also able to handle with the solving of optimization problems of type (3.3.1), in case when the functions g j, j = 1,..., p, are not necessarily differentiable, was proposed. Different to our approach, which assumes that the variables are coupled by the single-valued operator B, in [22] the coupling is made by some compositions of parallel sums of maximal monotone operators with linear continuous ones.

25 Chapter 4 Numerical experiments The obtained theoretical results are highly applicable in various fields of mathematics. In this section we present four numerical experiments which emphasize the performances of the primal-dual algorithm and some of its variants for systems of coupled monotone inclusions, namely image processing, multifacility location problems, average consensus for colored networks and image classification via support-vector machines. 4.1 Image processing The first numerical experiment solves the problem of image processing via the primal-dual splitting algorithm developed in this thesis. The aim is to estimate the unknown original image form the blurred and noisy image, with the help of the algorithm Image deblurring Let H and (H i) 1 i m be real Hilbert spaces, where m 1. For every i {1,..., m}, let f i Γ(H i) and let K i : H H i be continuous linear operators. Consider the initial convex optimization problem inf x H { m i=1 } f i (K i x). (4.1.1)

26 26 Chapter 4. Numerical experiments In the case of multiple variables, (4.1.1) can be reformulated as: { m inf f i (K i x i ) + x 1,...,x m+1 i=1 m i=1 } δ {0} (x i x m+1 ). (4.1.2) This is a special instance of Problem with (m, p) := (m + 1, m), f m+1 = 0, K m+1 = 0 and i {1,..., m + 1}, h i = δ {0} and j {1,..., m}, G j = H, g j = δ {0} and Id, if i = j; L ji : H i H j Id, if i = m + 1; (4.1.3) 0, otherwise. Let (x 1,1,0,..., x 1,m+1,0 ), (x 2,1,0,..., x 2,m+1,0 ) H 1... H m+1 and (v 1,1,0,..., v 1,m,0 ), (v 2,1,0,..., v 2,m,0 ) H 1... H m. In this case (3.3.9) yields β = 2 m + max { } K 1,..., K m+1, 1. (4.1.4) Algorithm Hence, the iterative scheme in Algorithm reads For i = 1,..., m ( ) y 1,i,n = x 1,i,n γ n K i x 2,i,n + v 1,i,n ( ) w 1,i,n = v 1,i,n γ n v2,i,n x 1,i,n + x 1,m+1,n y 2,i,n = x 2,i,n + γ n K i x 1,i,n w 2,i,n = v 2,i,n + γ n v 1,i,n y n 0 1,m+1,n = x 1,m+1,n + γ m n i=1 v 1,i,n For i = 1,..., m ( x 1,i,n+1 = x 1,i,n γ n K i prox γnf y ) i 2,i,n + w 1,i,n ( ) v 1,i,n+1 = v 1,i,n + γ n y1,i,n y 1,m+1,n x 2,i,n+1 = x 2,i,n y 2,i,n + prox γnf y i 2,i,n + γ n K i y 1,i,n v 2,i,n+1 = v 2,i,n w 2,i,n + γ n w 1,i,n x 1,m+1,n+1 = x 1,m+1,n + γ m n i=1 w 1,i,n

27 4.1 Image processing 27 Then (x 1,1,n,..., x 1,m+1,n ) (x,..., x). Consider a matrix A R n n representing the blur operator and a given vector b R n describing the blurred and noisy image. The goal is to estimate x R n, which is the unknown original image and fulfills the following Ax = b. The problem to be solved can be equivalently written as { inf f1 (x) + f 2 (Ax) }, (4.1.5) x S for f 1 : R n R, f 1 (x) = λ x 1 + δ S (x) and f 2 : R n R, f 2 (y) = y b 2. Thus f is proper, convex and lower semicontinuous with bounded domain and g is a 2- strongly convex function with full domain, differentiable everywhere and with Lipschitz continuous gradient having as Lipschitz constant 2. λ > 0 represents the regularization parameter and S R n is an n-dimensional cube representing the range of the pixels. Since each pixel furnishes a greyscale value which is between 0 and 255, a natural choice for the convex set S would be the n-dimensional cube [0, 255] n R n. In order to reduce the Lipschitz constant which appears in the developed approach, we scale the pictures that we use to exemplify the application, such that each of their pixels ranges in the interval [ 0, 1 10].

28 28 Chapter 4. Numerical experiments Experimental setting We concretely look at the Cameraman, Lena and Barbara test images. The Cameraman test image is part of the image processing toolbox in Matlab and its vectorized and scaled image dimension equals n = = By making use of the Matlab functions imfilter and fspecial, this image is blurred as follows: 1 H=f s p e c i a l ( gaussian, 9, 4 ) ; % gaussian blur o f s i z e 9 times 9 2 % and standard d e v i a t i o n 4 3 B=i m f i l t e r (X,H, conv, symmetric ) ; % B=observed b l u r r e d image 4 % X=o r i g i n a l image In row 1 the function fspecial returns a rotationally symmetric Gaussian lowpass filter of size 9 9 with standard deviation 4. The entries of H are nonnegative and their sum adds up to 1. In row 3 the function imfilter convolves the filter H with the image X R and outputs the blurred image B R The boundary option symmetric corresponds to reflexive boundary conditions. For more interesting applications concerning Matlab test images deblurred with different techniques, we refer the reader to [11 15, 19]. Thanks to the rotationally symmetric filter H, the linear operator A R n n given by the Matlab function imfilter is symmetric, too. After adding a zero-mean white Gaussian noise with standard deviation 10 4, we obtain the blurred and noisy image b R n. In this particular case m := 2, K 1 = Id, K 2 = A and β equals β = max { } A, 1. (4.1.6) By making use of the real spectral decomposition of A, it shows that A 2 = 1. Hence β =

29 4.1 Image processing 29 Algorithm Algorithm can be written as n 0 ( ) y 1,1,n = x 1,1,n γ n x2,1,n + v 1,1,n ( ) w 1,1,n = v 1,1,n γ n v2,1,n x 1,1,n + x 1,3,n y 2,1,n = x 2,1,n + γ n x 1,1,n w 2,1,n = v 2,1,n + γ n v 1,1,n ( ) y 1,2,n = x 1,2,n γ n A x 2,2,n + v 1,2,n ( ) w 1,2,n = v 1,2,n γ n v2,2,n x 1,2,n + x 1,3,n y 2,2,n = x 2,2,n + γ n Ax 1,2,n w 2,2,n = v 2,2,n + γ n v 1,2,n y 1,3,n = x 1,3,n + γ n (v 1,1,n + v 1,2,n ) ( x 1,1,n+1 = x 1,1,n γ n proxγnf y ) 1 2,1,n + w 1,1,n ( ) v 1,1,n+1 = v 1,1,n + γ n y1,1,n y 1,3,n x 2,1,n+1 = x 2,1,n y 2,1,n + prox γnf y 1 2,1,n + γ n y 1,1,n v 2,1,n+1 = v 2,1,n w 2,1,n + γ n w 1,1,n ( x 1,2,n+1 = x 1,2,n γ n A prox γnf y ) 2 2,2,n + w 1,2,n ( ) v 1,2,n+1 = v 1,2,n + γ n y1,2,n y 1,3,n x 2,2,n+1 = x 2,2,n y 2,2,n + prox γnf y 2 2,2,n + γ n Ay 1,2,n v 2,2,n+1 = v 2,2,n w 2,2,n + γ n w 1,2,n x 1,3,n+1 = x 1,3,n + γ n (w 1,1,n + w 1,2,n ). Then (x 1,1,n, x 1,2,n, x 1,3,n ) (x, x, x). Figure 4.1 shows the reconstructed images with 500 and 1000 iterations. Another test image which is very popular and is often used in image processing is the Lena test image. The picture undergoes the same blur as described in the previous case. Figure 4.2 shows the reconstructed images with 500 and 1500 iterations. A third example in image processing is the Barbara test image. We ran our algorithm on this test image too in order to demonstrate the applicability of the method on different images. Figure 4.3 shows the reconstructed images in the case

30 30 Chapter 4. Numerical experiments (a) Blurred and noisy image (b) 500 iterations (c) 1000 iterations (d) Original image Figure 4.1: Figure (a) is the obtained image after multiplying with a blur operator and adding white Gaussian noise, (b) shows the averaged sequence generated by Algorithm after 500 iterations. Figure (c) is the reconstructed image after 1000 iterations, (d) shows the clean Cameraman test image. of 400 and 800 iterations. A valuable tool for measuring the quality of these images is the so-called improvement in signal-to-noise ratio (ISNR), which is defined as ( ) x b 2 ISNR(k) = 10 log 10 x x k 2

31 4.1 Image processing 31 (a) Blurred and noisy image (b) 500 iterations (c) 1500 iterations (d) Original image Figure 4.2: Figure (a) is the obtained image after multiplying with a blur operator and adding white Gaussian noise, (b) shows the averaged sequence generated by Algorithm after 500 iterations. Figure (c) is the reconstructed image after 1500 iterations, (d) shows the clean Lena test image. where x, b and x k denote the original, observed and estimated image at iteration k, respectively. Figure 4.4 shows the evolution of the ISNR values for the three images presented.

32 32 Chapter 4. Numerical experiments (a) Blurred and noisy image (b) 400 iterations (c) 800 iterations (d) Original image Figure 4.3: Figure (a) is the obtained image after multiplying with a blur operator and adding white Gaussian noise, (b) shows the averaged sequence generated by Algorithm after 400 iterations. Figure (c) is the reconstructed image after 800 iterations, (d) shows the clean Barbara test image. 4.2 Multifacility location problems The second numerical experiment solves the multifacility location problem. It was as early as the 17 th century that mathematicians, notably Fermat, were concerned with what are known as single facility location problems. However, it was not until the 20 th century that normative approaches to solving symbolic models of these and related problems were addressed in the literature. Each of these solution techniques

33 4.2 Multifacility location problems 33 Signal to noise ratio 8 Cameraman Lena Barbara ISNR Number of iterations Figure 4.4: Evolution of the signal-to-noise ratio (ISNR) for the Cameraman, Lena and Barbara test images. concerned themselves with determining the location of a new facility, or new facilities, with respect to the location of existing facilities so as to minimize a cost function based on a weighted interfacility distance measure [17, 18, 49, 51, 52, 64 66]. The Euclidean distance problem for the case of single new facilities was addressed by Weiszfeld [63], Miehle [45], Kuhn and Kuenne [42], and Cooper [24 26] to name a few. However, it was not until the work of Kuhn [41] that the problem was considered completely solved. A computational procedure for minimizing the Euclidean multifacility problem was presented by Vergin and Rogers [60] in 1967; however, their techniques sometimes give suboptimum solutions. Two years later, Love [44] gave a scheme for solving this problem which makes use of convex programming and penalty function techniques. One advantage to this approach is that it considers the existence of various types of spatial constraints. In 1973, Eyster, White and Wierwille [31] presented the hyperboloid approximation procedure (HAP) for both rectilinear and Euclidean distance measures which extended the technique employed in solving the single facility problem to the multifacility case [18]. If one studies a list of references of the work done in the past decade involving facility location problems, it becomes readily apparent that there exists a strong in-

34 34 Chapter 4. Numerical experiments terdisciplinary interest in this area within the fields of applied mathematics, operation research, civil engineering, architecture, management science, systems engineering, logistics, economics, regional science, transportation systems, urban design and industrial engineering, among others. As a result, the term facility has taken on a very broad connotation in order to suit applications in each of these areas. For example, a facility can be a hospital, an ambulance, an airport, a police cruise, a manned space vehicle, a machine tool, a school, a student to be bused to a school in an urban environment, a remote computer terminal, a home appliance, a planned community, a warehouse, an office building, a pump in a pipeline, a sewage treatment plant, and so on [31, 33, 37]. Francis and Goldstein [34] provide a fairly recent bibliography of the facility location literature. One of the most complete classifications of these problems is provided in a book by Francis and White [35] Problem manipulation We proceed with the presentation of an efficient solution procedure for solving a multifacility location problem, which can be formulated mathematically as follows: { k inf λ i x 1 c i x 1,x i=1 k i=1 } γ i x 2 c i 2 + α x 1 x 2, (4.2.1) where we use the following notations x j =vector location of a new facility j, 1 j 2; c i =vector location of an existing facility i, 1 i k; λ i =non negative weight between the new facility x 1 and an existing facility i, 1 i k; γ i =non negative weight between the new facility x 2 and an existing facility i, 1 i k; α =non negative weight between the two new facilities x 1 and x 2.

35 4.2 Multifacility location problems 35 The discussed multifacility location problem is in fact a special instance of Problem with (m, p) := (2, 2k + 1), f i = 0, h i = δ {0}, K i = 0, where i = 1, 2, and λ j x c j 2, if j = 1,..., k; g j (x) = γ j x c j 2, if j = k + 1,..., 2k; α x, if j = 2k + 1, L j1 = Id, if j = 1,..., k; 0, if j = k + 1,..., 2k; Id, if j = 2k + 1, and L j2 = 0, if j = 1,..., k; Id, if j = k + 1,..., 2k; Id, if j = 2k + 1. Algorithm Hence, the iterative scheme in Algorithm reads For i = 1, 2 y ( 1,i,n = x 1,i,n γ n x2,i,n + 2k+1 ) j=1 L jiv 1,j,n y 2,i,n = x 2,i,n + γ n x 1,i,n For j = 1,... 2k + 1 ( ) w 1,j,n = v 1,j,n γ n v2,j,n L j1 x 1,1,n L j2 x 1,2,n w 2,j,n = v 2,j,n + γ n v 1,j,n n 0 r 2,j,n = prox γngj w 2,j,n For i = 1, 2 x ( 1,i,n+1 = x 1,i,n γ 2k+1 ) n j=1 L jiw 1,j,n x 2,i,n+1 = x 2,i,n y 2,i,n + γ n y 1,i,n For j = 1,..., 2k + 1 v ( ) 1,j,n+1 = v 1,i,n γ n r2,j,n L j1 y 1,1,n L j2 y 1,2,n ( ) v 2,j,n+1 = γ n w1,j,n v 1,j,n + r2,j,n. The algorithm implemented in the MatLab programming language works with the parameter β = 2 k We choose ε to be as small as possible from the interval ]0, 1/(β + 1)[ and the value of the parameter γ equals (1 ε)/β, for an optimal

36 36 Chapter 4. Numerical experiments solution Computational experience As an illustration of Algorithm 4.2.1, consider a location problem involving two new facilities and five existing facilities. This example is also used for illustrating the use of the hyperboloid approximation procedure presented in [31]. Since distances between facilities are Euclidean, the facilities might be machines connected by straight-line conveyors, military base connected by air travel, or industrial plants and warehouses where strait line travel can be reasonably approximated. For the example, assume the existing facilities are located at the coordinate points: P 1 = (0, 0); P 2 = (2, 4); P 3 = (6, 2); P 4 = (6, 10) and P 5 = (8, 8). Also, let λ i = (4, 2, 3, 0, 0), γ i = (0, 2, 1, 3, 2), i = 1,..., 5 and α = 2. With a starting location of x 0 1 = (0, 0) and x 0 2 = (0, 0), the result of the algorithm is illustrated in Figure 4.5. Most current techniques used for solving location problems, including Algorithm 4.2.1, define the stopping criterion based on successive changes in the value of the objective function, which is more efficient than the one based on successive changes in the location of the new facilities (convergent solution). The projection technique used in the proposed method takes full advantage of the structure of the location problem and can easily be extended to accommodate problems involving other item movements as well.

4.2 Multifacility location problems 37 (a) Starting location (b) 2 iterations (c) 5 iterations (d) 10 iterations (e) 25 iterations (f) 50 iterations Figure 4.

37 4.2 Multifacility location problems 37 (a) Starting location (b) 2 iterations (c) 5 iterations (d) 10 iterations (e) 25 iterations (f) 50 iterations Figure 4.5: A multifacility location problem involving two new facilities and five existing facilities. The existing facilities are located at the coordinate points P 1 = (0, 0), P 2 = (2, 4), P 3 = (6, 2), P 4 = (6, 10) and P 5 = (8, 8). The non negative weight α between the two new facilities equals 2 and the weights between a new facility and an existing facility equals λ i = (4, 2, 3, 0, 0) and γ i = (0, 2, 1, 3, 2), i = 1,..., 5.

38 38 Chapter 4. Numerical experiments 4.3 Average consensus for colored networks The third numerical experiment that we consider concerns the problem of average consensus on colored networks. Given a network, where each node possesses a measurement in form of a real number, the average consensus problem consists in calculating the average of these measurements in a recursive and distributed way, allowing the nodes to communicate information along only the available edges in the network Proposed algorithm Consider a connected network G = (V, E), where V represents the set of nodes and E represents the set of edges. Each edge is uniquely represented as a pair of nodes (i, j), where i < j. The nodes i and j can exchange their values if they can communicate directly, in other words, if (i, j) E. The set of neighbors of node k is represented with N k and its degree with D k = N k. We assume that each node possesses a measurement in form of a real number, also called color, and that no neighboring nodes have the same color. Let C denote the number of colors the network is colored with and C i the set of the nodes that have the color i, i = 1,..., C. Without affecting the generality of the problem we also assume that the first C 1 nodes are in the set C 1, the next C 2 nodes are in the set C 2, etc. Furthermore, we assume that a node coloring scheme is available. For more details concerning the mathematical modeling of the average consensus problem on colored networks we refer the reader to [46, 47]. Let P and E denote the number of nodes and edges in the network, respectively, hence, C i=1 C i = P. Denoting by θ k the measurement assigned to node k, k = 1,..., P, the problem we want to solve is { P 1 } min x R 2 (x θ k) 2. (4.3.1) k=1 The unique optimal solution to the problem (4.3.1) is θ = 1 P P k=1 θ k, namely the average of the measurements over the whole set of nodes in the network. The goal is to make this value available in each node in a distributed and recursive way. To this end,

39 4.3 Average consensus for colored networks 39 we replicate copies of x throughout the entire network, more precisely, for k = 1,..., P, node k will hold the k-th copy, denoted by x k, which will be updated iteratively during the algorithm. At the end we have to guarantee that all the copies are equal and we express this constraint by requiring that x i = x j for each (i, j) E. This gives rise to the following optimization problem { P 1 } min x=(x 1,...,x P ) R P 2 (x k θ k ) 2. (4.3.2) x i =x j, {i,j} E k=1 Let A R P E be the node-arc incidence matrix of the network, which is the matrix having each column associated to an edge in the following manner: the column associated to the edge (i, j) E has 1 at the i-th entry and 1 at the j-th entry, the remaining entries being equal to zero. Consequently, constraints in (4.3.2) can be written with the help of the transpose of the node-arc incidence matrix as A T x = 0. Taking into consideration the ordering of the nodes and the coloring scheme, we can write A T x = A T 1 x A T C x C, where x i R C i, i = 1,..., C, collects the copies of the nodes in C i, i.e. x = (x 1,..., x C1,..., x P CC +1,..., x P ). }{{}}{{} x 1 x C Hence, the optimization problem (4.3.2) becomes min x=(x 1,...,x C ) A T 1 x A T C x C=0 { C i=1 } f i (x i ), (4.3.3) where for i = 1,..., C, the function f i : R C i R is defined as f i (x i ) = l C i 1 2 (x l θ l ) 2. One can easily observe that problem (4.3.3) is a particular instance of the optimization problem (3.3.1), when taking m = C, p = 1, h i = δ {0}, K i = Id, L 1i = A T i R E C i, i = 1,..., C, and g 1 = δ {0}. Algorithm Using that h i = 0, i = 1,..., C, and Prox γg (x) = 0 for all γ > 0 and x R E, the iterative scheme in Algorithm reads, after some algebraic manipulations, in the

40 40 Chapter 4. Numerical experiments error-free case: n 0 For i = 1,..., C ( ) y 1,i,n = x 1,i,n γ n x2,i,n + A i v 1,1,n y 2,i,n = x 2,i,n + γ n x 1,i,n p 2,i,n = Prox γnf i y 2,i,n ( w 1,1,n = v 1,1,n γ n v2,1,n C ) i=1 AT i x 1,i,n For i = 1,..., C ( ) q 1,i,n = y 1,i,n γ n p2,i,n + A i w 1,1,n q 2,i,n = p 2,i,n + γ n y 1,i,n x 1,i,n+1 = x 1,i,n y 1,i,n + q 1,i,n x 2,i,n+1 = x 2,i,n y 2,i,n + q 2,i,n v 1,1,n+1 = v 1,1,n + γ n C i=1 AT i y 1,i,n v 2,1,n+1 = γ 2 n( C i=1 AT i x 1,i,n v 2,1,n ). Let us notice that for i = 1,..., C and γ > 0 it holds Prox γf i (x i ) = (1+γ) 1 (x i γθ i ), where θ i is the vector in R C i whose components are θ l with l C i Performance evaluation In order to compare the performances of our method with other existing algorithms in literature, we used the networks generated in [47] with the number of nodes ranging between 10 and The measurement θ k associated to each node was generated randomly and independently from a normal distribution with mean 10 and standard deviation 100. We worked with networks with 10, 50, 100, 200, 500, 700 and 1000 nodes and measured the performance of our algorithm from the point of view of the number of communication steps, which actually coincides with the number of iterations. As stopping criterion we considered x n 1 P θ 10 4, P θ where 1 P denotes the vector in R P having all entries equal to 1.

41 4.3 Average consensus for colored networks Watts Strogatz Network, (n,p)=(2, 0.8) Communication steps ALG 10 D ADMM Zhu Ols Number of nodes Figure 4.6: The chart shows the communication steps needed by the four algorithms for a Watts-Strogatz network with different number of nodes. ALG stands for the primal-dual algorithm proposed in this work. Geometric Network, d= ALG D ADMM Zhu Ols 800 Communication steps Number of nodes Figure 4.7: The chart shows the communication steps needed by the four algorithms for a Geometric network with different number of nodes. ALG stands for the primal-dual algorithm proposed in the thesis.

42 42 Chapter 4. Numerical experiments Networks with 10 nodes ALG D ADMM Zhu Ols 140 Communication steps Network number Figure 4.8: Comparison of the four algorithms over six networks with 10 nodes. Here, ALG stands for the proposed primal-dual algorithm. We present in Figure 4.6 and Figure 4.7 the communication steps needed when dealing with the Watts-Strogatz network with parameters (2,0.8) and with the Geometric network with a distance parameter 0.2. The Watts-Strogatz network is created from a lattice where every node is connected to 2 nodes, then the links are rewired with a probability of 0.8, while the Geometric network works with nodes in a [0, 1] 2 square and connects the nodes whose Euclidean distance is less than the given parameter 0.2. As shown in Figure 4.6 and in Figure 4.7, our algorithm performed comparable to D- AMM, presented in [47], but it performed better then the algorithms presented in [48] and [70]. In order to observe the behavior of our algorithm on different networks, we tested it on 6 models, shown in Table 4.1, with a different number of nodes. The used models and the role of its parameters are explained in Table 4.2. Observing the needed communication steps, we can conclude that our algorithm is communication-efficient and it performs better than or similarly to the algorithms in [47], [48] and [70] (as exemplified in Figure 4.8).

43 4.4 Support-vector machines classification 43 Number Model Parameters 1 Erdős-Rényi Watts-Strogatz (2, 0.8) 3 Watts-Strogatz (4, 0.6) 4 Barabasi-Albert 5 Geometric Lattice Table 4.1: Network parameters. Name Parameters Description Erdős-Rényi [30] p Every pair of nodes {i,j} is connected or not with probability p. Watts-Strogatz [62] (n, p) First, it creates a lattice where every node is connected to n nodes; then, it rewires every link with probability p. If link {i,j} is to be rewired, it removes the link and connects node i or node j (chosen with equal probability) to another node in the network, chosen uniformly. Barabasi-Albert [3] It starts with one node. At each step, one node is added to the network by connecting it to two existing nodes: the probability to connect it to node k is proportional to D k. Geometric [50] d It drops P points, corresponding to the nodes of the network, randomly in a [0, 1] 2 square; then, it connects node whose Euclidean distance is less than d. Lattice Creates a lattice of dimensions m n; m and n are chosen to make the lattice as square as possible. Table 4.2: Network models. 4.4 Support-vector machines classification The fourth numerical experiment we present in this section addresses the problem of classifying images via support vector machines. Having a set training data a i R n, i = 1,..., k, belonging to one of two given classes, denoted by -1 and +1, the aim is to construct by it a decision function given in the form of a separating hyperplane which should assign every new data to one of the two classes with a low misclassification rate.

44 44 Chapter 4. Numerical experiments The soft-margin hyperplane We construct the matrix A R k n such that each row corresponds to a data point a i, i = 1,..., k and a vector d R k such that for i = 1,..., k its i-th entry is equal to 1, if a i belongs to the class -1 and it is equal to +1, otherwise. Consider the case where the training data cannot be separated without error. In this case one may want to separate the training set with a minimal number of errors. In order to cover the situation when the separation cannot be done exactly, we consider non-negative slack variables ξ i 0, i = 1,..., k, thus the goal will be to find (s, r, ξ) R n R R k + as optimal solution of the following optimization problem (also called soft-margin support vector machines problem) min { s 2 + C ξ 2}, (4.4.1) (s,r,ξ) R n R R k + D(As+1 k r)+ξ 1 k where 1 k denotes the vector in R k having all entries equal to 1, the inequality z 1 k for z R k means z i 1, i = 1,...k, D = Diag(d) is the diagonal matrix having the vector d as main diagonal and C is a trade-off parameter. Each new data a R n will by assigned to one of the two classes by means of the resulting decision function z(a) = s T a+r, namely, a will be assigned to the class -1, if z(a) < 0, and to the class +1, otherwise. For more theoretical insights in support vector machines we refer the reader to [28, 32]. The soft-margin support vector machines problem (4.4.1) can be written as a special instance of the optimization problem (3.3.1), by taking m = 3, p = 1, f 1 ( ) = 2, f 2 = 0, f 3 ( ) = C 2 + δ R k + ( ), h i = δ {0}, K i = Id, i = 1, 2, 3, g 1 = δ {z R k :z 1 k }, L 11 = DA, L 12 = D1 k and L 13 = Id.

45 4.4 Support-vector machines classification 45 Algorithm Thus, Algorithm gives rise in the error-free case to the following iterative scheme: n 0 For i = 1, 2, 3 ( ) y 1,i,n = x 1,i,n γ n x2,i,n + L T 1iv 1,1,n y 2,i,n = x 2,i,n + γ n x 1,i,n p 2,i,n = Prox γnf y i 2,i,n ( w 1,1,n = v 1,1,n γ n v2,1,n 3 i=1 L ) 1ix 1,i,n w 2,1,n = v 2,1,n + γ n v 1,1,n r 2,1,n = Prox γng1 w 2,1,n For i = 1, 2, 3 ( ) q 1,i,n = y 1,i,n γ n p2,i,n + L T 1iw 1,1,n q 2,i,n = p 2,i,n + γ n y 1,i,n x 1,i,n+1 = x 1,i,n y 1,i,n + q 1,i,n x 2,i,n+1 = x 2,i,n y 2,i,n + q 2,i,n s 1,1,n = w 1,1,n γ n (r 2,1,n 3 i=1 L 1ip 1,i,n ) s 2,1,n = r 2,1,n + γ n w 1,1,n v 1,1,n+1 = v 1,1,n w 1,1,n + s 1,1,n v 2,1,n+1 = v 2,1,n w 2,1,n + s 2,1,n We would also like to notice that for the proximal points needed in the algorithm one has for γ > 0 and (s, r, ξ, z) R n R R k R k the following exact formulae: Prox γf 1 (s) = (2 + γ) 1 2s, Prox γf 2 (r) = 0, Prox γf 3 (ξ) = ξ γp R k + ( (2C + γ) 1 ξ ) and Prox γg1 (z) = P {x R k :x 1 k }(z) Experiments with digit recognition We made use of a data set of training images and 2041 test images of size from the website The problem

Figure 4.9). We evaluated the quality of the decision function on a test data set by computing the percentage of misclassified images.

46 46 Chapter 4. Numerical experiments consisted in determining a decision function based on a pool of handwritten digits showing either the number two or the number nine, labeled by 1 and +1, respectively (see Figure 4.9). We evaluated the quality of the decision function on a test data set by computing the percentage of misclassified images. Notice that we use only a half of the available images from the training data set, in order to reduce the computational effort. (a) A sample of data for number 2 (b) A sample of data for number 9 Figure 4.9: A sample of images belonging to the classes -1 and +1, respectively. With respect to the considered data set, we denote by D= {(X i, Y i ), i = 1,..., 6000} R 784 {+1, 1} the set of available training data consisting of 3000 images in the class 1 and 3000 images in the class +1. A sample from each class of images is shown in Figure 4.9. The images have been vectorized and normalized by dividing each of them by the quantity ( i=1 X i 2) 1 2. Number of iterations Training error Test error Table 4.3: Misclassification rate in percentage for different numbers of iterations for both the training data and the test data. We stopped the primal-dual algorithm after different numbers of iterations and evaluated the performances of the resulting decision functions. In Table 4.3we present the misclassification rate in percentage for the training and for the test data (the error for the training data is less than the one for the test data) and observe that the quality of the classification increases with the number of iterations. However, even for a low number of iterations the misclassification rate outperforms the ones reported in the

47 4.4 Support-vector machines classification 47 literature dealing with numerical methods for support vector classification. Let us also mention that the numerical results are given for the case C = 1. We tested also other choices for C, however we did not observe great impact on the results.

48 Chapter 5 Interdisciplinary application of our algorithm During my studies, I collaborated with a team of researchers in biology, studying the biodiversity of picoalgae in Romanian salt lakes. We investigated the phytoplankton communities in several hypersaline lakes in the Transylvanian basin, with the latest microscopy and molecular biological techniques [38 40]. The aim was to adapt our algorithm for image processing and pattern recognition for microscopic images and electrophoretograms. The developing process included the work with epifluorescence microscopic images of picoalgae in order to count and to distinguish them from the background noise. 5.1 Optimization in biology Most of the molecular techniques lead to the method of electrophoresis in which DNA or proteins can be separated in a gel using electricity. In our case DNA fragments from different organisms collected from the saline lakes were migrated, using agarose gel electrophoresis, in order to separate the DNA fragments. Agarose is a polysaccharide polymer material, generally extracted from seaweed and is frequently used in molecular biology, biochemistry and clinical chemistry for the separation of large molecules, especially a mixed population of DNA, in a matrix of agarose. After the patterns of the migrated DNA fragments are obtained (see Figure

5.2 Pattern recognition techniques 49 Figure 5.1: Denaturated gradient gel electrophoresis pattern of DNA samples from the salt lakes of the Transylvanian-Basin.

49 5.2 Pattern recognition techniques 49 Figure 5.1: Denaturated gradient gel electrophoresis pattern of DNA samples from the salt lakes of the Transylvanian-Basin. Columns represent different water samples and the horizontal bands within the columns represent different species. 5.1), they are analyzed and the different samples are compared to establish differences between the salt lakes. Also, the patterns can be used to establish differences within a lake between different water depths [38]. Molecular processing such as cloning and sequencing can be an expensive and time consuming method. Our aim is to optimize our algorithm to find specific patterns, in order to make the process cheaper and faster. Patterns can be distinguished with a method called pattern recognition. This is actually a classification, which attempts to assign each input value to one of a given set of classes. The procedure is a non-probabilistic binary linear classifier, very similar to the classification via support vector machines (see 4.4). In our case, epifluorescence microscopic images were taken in order to count the cyanobacterial cells from the water samples, as shown in Figure Pattern recognition techniques Enumerating the algae content of a water sample typically involves manual counts through a microscope. Recent advanced pattern recognition software enable

significance than conventional microscopy. The availability of this information can yield significant cost savings by allowing system operators to proactively treat water supplies.

50 50 Chapter 5. Interdisciplinary application of our algorithm these counts to be automated. The new techniques have the potential to advance the sensitivity of monitoring systems by providing much more information on the state of the water supply in near-real time, with more statistical significance than conventional microscopy. The availability of this information can yield significant cost savings by allowing system operators to proactively treat water supplies. In general, different species of picoalgae can seem quite clearly different to the human eye, but they are not as easily distinguished mathematically. Automated pattern recognition in images has been a complex area of research within computer science for many years, because many distinctions we make regularly through our eyes and brain remain very difficult to accomplish computationally. Our work in this area has a great progress with good results. In the future, we propose to develop our optimization technique for pattern recognition, in order to eliminate the unclassified particles and to obtain better classification results of microscopic images. Figure 5.2: Epifluorescence microscopic images (magnification = 1000) of pikocyanobacteria. The first image shows the pikocyanobacteria with blue-violet excitation and in the second image they are shown with green excitation.

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization

Convergence analysis for a primal-dual monotone + skew splitting algorithm with applications to total variation minimization Radu Ioan Boţ Christopher Hendrich November 7, 202 Abstract. In this paper we