About Split Proximal Algorithms for the Q-Lasso

Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S UMR 7296, France e-mail : abdellatif.moudafi@univ-amu.fr Abstract : Numerous problems in signal processing and imaging, statistical learning and data ing, or computer vision can be formulated as optimization problems which consist in imizing a sum of convex functions, not necessarily differentiable, possibly composed with linear operators. Each function is typically either a data fidelity term or a regularization term enforcing some properties on the solution, see for example [, 2] and references therein. In this note we are interested in the general form of Q-Lasso introduced in [3] which generalized the well-known Lasso of Tibshirani [4]. Q is a closed convex subset of a Euclidean m- space, for some integer m, that can be interpreted as the set of errors within given tolerance level when linear measurements are taken to cover a signal/image via the Lasso. Only the unconstrained case was discussed in [3], we discuss here some split proximal algorithms for solving the general case. It is worth mentioning that the lasso model a number of applied problems arising from machine learning and signal/image processing due to the fact it promotes the sparsity of a signal. Keywords : Q-Lasso; basic pursuit; proximal algorithm; soft-thresholding; regularization; Douglas-Rachford algorithm; forward-backward iterations. 200 Mathematics Subject Classification : 49J53; 65K0; 49M37; 90C25. Introduction The lasso of Tibshirani [4] is the imization problem x IR n 2 Ax b 2 2 +γ x, (.) Copyright c 207 by the Mathematical Association of Thailand. All rights reserved.

2 Thai J. Math. 5 (207)/ A. Moudafi where A is an m n real matrix, b IR m and γ > 0 is a tuning parameter. It is equivalent to the the basic pursuit (BP) of Chen et al. [5] x IR n x subject to Ax = b. (.2) However, due to errors of measurements, the constraint Ax = b is actually inexact; It turns out that problem (.2) is reformulated as x IR n x subject to Ax b p ε, (.3) where ε > 0 is the tolerance level of errors and p is often,2 or. It is noticed in [3] that if we let Q := B ε (b), the closed ball in IR n with center b and radius ε, then (.3) is rewritten as x IR n x subject to Ax Q. (.4) With Q a nonempty closed convex set of IR m and P Q the projection from IR m onto Q and since that the constraint is equivalent to the condition Ax P Q (Ax) = 0, this leads to the following equivalent Lagrangian formulation x IR n 2 (I P Q)Ax 2 2 +γ x, (.5) with γ > 0 a Lagrangian multiplier. A connection is also made in [3] with the so-called split feasibility problem [6] which is stated as finding x verifying x C, Ax Q, (.6) where C and Q are closed convex subsets of IR n and IR m, respectively. An equivalent imization formulation of (.6) is Its l regularization is given as x C 2 (I P Q)Ax 2 2. (.7) x C 2 (I P Q)Ax 2 2 +γ x, (.8) where γ > 0 is a regularization parameter. The main difficulty in solving (.8) stems form the fact that m,n are typically of high, hence the denoation of large-scale optimization. It is not possible to manipulate, at every iteration of an algorithm, matrices of size n n, like the Hessian of a function. So, proximal algorithms, which only exploit first-order information of the functions, are often the only viable way to solve (.8). In this note, we propose two proximal algorithms to solve the problem (.8) by full splitting; that is, at every iteration, the only operations involved are evaluations

About Split Proximal Algorithms for the Q-Lasso 3 of the gradient of the function 2 (I P Q)A( ) 2 2, the proximal mapping of γ, A, or its transpose A t. In [3], properties and iterative methods for (.5) are investigated. Remember also that many authors devoted their works to the unconstrained imization problem x H f (x)+f 2 (x) with f,f 2 are two proper, convex lower semi continuous functions defined on a Hilbert space H and f 2 differentiable with a β-lipschitz continuous gradient for some β > 0 and an effective method to solve it is the forward-backward algorithm which from an initial value x 0 generates a sequence (x k ) by the following iteration x k+ = ( λ k )x k +λ k prox γk f (x k γ k f 2 (x k )), (.9) where γ k > 0 is the algorithm step-size, 0 < λ k < is a relaxation parameter. It is well-known, see for instance [], that if (γ k ) is bounded and (λ k ) is bounded from below, then (x k ) weakly converges to a solution of x H f (x) + f 2 (x) provided that the set of solutions is nonempty. To relaxingthe assumptionon the differentiability of f 2, the Douglas-Rachford algorithm was introduce. It generates a sequence (y k ) as follows { yk+/2 = prox κf2 y k ; y k+ = y k + τ k ( proxκf (2y k+/2 y k ) y k+/2 ) (.0) where κ > 0, (τ k ) is a sequence of positive reals. It is well-known that (y k ) converges weakly to y such that prox κf2 y is a solution of the unconstrained imization problem above provided that: k IN, τ k ]0,2[ and k=0 τ k(2 τ k ) = + and the set of solutions is nonempty. In this note we are interested in (.8) which is more general and reduced to (.5) when the set of constraints is the entire space IR n. The involvement of the convex set C brings some technical difficulties which are overcome in what follows. Since the same properties for problem (.8) may be obtained by mimicking the analysis developed in [3] for (.5), we will focus our attention on the algorithmic aspect. 2 Split Proximal Algorithms In this section, we introduce several proximal iterative methods for solving Q-Lasso (.8) in its general form. For the sake of our purpose we confine ourselves to the finite dimensional setting, but our analysis is still valid in Hilbert spaces, just replace the transpose A t of A by its adjoint operator A and the convergence by the weak convergence. To begin with, recall that the proximal mapping (or the Moreau envelope) of a proper, convex and lower semicontinuous function ϕ of parameter λ > 0 is defined by prox λϕ (x) := arg v IR n{ϕ(v)+ 2 v x 2 }, x IR n,

4 Thai J. Math. 5 (207)/ A. Moudafi and that it has closed-form expression in some important cases. For example, if ϕ =, then for x IR n prox λ (x) = (prox λ (x ),prox λ (x n )), (2.) where prox λ (x k ) = sgn(x k )max k=,2, n { x k λ,0}. If ϕ(x) = 2 Ax y 2, then and if ϕ = i C, we have where Prox γϕ = (I +γa t A) A t = A t (I +γaa t ), Prox γϕ (x) = Proj C (x) := arg z C x z, i C (x) = { 0 if x C; + otherwise such function is convenient to enforce hard constraints on the solution. Observe that the imization problem (.8) can be written as x IR n 2 (I P Q)Ax 2 2 +γ x +i C (x). (2.2) Remember also that the partial differential is defined as It is easily seen that and that ϕ(x) := {u IR n ;ϕ(z) ϕ(x)+ u,z x z IR n }. 2 Ax y 2 = 2 Ax y 2 = A t (Ax y) = A t (Ax y) (x) = and i C is nothing but the normal cone to C. { sign(xi ) if x i 0; [,] if x i = 0 Very recently several split proximal algorithms were proposed to imize the problem x IR nf(x)+g(x)+i C(x), (2.3) where f, g are two proper, convex and lower semicontinuous functions and C a nonempty closed convex set. Based in the algorithms introduced in [5], we propose two solutions for solving the more general problem (2.2). Both of them are obtained by coupling the forward-backward algorithm and the Douglas-Rachford algorithm. Then we mention other solutions introduced in[7] founded on a combination of the Krasnoselski- Mann iteration and the Douglas-Rachford algorithm or an alternating algorithm. Other solutions may be considered by using primal-dual algorithms proposed in [2].

About Split Proximal Algorithms for the Q-Lasso 5 2. Insertion of a Forward-Backward Step in the Douglas- Rachford Algorithm To apply the Douglas-Rachford algorithm when g = γ et g 2 = 2 (I P Q )A( ) 2 2 +i C, weneedtodeteretheirproximalmappings. Themaindifficulty lies in the computation of the second one, namely prox κ 2 (I PQ)A( ) 2 2 +ic. As in [5], we can use a forward-backward algorithm to achieve this goal. The resulting algorithm is Algorithm :. Set γ ]0,2κ A ],λ ]0,] and κ ]0,+ [. Choose (τ k ) k IN satisfying k IN, τ k ]0,2[ and k=0 τ k(2 τ k ) = +. 2. Set k = 0,y 0 = y /2 C. 3. Set x k,0 = y k /2. 4. Pour i = 0,...,N k, a) choose γ k,n [γ,2κ A [ and λ k,i [λ,]; b) compute x k,i+ = x k,i +λ k,i ( PC ( x k,i γ k,i (κa t (I P Q )Ax k,i y k ) +γ k,i ) xk,i ). 5. Set y k+/2 = x k,nk. 6. Set y k+ = y k +τ k (prox κ (2y k+/2 y k ) y k+/2 ). 7. Increment k k + and go to 3. By a judicious choose of of N k, the convergence of the sequence (y k ) to y such that prox κ 2 (I PQ)A( ) 2 2 +ic(y) solves problem (2.2), follows directly by applying [5]-Proposition 4.. 2.2 Insertion of a Douglas-Rachford Step in the Forward- Backward Algorithm We consider f = κ +i C et f 2 = 2 (I P Q)A( ) 2 2. Since f 2 has a A 2 - Lipschitz gradient, we can apply the forward-backward algorithm. This requires however to compute prox ic+γ k which can be performed with Douglas-Rachford iterations. The resulting algorithm is Algorithm 2:. Choose γ k and λ k satisfying assumptions 0 < inf k γ k sup k γ k < 2/ A 2, 0 < λ λ k. Set τ ]0,2]. 2. Set k = 0,x 0 C.

6 Thai J. Math. 5 (207)/ A. Moudafi 3. Set x k = x k γ k A t (I P Q )Ax n. 4. Set y k,0 = 2prox γk x k x k. 5. For i = 0,,M k, a) compute b) choose τ k,i [τ,2]; y k,i+/2 = P C ( y k,i +x k); 2 c) compute y k,i+ = y k,i +τ k,i (prox γk (2y n,i+/2 y k,i ) y n,i+/2 ); d) if y k,i+ = y k,i, then goto 6. 6. Set x k+ = x k +λ k (y k,i+/2 x k ). 7. Increment k k + and go to 3. A direct application of [5]-Proposition 4.2 ensures the existence of positive integers (M k ) such that if for all k 0 M k M k, then the sequence (x k ) weakly convergences to a solution of problem (2.2). Remark 2.. Other split proximal algorithms may be designed by combining the fixed-point idea to compute the composite of a convex function with a linear operator introduced in [8] and the analysis developed for computing the proximal mapping of the sum of two convex functions developed in [7]. Primal-dual algorithms considered in [2] can also be used. Note that there are often several ways to assign the functions of (2.2) to the terms used in the generic problem investigated in [2, 8, 9]. Acknowledgements : I would like to thank my Team I&M (Image et Modéles) and the head of LSIS at AMU, the Prof. M. Ouladsine. References [] P.L. Combettes, J.-C. Pesquet, A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery, IEEE J. Selected Topics Signal Process (2007) 564-574. [2] L. Condat, A Generic Proximal Algorithm for Convex Optimization: Application to Total Variation Minimization, IEEE Signal Proc. Letters 2 (8) (204) 985-989. [3] M.A. Alghamdi, M. Ali Alghamdi, N. Shahzad, Naseer, H.-K. Xu, Properties and Iterative Methods for the Q-Lasso, Abstract & Applied Analysis (203), Article ID 250943, 8 pages. [4] R. Tibshirani, Regression Shrinkage and Selection Via the Lasso, Journal of the Royal Statistical Society (Series B) 58 () (996) 267-288.

About Split Proximal Algorithms for the Q-Lasso 7 [5] S.S. Chen, D.L. Donoho, M.A. Saunders, Atomic decomposition by basis pursuit, SIAM journal on scientific computing 20 () (998) 33-6. [6] Y. Censor, T. Elfving, A multiprojection algorithm using Bregman projections in a product space, Numer. Algorithms 8 (994) 22-239. [7] A. Moudafi, Computing the resolvent of composite operators, CUBO A Mathematical Journal 6 (3) (204) 87-96. [8] Ch.A. Micchelli, L. Shen, Y. Xu, X. Zeng, Proximity algorithms for the L /TV image denoising model, Adv Comput Math 38 (203) 40-426. [9] C. Chaux, J.-C. Pesquet, N. Pustelnik, Nested iterative algorithms for convex constrained image recovery problems, SIAM Journal on Imaging Sciences 2 (2) (2009) 730-762. (Received 8 October 205) (Accepted 5 September 206) Thai J. Math. Online @ http://thaijmath.in.cmu.ac.th