Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Size: px

Start display at page:

Download "Contraction Methods for Convex Optimization and monotone variational inequalities No.12"

Emmeline Hunt
5 years ago
Views:

1 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics Nanjing University hebma@nju.edu.cn

2 XII Structured constrained convex optimization We consider the following structured constrained convex optimization problem min {θ 1 (x) + θ 2 (y) Ax + By = b, x X, y Y} (1.1) where θ 1 (x) : R n 1 R, θ 2 (y) : R n 2 R are convex functions (but not necessary smooth), A R m n 1, B R m n 2 and b R m, X R n 1, Y R n 2 are given closed convex sets. We let n = n 1 + n 2. The task of solving the problem (1.1) is to find an (x, y, λ ) Ω, such that θ 1 (x) θ 1 (x ) + (x x ) T ( A T λ ) 0, θ 2 (y) θ 2 (y ) + (y y ) T ( B T λ ) 0, (x, y, λ) Ω, (1.2) (λ λ ) T (Ax + By b) 0, where Ω = X Y R m.

3 XII - 3 By denoting u = x y, w = x y λ, F (w) = A T λ B T λ Ax + By b and θ(u) = θ 1 (x) + θ 2 (y), the first order optimal condition (1.2) can be written in a compact form such as w Ω, θ(u) θ(u )+(w w ) T F (w ) 0, w Ω. (1.3) Note that the mapping F is monotone. We use Ω to denote the solution set of the variational inequality (1.3). For convenience we use the notations v = y λ and V = {(y, λ ) (x, y, λ ) Ω }.

4 XII - 4 Alternating Direction Method is a simple but powerful algorithm that is well suited to distributed convex optimization [1]. This approach also has the benefit that one algorithm could be flexible enough to solve many problems. Applied ADM to the structured COP: (y k, λ k ) (y k+1, λ k+1 ) First, for given (y k, λ k ), x k+1 is the solution of the following problem { } x k+1 = Argmin θ 1 (x) + β 2 Ax + Byk b 1 β λk 2 x X (1.4a) Use λ k and the obtained x k+1, y k+1 is the solution of the following problem y k+1 = Argmin { } θ 2 (y)+ β 2 Axk+1 +By b 1 β λk 2 y Y (1.4b) λ k+1 = λ k β(ax k+1 + By k+1 b). (1.4c)

5 XII - 5 In some structured convex optimization (1.1), B is a scalar matrix. However, the solution of the subproblem (1.4a) does not have the closed form solution because of the general structure of the matrix A. In this case, we linearize the quadratic term of (1.4a) β 2 Ax + Byk b 1 β λk 2 at x k and add a proximal term r 2 x a 2 to the objective function. In other words, instead of (1.4a), we solve the following x subproblem: min { θ 1 (x) + βx T A T (Ax k + By k b 1 β λk ) + r 2 x xk 2 x X }. Based on linearizing the quadratic term of (1.4a), in this lecture, we construct the linearized alternating direction method. We still assume that the solution of the problem has a closed form. min { θ 1 (x) + r 2 x a 2 x X } (1.5)

6 XII Linearized Alternating Direction Method In the Linearized ADM, x is not an intermediate variable. The k-th iteration of the Linearized ADM is from (x k, y k, λ k ) to (x k+1, y k+1, λ k+1 ). 2.1 Linearized ADM 1. First, for given (x k, y k, λ k ), x k+1 is the solution of the following problem ( { ) x k+1 θ 1 (x) + βx T A T (Ax k + By k b 1 β = Argmin λk ) + r x 2 xk 2 x X }. (2.1a) 2. Then, use λ k and the obtained x k+1, y k+1 is the solution of the following problem } y k+1 = Argmin {θ 2 (y)+ β2 Axk+1 +By b 1β λk 2 y Y. (2.1b) 3. Finally, λ k+1 = λ k β(ax k+1 + By k+1 b). (2.1c)

7 XII - 7 Requirements on parameters β, r For given β > 0, choose r such that ri n βa T A 0. (2.2) Analysis of the optimal conditions of subproblems in (2.1) Note that x k+1, the solution of (2.1a), satisfies { x k+1 X, θ 1 (x) θ 1 (x k+1 ) + (x x k+1 ) T A T λ k +βa T ( Ax k + By k b ) } + r(x k+1 x k ) 0, x X. (2.3a) Similarly, the solution of (2.1b) y k+1 satisfies y k+1 Y, θ 2 (y) θ 2 (y k+1 ) + (y y k+1 ) T { B T λ k +βb T ( Ax k+1 + By k+1 b )} 0, y Y. (2.3b)

8 XII - 8 Substituting λ k+1 (see (2.1c)) in (2.3) (eliminating λ k ), we get x k+1 X, θ 1 (x) θ 1 (x k+1 ) + (x x k+1 ) T { A T λ k+1 + βa T B(y k y k+1 ) +(ri n1 βa T A)(x k+1 x k ) } 0, x X, (2.4a) and y k+1 Y, θ 2 (y) θ 2 (y k+1 )+ (y y k+1 ) T { B T λ k+1 } 0, y Y. (2.4b) For analysis convenience, we rewrite (2.4) as the following equivalent form: T θ(u) θ(u k+1 ) + x xk+1 AT λ k+1 +β AT B(y k y k+1 ) y y k+1 B T λ k+1 B T + ri n 1 βa T A 0 xk+1 x k 0, (x, y) X Y. 0 βb T B y k+1 y k

9 XII - 9 Combining the last inequality with (2.1c), we have T x x k+1 A T λ k+1 A T θ(u) θ(u k+1 ) + y y k+1 B T λ k+1 + β B T λ λ k+1 Ax k+1 + By k+1 b 0 ri n1 βa T A 0 0 x k+1 x k + 0 βb T B 0 y k+1 y k 0, 0 0 λ k+1 λ k 1 β I m for all (x, y, λ) X Y R m. The above inequality can be rewritten as T x x k+1 A T θ(u) θ(u k+1 )+(w w k+1 ) T F (w k+1 ) + β y y k+1 B T λ λ k+1 0 T x x k+1 ri n1 βa T A 0 0 x k x k+1 y y k+1 0 βb T B 0 y k y k+1 λ λ k λ k λ k+1 1 β I m B(y k y k+1 ) B(y k y k+1 ), w Ω.(2.5)

10 XII Convergence of Linearized ADM Based on the analysis in the last section, we have the following lemma. Lemma 2.1 Let w k+1 = (x k+1, y k+1, λ k+1 ) Ω be generated by (2.1) from the given w k = (x k, y k, λ k ). Then, we have (w k+1 w ) T G(w k w k+1 ) (w k+1 w ) T η(y k, y k+1 ), w Ω, (2.6) where η(y k, y k+1 ) = β A T B T 0 B(yk y k+1 ), (2.7) and G = ri n1 βa T A βb T B β I m. (2.8)

11 XII - 11 Proof. Setting w = w in (2.5), and using G and η(y k, y k+1 ), we get (w k+1 w ) T G(w k w k+1 ) (w k+1 w ) T η(y k, y k+1 )+θ(u k+1 ) θ(u ) + (w k+1 w ) T F (w k+1 ). Since F is monotone, it follows that θ(u k+1 ) θ(u ) + (w k+1 w ) T F (w k+1 ) θ(u k+1 ) θ(u ) + (w k+1 w ) T F (w ) 0. The last inequality is due to w k+1 Ω and w Ω is a solution of (see (1.3)). The lemma is proved. By using η(y k, y k+1 ) (see (2.7)), Ax + By = b and (2.1c), we have (w k+1 w ) T η(y k, y k+1 ) = (B(y k y k+1 )) T β{(ax k+1 + By k+1 ) (Ax + By )} = (B(y k y k+1 )) T β(ax k+1 + By k+1 b) = (λ k λ k+1 ) T B(y k y k+1 ). (2.9)

12 XII - 12 Substituting it in (2.6), we obtain (w k+1 w ) T G(w k w k+1 ) (λ k λ k+1 ) T B(y k y k+1 ), w Ω. (2.10) Lemma 2.2 Let w k+1 = (x k+1, y k+1, λ k+1 ) Ω be generated by (2.1) from the given w k = (x k, y k, λ k ). Then, we have (λ k λ k+1 ) T B(y k y k+1 ) 0. (2.11) Proof. Since (2.4b) is true for the k-th iteration and the previous iteration, we have and θ 2 (y) θ 2 (y k+1 ) + (y y k+1 ) T { B T λ k+1 } 0, y Y, (2.12) θ 2 (y) θ 2 (y k ) + (y y k ) T { B T λ k } 0, y Y, (2.13)

13 XII - 13 Setting y = y k in (2.12) and y = y k+1 in (2.13), respectively, and then adding the two resulting inequalities, we get (λ k λ k+1 ) T B(y k y k+1 ) 0. The assertion of this lemma is proved. Under the assumption (2.2), the matrix G is positive semi-definite. In addition, if B is a full column rank matrix, G is positive definite. Even if in the positive semi-definite case, we also use w w G to denote w w G = (w w) T G(w w). If B is a full column rank matrix, G is positive definite. Lemma 2.3 Let w k+1 = (x k+1, y k+1, λ k+1 ) Ω be generated by (2.1) from the given w k = (x k, y k, λ k ). Then, we have (w k+1 w ) T G(w k w k+1 ) 0, w Ω, (2.14)

14 XII - 14 and consequently w k+1 w 2 G w k w 2 G w k w k+1 2 G, w Ω. (2.15) Proof. Assertion (2.14) follows from (2.10) and (2.11) directly. By using (2.14), we have w k w 2 G = (w k+1 w ) + (w k w k+1 2 G = w k+1 w 2 G + 2(w k+1 w ) T G(w k w k+1 ) + w k w k+1 2 G w k+1 w 2 G + w k w k+1 2 G, and thus (2.15) is proved. The inequality (2.15) is essential for the convergence of the alternating direction method. Note that G is positive semi-definite. w k w k+1 2 G = 0 G(w k w k+1 ) = 0.

15 XII - 15 The inequality (2.15) can be written as x k+1 x 2 (ri βa T A) + β B(yk+1 y ) β λk+1 λ 2 x k x 2 (ri βa T A) + β B(yk y ) β λk λ 2 ( x k x k+1 2 (ri βa T A) + β B(yk y k+1 ) β λk λ k+1 2). It leads to that lim k xk = x, lim k Byk = By and lim k λk = λ. The linearizing ADM is also known as the split inexact Uzawa method in image processing literature [9, 10].

16 XII Self-Adaptive ADM-based Contraction Method In the last section, we get x k+1 by solving the following x-subproblem: min { θ 1 (x) + βx T A T (Ax k + By k b 1 β λk ) + r 2 x xk 2 x X } and it required that the parameter r to satisfy ri n βa T A 0 r > βλ max (A T A). In some practical problem, a conservative estimation of λ max (A T A) will leads a slow convergence. In this section, based on the linearized ADM, we consider the self-adaptive contraction methods. Each iteration of the self-adaptive contraction methods consists of two steps prediction step and correction step. From the given w k, the prediction step produces a test vector w k and the correction step offers the new iterate w k+1.

17 XII Prediction 1. First, for given (x k, y k, λ k ), x k is the solution of the following problem ( { ) x k θ 1 (x) + βx T A T (Ax k + By k b 1 = Argmin β λk ) + r 2 x xk 2 x X } 2. Then, use λ k and the obtained x k, ỹ k is the solution of the problem { } ỹ k θ = 2 (y) (λ k ) T (A x k + By b) Argmin + β 2 A xk + By b 2 y Y (3.1a) (3.1b) 3. Finally, λ k = λ k β(a x k + Bỹ k b). (3.1c) The subproblems in (3.1) are similar as in (2.1). Instead of (x k+1, y k+1, λ k+1 ) in (2.1), we denote the output of (3.1) by ( x k, ỹ k, λ k ).

18 XII - 18 Requirements on parameters β, r For given β > 0, choose r such that β A T A(x k x k ) νr x k x k, ν (0, 1). (3.2) If ri βa T A 0, then (3.2) is satisfied. Thus, (2.2) is sufficient for (3.2). Analysis of the optimal conditions of subproblems in (3.1) Because we get w k in (3.1) via substituting w k+1 in (2.1) by w k. Therefore, similar as (2.5), we get x x T k A T λk A T θ(u) θ(ũ k ) + y ỹ k B T λk λ λ + β B T k A x k + Bỹ k b 0 ri n1 βa T A 0 0 x k x k + 0 βb T B 0 ỹ k y k 0, w Ω. 0 0 λ k λ k 1 β I m B(yk ỹ k )

19 XII - 19 The last variational inequality can be rewritten as w k Ω, θ(u) θ(ũ k ) + (w w k ) T { F ( w k ) + η(y k, ỹ k ) + HM( w k w k ) } 0, w Ω, (3.3) where η(y k, ỹ k ) = β H = A T B T 0 ri n βb T B 0 B(yk ỹ k ), (3.4) β I m, (3.5)

20 XII - 20 and I n1 β r AT A 0 0 M = 0 I n I m. (3.6) Based on the above analysis, we have the following lemma. Lemma 3.1 Let w k = ( x k, ỹ k, λ k ) Ω be generated by (3.1) from the given w k = (x k, y k, λ k ). Then, we have ( w k w ) T HM(w k w k ) ( w k w ) T η(y k, ỹ k ), w Ω. (3.7) Proof. Setting w = w in (3.3), we obtain ( w k w ) T HM(w k w k ) ( w k w ) T η(y k, ỹ k ) +θ(ũ k ) θ(u ) + ( w k w ) T F ( w k ). (3.8)

21 XII - 21 Since F is monotone and w k Ω, it follows that θ(ũ k ) θ(u ) + ( w k w ) T F ( w k ) θ(ũ k ) θ(u ) + ( w k w ) T F (w ) 0. The last inequality is due to w k Ω and w Ω is a solution of (1.3). Substituting it in the right hand side of (3.8), the lemma is proved. In addition, because Ax + By = b and β(a x k + Bỹ k b) = λ k λ k, we have ( w k w ) T η(y k, ỹ k ) = (B(y k ỹ k )) T β{(a x k + Bỹ k ) (Ax + By )} = (λ k λ k ) T B(y k ỹ k ). (3.9) Lemma 3.2 Let w k = ( x k, ỹ k, λ k ) Ω be generated by (3.1) from the given w k = (x k, y k, λ k ). Then, we have (w k w ) T HM(w k w k ) ϕ(w k, w k ), w Ω, (3.10)

22 XII - 22 where ϕ(w k, w k ) = (w k w k ) T HM(w k w k )+(λ k λ k ) T B(y k ỹ k ). (3.11) Proof. From (3.7) and (3.9) we have ( w k w ) T HM(w k w k ) (λ k λ k ) T B(y k ỹ k ). Assertion (3.10) follows from the last inequality and the definition of ϕ(w k, w k ) directly. 3.2 The Primary Contraction Methods The primary contraction methods use M(w k w k ) as search direction and the unit step length. In other words, the new iterate is given by w k+1 = w k M(w k w k ). (3.12)

23 XII - 23 According to (3.6), it can be written as x k+1 y k+1 λ k+1 = x k + β r AT A(x k x k ) ỹ k. (3.13) λ k In the primary contraction method, only the x-part of the corrector is different from the predictor. In the method of Section 2, we need r β A T A. By using the method in this section, we need only a r to satisfy the condition (3.2). In practical computation, we try to use the average of the eigenvalues of βa T A. Using (3.10), we have w k w 2 H w k+1 w 2 H = w k w 2 H (w k w ) M(w k w k ) 2 H = 2(w k w ) T HM(w k w k ) M(w k w k ) 2 H 2ϕ(w k, w k ) M(w k w k ) 2 H. (3.14)

24 XII - 24 Because (ỹ k, λ k ) = (y k+1, λ k+1 ), the inequality (2.11) is still holds and thus (λ k λ k ) T B(y k ỹ k ) 0. Therefore, it follows from (3.14), (3.11) and the last inequality that w k w 2 H w k+1 w 2 H 2(w k w k ) T HM(w k w k ) M(w k w k ) 2 H. (3.15) Lemma 3.3 Under the condition (3.2), we have 2(w k w k ) T HM(w k w k ) M(w k w k ) 2 H (1 ν 2 )r x k x k 2 + β B(y k ỹ k ) β λk λ k 2.(3.16) Proof. First, we have 2(w k w k ) T HM(w k w k ) M(w k w k ) 2 H = (w k w k ) T ( M T H + HM M T HM ) (w k w k ).

25 XII - 25 By using the structure of the matrices H and M (see (3.5) and (3.6)), we obtain M T H + HM M T HM = H (I M T )H(I M) ri n 0 0 r( β r = 0 βb T B 0 AT A) β I m Therefore, 2(w k w k ) T HM(w k w k ) M(w k w k ) 2 H = w k w k 2 H r ( β 2 r 2 ) A T A(x k x k ) 2. (3.17) Under the condition (3.2), we have ( β 2 r 2 ) A T A(x k x k ) 2 ν 2 x k x k 2. Substituting it in (3.17), the assertion of this lemma is proved. Theorem 3.1 Let w k = ( x k, ỹ k, λ k ) Ω be generated by (3.1) from the given

26 XII - 26 w k = (x k, y k, λ k ) and the new iterate w k+1 is given by (3.12). The sequence {w k = (x k, y k, λ k )} generated by the elementary contraction method satisfies w k+1 w 2 H w k w 2 H (1 ν 2 ) w k w k 2 H. (3.18) Proof. From (3.15) and (3.16) we obtain w k w 2 H w k+1 w 2 H (1 ν 2 )r x k x k 2 + β B(y k ỹ k ) β λk λ k 2 (1 ν 2 ) w k w k 2 H. The assertion of this theorem is proved. Theorem 3.1 is essential for the convergence of the primary contraction method.

27 XII The general contraction method The general contraction method For given w k, we use w(α) = w k αm(w k w k ) (3.19) to update the α-dependent new iterate. For any w Ω, we define ϑ(α) := w k w 2 H w(α) w 2 H (3.20) and q(α) = 2αϕ(w k, w k ) α 2 M(w k w k ) 2 H, (3.21) where ϕ(w k, w k ) is defined in (3.11). Theorem 3.2 Let w(α) be defined by (3.19). For any w Ω and α 0, we have ϑ(α) q(α), (3.22) where ϑ(α) and q(α) are defined in (3.20) and (3.21), respectively.

28 XII - 28 Proof. It follows from (3.19) and (3.20) that ϑ(α) = w k w 2 H (w k w ) αm(w k w k ) 2 H = 2α(w k w ) T HM(w k w k ) α 2 M(w k w k ) 2 H. By using (3.10) and the definition of q(α), the theorem is proved. Note that q(α) in (3.21) is a quadratic function of α and it reaches its maximum at α = In practical computation, we use ϕ(w k, w k ) M(w k w k ) 2 H. (3.23) w k+1 = w k γαkm(w k w k ), (3.24) to update the new iterate, where γ [1, 2) is a relaxation factor. By using (3.20) and (3.22), we have w k+1 w 2 H w k w 2 H q(γα k). (3.25)

29 XII - 29 Note that q(γα k) = 2γα kϕ(w k, w k ) (γα k) 2 M(w k w k ) 2 H. (3.26) Using (3.23) and (3.24), we obtain q(γα k) = γ(2 γ)(α k) 2 M(w k w k ) 2 H = 2 γ γ and consequently it follows from (3.25) that w k+1 w 2 H w k w 2 H 2 γ γ On the other hand, it follows from (3.23) and (3.26) that w k w k+1 2 H, w k w k+1 2 H, w Ω. (3.27) q(γα k) = γ(2 γ)α kϕ(w k, w k ). (3.28)

30 XII - 30 By using (3.11) and (3.16), we obtain 2ϕ(w k, w k ) M(w k w k ) 2 H = 2(w k w k ) T HM(w k w k ) M(w k w k ) 2 H +2(λ k λ k ) T B(y k ỹ k ) (1 ν 2 )r x k x k 2 + β B(y k ỹ k ) β λk λ k 2 +2(λ k λ k ) T B(y k ỹ k ) = (1 ν 2 )r x k x k 2 + β B(y k ỹ k ) + 1 β (λk λ k ) 2. Thus, we have 2ϕ(w k, w k ) > M(w k w k ) 2 H and consequently α k > 1 2.

31 XII - 31 In addition, because ϕ(w k, w k ) = (w k w k ) T HM(w k w k ) + (λ k λ k ) T B(y k ỹ k ) x k x k T ri n1 βa T A 0 0 x k x k = y k ỹ k 0 βb T B 1 2 BT y k ỹ k λ k λ k B 1 β I m λ k λ k x k x k (r x k x k β A T A(x k x k ) ) + 1 yk ỹ k T 2 λ k λ βbt B 0 yk ỹ k k 1 0 β I m λ k λ. k Under the condition (3.2), it follows from the last inequality that ϕ(w k, w k ) min { (1 ν), 1 2} w k w k 2 H 1 ν 2 w k w k 2 H. (3.29)

32 XII - 32 By using (3.25), (3.28), (3.29) and αk 1 2, we obtain the following theorem for the general contraction method. Theorem 3.3 The sequence {w k = (x k, y k, λ k )} generated by the general contraction method satisfies w k+1 w 2 H w k w 2 H γ(2 γ)(1 ν) 4 w k w k 2 H, w Ω. (3.30) The inequality (3.30) in Theorem 3.3 is essential for the convergence of the general contraction method. Both the inequalities (3.18) and (3.30) can be written as w k+1 w 2 H w k w 2 H c 0 w k w k 2 H, w Ω,

33 XII - 33 where c 0 > 0 is a constant. Therefore, we have r x k+1 x 2 + β B(y k+1 y ) β λk+1 λ 2 r x k x 2 + β B(y k y ) β λk λ 2 c 0 ( r x k x k 2 + β B(y k ỹ k ) β λk λ k 2). It leads to that and ( lim r x k x k 2 + β B(y k ỹ k ) k β λk λ k 2), lim k xk = x, lim k Byk = By and lim k λk = λ.

34 XII Applications in l 1 -norm problems An important l 1 -norm problem in the area machine learning is the l 1 regularized linear regression, also called the lasso [8]. This involves solving min τ x Ax b 2 2, (4.1) where τ > 0 is a scalar regularization parameter that is usually chosen by cross-validation. In typical applications, there are many more features than training examples, and the goal is to find a parsimonious model for the data. The problem (1.1) can be reformulated to problem min τ x y 2 2 Ax y = b (4.2) which is a form of (1.1). Applied the alternating direction method (1.4) to the problem (4.2), the x-subproblem is x k+1 = Argmin {τ x 1 + β 2 (Ax yk ) 1 β λk 2 2},

35 XII - 35 and the solution does not has closed form. Applied the linearized alternating direction method (2.1) to the problem (4.2), the x-subproblem (2.1a) is x k = Argmin{τ x 1 + r 2 x [xk + 1 r λk β r AT (Ax k y k )] 2 }. (4.3) This problem is form of (1.5) and its solution has the following closed form: x k = a P τ/r B [a], where a = x k + 1 r λk β r AT (Ax k y k ) and B τ/r = {ξ R n (τ/r)e ξ (τ/r)e}. By using the linearized alternating direction method in Section 2, for given β > 0, it needs r > βλ max (A T A). By using the self-adaptive ADM-based contraction method in Section 3, it needs r to satisfy β A T A(x k x k ) νr x k x k, ν (0, 1). Because A is a generic matrix, the above condition is satisfied even if r is much less than βλ max (A T A).

36 XII - 36 References [1] S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Foundations and Trends in Machine Learning, 3, 1-122, [2] D. Gabay, Applications of the Method of Multipliers to Variational Inequalities, in Augmented Lagrange Methods: Applications to the Solution of Boundary-valued Problems, M. Fortin and R. Glowinski, eds., North Holland, Amsterdam, The Netherlands, , [3] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinear variational problems viafinite element approximations, Computers and Mathematics with Applications, 2:17-40, [4] R. Glowinski, Numerical Methods for Nonlinear Variational Problems, Springer-Verlag, New York, Berlin, Heidelberg, Tokyo, [5] B.S. He, A class of projection and contraction methods for monotone variational inequalities, Applied Math. & Optim., 35, 69-76, [6] B.S. He, Inexact implicit methods for monotone general variational inequalities, Math. Program. Series A, 86, , [7] B.S. He, L-Z Liao, D.R. Han and H. Yang, A new inexact alternating directions method for monotone variational inequalities, Math. Progr., 92, , [8] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, , [9] X. Q. Zhang, M. Burger, X. Bresson and S. Osher, Bregmanized nonlocal regularization for deconvolution and sparse reconstruction, SIAM J. Imag. Sci., 3, , [10] X. Q. Zhang, M. Burger and S. Osher, A unified primal-dual algorithm framework based on Bregman iteration, J. Sci. Computing, 46, 20-46, 2010.

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics