Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Size: px

Start display at page:

Download "Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18"

Clementine Benson
5 years ago
Views:

1 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization Bingsheng He Department of Mathematics Nanjing University hebma@njueducn The context of this lecture is based on the publication [3]

2 XVIII Introduction In this paper, we consider the general case of linearly constrained separable convex programming with m 3: min m i=1 θ i(x i ) m i=1 A ix i = b; (11) x i X i, i = 1,, m; where θ i : R n i R (i = 1,, m) are closed proper convex functions (not necessarily smooth); X i R n i (i = 1,, m) are closed convex sets; A i R l n i (i = 1,, m) are given matrices and b R l is a given vector Throughout, we assume that the solution set of (11) is nonempty In fact, even for the special case of (11) with m = 3, the convergence of the extended ADM is still open In the last lecture, we provided a novel approach towards the extension of ADM for the problem (11) More specifically, we show that if a new iterate is generated by correcting the output of the ADM with a Gaussian back substitution procedure, then the

3 XVIII - 3 sequence of iterates is convergent to a solution of (11) The resulting method is called the ADM with Gaussian back substitution (ADM-GbS) Alternatively, the ADM-GbS can be regarded as a prediction-correction type method whose predictor is generated by the ADM procedure and the correction is completed by a Gaussian back substitution procedure The main task of each iteration in ADM-GbS is to solve the following sub-problem: min{θ i (x i ) + β 2 A ix i b i 2 x i X i }, i = 1,, m (12) Thus, ADM-GbS is implementable only when the subproblems of (12) have their solutions in the closed form Again, each iteration of the proposed method in this lecture consists of two steps prediction and correction In order to implement the prediction step, we only assume that the x i -subproblem min{θ i (x i ) + r i 2 x i a i 2 x i X i }, i = 1,, m (13) has its solution in the closed form The first-order optimality condition of (11) and thus characterize (11) by a variational inequality (VI) As we will show, the VI characterization is convenient for the convergence analysis to be conducted

4 XVIII - 4 By attaching a Lagrange multiplier vector λ R l to the linear constraint, the Lagrange function of (11) is: L(x 1, x 2,, x m, λ) = m m θ i (x i ) λ T ( A i x i b), (14) i=1 i=1 which is defined on W := X 1 X 2 X m R l ( Let x 1, x 2,, x m, λ ) be a saddle point of the Lagrange function (14) Then we have L λ R l(x 1, x 2,, x m, λ) L(x 1, x 2,, x m, λ ) L xi X i (i=1,,m)(x 1, x 2,, x m, λ ) For i {1, 2,, m}, we denote by θ i (x i ) the subdifferential of the convex function θ i (x i ) and by f i (x i ) θ i (x i ) a given subgradient of θ i (x i ) It is evident that finding a saddle point of L(x 1, x 2,, x m, λ) is equivalent to finding

5 XVIII - 5 w = (x 1, x 2,, x m, λ ) W, such that (x 1 x 1) T {f 1 (x 1) A T 1 λ } 0, (x m x m) T {f m (x m) A T mλ } 0, (λ λ ) T ( m i=1 A ix i b) 0, (15) for all w = (x 1, x 2,, x m, λ) W More compactly, (15) can be written into (w w ) T F (w ) 0, w W, (16a) where w = x 1 x m λ and F (w) = f 1 (x 1 ) A T 1 λ f m (x m ) A T mλ m i=1 A ix i b (16b) Note that the operator F (w) defined in (16b) is monotone due to the fact that θ i s are all convex functions In addition, the solution set of (16), denoted by W, is also nonempty

6 XVIII Linearized ADM with Gaussian back substitution 21 Linearized ADM Prediction Step 1 ADM step (prediction step) Obtain w k = ( x k 1, x k 2,, x k m, λ k ) in the forward (alternating) order by the following ADM procedure: x k 1 =arg min { θ 1 (x 1 )+ q T 1 A 1 x 1 + r 1 2 x 1 x k 1 2 x1 X 1 } ; x k i =arg min { θ i (x i )+ qi T A i x i + r i 2 x i x k i } 2 xi X i ; x k m =arg min { θ m (x m ) + q T ma m x m + r m 2 x m x k m 2 x m X m } ; where q i = β( i 1 j=1 A j x k j + m j=i A jx k j b) λ k = λ k β( m j=1 A j x k j b) (21)

7 XVIII - 7 The prediction is implementable due to the assumption (13) of this lecture and { arg min θ i (x i )+ qi T A i x i + r i 2 x i x k i } 2 xi X i { = arg min θ i (x i ) + r i 2 x i ( x k i 1 r i A T ) i q } i 2 xi X i Assumption r i, i = 1,, m is chosen that condition is satisfied in each iteration r i x k i x k i 2 β A i (x k i x k i ) 2 (22) In the case that A i = I ni, we take r i = β, the condition (22) is satisfied Note that in this case we have { { i 1 arg min θ i (x i )+ β( x i X i j=1 { = arg min θ i (x i )+ β x i X i 2 A j x k j + (i 1 m j=i A j x k j +A i x i + j=1 } T A j x k j b) Ai x i + β } 2 x i x k i 2 m j=i+1 ) A j x k j b 1 } 2 β λk

8 XVIII Correction by the Gaussian back substitution To present the Gaussian back substitution procedure, we define the matrices: and M = r 1 I n1 0 0 βa T 2 A 1 r 2 I n2 βa T ma 1 βa T ma m 1 r m I nm β I l, (23) H = diag ( r 1 I n1, r 2 I n2,, r m I nm, 1 β I l) (24) Note that for β > 0 and r i > 0, the matrix M defined in (23) is a non-singular

9 XVIII - 9 lower-triangular block matrix In addition, according to (23) and (24), we have: H 1 M T = I n2 0 β r 1 A T 1 A 2 β r 1 A T 1 A m 0 I nm 1 β r nm 1 A T m 1A m I nm I l (25) which is a upper-triangular block matrix whose diagonal components are identity matrices The Gaussian back substitution procedure to be proposed is based on the matrix H 1 M T defined in (25)

10 XVIII - 10 Step 2 Gaussian back substitution step (correction step) Correct the ADM output w k in the backward order by the following Gaussian back substitution procedure and generate the new iterate w k+1 : H 1 M T (w k+1 w k ) = α ( w k w k) (26) Recall that the matrix H 1 M T defined in (25) is a upper-triangular block matrix The Gaussian back substitution step (26) is thus very easy to execute In fact, as we mentioned, after the predictor is generated by the linearized ADM scheme (21) in the forward (alternating) order, the proposed Gaussian back substitution step corrects the predictor in the backward order Since the Gaussian back substitution step is easy to perform, the computation of each iteration of the ADM with Gaussian back substitution is dominated by the ADM procedure (21) To show the main idea with clearer notation, we restrict our theoretical discussion to the case with fixed β > 0 The main task of the Gaussian back substitution step (26) can be rewritten into w k+1 = w k αm T H(w k w k ) (27) As we will show, M T H(w k w k ) is a descent direction of the distance function

11 XVIII w w 2 G with G = MH 1 M T at the point w = w k for any w W In this sense, the proposed linearized ADM with Gaussian back substitution can also be regarded as an ADM-based contraction method where the output of the linearized ADM scheme (21) contributes a descent direction of the distance function Thus, the constant α in (26) plays the role of a step size along the descent direction (w k w k ) In fact, we can choose the step size dynamically based on some techniques in the literature (eg [4]), and the Gaussian back substitution procedure with the constant α can be modified accordingly into the following variant with a dynamical step size: where H 1 M T (w k+1 w k ) = γα k( w k w k ), (28) α k = wk w k 2 H + w k w k 2 Q 2 w k w k 2 H ; (29)

12 XVIII - 12 Q = βa T 1 A 1 βa T 1 A 2 βa T 1 A m A T 1 βa T 2 A 1 βa T 2 A 2 βa T 2 A m A T 2 βa T ma 1 βa T ma 2 βa T ma m A T m A 1 A 2 A m 1 β I l ; (210) and γ (0, 2) Indeed, for any β > 0, the symmetric matrix Q is positive semi-definite Then, for given w k and the w k obtained by the ADM procedure (21), we have that w k w k 2 H = m r i x k i x k i λ k β λ k 2, i=1 and w k w k 2 Q = β m A i (x k i x k i ) + 1 β (λk λ k ) i=1 2, where the norm w 2 H ( w 2 Q, respectively) is defined as w T Hw (w T Qw, respectively) Note that the step size α k defined in (29) satisfies α k 1 2

13 XVIII Convergence of the Linearized ADM-GbS In this section, we prove the convergence of the proposed ADM with Gaussian back substitution for solving (11) Our proof follows the analytic framework of contractive type methods Accordingly, we divide this section into three subsections 31 Verification of the descent directions In this subsection, we mainly show that (w k w k ) is a descent direction of the function 1 2 w w 2 G at the point w = w k whenever w k w k, where w k is generated by the ADM scheme (21), w W and G is a positive definite matrix Lemma 31 Let w k = ( x k 1,, x k m, λ k ) be generated by the linearized ADM step (21) from the given vector w k = (x k 1,, x k m, λ k ) Then, we have w k W, (w w k ) T {d 2 (w k, w k ) d 1 (w k, w k )} 0, w W, (31)

14 XVIII - 14 where d 1 (w k, w k ) = r 1 I n1 0 0 βa T 2 A 1 r 2 I n2 βa T ma 1 βa T ma m 1 r m I nm β I l x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k, (32) d 2 (w k, w k ) = F ( w k ) + β A T 1 A T 2 A T m 0 ( m j=1 A j (x k j x k j ) ) (33)

15 XVIII - 15 Proof Since x k i is the solution of (21), for i = 1, 2,, m, according to the optimality condition, we have x k i X i, (x i x k i ) T { f i ( x k i ) A T i [λ k β( i 1 j=1 A xk j + m j=i A jx k j b)] By using the fact the inequality (34) can be written as +r i ( x k i x k i ) } 0, x i X i (34) λ k = λ k β( m A j x k j b), j=1 x k i X i, (x i x k i ) T { f i ( x k i ) A T i λ k + βa T i ( m A j (x k j x k j ) ) j=i +r i ( x k i x k i ) } 0, x i X i (35)

16 XVIII - 16 Summing the inequality (35) over i = 1,, m, we obtain x k X and x 1 x k T 1 f 1 ( x x 2 x k 2 k 1) A T λ ( 1 k A T m 1 j=1 f 2 ( x k 2) A T λ A j(x k j x k j ) ) 2 k ( A T m 2 j=2 +β A j(x k j x k j ) ) x m x k ( m f m ( x k m) A T m λ k A T m Am (x k m x k m) ) x 1 x k T r 1 1 I n x k 1 x k 1 x 2 x k 2 0 r 2 I n2 x k 2 x k 2 x m x k m 0 0 r m I nm x k m x k m (36)

17 XVIII - 17 for all x X Adding the following term x 1 x k T 1 x 2 x k 2 β x m x k m A T 2 0 ( 1 j=1 A j(x k j x k j ) ) ( A T m 1 m j=1 A j(x k j x k j ) ) to the both sides of (36), we get x k X and for all x X, T x 1 x k 1 f 1 ( x k 1) A T λ ( 1 k + βa T m 1 j=1 x 2 x k A j(x k j x k j ) ) 2 f 2 ( x k 2) A T λ ( 2 k + βa T m 2 j=1 A j(x k j x k j ) ) ( x m x k m f m ( x k m) A T m λ k + βa T m m j=1 A j(x k j x k j ) ) x 1 x k 1 r 1 (x x 2 x k 2 T k 1 x k 1) 0 r 2 (x k 2 x k ( 2) + βa T 1 2 j=1 A j(x k j x k j ) ) ( x m x k m r m (x k m x k m) βa T m 1 m j=1 A j(x k j x k j ) ) (37)

18 XVIII - 18 Because that m j=1 A j x k j b = 1 β (λk λ k ), we have (λ λ k ) T ( m j=1 A j x k j b) = (λ λ k ) T 1 β (λk λ k ) Adding (38) and the last equality together, we get w k W, and for all w W T x 1 x k 1 f 1 ( x k 1) A T λ ( 1 k + βa T m 1 j=1 A j(x k j x k j ) ) x 2 x k 2 f 2 ( x k 2) A T λ ( 2 k + βa T m 2 j=1 A j(x k j x k j ) ) x m x k ( m λ λ f m ( x k m) A T m λ k + βa T m m j=1 A j(x k j x k j ) ) k m j=1 A j x k j b x 1 x k T 1 r 1 (x k 1 x k 1) 0 ( x 2 x k 2 r 2 (x k 2 x k 2) βa T 1 2 j=1 A j(x k j x k j ) ) + x m x k m λ λ r m (x k m x k ( m) k 1 β (λk λ βa T m 1 m j=1 A j(x k j x k j ) ) k ) 0 Use the notations of d 1 (w k, w k ) and d 2 (w k, w k ), the assertion is proved (38)

19 XVIII - 19 Lemma 32 Let w k = ( x k 1, x k 2,, x k m, λ k ) be generated by the ADM step (21) from the given vector w k = (x k 2,, x k m, λ k ) Then, we have ( w k w ) T d 1 (w k, w k ) (λ k λ k ) T ( m A j (x k j x k j ) ), w W, (39) j=1 where d 1 (w k, w k ) is defined in (32) Proof Since w W, it follows from (31) that ( w k w ) T d 1 (w k, w k ) ( w k w ) T d 2 (w k, w k ) (310) We consider the right-hand side of (310) By using (33), we get ( w k w ) T d 2 (w k, w k ) = ( m A j (x k j x k j ) )T β ( m A j ( x k j x j ) ) + ( w k w ) T F ( w k )(311) j=1 j=1 Then, we look at the right-hand side of (311) Since w k W, by using the monotonicity of F, we have ( w k w ) T F ( w k ) 0

20 XVIII - 20 Because that m m A j x j = b and β( A j x k j b) = λ k λ k, j=1 j=1 it follows from (311) that ( w k w ) T d 2 (w k, w k ) (λ k λ k ) T ( m A j (x k j x k j ) ) (312) j=2 Substituting (312) into (310), the assertion (39) follows immediately Since (see (23) and (32)) from (39) follows that d 1 (w k, w k ) = M(w k w k ), (313) ( w k w ) T M(w k w k ) (λ k λ k ) T ( m A j (x k j x k j ) ), w W Now, based on the last two lemmas, we are at the stage to prove the main theorem j=1 (314)

21 XVIII - 21 Theorem 31 (Main Theorem) Let w k = ( x k 1,, x k m, λ k ) be generated by the ADM step (21) from the given vector w k = (x k 1,, x k m, λ k ) Then, we have (w k w ) T M(w k w k ) 1 2 wk w k 2 H wk w k 2 Q, w W, (315) where M, H, and Q are defined in (23), (24) and (210), respectively Proof First, it follows from (314) that (w k w ) T M(w k w k ) (w k w k ) T M(w k w k ) + (λ k λ k ) T ( m A j (x k j x k j ) ), (316) for all w W j=1

22 XVIII - 22 Now, we treat the terms of the right hand side of (316) Using the matrix M (see (23)), we have (w k w k ) T M(w k w k ) = x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k T r 1 I n1 0 0 βa T 2 A 1 r 2 I n2 βa T ma 1 βa T m A m 1 r m I nm β I l x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k (317)

23 XVIII - 23 For the second term of the right-hand side of (316), by a manipulations, we obtain (λ k λ k ) T ( m A j (x k j x k j ) ) = j=1 x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k x k 1 x k x k 2 x k x k m x k m A 1 A 2 A m 0 λ k λ k T (318)

24 XVIII - 24 Adding (317) and (318) together, it follows that (w k w k ) T M(w k w k ) + (λ k λ k ) T ( m j=1 A j (x k j x k j ) ) = x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k r 1 I n1 0 0 βa T 2 A 1 r 2 I n2 βa T ma 1 βa T m A m 1 r m I nm 0 A 1 A 2 A m 1 β I l x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k = 1 2 x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k 2r 1 I n1 βa T 1 A2 βat 1 A m A T 1 βa T 2 A 1 2r 2 I n2 A T 2 βa T m 1 A m βa T m A 1 βa T m A m 1 2r m I nm A T m A 1 A 2 A m 2 β I l x k 1 x k 1 x k 2 x k 2 x k m x k m λ k λ k

25 XVIII - 25 Use the notation of the matrices H, Q and the condition (22) to the right-hand side of the last equality, we obtain (w k w k ) T M(w k w k ) + (λ k λ k ) T ( m A j (x k j x k j ) ) = 1 2 wk w k 2 H wk w k 2 Q j=1 Substituting the last equality in (316), the theorem is proved It follows from (315) that MH 1 M T (w k w ), M T H( w k w k ) 1 2 wk w k 2 (H+Q) In other words, by setting G = MH 1 M T, (319) MH 1 M T (w k w ) is the gradient of the distance function 1 w 2 w 2 G, and M T H( w k w k ) is a descent direction of 1 w 2 w 2 G at the current point w k whenever w k w k

26 XVIII The contractive property In this subsection, we mainly prove that the sequence generated by the proposed ADM with Gaussian back substitution is contractive with respect to the set W Note that we follow the definition of contractive type methods With this contractive property, the convergence of the proposed linearized ADM with Gaussian back substitution can be easily derived with subroutine analysis Theorem 32 Let w k = ( x k 1,, x k m, λ k ) be generated by the ADM step (21) from the given vector w k = (x k 1,, x k m, λ k ) Let the matrix G be given by (319) For the new iterate w k+1 produced by the Gaussian back substitution (27), there exists a constant c 0 > 0 such that w k+1 w 2 G w k w 2 ( G c 0 w k w k 2 H + w k w k 2 Q), w W, where H and Q are defined in (24) and (210), respectively (320)

27 XVIII - 27 Proof For G = MH 1 M T and any α 0, we obtain w k w 2 G w k+1 w 2 G = w k w 2 G (w k w ) αm T H(w k w k ) 2 G = 2α(w k w ) T M(w k w k ) α 2 w k w k 2 H (321) Substituting the result of Theorem 31 into the right-hand side of the last equation, we get and thus w k w 2 G w k+1 w 2 G α ( w k w k 2 H + w k w k 2 Q) α 2 w k w k 2 H = α(1 α) w k w k 2 H + α w k w k 2 Q, w k+1 w 2 G w k w 2 G α ( (1 α) w k w k 2 H + w k w k 2 Q), w W (322) Set c 0 = α(1 α) Recall that α [05, 1) The assertion is proved Corollary 31 The assertion of Theorem 32 also holds if the Gaussian back substitution is (28)

28 XVIII - 28 Proof Analogous to the proof of Theorem 32, we have that where α k w k w 2 G w k+1 w 2 G 2γα k(w k w ) T M(w k w k ) (γα k) 2 w k w k 2 H, (323) is given by (29) According to (29), we have that α k( w k w k 2 H Then, it follows from the above equality and (315) that ) 1 ( = w k w k 2 H + w k w k 2 ) Q 2 w k w 2 G w k+1 w 2 G γαk( w k w k 2 H + w k w k 2 ) Q 1 2 γ2 αk( w k w k 2 H + w k w k 2 ) Q = 1 2 γ(2 k( γ)α w k w k 2 H + w k w k 2 Q)

29 XVIII - 29 Because αk 1, it follows from the last inequality that 2 w k+1 w 2 G w k w 2 G 1 4 γ(2 γ)( w k w k 2 H + w k w k 2 Q), w W (324) Since γ (0, 2), the assertion of this corollary follows from (324) directly 33 Convergence The proposed lemmas and theorems are adequate to establish the global convergence of the proposed ADM with Gaussian back substitution, and the analytic framework is quite typical in the context of contractive type methods Theorem 33 Let {w k } and { w k } be the sequences generated by the proposed ADM with Gaussian back substitution Then we have 1 The sequence {w k } is bounded 2 lim k w k w k = 0,

30 XVIII Any cluster point of { w k } is a solution point of (16) 4 The sequence { w k } converges to some w W Proof The first assertion follows from (320) directly In addition, from (320) we get c 0 w k w k 2 H w 0 w 2 G k=0 and thus we get lim k w k w k 2 H = 0, and consequently and The second assertion is proved lim k xk i x k i = 0, i = 2,, m, (325) Substituting (325) into (35), for i = 1, 2,, m, we have lim k λk λ k = 0 (326) x k i X i, lim k (x i x k i ) T { f i ( x k i ) A T i λ k} 0, x i X i (327)

31 XVIII - 31 It follows from (21) and (326) that Combining (327) and (328) we get lim ( m A j x k j b) = 0 (328) k j=1 w k W, lim k (w wk ) T F ( w k ) 0, w W, (329) and thus any cluster point of { w k } is a solution point of (16) The third assertion is proved It follows from the first assertion and lim k w k w k 2 H = 0 that { w k } is also bounded Let w be a cluster point of { w k } and the subsequence {ṽ k j } converges to w It follows from (329) that w k j W, lim k (w wk j ) T F ( w k j ) 0, w W (330)

32 XVIII - 32 and consequently (x i x i ) T { f i (x i ) A T i λ } 0, x i X i, i = 1,, m, m j=1 A jx j b = 0 This means that w W is a solution point of (16) Since {w k } is Fejér monotone and lim k w k w k = 0, the sequence { w k } cannot have other cluster point and { w k } converges to w W References [1] S Boyd, N Parikh, E Chu, B Peleato and J Eckstein, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, Nov 2010 [2] BS He, M Tao and XM Yuan, Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming, SIAM J Optim 22, , 2012 [3] BS He and X M Yuan: Linearized Alternating Direction Method with Gaussian Back Substitution for Separable Convex Programming HTML/2011/10/3192html Numerical Algebra, Control and Optimization (Special Issue in Honor of Prof Xuchu He s 90th Birthday), 3(2), , 2013 [4] C H Ye and X M Yuan, A Descent Method for Structured Monotone Variational Inequalities, Optimization Methods and Software, 22, , 2007

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department