The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford University, and The International Center of Management Science and Engineering Nanjing University, Nanjing, China http: //www.stanford.edu/ yyye Joint work with Caihua Chen, Bingsheng He, Xiaoming Yuan April 25, 2014 Yinyu Ye Stanford 1/ 22
Outline 1 Background and Motivation 2 Divergent Examples for the Extended ADMM 3 The Small-Stepsize Variant of ADMM 4 Conclusions Yinyu Ye Stanford 2/ 22
1. Background and Motivation Yinyu Ye Stanford 3/ 22
Alternating Direction Method of Multipliers I min {θ 1 (x 1 )+θ 2 (x 2 ) A 1 x 1 + A 2 x 2 = b, x 1 X 1, x 2 X 2 } θ 1 (x 1 )andθ 2 (x 2 ) are convex closed proper functions; X 1 and X 2 are convex sets. Yinyu Ye Stanford 4/ 22
Alternating Direction Method of Multipliers I min {θ 1 (x 1 )+θ 2 (x 2 ) A 1 x 1 + A 2 x 2 = b, x 1 X 1, x 2 X 2 } θ 1 (x 1 )andθ 2 (x 2 ) are convex closed proper functions; X 1 and X 2 are convex sets. Alternating direction method of multipliers(glowinski & Marrocco 75, Gabay & Mercier 76): x1 k+1 = arg min{l A (x 1, x2 k,λk ) x 1 X 1 }, x2 k+1 = arg min{l A (x1 k+1, x 2,λ k ) x 2 X 2 }, λ k+1 = λ k β(a 1 x1 k+1 + A 2 x2 k+1 b), where the augmented Lagrangian function L A is defined as L A (x 1, x 2,λ)= 2 θ i (x i ) λ T ( 2 A i x i b ) + β 2 A i x i b 2. 2 i=1 i=1 i=1 Yinyu Ye Stanford 4/ 22
Alternating direction method of multipliers II Theoretical results of ADMM: Douglas-Rachford Splitting method to its dual (Gabay 76) A special implementation of the Proximal Point Algorithm (Eckstein & Bertsekas 92). Thus the convergence of ADMM can be easily established by using the classical operator theory. O(1/k) convergence speed (He & Yuan 12, Monteiro & Svaiter 13) Linear convergence under certain conditions (Lions & Mercier 79, Eckstein 89, ) Yinyu Ye Stanford 5/ 22
Alternating direction method of multipliers II Theoretical results of ADMM: Douglas-Rachford Splitting method to its dual (Gabay 76) A special implementation of the Proximal Point Algorithm (Eckstein & Bertsekas 92). Thus the convergence of ADMM can be easily established by using the classical operator theory. O(1/k) convergence speed (He & Yuan 12, Monteiro & Svaiter 13) Linear convergence under certain conditions (Lions & Mercier 79, Eckstein 89, ) Applications of ADMM: Partial differential equations, mechanics, image processing, compressed processing, statistical learning, compute version, semidefinite programming, Yinyu Ye Stanford 5/ 22
ADMM for Multi-block Convex Minimization Problems Convex minimization problems with three blocks: min θ 1 (x 1 )+θ 2 (x 2 )+θ 3 (x 3 ) s.t. A 1 x 1 + A 2 x 2 + A 3 x 3 = b x 1 X 1, x 2 X 2, x 3 X 3 θ 1 (x 1 ),θ 2 (x 2 )andθ 3 (x 3 ) are convex closed proper functions; X 1, X 2 and X 3 are convex sets. Yinyu Ye Stanford 6/ 22
ADMM for Multi-block Convex Minimization Problems Convex minimization problems with three blocks: min θ 1 (x 1 )+θ 2 (x 2 )+θ 3 (x 3 ) s.t. A 1 x 1 + A 2 x 2 + A 3 x 3 = b x 1 X 1, x 2 X 2, x 3 X 3 θ 1 (x 1 ),θ 2 (x 2 )andθ 3 (x 3 ) are convex closed proper functions; X 1, X 2 and X 3 are convex sets. The direct extension of ADMM: x1 k+1 =argmin{l A (x 1, x2 k, x 3 k,λk ) x 1 X 1 } x2 k+1 =argmin{l A (x1 k+1, x 2, x3 k,λk ) x 2 X 2 } x3 k+1 =argmin{l A (x1 k+1, x2 k+1, x 3,λ k ) x 3 X 3 } λ k+1 = λ k β(a 1 x1 k+1 + A 2 x2 k+1 + A 3 x3 k+1 b) L A (x 1, x 2, x 3,λ)= 3 θ i (x i ) λ T ( 3 A i x i b ) + β 3 A i x i b 2 2 i=1 i=1 i=1 Yinyu Ye Stanford 6/ 22
Applications of the Extended ADMM The extended ADMM can find many applications: robust PCA with noisy and incomplete data, image alignment problem, latent variable Gaussian graphical mode, quadratic discriminant analysis model, etc. was popularly used in practice, and it has outperformed other variants of ADMM most of the time. Yinyu Ye Stanford 7/ 22
Applications of the Extended ADMM The extended ADMM can find many applications: robust PCA with noisy and incomplete data, image alignment problem, latent variable Gaussian graphical mode, quadratic discriminant analysis model, etc. was popularly used in practice, and it has outperformed other variants of ADMM most of the time. Therefore, one would expect that the extended ADMM always converges. However, Yinyu Ye Stanford 7/ 22
Theoretical Results of the Extended ADMM Not easy to analyze the convergence The operator theory for the ADMM cannot be directly extended to the ADMM with three blocks. Big difference between the ADMM with two blocks and with three blocks. Yinyu Ye Stanford 8/ 22
Theoretical Results of the Extended ADMM Not easy to analyze the convergence The operator theory for the ADMM cannot be directly extended to the ADMM with three blocks. Big difference between the ADMM with two blocks and with three blocks. Existing results for global convergence: Strong convexity; plus β in a specific range (Han & Yuan 12). Certain conditions on the problem; then take a sufficiently small stepsize γ in the update of the multipliers (Hong & Luo 12), i.e., λ k+1 = λ k γβ(a 1 x1 k+1 + A 2 x2 k+1 + A k+1 3 b). A correction step (He&Tao&Yuan 12, He&Tao&Yuan-IMA). Yinyu Ye Stanford 8/ 22
Theoretical Results of the Extended ADMM Not easy to analyze the convergence The operator theory for the ADMM cannot be directly extended to the ADMM with three blocks. Big difference between the ADMM with two blocks and with three blocks. Existing results for global convergence: Strong convexity; plus β in a specific range (Han & Yuan 12). Certain conditions on the problem; then take a sufficiently small stepsize γ in the update of the multipliers (Hong & Luo 12), i.e., λ k+1 = λ k γβ(a 1 x1 k+1 + A 2 x2 k+1 + A k+1 3 b). A correction step (He&Tao&Yuan 12, He&Tao&Yuan-IMA). But, these did not answer the open question whether or not the direct extension of ADMM converges under the simple convexity assumption. Yinyu Ye Stanford 8/ 22
2. Divergent Examples for the Extended ADMM Yinyu Ye Stanford 9/ 22
Strategy to Construct the Counter-example A sufficient condition to guarantee the ADMM convergence: A T 1 A 2 =0, or A T 2 A 3 =0, or A T 3 A 1 =0. Consider the case A T 1 A 2 = 0 in which the extended ADMM reduces to the ADMM with two blocks (by regarding (x 1, x 2 )as one variable). Yinyu Ye Stanford 10/ 22
Strategy to Construct the Counter-example A sufficient condition to guarantee the ADMM convergence: A T 1 A 2 =0, or A T 2 A 3 =0, or A T 3 A 1 =0. Consider the case A T 1 A 2 = 0 in which the extended ADMM reduces to the ADMM with two blocks (by regarding (x 1, x 2 )as one variable). Consequently, our strategy to construct a non-convergence example: A 1, A 2 and A 3 are similar but not identical. No objective function so that the operator is a linear mapping and the convergence is independent of the choice of β Then, we set β = 1 for simplicity. Yinyu Ye Stanford 10/ 22
Strategy to Construct the Counter-example A sufficient condition to guarantee the ADMM convergence: A T 1 A 2 =0, or A T 2 A 3 =0, or A T 3 A 1 =0. Consider the case A T 1 A 2 = 0 in which the extended ADMM reduces to the ADMM with two blocks (by regarding (x 1, x 2 )as one variable). Consequently, our strategy to construct a non-convergence example: A 1, A 2 and A 3 are similar but not identical. No objective function so that the operator is a linear mapping and the convergence is independent of the choice of β Then, we set β = 1 for simplicity. Thus, we simply consider the system of homogeneous linear equations with three variables: A 1 x 1 + A 2 x 2 + A 3 x 3 =0. Yinyu Ye Stanford 10/ 22
Divergent Example of the Extended ADMM I Concretely, we take A =(A 1, A 2, A 3 )= 1 1 1 1 1 2 1 2 2. Thus the extended ADMM with β = 1 can be specified as 3 0 0 0 0 0 4 6 0 0 0 0 5 7 9 0 0 0 1 1 1 1 0 0 1 1 2 0 1 0 1 2 2 0 0 1 x k+1 1 x k+1 2 x k+1 3 λ k+1 0 4 5 1 1 1 0 0 7 1 1 2 = 0 0 0 1 2 2 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 x k 1 x k 2 x k 3 λ k. Yinyu Ye Stanford 11/ 22
Divergent Example of the Extended ADMM II Or equivalently, where x k+1 2 x k+1 3 λ k+1 = M x k 2 x k 3 λ k, 144 9 9 9 18 M = 1 8 157 5 13 8 162 64 122 122 58 64 56 35 35 91 56 88 26 26 62 88. Yinyu Ye Stanford 12/ 22
Divergent Example of the Extended ADMM III The matrix M = V Diag(d)V 1,where 0.9836 + 0.2984i 0.9836 0.2984i d = 0.8744 + 0.2310i 0.8744 0.2310i. 0 and V = 0.1314 + 0.2661i 0.1314 0.2661i 0.1314 0.2661i 0.1314 + 0.2661i 0 0.0664 0.2718i 0.0664 + 0.2718i 0.0664 + 0.2718i 0.0664 0.2718i 0 0.2847 0.4437i 0.2847 + 0.4437i 0.2847 0.4437i 0.2847 + 0.4437i 0.5774 0.5694 0.5694 0.5694 0.5694 0.5774 0.4270 + 0.2218i 0.4270 0.2218i 0.4270 + 0.2218i 0.4270 0.2218i 0.5774, Note that ρ(m) = d 1 = d 2 > 1. Yinyu Ye Stanford 13/ 22
Divergent Example of the Extended ADMM IV Take the initial point (x2 0, x0 3,λ0 )asv(:, 1) + V (:, 2) R 5.Then x k+1 2 = V Diag(d k+1 )V 1 which is divergent. x k+1 3 λ k+1 = V Diag(d k+1 ) = V 1 1 0 3 1 (0.9836 + 0.2984i ) k+1 (0.9836 0.2984i ) k+1 0 3 1 x 0 2 x 0 3 λ 0, Yinyu Ye Stanford 14/ 22
Strong Convexity Helps? Consider the following example min 0.05x1 2 +0.05x2 2 +0.05x2 3 1 1 1 x 1 s.t. 1 1 2 x 2 =0. 1 2 2 x 3 (1) Yinyu Ye Stanford 15/ 22
Strong Convexity Helps? Consider the following example min 0.05x1 2 +0.05x2 2 +0.05x2 3 1 1 1 x 1 s.t. 1 1 2 x 2 =0. 1 2 2 x 3 (1) the matrix M in the extended ADMM (β =1)has ρ(m) =1.0087 > 1 Yinyu Ye Stanford 15/ 22
Strong Convexity Helps? Consider the following example min 0.05x1 2 +0.05x2 2 +0.05x2 3 1 1 1 x 1 s.t. 1 1 2 x 2 =0. 1 2 2 x 3 (1) the matrix M in the extended ADMM (β =1)has ρ(m) =1.0087 > 1 able to find a proper initial point such that the extended ADMM diverges Yinyu Ye Stanford 15/ 22
Strong Convexity Helps? Consider the following example min 0.05x1 2 +0.05x2 2 +0.05x2 3 1 1 1 x 1 s.t. 1 1 2 x 2 =0. 1 2 2 x 3 (1) the matrix M in the extended ADMM (β =1)has ρ(m) =1.0087 > 1 able to find a proper initial point such that the extended ADMM diverges even for strongly convex programming, the extended ADMM is not necessarily convergent for a given β>0 Yinyu Ye Stanford 15/ 22
4. The Small Stepsize Variant of ADMM Yinyu Ye Stanford 16/ 22
The stepsize of ADMM In the direct extension of ADMM, the Lagrangian multiplier is updated by λ k+1 := λ k γβ(a 1 x k+1 1 + A 2 x k+1 2 +...+ A j x k+1 j ). Convergence is proved: Yinyu Ye Stanford 17/ 22
The stepsize of ADMM In the direct extension of ADMM, the Lagrangian multiplier is updated by λ k+1 := λ k γβ(a 1 x k+1 1 + A 2 x k+1 2 +...+ A j x k+1 j ). Convergence is proved: j = 1; (Augmented Lagrangian Method) for γ (0, 2), (Hestenes 69, Powell 69). j = 2; (Alternating Direction Method of Multipliers) for γ (0, 1+ 5 2 ), (Glowinski, 84). j 3; for γ sufficiently small provided additional conditions on the problem, (Hong & Luo 12) Yinyu Ye Stanford 17/ 22
The stepsize of ADMM In the direct extension of ADMM, the Lagrangian multiplier is updated by λ k+1 := λ k γβ(a 1 x k+1 1 + A 2 x k+1 2 +...+ A j x k+1 j ). Convergence is proved: j = 1; (Augmented Lagrangian Method) for γ (0, 2), (Hestenes 69, Powell 69). j = 2; (Alternating Direction Method of Multipliers) for γ (0, 1+ 5 2 ), (Glowinski, 84). j 3; for γ sufficiently small provided additional conditions on the problem, (Hong & Luo 12) Question: Is there a problem-data-independent γ such that the method converges? Yinyu Ye Stanford 17/ 22
A Numerical Study (Ongoing) Consider the linear system 1 1 1 1 1 1+γ 1 1+γ 1+γ x 1 x 2 x 3 =0. Yinyu Ye Stanford 18/ 22
A Numerical Study (Ongoing) Consider the linear system 1 1 1 1 1 1+γ 1 1+γ 1+γ x 1 x 2 x 3 =0. Table: The radius of the problem γ 1 0.1 1e-2 1e-3 1e-4 1e-5 1e-6 1e-7 ρ(m) 1.0278 1.0026 1.0001 > 1 > 1 > 1 > 1 > 1 Yinyu Ye Stanford 18/ 22
A Numerical Study (Ongoing) Consider the linear system 1 1 1 1 1 1+γ 1 1+γ 1+γ x 1 x 2 x 3 =0. Table: The radius of the problem γ 1 0.1 1e-2 1e-3 1e-4 1e-5 1e-6 1e-7 ρ(m) 1.0278 1.0026 1.0001 > 1 > 1 > 1 > 1 > 1 Thus, there seems no practical problem-data-independent γ such that the small-step size variant works. Yinyu Ye Stanford 18/ 22
5. Conclusion Yinyu Ye Stanford 19/ 22
Conclusion We construct examples to show that the direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent for any given algorithm parameter β. Yinyu Ye Stanford 20/ 22
Conclusion We construct examples to show that the direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent for any given algorithm parameter β. Even in the case where the objective function is strongly convex, the direct extension of ADMM loses its convergence for a given β. Yinyu Ye Stanford 20/ 22
Conclusion We construct examples to show that the direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent for any given algorithm parameter β. Even in the case where the objective function is strongly convex, the direct extension of ADMM loses its convergence for a given β. There doesn t exist a problem-data-independent stepsize γ such that the small stepsize variant of ADMM would work. Yinyu Ye Stanford 20/ 22
Conclusion We construct examples to show that the direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent for any given algorithm parameter β. Even in the case where the objective function is strongly convex, the direct extension of ADMM loses its convergence for a given β. There doesn t exist a problem-data-independent stepsize γ such that the small stepsize variant of ADMM would work. Is there a cyclic non-converging example? Yinyu Ye Stanford 20/ 22
Conclusion We construct examples to show that the direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent for any given algorithm parameter β. Even in the case where the objective function is strongly convex, the direct extension of ADMM loses its convergence for a given β. There doesn t exist a problem-data-independent stepsize γ such that the small stepsize variant of ADMM would work. Is there a cyclic non-converging example? Our results support the need of a correction step in the ADMM-type method (He&Tao&Yuan 12, He&Tao&Yuan-IMA). Yinyu Ye Stanford 20/ 22
Conclusion We construct examples to show that the direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent for any given algorithm parameter β. Even in the case where the objective function is strongly convex, the direct extension of ADMM loses its convergence for a given β. There doesn t exist a problem-data-independent stepsize γ such that the small stepsize variant of ADMM would work. Is there a cyclic non-converging example? Our results support the need of a correction step in the ADMM-type method (He&Tao&Yuan 12, He&Tao&Yuan-IMA). Question: Is there an simple correction of the ADMM for the multi-block convex minimization problems? Or how to treat the multi blocks equally? Yinyu Ye Stanford 20/ 22
How to Treat All Blocks Equally? Answer: Independent uniform random permutation in each iteration! Yinyu Ye Stanford 21/ 22
How to Treat All Blocks Equally? Answer: Independent uniform random permutation in each iteration! Select the block-update order in the uniformly random fashion this equivalently reduces the ADMM algorithm to one block. Yinyu Ye Stanford 21/ 22
How to Treat All Blocks Equally? Answer: Independent uniform random permutation in each iteration! Select the block-update order in the uniformly random fashion this equivalently reduces the ADMM algorithm to one block. Or fix the first block, and then select the rest block order in the uniformly random fashion this equivalently reduces the ADMM algorithm to two blocks. Yinyu Ye Stanford 21/ 22
How to Treat All Blocks Equally? Answer: Independent uniform random permutation in each iteration! Select the block-update order in the uniformly random fashion this equivalently reduces the ADMM algorithm to one block. Or fix the first block, and then select the rest block order in the uniformly random fashion this equivalently reduces the ADMM algorithm to two blocks. It works for the example, and it works in general my conjecture. Yinyu Ye Stanford 21/ 22
Thank You! Yinyu Ye Stanford 22/ 22