Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Size: px

Start display at page:

Download "Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory"

Allen Gerard Mason
5 years ago
Views:

1 Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics and Scientific/Engineering Computing Academy of Mathematics and Systems Science Chinese Academy of Sciences, China 2012 International Workshop on Signal Processing, Optimization, and Control USTC, Hefei, Anhui, China Xin Liu (AMSS) ADMM July 1st, / 22

2 Outline 1 Introduction and Applications Basic Idea Algorithm Framework Applications 2 Theoretical Results Brief Introduction Convergence Results 3 Conclusion and Future works Xin Liu (AMSS) ADMM July 1st, / 22

3 Section I. Introduction and Application Xin Liu (AMSS) ADMM July 1st, / 22

4 Divide and Conquer An Ancient Strategy Cô ˆ Â», ƒ ))5šfW{6(5SUN TZU, ART OF WAR6) šf(535õ470 BC) Divide et impera Julius Caesar (100õ44 BC) Mathematical Point of View: Split and Alternate Xin Liu (AMSS) ADMM July 1st, / 22

5 Splitting Techniques Case 1: Nondifferentiable Term min f (x) + g(bx) Case 2: Highly Nonconvex min f (x) + g(y) s.t. Bx y = 0. min f (g(x)) min f (y) s.t. g(x) y = 0. Case 3: Inconsistent Objective and Constraint min f (x) s.t. c(x) = 0 min f (x) s.t. c(y) = 0, x = y. Xin Liu (AMSS) ADMM July 1st, / 22

6 Splitting Instances Instance 1: Compressive Sensing min Wx 1 + µ 2 Ax b 2 2 min y 1 + µ 2 Ax b 2 2 s.t. Wx y = 0. Instance 2: Nonlinear l 1 Minimization min f (x) 1. min y 1 s.t. f (x) = y. Xin Liu (AMSS) ADMM July 1st, / 22

7 Splitting Instances Convex Models Instance 3: Dual Problem of Compressive Sensing (Yang-Zhang 2009) min b T y + 1 2µ y 2 2 s.t. W T A T y 1. min b T y + 1 2µ y 2 2 s.t. z 1, z = W T A T y. Xin Liu (AMSS) ADMM July 1st, / 22

8 Augmented Lagrangian Method Equality Constrained Problems min f (x) s.t. c(x) = 0. Augmented Lagrangian Function (Henstenes 1969, Powell 1969, Rockafellar 1973) L β (x, λ) = f (x) λ T c(x) + β 2 c(x) 2 2. Augmented Lagrangian Method x k+1 arg min L β (x, λ k ); ALM : λ k+1 λ k τ β c(x k+1 ); update β if necessary. Xin Liu (AMSS) ADMM July 1st, / 22

9 Augmented Lagrangian Method (Cont d) Problems with Equality Constraints min f (x) s.t. c(x) = 0. x Ω Augmented Lagrangian Method Extension x k+1 arg min L β (x, λ k ); x Ω ALM : λ k+1 λ k τ β c(x k+1 ); update β if necessary. Xin Liu (AMSS) ADMM July 1st, / 22

10 Alternating Direction Method of Multiplier Block Structure {x x Ω} = p {x x i Ω i }. i=1 (Augmented Lagrangian) Alternating Direction Method (of Multiplier) (Glowinski-Marocco 1975, Gabay-Mercier 1976,...) ADMM : x k+1 1 arg min x1 Ω 1 L β (x 1, x k 2,..., xk p, λ k ); x k+1 2 arg min x2 Ω 2 L β (x k+1 1, x 2, x k 3,..., xk p, λ k );... xp k+1 arg min xp Ω p L β (x1 k+1,..., xp 1 k+1, x p, λ k ); λ k+1 λ k τ β c(x1 k+1,..., xp k+1 ). Xin Liu (AMSS) ADMM July 1st, / 22

11 Applications I Phase Retrieval (Wen-Yang-L.) X-ray crystallography, transmission electron microscopy Original model: Reformulation: min ˆψ C n,z C m k k i=1 min ˆψ C n k i=1 1 F Qi ˆψ b 2 i z i b i 2 2 s.t. z i = F Q i ˆψ, i = 1,..., k. Augmented Lagrange function: k ( 1 L β (z i, ψ, y i ) = 2 z i b i y i (F Q iψ z i ) + β ) 2 F Q iψ z i 2 2. i=1 Xin Liu (AMSS) ADMM July 1st, / 22

12 Applications II Portfolio Optimization (Wen-Peng-L.-Bai-Sun) Asset Allocation under the Basel Accord Risk Measures (Value-at-Risk) integer programming Original model: min ( Ru) (p), u U r0 where U r0 = {u R d µ T u r 0, 1 T u = 1, u 0}; ( ) (p) refers to the p-th smallest component of a vector. Reformulation: Augmented Lagrange function: min u U r0, x R n x (p) s.t. x + Ru = 0. L β (x, u, λ) := x (p) λ T (x + Ru) + β 2 x + Ru 2. Xin Liu (AMSS) ADMM July 1st, / 22

13 Applications III Matrix Factorization (Zhang et al.) Nonnegative matrix factorization, structure enforcing matrix factorization Original model: min W R m k, H R n k A WHT 2 F s.t. W T 1, H T 2, where T 1, T 2 can be {X X T X = I}, or {X X 0}, or any other matrix sets allowing easy projection Reformulation: min A WH T 2 W, H, S 1 T 1, S 2 T F s.t. W = S 1, H = S 2. 2 Augmented Lagrange function: L (β1, β 2 )(W, H, S 1, S 2, Λ) = A WH T 2 F Λ 1 (W S 1 ) Λ 2 (H S 2 ) + β 1 2 W S 1 2 F + β 2 2 H S 2 2 F. Xin Liu (AMSS) ADMM July 1st, / 22

14 Section II. Theoretical Results Xin Liu (AMSS) ADMM July 1st, / 22

15 Brief Introduction Intuition Splitting brings easy subproblem Splitting induces equality constraint Augmented Lagrange Alternating solves the split targets in turn From line search to ADM Line search based optimization - one dimensional subspace Subspace method - multi-dimensional subspace ADMM - high-order subspaces Convergence Based on Strict Conditions Two blocks, joint convexity, separability (Gabay-Mercier 1976) Multiple blocks, variant versions (He, Yuan et al., Goldfarb and Ma, etc.) complexity acceleration customization Two blocks, linear convergence rate (Yin-Deng 2012) Xin Liu (AMSS) ADMM July 1st, / 22

16 Some New Results Towards a General Scheme nonconvex and nonseparable cases Local convergence and rate (Yang-L.-Zhang) Global convergence (L.-Yang-Zhang) under some assumptions (ongoing) special case: multiple blocks, separable + strongly convex + second order differentiable Nonlinear Splitting and Iteration Scheme Original Nonlinear System: F : R n R n Splitting: G : R n R n R n i.e. G(x, x) := L(x) R(x) F(x). 1 G x G, and 2 G x G. Consider G(x, x, λ) to be a splitting of F := x L β (x, λ) A generalized ADMM scheme: x k+1 G(x, x k, λ k ) = 0; GADMM : λ k+1 λ k τ β c(x k+1 ). Xin Liu (AMSS) ADMM July 1st, / 22

17 Local Convergence Result Error System e k+1 = M(τ)e k + o( e k ) where [ M(τ) = [ 1 G ] 1 2 G [ 1 G ] 1 ( c ) T τ c [ 1 G ] 1 2 G I τ c [ 1 G ] 1 ( c ) T ] Local convergence: e k ((x k x ) T, (λ k λ ) T ) T Implicit Function Theorem + Taylor Expansion Assumptions: xx L β (x, λ ) 0 and c(x ) full row rank Results: local convergence: η > 0, ρ(m(τ)) < 1, τ (0, η); R-linear rate: ρ(m(τ)). Xin Liu (AMSS) ADMM July 1st, / 22

18 Global Convergence Relative Error System e k+1 = M(τ) k e k where [ M(τ) k = [ 1 G k L ] 1 2 G k U τa[ 1 G k L ] 1 2 G k U [ 1 G k L ] 1 A T I τa[ 1 G k L ] 1 A T ] Global convergence: e k ((x k x k 1 ) T, (λ k λ k 1 ) T ) T Mean Value Theorem + Average Hessian ( 1 G k L, 2 G k U ) Difficulty: non-stationary iteration L β strongly convex and L β is Lipschitz continuous ρ(m(τ) k ) 1 ɛ ( k) global convergence Xin Liu (AMSS) ADMM July 1st, / 22

19 Global Convergence (Cont d) l 2 Restriction (ongoing) M(τ) k ) 2 1 ɛ ( k) Assumptions: linear constraints block-wise convexity second order differentiability block diagonal dominance Result: global convergence Special Case Assumptions: separability: 2 G k U constant, 1 G k L non-stationary in block diagonal strongly convexity linear constraints second order differentiability Result: β > 0 and η > 0; global convergence, β (0, β), τ (0, η). Xin Liu (AMSS) ADMM July 1st, / 22

20 Section III. Conclusion and Future works Xin Liu (AMSS) ADMM July 1st, / 22

21 Conclusion Powerful tool for hard optimization problem with structure; Lack of convergence results for nonconvex problems; Excellent performance in practice. Future Works There is still room for further improvement of the algorithm; Convergence results for lots of known successful cases are still unclear; Gap from the stationary point to the global optimizer. Xin Liu (AMSS) ADMM July 1st, / 22

22 Thank you for your attention! Xin Liu (AMSS) ADMM July 1st, / 22

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop