Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Size: px

Start display at page:

Download "Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization"

Eustacia Owens
5 years ago
Views:

1 Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University MOPTA Conference Lehigh University August 19, 2010

2 Alternating direction augmented Lagrangian methods Long history: goes back to Gabay and Mercier, Glowinski and Marrocco, Lions and Mercier, and Passty etc. Optimization models from partial differential equation Maximal monotone operators Variational inequalities Nonlinear convex optimization Linear programming Nonsmooth l 1 -minimization, compressive sensing

3 Outline Outline: 1 ADAL methods for minimizing a convex function that can be written as a sum of simpler functions 2 ADAL methods for solving SDPs

4 Introduction SUM-K min F (x) K f i (x) i=1 SUM-2 min F (x) f (x) + g(x) Minimize the sum of convex functions Assume the following problem is easy min x τf i (x) + 1 x y 2 2 Examples of f i : x 1, x 2, Ax b 2, X, log det(x ),...

5 Examples Compressed sensing: Matrix Rank Min: min ρ x Ax b 2 2 Robust PCA: min ρ X A(X ) b 2 2 min X,Y X + ρ Y 1 : X + Y = M Sparse Inverse Covariance Selection: min log det(x ) + Σ, X + ρ X 1

6 Variable Splitting (SUM 2) min f (x) + g(x) Variable splitting min s.t. f (x) + g(y) x = y Augmented Lagrangian function: L(x, y; λ) := f (x) + g(y) λ, x y + 1 x y 2 2µ Augmented Lagrangian Method: { (x k+1, y k+1 ) := arg min (x,y) L(x, y; λk ) λ k+1 := λ k (x k+1 y k+1 )/µ

7 Alternating Direction Augmented Lagrangian (ADAL) L(x, y; λ) := f (x) + g(y) λ, x y + 1 2µ x y 2 Solve augmented Lagrangian subproblem alternatingly x k+1 := arg min x L(x, y k ; λ k ) y k+1 := arg min y L(x k+1, y; λ k ) λ k+1 := λ k (x k+1 y k+1 )/µ

8 Symmetric ADAL L(x, y; λ) := f (x) + g(y) λ, x y + 1 2µ x y 2 Symmetric version x k+1 := arg min x L(x, y k ; λ k ) λ k+ 1 2 := λ k (x k+1 y k )/µ y k+1 := arg min y L(x k+1, y; λ k+ 1 2 ) λ k+1 := λ k+ 1 2 (x k+1 y k+1 )/µ Optimality conditions lead to (assuming f and g are smooth) λ k+ 1 2 = f (x k+1 ), λ k+1 = g(y k+1 )

9 Alternating Linearization Method (SUM 2) min F (x) f (x) + g(x) Define Q g (u, v) := f (u) + g(v) + g(v), u v + 1 u v 2 2µ Q f (u, v) := f (u) + f (u), v u + 1 2µ u v 2 + g(v) Alternating Linearization Method (ALM) { x k+1 := arg min x Q g (x, y k ) y k+1 := arg min y Q f (x k+1, y) Proposed by Kiwiel, Rosa and Ruszczynski for stochastic programming. Gauss-Seidel like algorithm

10 Complexity Bound of ALM Theorem (Goldfarb, Ma and Scheinberg, 2009) Assume f and g are Lipschitz continuous with constants L(f ) and L(g). For µ 1/ max{l(f ), L(g)}, ALM satisfies Therefore, F (y k ) F (x ) x 0 x 2 convergence in objective value 4µk O(1/ɛ) iterations for an ɛ-optimal solution The first complexity result for splitting and alternating direction type methods Question: Can we improve the complexity?

11 Optimal Gradient Methods min f (x) (assuming f is Lipschitz continuous) ɛ-optimal solution f (x) f (x ) ɛ Classical gradient method Complexity O(1/ɛ) x k = x k 1 τ k f (x k 1 ) Nesterov s acceleration technique (1983) Complexity O(1/ ɛ) { x k := y k 1 τ k f (y k 1 ) y k := x k + k 1 k+2 (x k x k 1 ) Optimal first-order method; best one can get

12 ISTA and FISTA (Beck and Teboulle, 2009) Assume g is smooth min F (x) f (x) + g(x) Fixed Point Algorithm ( Also called ISTA in compressed sensing ) or equivalently x k+1 := arg min x Q f (x, x k ) x k+1 := arg min x τf (x) x (x k τ g(x k )) 2 Never minimize g Iteration complexity: O(1/ɛ) for an ɛ-optimal solution (F (x k ) F (x ) ɛ)

13 ISTA and FISTA (Beck and Teboulle, 2009) min F (x) f (x) + g(x) Fast ISTA (FISTA) x k := ( arg min x τf (x) + ) 1 2 x (y k τ g(y k )) 2 t k+1 := tk 2 /2 y k+1 := x k + t k 1 t k+1 (x k x k 1 ) Complexity O(1/ ɛ)

14 Fast Alternating Linearization Method ALM { x k+1 := arg min x Q g (x, y k ) y k+1 := arg min y Q f (x k+1, y) Accelerate ALM in the same way as FISTA Fast alternating linearization method (FALM) x k := arg min x Q g (x, z k ) y k := arg min y Q f (x k, y) w k := (x k + y k )/2 t k+1 := ( t 2 k ) /2 z k+1 := w k + 1 t k+1 (t k (y k w k 1 ) (w k w k 1 )) computational effort at each iteration is almost unchanged

15 Complexity Bound for FALM min F (x) f (x) + g(x) Theorem (Goldfarb, Ma and Scheinberg, 2009) Assume f and g are Lipschitz continuous with constants L(f ) and L(g). For µ 1/ max{l(f ), L(g)}, FALM satisfies Therefore, F (y k ) F (x ) x 0 x 2 µ(k + 1) 2 convergence in objective value O(1/ ɛ) iterations for an ɛ-optimal solution Optimal first-order method

16 ALM with skipping steps At k-th iteration of ALM-S: x k+1 := arg min x L µ (x, y k ; λ k ) If F (x k+1 ) > L µ (x k+1, y k ; λ k ), then x k+1 := y k y k+1 := arg min y Q f (y, x k+1 ) λ k+1 := f (x k+1 ) (x k+1 y k+1 )/µ Note that only one function is required to be smooth.

17 Complexity result of ALM-S Theorem (Goldfarb and Scheinberg, 2010) Assume f is Lipschitz continuous. For µ 1/L(f ), the iterates y k in ALM-S satisfy: F (y k ) F (x ) x 0 x 2 2µ(k + k ns ), k, where k ns is the number of iterations until the k-th for which F (x k+1 ) L µ (x k+1, y k ; λ k ). Thus O(1/ɛ) iterations to obtain an ɛ-optimal solution. Similar algorithm can be designed for FALM with O(1/ ɛ) complexity and only one function is required to be smooth.

18 Complexity result of ADAL Theorem (Goldfarb and Huang, 2010) Assume both f and g are Lipschitz continuous. For µ 1/ max{l(f ), L(g)}, ADAL needs O(1/ɛ) iterations to obtain an ɛ-optimal solution. Thus far we have been unable to develop a fast O(1/ ɛ) version of ADAL.

19 SUM-K From P.L.Lions and B.Mercier s 1979 paper on operator splitting Generalization from 2 to K is possible, but It seems proving convergence for K 3 is difficult. min F (x) f (x) + g(x) + h(x) Define Q gh (u, v, w) := f (u) + g(v) + g(v), u v + u v 2 /2µ +h(w) + h(w), u w + u w 2 /2µ. x k+1 := arg min Q gh (x, y k, z k ) y k+1 := arg min Q fh (x k+1, y, z k ) z k+1 := arg min Q fg (x k+1, y k+1, z) However, no complexity results for Gauss-Seidel like algorithm!

20 MSA: A Jacobi Type Algorithm min F (x) f (x) + g(x) + h(x) Multiple Splitting Algorithm (MSA) Jacobi type algorithm Can be done in parallel x k+1 := arg min Q gh (x, w k, w k ) y k+1 := arg min Q fh (w k, y, w k ) z k+1 := arg min Q fg (w k, w k, z) w k+1 := (x k+1 + y k+1 + z k+1 )/3 We have complexity result!

21 Complexity Bound for MSA min F (x) f (x) + g(x) + h(x) Theorem (Goldfarb and Ma, 2009) Assume f, g and h are Lipschitz continuous with constants L(f ), L(g) and L(h). For µ 1/ max{l(f ), L(g), L(h)}, MSA satisfies Therefore, min{f (x k ), F (y k ), F (z k )} F (x ) x 0 x 2. µk convergence in objective value O(1/ɛ) iterations for an ɛ-optimal solution

22 Fast Multiple Splitting Algorithm Fast Multiple Splitting Algorithm (FaMSA) x k := arg min Q gh (x, wx k, wx k ) y k := arg min Q fh (wy k, y, wy k ) z k := arg min Q fg (wz k, wz k, z) w k := (x k + y k + z k )/3 t k+1 := ( 1 + w k+1 x := w k t 2 k ) /2 t k+1 [t k (x k w k ) (w k w k 1 )] wy k+1 := w k + 1 t k+1 [t k (y k w k ) (w k w k 1 )] wz k+1 := w k + 1 t k+1 [t k (z k w k ) (w k w k 1 )]

23 Complexity Bound for FaMSA min F (x) f (x) + g(x) + h(x) Theorem (Goldfarb and Ma, 2009) Assume f, g and h are Lipschitz continuous with constants L(f ), L(g) and L(h). For µ 1/ max{l(f ), L(g), L(h)}, FaMSA satisfies Therefore, min{f (x k ), F (y k ), F (z k )} F (x ) 4 x 0 x 2 convergence in objective value O(1/ ɛ) iterations for an ɛ-optimal solution optimal first-order method µ(k + 1) 2

24 Comparison of ALM/FALM and MSA/FaMSA ALM/FALM Gauss-Seidel like algorithms expected to be faster than MSA/FaMSA since the information from current iteration is used complexity results for (SUM-2), no results for (SUM-K) when K 3 only one function needs to be smooth MSA/FaMSA Jacobi like algorithms can be done in parallel complexity results for (SUM-K) for any K 2

25 ALM and FALM for Robust PCA Robust PCA: f (X ) = X, g(y ) = ρ Y 1 min { X + ρ Y 1 : X + Y = M} X,Y R m n Subproblem wrt X (a matrix shrinkage operator, corresponds to an SVD): X k+1 := arg min X f (X ) + g(y k ) + g(y k ), M X Y k Subproblem wrt Y (a vector shrinkage operator): + X + Y k M 2 F /2µ Y k+1 := arg min Y f (X k+1 ) + f (X k+1 ), M X k+1 Y + X k+1 + Y M 2 F /2µ + g(y )

26 Numerical Results Surveillance video

27 Numerical Results (cont.) Surveillance video

28 Numerical Results (cont.) Shadow and specularities removal from face images

29 Numerical Results (cont.) Video denoising

30 Summary of Numerical Results Problem n 1 n 2 SVDs CPU Hall :03 Escalator :53 Campus (color) :01:01 Face :39 Video denoising :12:49

31 Conclusion and future work Our contribution Alternating linearization and multiple splitting methods Optimal first-order methods First complexity results for splitting and alternating direction methods Current and Future work Extension of ALM/FALM, MSA/FaMSA to constrained problems Extension of MSA/FaMSA to nonsmooth problems Development of Fast ADAL with O(1/ (ɛ)) complexity Line search variants Applications in many fields such as Medical Imaging, Machine Learning, Model Selection, Optimal acquisition basis selection (radar), etc.

32 ADAL applied to SDP dual problem (D) min y R m,s S n b y s.t. A (y) + S = C, S 0, A(X ) := ( A (1), X,, A (m), X ) A (y) := m i=1 y ia (i) Let A := ( vec(a (1) ),, vec(a (m) ) ) R m n 2. Constraints: A(X ) = b = Avec(X ) = b Operators related to A( ): A(A (y)) := (AA )y, A (A(X )) := mat (( ) ) A A vec(x ) Assumption: The matrix A has full row rank and the Slater condition holds for the primal SDP.

33 Augmented Lagrangian method Augmented Lagrangian function: L µ (X, y, S) = b y + X, A (y) + S C + 1 2µ A (y)+s C 2 F. Algorithmic framework: compute y k+1 and S k+1 at k-th iteration (DL) min L µ(x k, y, S), s.t. S 0. y R m,s S n update the primal variable X k+1 by Pros and Cons: X k+1 := X k + A (y k+1 ) + S k+1 C µ Pros: rich theory, well understood and a lot of algorithms Cons: L µ (X k, y, S) is not separable in y and S, and the subproblem (DL) is difficult to minimize.

34 Alternating direction method Divide variables into different blocks according to their roles Minimize the augmented Lagrangian function with respect to one block at a time while all other blocks are fixed Framework y k+1 := arg min y R m L µ(x k, y, S k ), S k+1 := arg min S S n L µ(x k, y k+1, S), S 0, X k+1 := X k + A (y k+1 ) + S k+1 C µ.

35 Alternating direction method Solving subproblem: y k+1 = arg min y R m L µ (X k, y, S k ) First-order optimality condition: y L µ := A(X k ) b + 1 µ A(A (y k+1 ) + S k C) = 0. Since AA is invertible, ( ) y k+1 := y(s k, X k ) := (AA ) 1 µ(a(x k ) b) + A(S k C). Compute AA prior to executing the algorithm. The operator AA is the identity for many SDP relaxations of combinatorial optimization problems, such as the maxcut, maximum stable set and bisection problems

36 Alternating direction method Solving subproblem S k+1 = arg min S S n L µ(x k, y k+1, S), S 0 Equivalent formulation: S min V k+1 2, S 0, S S n F where V k+1 := V (S k, X k ) = C A (y(s k, X k )) µx k. Hence, the solution is S k+1 := V k+1 := Q Σ + Q where V k+1 = QΣQ = ( ) ( ) ( Σ Q Q + 0 Q ) 0 Σ Q

37 Alternating direction method Updating the Lagrange multiplier X k+1 Updating formula: Equivalent formulation: X k+1 := X k + A (y k+1 ) + S k+1 C µ X k+1 = 1 µ (S k+1 V k+1 ) = 1 µ V k+1, where V k+1 := Q Σ Q. Note that X k+1 is also the optimal solution of µx min + V k+1 2, X 0. X S n F

38 Convergence results Let P(V ) := ( V, V ) Fixed point formulation: y k+1 = y(s k, X k ) ( S k+1, µx k+1) = P(V k+1 ) = P(V (S k, X k )). Equivalence of the optimality conditions: (X, y, S) satisfies the KKT optimality conditions A(X ) = b, A (y) + S = C, SX = 0, X 0, Z 0. (X, y, S) satisfies y = y(s, X ) and ( S, µx ) = P(V (S, X )). Main result: the sequence {(X k, y k, S k )} generated by our algorithm from any starting point converges to a solution (X, y, S ) X, where X is the set of primal and dual solution pairs. No complexity results currently known.

39 Convergence results For any V, V S n, P(V ) P( V ) V V F, with F equality holding if and only if V V = 0 and V V = 0. For any S, X, Ŝ, X S+, n V (S, X ) V (Ŝ, X ( ) F S Ŝ, µ(x X ) F ) with equality holding if and only if V V = (S Ŝ) µ(x X ). Let (X, y, S ), be an optimal solution. If P(V (S, X )) P(V (S, X )) F = ( S S, µ(x X ) ) F, then, (S, µx ) is a fixed point, and hence, (X, y, S) is a primal and dual optimal solution pair.

40 An expanded problem Primal problem: min C, X, s.t. A(X ) = b, B(X ) d, X 0, X 0, where B(X ) := ( B (1), X,, B (q), X ), B (i) S n. Dual problem min b y d v, y R m,v R q,s S n,z S n s.t. A (y) + B (v) + S + Z = C, v 0, S 0, Z 0

41 A multiple splitting scheme Augmented Lagrangian function L µ = b y d v + X, A (y) + B (v) + S + Z C + 1 2µ A (y) + B (v) + S + Z C 2 F, Framework y k+1 := arg min y R m L µ(x k, y, v k, Z k, S k ), v k+1 := arg min v R q L µ(x k, y k+1, v, Z k, S k ), v 0, Z k+1 := arg min Z S n L µ(x k, y k+1, v k+1, Z, S k ), Z 0, S k+1 := arg min S S n L µ(x k, y k+1, v k+1, Z k+1, S), S 0, X k+1 := X k + A (y k+1 ) + B (v k+1 ) + S k+1 + Z k+1 C µ.

42 Eigenvalue decomposition Partial eigenvalue decomposition: X and S share the same set of eigenvectors since XS = 0. Only need to compute either V k or V k. Adaptive scheme (suppose that V k 1 is known) κ +(V k 1 ): the total number of the pos. eigenvalues of V k 1. compute V k for S k if κ +(V k 1 ) n 2 otherwise, compute V k for X k. Direct and iterative methods DSYEVX in LAPACK and ARPACK Warm start techniques V k is the optimal solution of min S 0 S V k 2 F Nonlinear programming approaches: R k+1 := arg min RR V k+1 2 R R n κ F starting from R := Q k (Σk +) 1 2,

43 Adjusting penalty parameter µ Tune µ so that the primal and dual infeasibilities are balanced The primal and dual infeasibilities are proportional to 1 µ and µ, respectively. We decrease (increase) µ by a factor γ, 0 < γ < 1, if the primal infeasibility is less than (greater than) a multiple of the dual infeasibility for a number of consecutive iterations. µ is required to remain within an interval [µ min, µ max ], where 0 < µ min < µ max <.

44 Other practical issues step size for updating X : X k+1 = X k + ρ A (y k+1 ) + S k+1 C µ = (1 ρ)x k + ρ µ (S k+1 V k+1 ) = (1 ρ)x k + ρ X k+1, termination rule: optimality and feasiblities: pinf = A(X ) b b 2, dinf = C + S + Z A (y) F 1 + C 1, gap = b y C, X 1 + b y + C, X. Stop when δ := max{pinf, dinf, gap} ɛ Stagnation: it stag is an iteration counter (it stag > h 1 and δ <= 10ɛ) or (it stag > h 2 and δ <= 10 2 ɛ) or (it stag > h 3 and δ <= 10 3 ɛ), where 1 < h 1 < h 2 < h 3 are integers representing different levels of difficulty.

45 Numerical Results Frequency assignment problem Maximal stable set problem Comparison with SDPNAL (Zhao, Sun and Toh) A semi-smooth Newton-CG method for minimizing the dual augmented Lagrangian function Codes were written in C Language MEX-files in MATLAB (Release 7.3.0) and all experiments were performed on a Dell Precision 670 workstation with an Intel Xeon 3.4GHZ CPU and 6GB of RAM.

46 Numerical Results: frequency assignment problems Graph: G = (V, E); Weight matrix W S r Formulation: min 1 k 1 2k diag(we) + 2k W, X s.t. X i,j 1 k 1, (i, j) E\T, X i,j = 1 k 1, (i, j) T, diag(x ) = e, X 0. Operator A: the edges in T and diag(x ) = e Operator B: the edges in E\T 1 Scaling: X ij / 2 = 2(k 1) and X ij / 2 1 2(k 1) ( ) y k+1 = µ(a(x k ) b) + A(S k C) ( ( v k+1 = max µ(b(x k ) d) + B(S k C) ) ), 0.

47 Numerical Results: frequency assignment problems Table: Computational results ADAL SDPNAL name n ˆm pinf dinf itr gap cpu gap cpu fap e e e e fap e e e e fap e e e e fap e e e e fap e e e e fap e e e e fap e e e e fap e e e e fap e e e e fap e e e e-5 1:50 fap e e e e-4 5:15 fap e e e-05 6: e-4 13:14

48 Numerical Results: frequency assignment problems Figure: Performance profiles of two variants of ADAL, SDPNAL and mprw gap cpu % of problems % of problems SDPAD ρ=1.6 SDPAD ρ= SDPAD ρ=1 SDPNAL 0.1 SDPAD ρ=1 SDPNAL mprw mprw not more than 2 x times worse than the best not more than 2 x times worse than the best (a) gap (b) cpu

49 Numerical Results: computing θ(g) and θ + (G) Formulation: θ(g) := max ee, X s.t. X ij = 0, (i, j) E, I, X = 1, X 0, θ + (G) := max ee, X s.t. X ij = 0, (i, j) E, I, X = 1, X 0, X 0, Scaling: X ij / 2 = 0 and 1 n I, X = 1 n.

50 Numerical Results: computing θ(g) Table: Computational results on computing θ(g) ADAL SDPNAL name n ˆm pinf dinf itr gap cpu gap cpu theta e e e e-8 50 theta e e e e-8 1:00 theta e e e e-8 58 theta e e e-7 1:43 4.1e-8 1:34 MANN-a e e e e-8 07 sanr e e e e-7 04 c-fat e e e e-8 09 ham e e e-6 22:8 9.0e-8 02 ham e e e e-8 10 ham e e e-5 2:48 1.4e-6 1:33 brock e e e e-8 26 keller e e e e-8 05 p-hat e e e e-7 1:45 G e e e-6 21:17 4.2e-8 1:33 2dc e e e-4 5:51 1.7e-4 32:16 1dc e e e-4 24:26 2.9e-6 41:26 1et e e e-4 20:03 1.8e-6 1:01:14 1tc e e e-4 21:47 2.2e-6 1:48:04 1zc e e e-7 23:15 3.3e-8 4:15 2dc e e e-4 49:05 9.9e-5 2:57:56 1dc e e e-4 2:50:44 1.5e-6 6:11:11 1et e e e-4 4:54:57 8.8e-7 7:13:55 1tc e e e-3 5:15:14 7.9e-6 9:52:09 1zc e e e-6 6:39:07 1.2e-6 45:16 2dc e e e-4 7:12:50 4.4e-5 15:13:19

51 Numerical Results: computing θ + (G) Table: Computational results on computing θ + (G) ADAL SDPNAL name n ˆm pinf dinf itr gap cpu gap cpu theta e e e-7 1:01 8.4e-8 3:31 theta e e e-7 1:01 2.3e-8 3:28 theta e e e e-7 2:35 theta e e e-7 1:39 1.2e-7 6:44 MANN-a e e e e-7 35 sanr e e e e-7 11 c-fat e e e e-7 36 ham e e e e ham e e e-7 2:19 2.6e-7 42 ham e e e-7 27:58 4.2e-7 4:35 brock e e e e-9 1:45 keller e e e e-7 43 p-hat e e e e-7 6:50 G e e e-6 20:22 2.1e-7 52:00 2dc e e e-5 5:10 3.8e-4 2:25:15 1dc e e e-5 57:20 1.4e-5 5:03:49 1et e e e-4 36:04 1.1e-5 6:45:50 1tc e e e-4 49:13 8.7e-4 10:37:57 1zc e e e-4 28:22 1.6e-7 40:13 2dc e e e-5 50:34 7.3e-4 11:57:25 1dc e e e-4 4:25:17 9.7e-5 35:52:44 1et e e e-4 5:15:09 4.0e-5 80:48:17 1tc e e e-4 6:35:33 1.4e-3 73:56:01 1zc e e e-6 7:14:21 2.3e-7 2:13:04 2dc e e e-5 5:45:25 2.7e-3 45:21:42

52 Numerical Results: computing θ(g) Figure: Performance profiles of two variants of ADAL and SDPNAL gap cpu % of problems % of problems SDPAD ρ=1.6 SDPAD ρ=1 SDPNAL 0.1 SDPAD ρ=1.6 SDPAD ρ=1 SDPNAL not more than 2 x times worse than the best not more than 2 x times worse than the best (a) gap (b) cpu

53 Numerical Results: computing θ + (G) Figure: Performance profiles of two variants of ADAL and SDPNAL gap cpu % of problems % of problems SDPAD ρ=1.6 SDPAD ρ=1 SDPNAL 0.1 SDPAD ρ=1.6 SDPAD ρ=1 SDPNAL not more than 2 x times worse than the best not more than 2 x times worse than the best (a) gap (b) cpu

54 Numerical Results: BIQ problems The binary integer quadratic programming problem: min x Qx, s.t. x {0, 1} n whose SDP relaxation is: ( ) Q 0 min, X 0 0 s.t. X ii X n,i = 0, i = 1,, n 1, X nn = 1, X 0, X 0, 2 Scaling: 3 (X ij X n,i ) = 0 best upper bound take from Biq mac library and %dgap := best upper bound dobj best upper bound 100%.

55 Numerical Results: BIQ problems Figure: Performance profiles of two variants of ADAL and SDPNAL gap cpu % of problems % of problems SDPAD ρ=1.6 SDPAD ρ=1 SDPNAL 0.1 SDPAD ρ=1.6 SDPAD ρ=1 SDPNAL not more than 2 x times worse than the best not more than 2 x times worse than the best (a) %dgap (b) cpu

Optimization for Learning and Big Data

Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for