Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones
|
|
- Delphia Washington
- 5 years ago
- Views:
Transcription
1 Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization, Information Processing, and Learning August 17 1 / 31
2 Overview conic programming problem (P): minimize c T x subject to Ax = b, x K K is a closed convex cone this talk: a first-order iteration parallel: linear speedup, async still working if problem is unsolvable 2 / 31
3 Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: 1 equivalent to standard ADMM, but the different form is important 3 / 31
4 Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: convergence guarantees and rates 1 equivalent to standard ADMM, but the different form is important 3 / 31
5 Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: convergence guarantees and rates coordinate friendly: break z into m blocks, cost(t i) 1 cost(t ) m 1 equivalent to standard ADMM, but the different form is important 3 / 31
6 Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: convergence guarantees and rates coordinate friendly: break z into m blocks, cost(t i) 1 cost(t ) m divergent nicely: (P) has no primal-dual sol pair z k z k+1 z k tells a whole lot 1 equivalent to standard ADMM, but the different form is important 3 / 31
7 Douglas-Rachford splitting (Lions-Mercier 79) proximal mapping of a closed function h prox γh (x) = arg min{h(z) + 1 z 2γ x 2 } z 4 / 31
8 Douglas-Rachford splitting (Lions-Mercier 79) proximal mapping of a closed function h prox γh (x) = arg min{h(z) + 1 z 2γ x 2 } z Douglas-Rachford Splitting (DRS) method solves minimize f(x) + g(x) by iterating z k+1 = T z k 4 / 31
9 Douglas-Rachford splitting (Lions-Mercier 79) proximal mapping of a closed function h prox γh (x) = arg min{h(z) + 1 z 2γ x 2 } z Douglas-Rachford Splitting (DRS) method solves minimize f(x) + g(x) by iterating z k+1 = T z k defined as: x k+ 1 2 = prox γg (z k ) x k+1 = prox γf (2z k x k+ 1 2 ) z k+1 = z k + (x k+1 x k+ 1 2 ) 4 / 31
10 Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex 5 / 31
11 Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex each iteration: project onto K, then project onto A = b 5 / 31
12 Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex each iteration: project onto K, then project onto A = b per-iteration cost: O(n 2 ) if x R n (by pre-factorizing AA T ) 5 / 31
13 Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex each iteration: project onto K, then project onto A = b per-iteration cost: O(n 2 ) if x R n (by pre-factorizing AA T ) prior work: ADMM for SDP (Wen-Goldfarb-Y. 09) 5 / 31
14 Other choices of splitting linearized ADMM and primal-dual splitting: avoid inverting full A variations of Frank-Wolfe: avoid expensive projections to SDP cone subgradient and bundle methods... 6 / 31
15 Coordinate friendly 2 (CF) (Block) coordinate update is fast only if the subproblems are simple definition: T : H H is CF if, for any z and i [m], z + := ( z 1,..., (T z) i,..., z m ) it holds that cost [ {z, M(z)} {z +, M(z + )} ] = O ( 1 cost[z T z]) m where M(z) is some quantity maintained in the memory 2 Peng-Wu-Xu-Yan-Y. AMSA 16 7 / 31
16 Composed operators 9 rules 3 for CF T 1 T 2 cover many examples general principles: T 1 T 2 inherits the (weaker) separability property if T 1 is CF and T 2 to be either cheap, easy-to-maintain, or directly CF, then T 1 T 2 is CF if T 1 is separable or cheap, T 1 T 2 is easier to CF 3 Peng-Wu-Xu-Yan-Y. AMSA 16 8 / 31
17 Lists of CF T 1 T 2 many convex image processing models portfolio optimization most sparse optimization problems all LPs, all SOCPs, and SDPs without large cones most ERM problems... 9 / 31
18 Example: DRS for SOCP second-order cone: Q n = {x R n : x 1 (x 2,..., x n) 2} 10 / 31
19 Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np 10 / 31
20 Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} CF is trivial if all cones are small T = linear proj Q n 1 Q np 10 / 31
21 Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np CF is trivial if all cones are small now, consider a big cone; property: proj Q n(x) = (αx 1, βx 2,..., βx n) where α, β depend on x 1 and γ := (x 2,..., x n) 2 10 / 31
22 Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np CF is trivial if all cones are small now, consider a big cone; property: proj Q n(x) = (αx 1, βx 2,..., βx n) where α, β depend on x 1 and γ := (x 2,..., x n) 2 given γ and updating x i, refreshing γ costs O(1) 10 / 31
23 Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np CF is trivial if all cones are small now, consider a big cone; property: proj Q n(x) = (αx 1, βx 2,..., βx n) where α, β depend on x 1 and γ := (x 2,..., x n) 2 given γ and updating x i, refreshing γ costs O(1) by maintaining γ, proj Q n is cheap, and T = linear cheap is CF 10 / 31
24 Fixed-point iterations full update z k+1 = T z k 11 / 31
25 Fixed-point iterations full update z k+1 = T z k (block) coordinate update (CU): choose i k [m], { z k+1 zi k + η((t z k ) i zi k ), if i = i k i = zi k, otherwise. 11 / 31
26 Fixed-point iterations full update z k+1 = T z k (block) coordinate update (CU): choose i k [m], { z k+1 zi k + η((t z k ) i zi k ), if i = i k i = zi k, otherwise. parallel CU: p agents choose I k [m] z k+1 i = { z k i + η((t z k ) i z k i ), z k i, if i I k otherwise. η depends on properties of T, i k, and I k 11 / 31
27 Sync-parallel versus async-parallel Agent 1 idle idle Agent 1 Agent 2 idle Agent 2 Agent 3 idle Agent 3 Synchronous (faster agents must wait) Asynchronous (all agents are non-stop) 12 / 31
28 ARock: async-parallel CU p agents every agent continuously does: pick i k [m], new notation: z k+1 i = { z k i + η((t z k d k ) i z k d k i ), if i = i k z k i, k increases after any agent completes an update z k d k = (z k d k,1 1,..., z k d k,m m ) may be stale allow inconsistent atomic read/write otherwise. 13 / 31
29 Various theories and meanings s: T is contractive in w,, partially/totally async 14 / 31
30 Various theories and meanings s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k 14 / 31
31 Various theories and meanings s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k state-of-the-art: allow essential cyclic i k, unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays can depend on i k 14 / 31
32 Various theories and meanings s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k state-of-the-art: allow essential cyclic i k, unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays can depend on i k ARock: T is non-expansive in 2 unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays independent of i k, provable running time async:sync= 1 : log(p) in a poisson system, prox is async Combettes-Eckstein: async projective splitting, free of parameter 14 / 31
33 Various theories and meanings s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k state-of-the-art: allow essential cyclic i k, unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays can depend on i k ARock: T is non-expansive in 2 unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays independent of i k, provable running time async:sync= 1 : log(p) in a poisson system, prox is async Combettes-Eckstein: async projective splitting, free of parameter in distributed comp, also refer to: random activations, may not delay 14 / 31
34 ARock convergence 4 notation: m = # blocks τ = max async delay uniform random selection (non-uniform is okay) Theorem (known max delay) Assume: T is nonexpansive and has a fixed point, and delays do not depend on 1 i k. Use step size η k [ɛ, ). Then, 2m 1/2 τ+1 xk x FixT almost surely. 4 Peng-Xu-Yan-Y. SISC / 31
35 ARock convergence 4 notation: m = # blocks τ = max async delay uniform random selection (non-uniform is okay) Theorem (known max delay) Assume: T is nonexpansive and has a fixed point, and delays do not depend on 1 i k. Use step size η k [ɛ, ). Then, 2m 1/2 τ+1 xk x FixT almost surely. consequence: no sync at least until using O( m) agents sharp when τ m 4 Peng-Xu-Yan-Y. SISC / 31
36 Optimization and fixed-point examples 16 / 31
37 Applications 17 / 31
38 More complicated applications LP, QP, SOCP, some SDP Image reconstruction minimization Nonnegative matrix factorization Decentralized optimization (no global coordination anymore!) 18 / 31
39 QCQP test: ARock versus SCS 5 5 O Donoghue, Chu, Parikh, Boyd / 31
40 Practice coding: OpenMP, C++11, MPI easier than you think performance: if done correctly, async speed sync speed much faster when systems get bigger and/or unbalanced 20 / 31
41 An ideal solver find a solution if there is one 21 / 31
42 An ideal solver find a solution if there is one when there is no solution, reliably report no solution 21 / 31
43 An ideal solver find a solution if there is one when there is no solution, reliably report no solution provide a certificate 21 / 31
44 An ideal solver find a solution if there is one when there is no solution, reliably report no solution provide a certificate suggest the cheapest fix 21 / 31
45 An ideal solver find a solution if there is one when there is no solution, reliably report no solution provide a certificate suggest the cheapest fix status: achievable for LP, not for SOCPs yet 21 / 31
46 Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L 22 / 31
47 Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L every problem falls in exactly one of the seven cases: 1) p finite: 1a) has PD sol pair, 1b) only P sol, 1c) no P sol 22 / 31
48 Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L every problem falls in exactly one of the seven cases: 1) p finite: 1a) has PD sol pair, 1b) only P sol, 1c) no P sol 2) p = : 2a) has improving dir, 2b) no improving dir 22 / 31
49 Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L every problem falls in exactly one of the seven cases: 1) p finite: 1a) has PD sol pair, 1b) only P sol, 1c) no P sol 2) p = : 2a) has improving dir, 2b) no improving dir 3) p = + : 3a) dist(l, K) > 0 has separating hyperplane 3b) dist(l, K) = 0 no strict separating hyperplane (nearly) pathological cases fail existing solvers 22 / 31
50 Example 1 3-variable problem: minimize x 1 subject to x 2 = 1, 2x 2x 3 x 2 1. since x 2, x 3 0, the problem is equivalent to minimize x 1 subject to x 2 = 1, (x 1, x 2, x 3) rotated second-order cone. 6 p =, by letting x3 and x 1 7 reason: any improving direction u has form (u1, 0, u 3 ), but by the cone constraint 2u 2 u 3 = 0 u 2 1, so u 1 = 0, which implies c T u 1 = 0 (not improving). 23 / 31
51 Example 1 3-variable problem: minimize x 1 subject to x 2 = 1, 2x 2x 3 x 2 1. since x 2, x 3 0, the problem is equivalent to minimize x 1 subject to x 2 = 1, (x 1, x 2, x 3) rotated second-order cone. classification: (2b) feasible unbounded 6 no improving direction 7 6 p =, by letting x3 and x 1 7 reason: any improving direction u has form (u1, 0, u 3 ), but by the cone constraint 2u 2 u 3 = 0 u 2 1, so u 1 = 0, which implies c T u 1 = 0 (not improving). 23 / 31
52 Example 1 3-variable problem: minimize x 1 subject to x 2 = 1, 2x 2x 3 x 2 1. since x 2, x 3 0, the problem is equivalent to minimize x 1 subject to x 2 = 1, (x 1, x 2, x 3) rotated second-order cone. classification: (2b) feasible unbounded 6 no improving direction 7 solver results: SDPT3: Failed, p no reported SeDuMi: Inaccurate/Solved, p = Mosek: Inaccurate/Unbounded, p = 6 p =, by letting x3 and x 1 7 reason: any improving direction u has form (u1, 0, u 3 ), but by the cone constraint 2u 2 u 3 = 0 u 2 1, so u 1 = 0, which implies c T u 1 = 0 (not improving). 23 / 31
53 Example 2 3-variable problem: minimize 0 subject to [ ] [ ] x =, x 3 x x2 2. }{{}}{{} x K x L 8 x L imply x = [1, α, α] T, α R, which always violates the second-order cone constraint. 9 dist(l, K) [1, α, α] [1, α, (α 2 + 1) 1/2 ] 2 0 as α. 24 / 31
54 Example 2 3-variable problem: minimize 0 subject to [ ] [ ] x =, x 3 x x2 2. }{{}}{{} x K x L classification: (3b) infeasible 8, L K = dist(l, K) = 0 9 no strict separating hyperplane 8 x L imply x = [1, α, α] T, α R, which always violates the second-order cone constraint. 9 dist(l, K) [1, α, α] [1, α, (α 2 + 1) 1/2 ] 2 0 as α. 24 / 31
55 Example 2 3-variable problem: minimize 0 subject to [ ] [ ] x =, x 3 x x2 2. }{{}}{{} x K x L classification: (3b) infeasible 8, L K = dist(l, K) = 0 9 no strict separating hyperplane solver results: SDPT3: Infeasible, p = SeDuMi: Solved, p = 0 Mosek: Failed, p not reported 8 x L imply x = [1, α, α] T, α R, which always violates the second-order cone constraint. 9 dist(l, K) [1, α, α] [1, α, (α 2 + 1) 1/2 ] 2 0 as α. 24 / 31
56 Then, what happens to DRS? In 1970s, Paty, Rockafellar assume T is firmly nonexpansive run z k+1 = T (z k ) converges if has PD sol; otherwise, z k In 1979, Bailion-Bruck-Reich nailed z k z k+1 v = Proj ran(i T ) (0) 25 / 31
57 Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) 26 / 31
58 Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) 26 / 31
59 Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified 26 / 31
60 Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified compute an improving direction if one exists 26 / 31
61 Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified compute an improving direction if one exists compute a separating hyperplane if one exists 26 / 31
62 Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified compute an improving direction if one exists compute a separating hyperplane if one exists for all infeasible problems, minimally change to restore strong feasibility 26 / 31
63 Decision flow 27 / 31
64 Infeasible SDP test set (Liu-Pataki 17) m = 10 m = 20 Clean Messy Clean Messy SeDuMi SDPT Mosek PP 10 +SeDuMi percentage of success detection on clean and messy examples in Liu-Pataki PreProcessing by Permenter-Parilo / 31
65 Identify weakly infeasible SDPs m = 10 m = 20 Clean Messy Clean Messy Proposed (stopping: z 1e ) our percentage is way much better! 29 / 31
66 Identify strongly infeasible SDPs m = 10 m = 20 Clean Messy Clean Messy Proposed (stopping: z 5e4 z 5e ) our percentage is way much better! 30 / 31
67 Thank you! References: UCLA CAM reports 15-37: ARock 16-13: Coordinate friendly, SOCP applications 17-30: Unbounded and realistic-delay async BCD 17-31: DRS for unsolvable conic programs arxiv: : provably async-to-sync speedup 31 / 31
ARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationA New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs
A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs Yanli Liu, Ernest K. Ryu, and Wotao Yin May 23, 2017 Abstract In this paper, we present
More informationA New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs
A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs Yanli Liu Ernest K. Ryu Wotao Yin October 14, 2017 Abstract In this paper, we present
More informationAsynchronous Parallel Computing in Signal Processing and Machine Learning
Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationDoes Alternating Direction Method of Multipliers Converge for Nonconvex Problems?
Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates
ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T
More informationOperator Splitting for Parallel and Distributed Optimization
Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationARock: an Algorithmic Framework for Async-Parallel Coordinate Updates
ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin July 7, 215 The problem of finding a fixed point to a nonexpansive operator is an abstraction
More informationApproximate Farkas Lemmas in Convex Optimization
Approximate Farkas Lemmas in Convex Optimization Imre McMaster University Advanced Optimization Lab AdvOL Graduate Student Seminar October 25, 2004 1 Exact Farkas Lemma Motivation 2 3 Future plans The
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationA Primal-dual Three-operator Splitting Scheme
Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm
More informationSplitting methods for decomposing separable convex programs
Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques
More informationBeyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory
Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationCoordinate Update Algorithm Short Course The Package TMAC
Coordinate Update Algorithm Short Course The Package TMAC Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 16 TMAC: A Toolbox of Async-Parallel, Coordinate, Splitting, and Stochastic Methods C++11 multi-threading
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationFast ADMM for Sum of Squares Programs Using Partial Orthogonality
Fast ADMM for Sum of Squares Programs Using Partial Orthogonality Antonis Papachristodoulou Department of Engineering Science University of Oxford www.eng.ox.ac.uk/control/sysos antonis@eng.ox.ac.uk with
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationPrimal-dual fixed point algorithms for separable minimization problems and their applications in imaging
1/38 Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging Xiaoqun Zhang Department of Mathematics and Institute of Natural Sciences Shanghai Jiao Tong
More informationDecentralized Consensus Optimization with Asynchrony and Delay
Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University
More informationSplitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches
Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationDouglas-Rachford Splitting for Pathological Convex Optimization
Douglas-Rachford Splitting for Pathological Convex Optimization Ernest K. Ryu Yanli Liu Wotao Yin January 9, 208 Abstract Despite the vast literature on DRS, there has been very little work analyzing their
More informationPrimal-dual algorithms for the sum of two and three functions 1
Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationLecture 17: Primal-dual interior-point methods part II
10-725/36-725: Convex Optimization Spring 2015 Lecture 17: Primal-dual interior-point methods part II Lecturer: Javier Pena Scribes: Pinchao Zhang, Wei Ma Note: LaTeX template courtesy of UC Berkeley EECS
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationLecture 24 November 27
EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve
More informationNonconvex ADMM: Convergence and Applications
Nonconvex ADMM: Convergence and Applications Instructor: Wotao Yin (UCLA Math) Based on CAM 15-62 with Yu Wang and Jinshan Zeng Summer 2016 1 / 54 1. Alternating Direction Method of Multipliers (ADMM):
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationarxiv: v1 [math.oc] 23 May 2017
A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method
More informationOn the order of the operators in the Douglas Rachford algorithm
On the order of the operators in the Douglas Rachford algorithm Heinz H. Bauschke and Walaa M. Moursi June 11, 2015 Abstract The Douglas Rachford algorithm is a popular method for finding zeros of sums
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING
ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely
More informationAlternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization
Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationA SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS
A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationTHE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS
THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS JONATHAN M. BORWEIN AND MATTHEW K. TAM Abstract. We analyse the behaviour of the newly introduced cyclic Douglas Rachford algorithm
More informationPrimal-dual coordinate descent
Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationInterior-Point and Augmented Lagrangian Algorithms for Optimization and Control
Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Constrained Optimization May 2014 1 / 46 In This
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationEE364b Convex Optimization II May 30 June 2, Final exam
EE364b Convex Optimization II May 30 June 2, 2014 Prof. S. Boyd Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationResearch overview. Seminar September 4, Lehigh University Department of Industrial & Systems Engineering. Research overview.
Research overview Lehigh University Department of Industrial & Systems Engineering COR@L Seminar September 4, 2008 1 Duality without regularity condition Duality in non-exact arithmetic 2 interior point
More informationPreconditioning via Diagonal Scaling
Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.
More informationSparsity and Compressed Sensing
Sparsity and Compressed Sensing Jalal Fadili Normandie Université-ENSICAEN, GREYC Mathematical coffees 2017 Recap: linear inverse problems Dictionary Sensing Sensing Sensing = y m 1 H m n y y 2 R m H A
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationIII. Applications in convex optimization
III. Applications in convex optimization nonsymmetric interior-point methods partial separability and decomposition partial separability first order methods interior-point methods Conic linear optimization
More informationDouglas-Rachford Splitting: Complexity Estimates and Accelerated Variants
53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationHYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING
SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationProjection methods to solve SDP
Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method
More informationFacial Reduction and Geometry on Conic Programming
Facial Reduction and Geometry on Conic Programming Masakazu Muramatsu UEC Bruno F. Lourenço, and Takashi Tsuchiya Seikei University GRIPS Based on the papers of LMT: 1. A structural geometrical analysis
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationLecture 5. Theorems of Alternatives and Self-Dual Embedding
IE 8534 1 Lecture 5. Theorems of Alternatives and Self-Dual Embedding IE 8534 2 A system of linear equations may not have a solution. It is well known that either Ax = c has a solution, or A T y = 0, c
More informationInfeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization
Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization Frank E. Curtis, Lehigh University involving joint work with James V. Burke, University of Washington Daniel
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationPrimal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization
Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization Robert M. Freund M.I.T. June, 2010 from papers in SIOPT, Mathematics
More informationOn positive duality gaps in semidefinite programming
On positive duality gaps in semidefinite programming Gábor Pataki arxiv:82.796v [math.oc] 3 Dec 28 January, 29 Abstract We present a novel analysis of semidefinite programs (SDPs) with positive duality
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationSolving conic optimization problems via self-dual embedding and facial reduction: a unified approach
Solving conic optimization problems via self-dual embedding and facial reduction: a unified approach Frank Permenter Henrik A. Friberg Erling D. Andersen August 18, 216 Abstract We establish connections
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More informationThe Alternating Direction Method of Multipliers
The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationAsynchronous Non-Convex Optimization For Separable Problem
Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent
More informationInfeasibility detection in the alternating direction method of multipliers for convex optimization
Infeasibility detection in the alternating direction method of multipliers for convex optimization Goran Banjac, Paul Goulart, Bartolomeo Stellato, and Stephen Boyd June 20, 208 Abstract The alternating
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationarxiv: v2 [math.oc] 2 Mar 2017
CYCLIC COORDINATE UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS YAT TIN CHOW, TIANYU WU, AND WOTAO YIN arxiv:6.0456v [math.oc] Mar 07 Abstract. Many problems reduce to the fixed-point
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationSparse Optimization Lecture: Dual Certificate in l 1 Minimization
Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization
More informationarxiv: v4 [math.oc] 29 Jan 2018
Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationTutorial on Convex Optimization for Engineers Part II
Tutorial on Convex Optimization for Engineers Part II M.Sc. Jens Steinwandt Communications Research Laboratory Ilmenau University of Technology PO Box 100565 D-98684 Ilmenau, Germany jens.steinwandt@tu-ilmenau.de
More information1 Overview. 2 Extreme Points. AM 221: Advanced Optimization Spring 2016
AM 22: Advanced Optimization Spring 206 Prof. Yaron Singer Lecture 7 February 7th Overview In the previous lectures we saw applications of duality to game theory and later to learning theory. In this lecture
More informationProximal splitting methods on convex problems with a quadratic term: Relax!
Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationELEMENTARY REFORMULATION AND SUCCINCT CERTIFICATES IN CONIC LINEAR PROGRAMMING. Minghui Liu
ELEMENTARY REFORMULATION AND SUCCINCT CERTIFICATES IN CONIC LINEAR PROGRAMMING Minghui Liu A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More information