Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones

Similar documents
ARock: an algorithmic framework for asynchronous parallel coordinate updates

A New Use of Douglas-Rachford Splitting and ADMM for Classifying Infeasible, Unbounded, and Pathological Conic Programs

A New Use of Douglas-Rachford Splitting and ADMM for Identifying Infeasible, Unbounded, and Pathological Conic Programs

Asynchronous Parallel Computing in Signal Processing and Machine Learning

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Convergence of Fixed-Point Iterations

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Coordinate Update Algorithm Short Course Operator Splitting

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

Operator Splitting for Parallel and Distributed Optimization

Tight Rates and Equivalence Results of Operator Splitting Schemes

Math 273a: Optimization Subgradient Methods

Sparse Optimization Lecture: Basic Sparse Optimization Models

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates

Approximate Farkas Lemmas in Convex Optimization

Convex Optimization and l 1 -minimization

A Primal-dual Three-operator Splitting Scheme

Splitting methods for decomposing separable convex programs

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Proximal Methods for Optimization with Spasity-inducing Norms

Coordinate Update Algorithm Short Course The Package TMAC

Parallel Coordinate Optimization

Dual and primal-dual methods

Fast ADMM for Sum of Squares Programs Using Partial Orthogonality

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Sparse Optimization Lecture: Dual Methods, Part I

Primal-dual fixed point algorithms for separable minimization problems and their applications in imaging

Decentralized Consensus Optimization with Asynchrony and Delay

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Math 273a: Optimization Subgradients of convex functions

Douglas-Rachford Splitting for Pathological Convex Optimization

Primal-dual algorithms for the sum of two and three functions 1

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Lecture 17: Primal-dual interior-point methods part II

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Distributed Optimization via Alternating Direction Method of Multipliers

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Lecture 24 November 27

Nonconvex ADMM: Convergence and Applications

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

arxiv: v1 [math.oc] 23 May 2017

On the order of the operators in the Douglas Rachford algorithm

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

A Unified Approach to Proximal Algorithms using Bregman Distance

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

Accelerated primal-dual methods for linearly constrained convex problems

THE CYCLIC DOUGLAS RACHFORD METHOD FOR INCONSISTENT FEASIBILITY PROBLEMS

Primal-dual coordinate descent

Conditional Gradient (Frank-Wolfe) Method

Interior-Point and Augmented Lagrangian Algorithms for Optimization and Control

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

EE364b Convex Optimization II May 30 June 2, Final exam

Optimization methods

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Research overview. Seminar September 4, Lehigh University Department of Industrial & Systems Engineering. Research overview.

Preconditioning via Diagonal Scaling

Sparsity and Compressed Sensing

Lecture 6: Conic Optimization September 8

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Selected Topics in Optimization. Some slides borrowed from

III. Applications in convex optimization

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

Adaptive Primal Dual Optimization for Image Processing and Learning

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

Lecture 5: September 15

Projection methods to solve SDP

Facial Reduction and Geometry on Conic Programming

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Descent and Ascent Methods

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

Infeasibility Detection and an Inexact Active-Set Method for Large-Scale Nonlinear Optimization

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Primal-Dual Geometry of Level Sets and their Explanatory Value of the Practical Performance of Interior-Point Methods for Conic Optimization

On positive duality gaps in semidefinite programming

Optimization methods

Solving conic optimization problems via self-dual embedding and facial reduction: a unified approach

More First-Order Optimization Algorithms

The Alternating Direction Method of Multipliers

STA141C: Big Data & High Performance Statistical Computing

Asynchronous Non-Convex Optimization For Separable Problem

Infeasibility detection in the alternating direction method of multipliers for convex optimization

Lecture 5: September 12

arxiv: v2 [math.oc] 2 Mar 2017

Convex Optimization Lecture 16

12. Interior-point methods

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

arxiv: v4 [math.oc] 29 Jan 2018

ECS289: Scalable Machine Learning

Tutorial on Convex Optimization for Engineers Part II

1 Overview. 2 Extreme Points. AM 221: Advanced Optimization Spring 2016

Proximal splitting methods on convex problems with a quadratic term: Relax!

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

ELEMENTARY REFORMULATION AND SUCCINCT CERTIFICATES IN CONIC LINEAR PROGRAMMING. Minghui Liu

Bias-free Sparse Regression with Guaranteed Consistency

Transcription:

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization, Information Processing, and Learning August 17 1 / 31

Overview conic programming problem (P): minimize c T x subject to Ax = b, x K K is a closed convex cone this talk: a first-order iteration parallel: linear speedup, async still working if problem is unsolvable 2 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: 1 equivalent to standard ADMM, but the different form is important 3 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: convergence guarantees and rates 1 equivalent to standard ADMM, but the different form is important 3 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: convergence guarantees and rates coordinate friendly: break z into m blocks, cost(t i) 1 cost(t ) m 1 equivalent to standard ADMM, but the different form is important 3 / 31

Approach overview Douglas-Rachfordf 1 fixed point iteration z k+1 = T z k T depends on A, b, c and has nice properties: convergence guarantees and rates coordinate friendly: break z into m blocks, cost(t i) 1 cost(t ) m divergent nicely: (P) has no primal-dual sol pair z k z k+1 z k tells a whole lot 1 equivalent to standard ADMM, but the different form is important 3 / 31

Douglas-Rachford splitting (Lions-Mercier 79) proximal mapping of a closed function h prox γh (x) = arg min{h(z) + 1 z 2γ x 2 } z 4 / 31

Douglas-Rachford splitting (Lions-Mercier 79) proximal mapping of a closed function h prox γh (x) = arg min{h(z) + 1 z 2γ x 2 } z Douglas-Rachford Splitting (DRS) method solves minimize f(x) + g(x) by iterating z k+1 = T z k 4 / 31

Douglas-Rachford splitting (Lions-Mercier 79) proximal mapping of a closed function h prox γh (x) = arg min{h(z) + 1 z 2γ x 2 } z Douglas-Rachford Splitting (DRS) method solves minimize f(x) + g(x) by iterating z k+1 = T z k defined as: x k+ 1 2 = prox γg (z k ) x k+1 = prox γf (2z k x k+ 1 2 ) z k+1 = z k + (x k+1 x k+ 1 2 ) 4 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex 5 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex each iteration: project onto K, then project onto A = b 5 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex each iteration: project onto K, then project onto A = b per-iteration cost: O(n 2 ) if x R n (by pre-factorizing AA T ) 5 / 31

Apply DRS to conic programming minimize c T x subject to Ax = b, x K minimize ( c T x + δ A =b (x) ) + δ K(x) }{{}}{{} f(x) g(x) cone K is nonempty closed convex each iteration: project onto K, then project onto A = b per-iteration cost: O(n 2 ) if x R n (by pre-factorizing AA T ) prior work: ADMM for SDP (Wen-Goldfarb-Y. 09) 5 / 31

Other choices of splitting linearized ADMM and primal-dual splitting: avoid inverting full A variations of Frank-Wolfe: avoid expensive projections to SDP cone subgradient and bundle methods... 6 / 31

Coordinate friendly 2 (CF) (Block) coordinate update is fast only if the subproblems are simple definition: T : H H is CF if, for any z and i [m], z + := ( z 1,..., (T z) i,..., z m ) it holds that cost [ {z, M(z)} {z +, M(z + )} ] = O ( 1 cost[z T z]) m where M(z) is some quantity maintained in the memory 2 Peng-Wu-Xu-Yan-Y. AMSA 16 7 / 31

Composed operators 9 rules 3 for CF T 1 T 2 cover many examples general principles: T 1 T 2 inherits the (weaker) separability property if T 1 is CF and T 2 to be either cheap, easy-to-maintain, or directly CF, then T 1 T 2 is CF if T 1 is separable or cheap, T 1 T 2 is easier to CF 3 Peng-Wu-Xu-Yan-Y. AMSA 16 8 / 31

Lists of CF T 1 T 2 many convex image processing models portfolio optimization most sparse optimization problems all LPs, all SOCPs, and SDPs without large cones most ERM problems... 9 / 31

Example: DRS for SOCP second-order cone: Q n = {x R n : x 1 (x 2,..., x n) 2} 10 / 31

Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np 10 / 31

Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} CF is trivial if all cones are small T = linear proj Q n 1 Q np 10 / 31

Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np CF is trivial if all cones are small now, consider a big cone; property: proj Q n(x) = (αx 1, βx 2,..., βx n) where α, β depend on x 1 and γ := (x 2,..., x n) 2 10 / 31

Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np CF is trivial if all cones are small now, consider a big cone; property: proj Q n(x) = (αx 1, βx 2,..., βx n) where α, β depend on x 1 and γ := (x 2,..., x n) 2 given γ and updating x i, refreshing γ costs O(1) 10 / 31

Example: DRS for SOCP second-order cone: DRS operator has the form Q n = {x R n : x 1 (x 2,..., x n) 2} T = linear proj Q n 1 Q np CF is trivial if all cones are small now, consider a big cone; property: proj Q n(x) = (αx 1, βx 2,..., βx n) where α, β depend on x 1 and γ := (x 2,..., x n) 2 given γ and updating x i, refreshing γ costs O(1) by maintaining γ, proj Q n is cheap, and T = linear cheap is CF 10 / 31

Fixed-point iterations full update z k+1 = T z k 11 / 31

Fixed-point iterations full update z k+1 = T z k (block) coordinate update (CU): choose i k [m], { z k+1 zi k + η((t z k ) i zi k ), if i = i k i = zi k, otherwise. 11 / 31

Fixed-point iterations full update z k+1 = T z k (block) coordinate update (CU): choose i k [m], { z k+1 zi k + η((t z k ) i zi k ), if i = i k i = zi k, otherwise. parallel CU: p agents choose I k [m] z k+1 i = { z k i + η((t z k ) i z k i ), z k i, if i I k otherwise. η depends on properties of T, i k, and I k 11 / 31

Sync-parallel versus async-parallel Agent 1 idle idle Agent 1 Agent 2 idle Agent 2 Agent 3 idle Agent 3 Synchronous (faster agents must wait) Asynchronous (all agents are non-stop) 12 / 31

ARock: async-parallel CU p agents every agent continuously does: pick i k [m], new notation: z k+1 i = { z k i + η((t z k d k ) i z k d k i ), if i = i k z k i, k increases after any agent completes an update z k d k = (z k d k,1 1,..., z k d k,m m ) may be stale allow inconsistent atomic read/write otherwise. 13 / 31

Various theories and meanings 1969 90s: T is contractive in w,, partially/totally async 14 / 31

Various theories and meanings 1969 90s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k 14 / 31

Various theories and meanings 1969 90s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k state-of-the-art: allow essential cyclic i k, unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays can depend on i k 14 / 31

Various theories and meanings 1969 90s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k state-of-the-art: allow essential cyclic i k, unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays can depend on i k ARock: T is non-expansive in 2 unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays independent of i k, provable running time async:sync= 1 : log(p) in a poisson system, prox is async Combettes-Eckstein: async projective splitting, free of parameter 14 / 31

Various theories and meanings 1969 90s: T is contractive in w,, partially/totally async recent in ML community: async SG and async BCD early works: random i k, bounded delays, Ef has sufficient descent, treat delays as noise, delays independent of i k state-of-the-art: allow essential cyclic i k, unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays can depend on i k ARock: T is non-expansive in 2 unbounded noise (t 4 or faster decay), Lyapunov analysis, delays as overdue progress, delays independent of i k, provable running time async:sync= 1 : log(p) in a poisson system, prox is async Combettes-Eckstein: async projective splitting, free of parameter in distributed comp, also refer to: random activations, may not delay 14 / 31

ARock convergence 4 notation: m = # blocks τ = max async delay uniform random selection (non-uniform is okay) Theorem (known max delay) Assume: T is nonexpansive and has a fixed point, and delays do not depend on 1 i k. Use step size η k [ɛ, ). Then, 2m 1/2 τ+1 xk x FixT almost surely. 4 Peng-Xu-Yan-Y. SISC 16 15 / 31

ARock convergence 4 notation: m = # blocks τ = max async delay uniform random selection (non-uniform is okay) Theorem (known max delay) Assume: T is nonexpansive and has a fixed point, and delays do not depend on 1 i k. Use step size η k [ɛ, ). Then, 2m 1/2 τ+1 xk x FixT almost surely. consequence: no sync at least until using O( m) agents sharp when τ m 4 Peng-Xu-Yan-Y. SISC 16 15 / 31

Optimization and fixed-point examples 16 / 31

Applications 17 / 31

More complicated applications LP, QP, SOCP, some SDP Image reconstruction minimization Nonnegative matrix factorization Decentralized optimization (no global coordination anymore!) 18 / 31

QCQP test: ARock versus SCS 5 5 O Donoghue, Chu, Parikh, Boyd 15 19 / 31

Practice coding: OpenMP, C++11, MPI easier than you think performance: if done correctly, async speed sync speed much faster when systems get bigger and/or unbalanced 20 / 31

An ideal solver find a solution if there is one 21 / 31

An ideal solver find a solution if there is one when there is no solution, reliably report no solution 21 / 31

An ideal solver find a solution if there is one when there is no solution, reliably report no solution provide a certificate 21 / 31

An ideal solver find a solution if there is one when there is no solution, reliably report no solution provide a certificate suggest the cheapest fix 21 / 31

An ideal solver find a solution if there is one when there is no solution, reliably report no solution provide a certificate suggest the cheapest fix status: achievable for LP, not for SOCPs yet 21 / 31

Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L 22 / 31

Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L every problem falls in exactly one of the seven cases: 1) p finite: 1a) has PD sol pair, 1b) only P sol, 1c) no P sol 22 / 31

Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L every problem falls in exactly one of the seven cases: 1) p finite: 1a) has PD sol pair, 1b) only P sol, 1c) no P sol 2) p = : 2a) has improving dir, 2b) no improving dir 22 / 31

Conic programming p = min c T x K is a closed convex cone subject to Ax }{{ = } b, x K x L every problem falls in exactly one of the seven cases: 1) p finite: 1a) has PD sol pair, 1b) only P sol, 1c) no P sol 2) p = : 2a) has improving dir, 2b) no improving dir 3) p = + : 3a) dist(l, K) > 0 has separating hyperplane 3b) dist(l, K) = 0 no strict separating hyperplane (nearly) pathological cases fail existing solvers 22 / 31

Example 1 3-variable problem: minimize x 1 subject to x 2 = 1, 2x 2x 3 x 2 1. since x 2, x 3 0, the problem is equivalent to minimize x 1 subject to x 2 = 1, (x 1, x 2, x 3) rotated second-order cone. 6 p =, by letting x3 and x 1 7 reason: any improving direction u has form (u1, 0, u 3 ), but by the cone constraint 2u 2 u 3 = 0 u 2 1, so u 1 = 0, which implies c T u 1 = 0 (not improving). 23 / 31

Example 1 3-variable problem: minimize x 1 subject to x 2 = 1, 2x 2x 3 x 2 1. since x 2, x 3 0, the problem is equivalent to minimize x 1 subject to x 2 = 1, (x 1, x 2, x 3) rotated second-order cone. classification: (2b) feasible unbounded 6 no improving direction 7 6 p =, by letting x3 and x 1 7 reason: any improving direction u has form (u1, 0, u 3 ), but by the cone constraint 2u 2 u 3 = 0 u 2 1, so u 1 = 0, which implies c T u 1 = 0 (not improving). 23 / 31

Example 1 3-variable problem: minimize x 1 subject to x 2 = 1, 2x 2x 3 x 2 1. since x 2, x 3 0, the problem is equivalent to minimize x 1 subject to x 2 = 1, (x 1, x 2, x 3) rotated second-order cone. classification: (2b) feasible unbounded 6 no improving direction 7 solver results: SDPT3: Failed, p no reported SeDuMi: Inaccurate/Solved, p = 175514 Mosek: Inaccurate/Unbounded, p = 6 p =, by letting x3 and x 1 7 reason: any improving direction u has form (u1, 0, u 3 ), but by the cone constraint 2u 2 u 3 = 0 u 2 1, so u 1 = 0, which implies c T u 1 = 0 (not improving). 23 / 31

Example 2 3-variable problem: minimize 0 subject to [ ] [ ] 0 1 1 0 x =, x 3 x 2 1 1 0 0 1 + x2 2. }{{}}{{} x K x L 8 x L imply x = [1, α, α] T, α R, which always violates the second-order cone constraint. 9 dist(l, K) [1, α, α] [1, α, (α 2 + 1) 1/2 ] 2 0 as α. 24 / 31

Example 2 3-variable problem: minimize 0 subject to [ ] [ ] 0 1 1 0 x =, x 3 x 2 1 1 0 0 1 + x2 2. }{{}}{{} x K x L classification: (3b) infeasible 8, L K = dist(l, K) = 0 9 no strict separating hyperplane 8 x L imply x = [1, α, α] T, α R, which always violates the second-order cone constraint. 9 dist(l, K) [1, α, α] [1, α, (α 2 + 1) 1/2 ] 2 0 as α. 24 / 31

Example 2 3-variable problem: minimize 0 subject to [ ] [ ] 0 1 1 0 x =, x 3 x 2 1 1 0 0 1 + x2 2. }{{}}{{} x K x L classification: (3b) infeasible 8, L K = dist(l, K) = 0 9 no strict separating hyperplane solver results: SDPT3: Infeasible, p = SeDuMi: Solved, p = 0 Mosek: Failed, p not reported 8 x L imply x = [1, α, α] T, α R, which always violates the second-order cone constraint. 9 dist(l, K) [1, α, α] [1, α, (α 2 + 1) 1/2 ] 2 0 as α. 24 / 31

Then, what happens to DRS? In 1970s, Paty, Rockafellar assume T is firmly nonexpansive run z k+1 = T (z k ) converges if has PD sol; otherwise, z k In 1979, Bailion-Bruck-Reich nailed z k z k+1 v = Proj ran(i T ) (0) 25 / 31

Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) 26 / 31

Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) 26 / 31

Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified 26 / 31

Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified compute an improving direction if one exists 26 / 31

Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified compute an improving direction if one exists compute a separating hyperplane if one exists 26 / 31

Our analysis results (Liu-Ryu-Y. 17) rate of convergence: z k z k+1 v + ɛ + O( 1 k+1 ) deciphered Proj ran(i T ) a workflow running three similar DRS, differ by only constants: 1) DRS 2) feasibility DRS with c = 0 3) boundedness DRS with b = 0 most pathological cases are identified compute an improving direction if one exists compute a separating hyperplane if one exists for all infeasible problems, minimally change to restore strong feasibility 26 / 31

Decision flow 27 / 31

Infeasible SDP test set (Liu-Pataki 17) m = 10 m = 20 Clean Messy Clean Messy SeDuMi 0 0 1 0 SDPT3 0 0 0 0 Mosek 0 0 11 0 PP 10 +SeDuMi 100 0 100 0 percentage of success detection on clean and messy examples in Liu-Pataki 17 10 PreProcessing by Permenter-Parilo 14 28 / 31

Identify weakly infeasible SDPs m = 10 m = 20 Clean Messy Clean Messy Proposed 100 21 100 99 (stopping: z 1e7 2 800) our percentage is way much better! 29 / 31

Identify strongly infeasible SDPs m = 10 m = 20 Clean Messy Clean Messy Proposed 100 100 100 100 (stopping: z 5e4 z 5e4+1 2 10 3 ) our percentage is way much better! 30 / 31

Thank you! References: UCLA CAM reports 15-37: ARock 16-13: Coordinate friendly, SOCP applications 17-30: Unbounded and realistic-delay async BCD 17-31: DRS for unsolvable conic programs arxiv:1708.05136: provably async-to-sync speedup 31 / 31