ARock: an algorithmic framework for asynchronous parallel coordinate updates

Size: px
Start display at page:

Download "ARock: an algorithmic framework for asynchronous parallel coordinate updates"

Transcription

1 ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report ShanghaiTech SSDS 15 June 25, / 40

2 Background 2 / 40

3 Serial computing Problem CPU t N t 2 t 1 3 / 40

4 Parallel computing Problem CPU CPU CPU tn t2 t1 4 / 40

5 Sync-parallel versus async-parallel Agent 1 idle idle Agent 1 Agent 2 idle Agent 2 Agent 3 idle Agent 3 Synchronous (new iteration starts after the last agent finishes) Asynchronous (all agents are non-stop) 5 / 40

6 ARock: an algorithmic framework of async-parallel coordinate updates 6 / 40

7 The fixed-point problem Hilbert space H. Operator T : H H. find x H such that x = T x equivalent problem: let S := I T ; find x H such that 0 = Sx abstracts many problems: convex optimization; statistical regression; optimal control; linear and nonlinear systems of equations; ordinary and partial differential equations. 7 / 40

8 Krasnosel skĭi Mann (KM) iteration require: nonexpansive operator T, that is T x T y x y, x, y H iteration: equivalent form with S = I T : x k+1 = (1 λ)x k + λt x k x k+1 = x k λsx k special cases: gradient descent, proximal-point algorithm, many operator-splitting algorithms such as Douglas-Rachford and ADMM 8 / 40

9 Parallel coordinate update suppose H = H 1 H m totally m agents (workstations, CPUs, cores) agents i update x i H i in parallel: agent 1: agent 2:. agent m: x k+1 1 x k x k+1 m = x k 1 x k 2. x k m (Sx k ) 1 η (Sx k ) 2 k. (Sx k ) m require: each (Sx) i is much easier to compute than Sx (otherwise, parallel computing does not save time) 9 / 40

10 suppose H = H 1 H m ARock: Async-parallel coordinate KM totally p agents; each agent randomly picks i {1,..., m} and updates x i: x k+1 1 x k x k+1 i = x k i η k (Sˆx k ) i x k+1 m ˆx k : the result of reading x from global memory x k : the status of x in global memory right before it is updated Agent 1 Agent 2 Agent 3 x k m 0 10 / 40

11 Random coordinate selection each coordinate x i is selected with probability p i, where min i p i > 0 cost of randomness: agents cannot cache data, global memory required (with exceptions) benefits of randomness: enforce the update frequencies p i (even if the agents have different speeds and the coordinates have different complexities); automatic load balance breaks a pattern, often faster than the fixed cyclic order 11 / 40

12 Applications and numerical results 12 / 40

13 Linear equations (asynchronous Jacobi) require: invertible square matrix A with nonzero diagonal entries let D be the diagonal part of A; then Ax = b (I D 1 A)x + D 1 b = x. }{{} T x T is nonexpansive if I D 1 A 2 1, i.e., A is diagonal dominating x k+1 = T x k recovers the Jacobi algorithm 13 / 40

14 Algorithm 1: ARock for linear equations Input : shared variables x R n, K > 0; set global iteration counter k = 0; while k < K, every agent asynchronously and continuously do sample i {1,..., m} uniformly at random; add η k a ii ( j aij ˆxk j b i) to shared variable x i; update the global counter k k + 1; 14 / 40

15 Numerical comparison problem: solve Ax = b, where b R n and A R n n are taken from the two datasets Name Type Size (n) Bandwidth (w) Dataset I sparse 1,000,000 5 Dataset II dense 5,000 N/A we compare: ARock (async) Jacobi (sync) running on 1, 2, 4,..., 32 cores on a workstation 15 / 40

16 Residual-vs-time plot size: 1 million, bandwidth = 5 size: 5,000, dense matrix 10 0 residual async 1 async 2 async 4 async 8 async 16 async 32 sync 1 sync 2 sync 4 sync 8 sync 16 sync 32 residual async 1 async 2 async 4 async 8 async 16 async 32 sync 1 sync 2 sync 4 sync 8 sync 16 sync time (s) time (s) sparse A and 100 epochs dense A and 50 epochs ARock (async) and Jacobi (sync) both have almost linear speedup ARock (async) is much faster due to asynchronicity and its Gauss-Seidel kind efficiency (next slide) 16 / 40

17 Residual-vs-epoch plot 10 5 size: 1 million, bandwidth = size: 1 million, bandwidth = 5 residual async 1 core async 2 cores async 4 cores async 8 cores async 16 cores async 32 cores sync Jacobi Gauss Seidel residual async 1 core async 2 cores async 4 cores async 8 cores async 16 cores async 32 cores sync Jacobi Gauss Seidel number of epochs sparse A number of epochs dense A ARock matches Gauss-Seidel s epoch efficiency 17 / 40

18 Minimizing smooth functions require: convex and Lipschitz differentiable function f if f is L-Lipschitz, then where T is nonexpansive minimize x f(x) x = ( I 2 L f) x. }{{} T ARock will be very faster when xi f(x) is easy to compute 18 / 40

19 Minimizing composite functions require: convex smooth function g and convex (possibly nonsmooth) function f proximal map: prox γf (y) = arg min f(x) + 1 2γ x y 2. minimize x ARock will be very fast given easy-to-compute xi g(x) f(x) + g(x) x = prox γf (I γ g) x. }{{} T either separable or easy-to-compute f (e.g., l 1 and l 1,2) 19 / 40

20 Example: sparse logistic regression n features, N labeled samples each sample x i R n has its label b i {1, 1} l 1 regularized logistic regression: minimize x R n λ x N N log ( 1 + exp( b i a T i x) ), (1) i=1 compare sync-parallel and ARock (async-parallel) on two datasets: Name N (#samples) n (# features) # nonzeros in {a 1,..., a N } rcv1 20,242 47,236 1,498,952 news20 19,996 1,355,191 9,097, / 40

21 Speedup tests rcv1 news20 #cores Time (s) Speedup Time (s) Speedup async sync async sync async sync async sync reasons of sync s poor speedup: load imbalance (next slide): as more cores are used in parallel, it is more likely that one of them handles a coordinate corresponding to a large number of nonzeros in the samples before each new iteration, all cores wait for the last core to finish ARock (asyn) has nearly linear speedup, not affected by load imbalance 21 / 40

22 Sparsity pattern and load imbalance 10 5 rcv news # nonzeros # nonzeros coordinate (each has ~50 features) coordinate (each has ~50 features) each dot gives the # nonzeros in each coordinate (about 50 features) left: range of # nonzero: right: range of # nonzero: larger ratio worse load balance 22 / 40

23 More applications 23 / 40

24 Minimizing composite functions require: both f and g are convex (possibly nonsmooth) functions reflective proximal map: refl γf := 2prox γf I the maps refl γf, refl γg and thus refl γf refl γg are nonexpansive minimize f(x) + g(x) z = refl γf refl γg (z), x = prox }{{} γg (z). T PRS T PRS is known as the Peaceman-Rachford splitting operator also works with the Douglas-Rachford splitting operator: ARock will be very fast given separable refl γf easy-to-compute (refl γg) i 1 2 I T PRS 24 / 40

25 Parallel/distributed ADMM require: m convex functions f i (possibly nonsmooth) consensus problem: minimize x m fi(x) + g(x) i=1 minimize x i,y subject to m fi(xi) + g(y) i=1 I I... I x 1 x 2. x m I I. y = 0 I Douglas-Rachford-ARock to the dual problem async-parallel ADMM: m f i-subproblems are solved in the async-parallel fashion y and z i (dual variables) are updated in global memory 25 / 40

26 Algorithm 2: ARock (async-parallel ADMM) for consensus optimization Input : shared variables y 0, z 0 i, i, and K > 0 while k < K, every agent asynchronously and continuously do sample i from {1,..., m} with equal probability; locally compute (ŵ k d g ) i, ˆx k i, and (ŵ k d f ) i by (2a) (2c), respectively; update global z k+1 i and ŷ k+1 by (3a) and (3b), respectively; update the global counter k k + 1; local computation: (ŵ k d g ) i = ẑ k i + γ ŷ k, (2a) ˆx k i = arg min xi f i(x i) 2(ŵ k d g ) i ẑ k i, x i + γ 2 xi 2, (2b) (ŵ k d f ) i = 2(ŵ k d g ) i ẑ k i γ ˆx k i. (2c) global update: z k+1 i = z k i + η k ((ŵ k d f ) i (ŵ k d g ) i) (3a) ŷ k+1 = ŷ k + 1 γm (ẑk i ẑ k+1 i ) (3b) 26 / 40

27 Async-parallel decentralized ADMM a graph of connected agents: G = (V, E). decentralized consensus optimization problem: minimize x i R d,i V f(x) := i V fi(xi) subject to x i = x j, (i, j) E ADMM reformulation: constraints x i = y ij, x j = y ij, (i, j) E apply ARock version 1: nodes asynchronously activate version 2: edges (and nodes of each edge) asynchronously activate both versions: each agent keeps f i private and talks to its neighbors 27 / 40

28 notation: E(i) all edges of agent i, E(i) = L(i) R(i) L(i) neighbors j of agent i, j < i R(i) neighbors j of agent i, j > i Algorithm 3: ARock for the decentralized consensus problem Input : each agent i sets x 0 i R d, dual variables z 0 e,i for e E(i), K > 0. while k < K, any activated agent i do receive ẑli,l k from neighbors l L(i) and ẑir,r k from neighbors r R(i); update local ˆx k i, z k+1 li,i and z k+1 ir,i according to (4a) (4c), respectively; send z k+1 li,i to neighbors l L(i) and z k+1 ir,i to neighbors r R(i). ˆx k i arg min f i(x i) + ( x l L(i) ẑk li,l + ) r R(i) ẑk ir,r xi + γ 2 E(i) xi 2, (4a) i z k+1 ir,i = z k ir,i η k ((ẑ k ir,i + ẑ ir,r)/2 + γˆx k i ), r R(i), (4b) z k+1 li,i = z k li,i η k ((ẑ k li,i + ẑ li,l )/2 + γˆx k i ), l L(i). (4c) 28 / 40

29 Literature 29 / 40

30 Brief history The first async-parallel algorithm appeared in 1969 for solving linear equations. It was extended to fixed-point problems under the absolute-contraction 1 type of assumption. For years, mainly solve linear, nonlinear and differential equations. Some recent work solves statistical regression, machine learning, and sensor network problems. 1 An operator T : R n R n is Lipschitz contractive if T (x) T (y) A x y, component-wise, where x denotes the vector with components x i, i = 1,..., n, and A R n n is a matrix with a spectral radius strictly less than / 40

31 Recent work Bertsekas-Tsitsiklis 89: Async-parallel gradient-projection method Liu et al. 13: async-parallel stochastic coordinate descent for minimizing convex smooth functions Liu and Wright 14: async-parallel stochastic proximal coordinate descent algorithm for minimizing convex composite objective functions Hsieh et al. 15: async-parallel implementation of LIBLINEAR (for l 2 regularized empirical risk minimization) Other async-parallel / async-admm methods: Wei-Ozdaglar 13, Iutzeler et al 13, Zhang-Kwok 14, Hong 14, 31 / 40

32 ARock contributions A framework for nonexpansive operators that have fixed-points Applications: async-parallel algorithms for linear equations, (smooth and nonsmooth) function minimization, distributed and decentralized optimization... Similar to recent work, random coordinate updates: automatic load balance Analysis: almost sure convergence of x k to x FixT linear convergence (when S is strongly monotone) fixed step sizes Open-source C code for reproducible research 32 / 40

33 Under the hood 33 / 40

34 Iteration is redefined Synchronous new iteration = all agents finish Asynchronous new iteration = any agent finishes 34 / 40

35 Reading consistency multiple agents simultaneously read and write x in global memory. while an agent reads x into its cache, x might be updated by other agents. definitions: let x 0,..., x k,... be the states of x in the memory ˆx k is called consistent if ˆx k = x j for some j k. ˆx k is called inconsistent if ˆx k x j for every j k. 35 / 40

36 Reading consistency and memory lock Agent 1 read [0, 0, 0, 0] T = x 0 consistent read Agent 1 read [0, 0, 0, 2] T {x 0, x 1, x 2 } inconsistent read ARock allows inconsistent read 36 / 40

37 Atomic coordinate update when each coordinate update is atomic (single CPU instruction), the read of each single coordinate is consistent, that is, x k i = ˆx k i + d J i (k) (x d+1 i x d i ) }{{} interim changes of x i ˆx k i : the result of read x k i : the status of x i right before it is updated J i(k): the index set of the interim changes of x i since k increases for each coordinate update, we have J i(k) J j(k) =. therefore, let J(k) = m i=1j i(k), we have x k = ˆx k + (x d+1 x d ) we assume that J(k) τ for all k d J(k) 37 / 40

38 Special cases of ARock if p = m = 1 (one agent and one coordinate), ARock reduces to the KM iteration. if p = m > 1 and τ = 0 (no delay), ARock reduces to sync-parallel coordinate update. if p = 1 (only one agent), ARock reduces to Nesterov s randomized coordinate update. 38 / 40

39 Analysis challenges and techniques Challenges asynchrony staled information used in the update inconsistency ˆx k may not equal a status of x ever existed coordinate update search direction only on one coordinate no objective function must play with z k z 2 and T z k z k 2 Techniques bounded delay or infinite delay with a light tail a new metric a non-negative almost supermartingale staled ˆx k related to the current x k through atomic updates random selection: expected progress over all coordinates 39 / 40

40 Thank you! Reference: Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin. UCLA CAM Website: wotaoyin/arock 40 / 40

Asynchronous Parallel Computing in Signal Processing and Machine Learning

Asynchronous Parallel Computing in Signal Processing and Machine Learning Asynchronous Parallel Computing in Signal Processing and Machine Learning Wotao Yin (UCLA Math) joint with Zhimin Peng (UCLA), Yangyang Xu (IMA), Ming Yan (MSU) Optimization and Parsimonious Modeling IMA,

More information

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates

ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates ARock: an Algorithmic Framework for Asynchronous Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin May 3, 216 Abstract Finding a fixed point to a nonexpansive operator, i.e., x = T

More information

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates

ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates ARock: an Algorithmic Framework for Async-Parallel Coordinate Updates Zhimin Peng Yangyang Xu Ming Yan Wotao Yin July 7, 215 The problem of finding a fixed point to a nonexpansive operator is an abstraction

More information

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones

Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization,

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

Decentralized Consensus Optimization with Asynchrony and Delay

Decentralized Consensus Optimization with Asynchrony and Delay Decentralized Consensus Optimization with Asynchrony and Delay Tianyu Wu, Kun Yuan 2, Qing Ling 3, Wotao Yin, and Ali H. Sayed 2 Department of Mathematics, 2 Department of Electrical Engineering, University

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Convergence of Fixed-Point Iterations

Convergence of Fixed-Point Iterations Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and

More information

Parallel Coordinate Optimization

Parallel Coordinate Optimization 1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a

More information

Coordinate Update Algorithm Short Course The Package TMAC

Coordinate Update Algorithm Short Course The Package TMAC Coordinate Update Algorithm Short Course The Package TMAC Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 16 TMAC: A Toolbox of Async-Parallel, Coordinate, Splitting, and Stochastic Methods C++11 multi-threading

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)

More information

Primal-dual algorithms for the sum of two and three functions 1

Primal-dual algorithms for the sum of two and three functions 1 Primal-dual algorithms for the sum of two and three functions 1 Ming Yan Michigan State University, CMSE/Mathematics 1 This works is partially supported by NSF. optimization problems for primal-dual algorithms

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Joint work with Nicolas Le Roux and Francis Bach University of British Columbia Context: Machine Learning for Big Data Large-scale

More information

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization Meisam Razaviyayn meisamr@stanford.edu Mingyi Hong mingyi@iastate.edu Zhi-Quan Luo luozq@umn.edu Jong-Shi Pang jongship@usc.edu

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

Distributed Computation of Quantiles via ADMM

Distributed Computation of Quantiles via ADMM 1 Distributed Computation of Quantiles via ADMM Franck Iutzeler Abstract In this paper, we derive distributed synchronous and asynchronous algorithms for computing uantiles of the agents local values.

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

CYCLIC COORDINATE-UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS

CYCLIC COORDINATE-UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS SIAM J. SCI. COMPUT. Vol. 39, No. 4, pp. A80 A300 CYCLIC COORDINATE-UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS YAT TIN CHOW, TIANYU WU, AND WOTAO YIN Abstract. Many problems

More information

arxiv: v2 [math.oc] 2 Mar 2017

arxiv: v2 [math.oc] 2 Mar 2017 CYCLIC COORDINATE UPDATE ALGORITHMS FOR FIXED-POINT PROBLEMS: ANALYSIS AND APPLICATIONS YAT TIN CHOW, TIANYU WU, AND WOTAO YIN arxiv:6.0456v [math.oc] Mar 07 Abstract. Many problems reduce to the fixed-point

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Importance Sampling for Minibatches

Importance Sampling for Minibatches Importance Sampling for Minibatches Dominik Csiba School of Mathematics University of Edinburgh 07.09.2016, Birmingham Dominik Csiba (University of Edinburgh) Importance Sampling for Minibatches 07.09.2016,

More information

Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization Part I: Model and Convergence

Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization Part I: Model and Convergence Noname manuscript No. (will be inserted by the editor) Asynchronous Parallel Algorithms for Nonconvex Big-Data Optimization Part I: Model and Convergence Loris Cannelli Francisco Facchinei Vyacheslav Kungurtsev

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Convergence Models and Surprising Results for the Asynchronous Jacobi Method

Convergence Models and Surprising Results for the Asynchronous Jacobi Method Convergence Models and Surprising Results for the Asynchronous Jacobi Method Jordi Wolfson-Pou School of Computational Science and Engineering Georgia Institute of Technology Atlanta, Georgia, United States

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

EE364b Convex Optimization II May 30 June 2, Final exam

EE364b Convex Optimization II May 30 June 2, Final exam EE364b Convex Optimization II May 30 June 2, 2014 Prof. S. Boyd Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you

More information

F (x) := f(x) + g(x), (1.1)

F (x) := f(x) + g(x), (1.1) ASYNCHRONOUS STOCHASTIC COORDINATE DESCENT: PARALLELISM AND CONVERGENCE PROPERTIES JI LIU AND STEPHEN J. WRIGHT Abstract. We describe an asynchronous parallel stochastic proximal coordinate descent algorithm

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Nov 2, 2016 Outline SGD-typed algorithms for Deep Learning Parallel SGD for deep learning Perceptron Prediction value for a training data: prediction

More information

Lock-Free Approaches to Parallelizing Stochastic Gradient Descent

Lock-Free Approaches to Parallelizing Stochastic Gradient Descent Lock-Free Approaches to Parallelizing Stochastic Gradient Descent Benjamin Recht Department of Computer Sciences University of Wisconsin-Madison with Feng iu Christopher Ré Stephen Wright minimize x f(x)

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Nonconvex ADMM: Convergence and Applications

Nonconvex ADMM: Convergence and Applications Nonconvex ADMM: Convergence and Applications Instructor: Wotao Yin (UCLA Math) Based on CAM 15-62 with Yu Wang and Jinshan Zeng Summer 2016 1 / 54 1. Alternating Direction Method of Multipliers (ADMM):

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Caroline Chaux Joint work with X. Vu, N. Thirion-Moreau and S. Maire (LSIS, Toulon) Aix-Marseille

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization

Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization Xiangru Lian, Yijun Huang, Yuncheng Li, and Ji Liu Department of Computer Science, University of Rochester {lianxiangru,huangyj0,raingomm,ji.liu.uwisc}@gmail.com

More information

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization

Perturbed Iterate Analysis for Asynchronous Stochastic Optimization Perturbed Iterate Analysis for Asynchronous Stochastic Optimization arxiv:1507.06970v2 [stat.ml] 25 Mar 2016 Horia Mania α,ǫ, Xinghao Pan α,ǫ, Dimitris Papailiopoulos α,ǫ Benjamin Recht α,ǫ,σ, Kannan Ramchandran

More information

arxiv: v4 [math.oc] 29 Jan 2018

arxiv: v4 [math.oc] 29 Jan 2018 Noname manuscript No. (will be inserted by the editor A new primal-dual algorithm for minimizing the sum of three functions with a linear operator Ming Yan arxiv:1611.09805v4 [math.oc] 29 Jan 2018 Received:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information