Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Size: px
Start display at page:

Download "Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725"

Transcription

1 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization /

2 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple. Proimal Newton method: let (0) R n, and repeat: v (k) = argmin v g( (k 1) ) T v vt 2 g( (k 1) )v + h( (k 1) + v) (k) = (k 1) + t k v (k), k = 1, 2, 3,... Step sizes are typically chosen by backtracking Iterations here are typically very epensive (computing d (k) is typically a formidable task) But typically very few iterations are needed until convergence: under appropriate conditions, get local quadratic convergence 2

3 Reminder: conjugate functions Recall that given f : R n R, the function f (y) = ma y T f() is called its conjugate Conjugates appear frequently in dual programs, since f (y) = min f() y T If f is closed and conve, then f = f. Also, f (y) y f() argmin z f(z) y T z If f is strictly conve, then f (y) = argmin z f(z) y T z 3

4 Proof details We will show that f (y) y f(), assuming that f is conve and closed Proof of = : Suppose y f(). Then M y, the set of maimizers of y T z f(z) over z. But f (y) = ma z y T z f(z) and f (y) = cl(conv( z My {z})). Thus f (y). Proof of = : From what we showed above, if f (y), then y f (), but f = f. Clearly y f() argmin z f(z) y T z Lastly if f is strictly conve, then we know that f(z) y T z has a unique minimizer over z, and this must be f (y) 4

5 Outline Today: Dual (sub)gradient methods Dual decomposition Augmented Lagrangians A peak at ADMM 5

6 Dual (sub)gradient methods Even if we can t derive dual (conjugate) in closed form, we can still use dual-based subgradient or gradient methods Eample: consider the problem min f() subject to A = b Its dual problem is ma u f ( A T u) b T u where f is conjugate of f. Defining g(u) = f ( A T u) b T u, note that g(u) = A f ( A T u) b 6

7 Therefore, using what we know about conjugates g(u) = A b where argmin z f(z) + u T Az The dual subgradient method (for maimizing the dual objective) starts with an initial dual guess u (0), and repeats for k = 1, 2, 3,... (k) argmin f() + (u (k 1) ) T A u (k) = u (k 1) + t k (A (k) b) Step sizes t k, k = 1, 2, 3,..., are chosen in standard ways 7

8 Recall that if f is strictly conve, then f is differentiable, and so this becomes dual gradient ascent, which repeats for k = 1, 2, 3,... (k) = argmin f() + (u (k 1) ) T A u (k) = u (k 1) + t k (A (k) b) (Difference is that each (k) is unique, here.) Again, step sizes t k, k = 1, 2, 3,... are chosen in standard ways Also, proimal gradients and acceleration can be applied as they would usually 8

9 Lipschitz gradients and strong conveity Assume that f is a closed and conve function. Then f is strongly conve with parameter d f Lipschitz with parameter 1/d Proof of = : Recall, if g strongly conve with minimizer, then g(y) g() + d 2 y 2, for all y Hence defining u = f (u), v = f (v), f( v ) u T v f( u ) u T u + d 2 u v 2 2 f( u ) v T u f( v ) v T v + d 2 u v 2 2 Adding these together, using Cauchy-Schwartz, and rearranging shows that u v 2 u v 2 /d 9

10 10 Convergence guarantees The following results hold from combining the last fact with what we already know about gradient descent: If f is strongly conve with parameter d, then dual gradient ascent with fied step sizes t k = d, k = 1, 2, 3,..., converges at the rate O(1/ɛ) If f is strongly conve with parameter d, and f is Lipschitz with parameter L, then dual gradient ascent with step sizes t k = 2/(1/d + 1/L), k = 1, 2, 3,..., converges at the rate O(log(1/ɛ)) Note that these results describe convergence of the dual objective to its optimal value

11 11 Consider min Dual decomposition B f i ( i ) subject to A = b i=1 Here = ( 1,... B ) R n divides into B blocks of variables, with each i R n i. We can also partition A accordingly A = [A 1... A B ], where A i R m n i Simple but powerful observation, in calculation of (sub)gradient, is that the minimization decomposes into B separate problems: + i + argmin B f i ( i ) + u T A i=1 argmin f i ( i ) + u T A i i, i = 1,... B i

12 12 Dual decomposition algorithm: repeat for k = 1, 2, 3,... (k) i argmin f i ( i ) + (u (k 1) ) T A i i, i = 1,... B i u (k) = u (k 1) + t k ( B i=1 A i (k) i ) b Can think of these steps as: Broadcast: send u to each of the B processors, each optimizes in parallel to find i Gather: collect A i i from each processor, update the global dual variable u 1 u 2 u 3 u

13 13 Dual decomposition with inequality constraints Consider min B f i ( i ) subject to i=1 B A i i b i=1 Dual decomposition (projected subgradient method): repeat for k = 1, 2, 3,... argmin f i ( i ) + (u (k 1) ) T A i i, i = 1,... B i ( ( B ) ) u (k) = u (k 1) + t k A i (k) i b (k) i i=1 where u + denotes the positive part of u, i.e., (u + ) i = ma{0, u i }, i = 1,..., m +

14 14 Price coordination interpretation (Vandenberghe): Have B units in a system, each unit chooses its own decision variable i (how to allocate its goods) Constraints are limits on shared resources (rows of A), each component of dual variable u j is price of resource j Dual update: u + j = (u j ts j ) +, j = 1,... m where s = b B i=1 A i i are slacks Increase price uj if resource j is over-utilized, s j < 0 Decrease price uj if resource j is under-utilized, s j > 0 Never let prices get negative

15 15 Augmented Lagrangian method also known as: method of multipliers Disadvantage of dual ascent: require strong conditions to ensure convergence. Improved by augmented Lagrangian method, also called method of multipliers. We transform the primal problem: min f() + ρ 2 A b 2 2 subject to A = b where ρ > 0 is a parameter. Clearly equivalent to original problem, and objective is strongly conve when A has full column rank. Use dual gradient ascent: repeat for k = 1, 2, 3,... (k) = argmin u (k) = u (k 1) + ρ(a (k) b) f() + (u (k 1) ) T A + ρ 2 A b 2 2

16 16 Notice step size choice t k = ρ, k = 1, 2, 3,... in dual algorithm. Why? Since (k) minimizes f() + (u (k 1) ) T A + ρ 2 A b 2 2 over, we have ( ) 0 f( (k) ) + A T u (k 1) + ρ(a (k) b) = f( (k) ) + A T u (k) This is the stationarity condition for the original primal problem; can show under mild conditions that A (k) b approaches zero (i.e., primal iterates approach feasibility), hence in the limit KKT conditions are satisfied and (k), u (k) approach optimality Advantage: much better convergence properties. Disadvantage: lose decomposability! (Separability is compromised by augmented Lagrangian...)

17 17 Alternating direction method of multipliers Alternating direction method of multipliers or ADMM: the best of both worlds, i.e., we get strong convergence properties, along with decomposability. Consider min,z f() + g(z) subject to A + Bz = c As before, we augment the objective min f() + g(z) + ρ 2 A + Bz c 2 2 subject to A + Bz = c for a parameter ρ > 0. We define augmented Lagrangian L ρ (, z, u) = f() + g(z) + u T (A + Bz c) + ρ 2 A + Bz c 2 2

18 18 ADMM repeats the steps, for k = 1, 2, 3,... (k) = argmin z (k) = argmin z L ρ (, z (k 1), u (k 1) ) L ρ ( (k), z, u (k 1) ) u (k) = u (k 1) + ρ(a (k) + Bz (k) c) Note that the usual method of multipliers would have replaced the first two steps by a joint minimization ( (k), z (k) ) = argmin,z L ρ (, z, u (k 1) )

19 19 Convergence guarantees Under modest assumptions on f, g (these do not require A, B to be full rank), the ADMM iterates satisfy, for any ρ > 0: Residual convergence: r (k) = A (k) Bz (k) c 0 as k, i.e., primal iterates approach feasibility Objective convergence: f( (k) ) + g(z (k) ) f + g, where f + g is the optimal primal objective value Dual convergence: u (k) u, where u is a dual solution For details, see Boyd et al. (2010). Note that we do not generically get primal convergence, but this is true under more assumptions Convergence rate: not known in general, theory is currently being developed, e.g., in Hong and Luo (2012), Deng and Yin (2012), Iutzeler et al. (2014), Nishihara et al. (2015). Roughly, it behaves like a first-order method (or a bit faster)

20 20 ADMM in scaled form It is often easier to epress the ADMM algorithm in a scaled form, where we replace the dual variable u by a scaled variable w = u/ρ. In this parametrization, the ADMM steps are: (k) = argmin z (k) = argmin z f() + ρ 2 A + Bz(k 1) c + w (k 1) 2 2 g(z) + ρ 2 A(k) + Bz c + w (k 1) 2 2 w (k) = w (k 1) + A (k) + Bz (k) c Note that here the kth iterate w (k) is just given by a running sum of residuals: w (k) = w (0) + k ( A (i) + Bz (i) c ) i=1

21 21 Eample: alternating projections Consider finding a point in intersection of conve sets C, D R n : min I C () + I D () To get this into ADMM form, we epress it as min,z I C () + I D (z) subject to z = 0 Each ADMM cycle involves two projections: (k) ( = argmin P C z (k 1) w (k 1)) z (k) ( = argmin P D (k) + w (k 1)) z w (k) = w (k 1) + (k) z (k) Like the classical alternating projections method, but more efficient

22 References S. Boyd and N. Parikh and E. Chu and B. Peleato and J. Eckstein (2010), Distributed optimization and statistical learning via the alternating direction method of multipliers W. Deng and W. Yin (2012), On the global and linear convergence of the generalized alternating direction method of multipliers M. Hong and Z. Luo (2012), On the linear convergence of the alternating direction method of multipliers F. Iutzeler and P. Bianchi and Ph. Ciblat and W. Hachem, (2014), Linear convergence rate for distributed optimization with the alternating direction method of multipliers R. Nishihara and L. Lessard and B. Recht and A. Packard and M. Jordan (2015), A general analysis of the convergence of ADMM L. Vandenberghe, Lecture Notes for EE 236C, UCLA, Spring

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Lecture 23: Conditional Gradient Method

Lecture 23: Conditional Gradient Method 10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Lecture 16: October 22

Lecture 16: October 22 0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Lecture 14: Newton s Method

Lecture 14: Newton s Method 10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

1 Kernel methods & optimization

1 Kernel methods & optimization Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Lecture 6: September 17

Lecture 6: September 17 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS

More information

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Karush-Kuhn-Tucker Conditions Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given a minimization problem Last time: duality min x subject to f(x) h i (x) 0, i = 1,... m l j (x) = 0, j =

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem 1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

10701 Recitation 5 Duality and SVM. Ahmed Hefny

10701 Recitation 5 Duality and SVM. Ahmed Hefny 10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa Convex Optimization Lecture 12 - Equality Constrained Optimization Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 19 Today s Lecture 1 Basic Concepts 2 for Equality Constrained

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Lecture 14: Optimality Conditions for Conic Problems

Lecture 14: Optimality Conditions for Conic Problems EE 227A: Conve Optimization and Applications March 6, 2012 Lecture 14: Optimality Conditions for Conic Problems Lecturer: Laurent El Ghaoui Reading assignment: 5.5 of BV. 14.1 Optimality for Conic Problems

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Nonparametric Quantile Regression for Prediction of Parking Data

Nonparametric Quantile Regression for Prediction of Parking Data Nonparametric Quantile Regression for Prediction of Parking Data Tanner Fiez Department of Electrical Engineering University of Washington I. INTRODUCTION In this paper we will eplore the approach taken

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Convex Optimization Overview (cnt d)

Convex Optimization Overview (cnt d) Conve Optimization Overview (cnt d) Chuong B. Do November 29, 2009 During last week s section, we began our study of conve optimization, the study of mathematical optimization problems of the form, minimize

More information

Topic 8c Multi Variable Optimization

Topic 8c Multi Variable Optimization Course Instructor Dr. Raymond C. Rumpf Office: A 337 Phone: (915) 747 6958 E Mail: rcrumpf@utep.edu Topic 8c Multi Variable Optimization EE 4386/5301 Computational Methods in EE Outline Mathematical Preliminaries

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 16 Extra: Support Vector Machine Optimization with Dual Dr. Yanjun Qi University of Virginia Department of Computer Science Today Extra q Optimization of SVM ü SVM

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Distributed Optimization: Analysis and Synthesis via Circuits

Distributed Optimization: Analysis and Synthesis via Circuits Distributed Optimization: Analysis and Synthesis via Circuits Stephen Boyd Prof. S. Boyd, EE364b, Stanford University Outline canonical form for distributed convex optimization circuit intepretation primal

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Convexity II: Optimization Basics

Convexity II: Optimization Basics Conveity II: Optimization Basics Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 See supplements for reviews of basic multivariate calculus basic linear algebra Last time: conve sets and functions

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b,

ECE Optimization for wireless networks Final. minimize f o (x) s.t. Ax = b, ECE 788 - Optimization for wireless networks Final Please provide clear and complete answers. PART I: Questions - Q.. Discuss an iterative algorithm that converges to the solution of the problem minimize

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Computational Optimization. Constrained Optimization Part 2

Computational Optimization. Constrained Optimization Part 2 Computational Optimization Constrained Optimization Part Optimality Conditions Unconstrained Case X* is global min Conve f X* is local min SOSC f ( *) = SONC Easiest Problem Linear equality constraints

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming

Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Lecture 11 and 12: Penalty methods and augmented Lagrangian methods for nonlinear programming Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 11 and

More information

Bregman Alternating Direction Method of Multipliers

Bregman Alternating Direction Method of Multipliers Bregman Alternating Direction Method of Multipliers Huahua Wang, Arindam Banerjee Dept of Computer Science & Engg, University of Minnesota, Twin Cities {huwang,banerjee}@cs.umn.edu Abstract The mirror

More information

An Optimization-based Approach to Decentralized Assignability

An Optimization-based Approach to Decentralized Assignability 2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract

More information

12. Interior-point methods

12. Interior-point methods 12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information