5. Subgradient method

Size: px
Start display at page:

Download "5. Subgradient method"

Transcription

1 L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1

2 Subgradient method to minimize a nondifferentiable convex function f: choose x (0) and repeat g (k 1) is any subgradient of f at x (k 1) x (k) = x (k 1) t k g (k 1), k = 1, 2,... Step size rules fixed step: t k constant fixed length: t k g (k 1) 2 = x (k) x (k 1) 2 is constant diminishing: t k 0, t k = k=1 Subgradient method 5-2

3 Assumptions f has finite optimal value f, minimizer x f is convex, dom f = R n f is Lipschitz continuous with constant G > 0: f(x) f(y) G x y 2 x, y this is equivalent to g 2 G for all x and g f(x) (see next page) Subgradient method 5-3

4 Proof. assume g 2 G for all subgradients; choose g y f(y), g x f(x): g T x (x y) f(x) f(y) g T y (x y) by the Cauchy-Schwarz inequality G x y 2 f(x) f(y) G x y 2 assume g 2 > G for some g f(x); take y = x + g/ g 2 : f(y) f(x) + g T (y x) = f(x) + g 2 > f(x) + G Subgradient method 5-4

5 Analysis the subgradient method is not a descent method the key quantity in the analysis is the distance to the optimal set with x + = x (i), x = x (i 1), g = g (i 1), t = t i : x + x 2 2 = x tg x 2 2 = x x 2 2 2tg T (x x ) + t 2 g 2 2 x x 2 2 2t (f(x) f ) + t 2 g 2 2 combine inequalities for i = 1,..., k, and define f (k) best = min 0 i<k f(x(i) ): k 2( t i )(f (k) best f ) x (0) x 2 2 x (k) x k x (0) x t 2 i g (i 1) 2 2 k t 2 i g (i 1) 2 2 Subgradient method 5-5

6 Fixed step size: t i = t f (k) best f x(0) x 2 2 2kt + G2 t 2 does not guarantee convergence of f (k) best for large k, f (k) best is approximately G2 t/2-suboptimal Fixed step length: t i = s/ g (i 1) 2 f (k) best f G x(0) x 2 2 2ks + Gs 2 does not guarantee convergence of f (k) best for large k, f (k) best is approximately Gs/2-suboptimal Subgradient method 5-6

7 Diminishing step size: t i 0, t i = f (k) best f x (0) x G 2 2 k t i k t 2 i can show that ( k t 2 i )/( k t i ) 0; hence, f (k) best converges to f Subgradient method 5-7

8 Example: 1-norm minimization minimize Ax b 1 subgradient is given by A T sign(ax b) example with A R , b R 500 Fixed steplength t k = s/ g (k 1) 2 for s = 0.1, 0.01, (f(x (k) ) f )/f (f best (x (k) ) f )/f k k Subgradient method 5-8

9 Diminishing step size: t k = 0.01/ k and t k = 0.01/k (f best (x (k) ) f )/f 0.01/ k 0.01/k k Subgradient method 5-9

10 Optimal step size for fixed number of iterations from page 5-5: if s i = t i g (i 1) 2 and x (0) x 2 R: f (k) best f k R s 2 i k s i /G for given k, bound is minimized by fixed step length s i = s = R/ k resulting bound after k steps is f (k) best f GR k guarantees accuracy f (k) best f ɛ in k = O(1/ɛ 2 ) iterations Subgradient method 5-10

11 Optimal step size when f is known right-hand side in first inequality of page 5-5 is minimized by t i = f(x(i 1) ) f g (i 1) 2 2 optimized bound is ( f(x (i 1) ) f ) 2 g (i 1) 2 2 x (i 1) x 2 2 x (i) x 2 2 applying recursively (with x (0) x 2 R and g (i) 2 G) gives f (k) best f GR k Subgradient method 5-11

12 Exercise: find point in intersection of convex sets find a point in the intersection of m closed convex sets C 1,..., C m : minimize f(x) = max {f 1 (x),..., f m (x)} where f j (x) = inf y C j x y 2 is Euclidean distance of x to C j f = 0 if the intersection is nonempty (from p. 4-14): g f(ˆx) if g f j (ˆx) and C j is farthest set from ˆx (from p. 4-20) subgradient g f j (ˆx) follows from projection P j (ˆx) on C j : g = 0 (if ˆx C j ), g = 1 ˆx P j (ˆx) 2 (ˆx P j (ˆx)) (if ˆx C j ) note that g 2 = 1 if ˆx C j Subgradient method 5-12

13 Subgradient method optimal step size (page 5-11) for f = 0 and g (i 1) 2 = 1 is t i = f(x (i 1) ) at iteration k, find farthest set C j (with f(x (k 1) ) = f j (x (k 1) )), and take x (k) = x (k 1) f(x(k 1) ) f j (x (k 1) ) (x(k 1) P j (x (k 1) )) = P j (x (k 1) ) at each step, we project the current point onto the farthest set a version of the alternating projections algorithm for m = 2, projections alternate onto one set, then the other later, we will see faster versions of this that are almost as simple Subgradient method 5-13

14 Optimality of the subgradient method can the f (k) best f GR/ k bound on page 5-10 be improved? Problem class f is convex, with a minimizer x we know a starting point x (0) with x (0) x 2 R we know the Lipschitz constant G of f on {x x x (0) 2 R} f is defined by an oracle: given x, oracle returns f(x) and a subgradient Algorithm class: k iterations of any method that chooses x (i) in x (0) + span{g (0), g (1),..., g (i 1) } Subgradient method 5-14

15 Test problem and oracle f(x) = max x i + 1,...,k 2 x 2 2, x (0) = 0 solution: x = 1 k (1, }. {{.., 1 }, 0, }. {{.., 0 } ) and f = 1 2k k n k R = x (0) x 2 = 1/ k and G = 1 + 1/ k oracle returns subgradient eĵ + x where ĵ = min{j x j = Iteration: for i = 0,..., k 1, entries x (i) i+1,..., x(i) k max x i},...,k are zero; therefore f (k) best f = min i<k f(x(i) ) f f = GR 2(1 + k) Conclusion: O(1/ k) bound cannot be improved Subgradient method 5-15

16 Summary: subgradient method handles general nondifferentiable convex problem often leads to very simple algorithms convergence can be very slow no good stopping criterion theoretical complexity: O(1/ɛ 2 ) iterations to find ɛ-suboptimal point an optimal 1st-order method: O(1/ɛ 2 ) bound cannot be improved Subgradient method 5-16

17 References S. Boyd, lecture notes and slides for EE364b, Convex Optimization II Yu. Nesterov, Introductory Lectures on Convex Optimization. A Basic Course (2004) with the example on page 5-15 of this lecture Subgradient method 5-17

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

10. Ellipsoid method

10. Ellipsoid method 10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Lecture: Smoothing.

Lecture: Smoothing. Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

Lecture 14 Ellipsoid method

Lecture 14 Ellipsoid method S. Boyd EE364 Lecture 14 Ellipsoid method idea of localization methods bisection on R center of gravity algorithm ellipsoid method 14 1 Localization f : R n R convex (and for now, differentiable) problem:

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Lecture 25: Subgradient Method and Bundle Methods April 24

Lecture 25: Subgradient Method and Bundle Methods April 24 IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Lecture 6: September 17

Lecture 6: September 17 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 4 Subgradient Shiqian Ma, MAT-258A: Numerical Optimization 2 4.1. Subgradients definition subgradient calculus duality and optimality conditions Shiqian

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution 1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Subgradient Methods. Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring

Subgradient Methods. Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring Subgradient Methods Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring 2013 14 May 2014; based on notes from January 2007 Contents 1 Introduction 3 2 Basic subgradient

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lecture 1: Background on Convex Analysis

Lecture 1: Background on Convex Analysis Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes

More information

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Final exam.

Final exam. EE364b Convex Optimization II June 018 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu February

More information

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods Convex Optimization Prof. Nati Srebro Lecture 12: Infeasible-Start Newton s Method Interior Point Methods Equality Constrained Optimization f 0 (x) s. t. A R p n, b R p Using access to: 2 nd order oracle

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Primal Solutions and Rate Analysis for Subgradient Methods

Primal Solutions and Rate Analysis for Subgradient Methods Primal Solutions and Rate Analysis for Subgradient Methods Asu Ozdaglar Joint work with Angelia Nedić, UIUC Conference on Information Sciences and Systems (CISS) March, 2008 Department of Electrical Engineering

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

Lecture 6 : Projected Gradient Descent

Lecture 6 : Projected Gradient Descent Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly

More information

Radial Subgradient Descent

Radial Subgradient Descent Radial Subgradient Descent Benja Grimmer Abstract We present a subgradient method for imizing non-smooth, non-lipschitz convex optimization problems. The only structure assumed is that a strictly feasible

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

Linear Analysis Lecture 5

Linear Analysis Lecture 5 Linear Analysis Lecture 5 Inner Products and V Let dim V < with inner product,. Choose a basis B and let v, w V have coordinates in F n given by x 1. x n and y 1. y n, respectively. Let A F n n be the

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Universal Gradient Methods for Convex Optimization Problems

Universal Gradient Methods for Convex Optimization Problems CORE DISCUSSION PAPER 203/26 Universal Gradient Methods for Convex Optimization Problems Yu. Nesterov April 8, 203; revised June 2, 203 Abstract In this paper, we present new methods for black-box convex

More information

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS

A SIMPLE PARALLEL ALGORITHM WITH AN O(1/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS A SIMPLE PARALLEL ALGORITHM WITH AN O(/T ) CONVERGENCE RATE FOR GENERAL CONVEX PROGRAMS HAO YU AND MICHAEL J. NEELY Abstract. This paper considers convex programs with a general (possibly non-differentiable)

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Lecture 2: Subgradient Methods

Lecture 2: Subgradient Methods Lecture 2: Subgradient Methods John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this lecture, we discuss first order methods for the minimization of convex functions. We

More information

Barrier Method. Javier Peña Convex Optimization /36-725

Barrier Method. Javier Peña Convex Optimization /36-725 Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly

More information

Unconstrained minimization

Unconstrained minimization CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

SHARPNESS, RESTART AND ACCELERATION

SHARPNESS, RESTART AND ACCELERATION SHARPNESS, RESTART AND ACCELERATION VINCENT ROULET AND ALEXANDRE D ASPREMONT ABSTRACT. The Łojasievicz inequality shows that sharpness bounds on the minimum of convex optimization problems hold almost

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic

More information

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

In English, this means that if we travel on a straight line between any two points in C, then we never leave C. Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from

More information

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009

LMI MODELLING 4. CONVEX LMI MODELLING. Didier HENRION. LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ. Universidad de Valladolid, SP March 2009 LMI MODELLING 4. CONVEX LMI MODELLING Didier HENRION LAAS-CNRS Toulouse, FR Czech Tech Univ Prague, CZ Universidad de Valladolid, SP March 2009 Minors A minor of a matrix F is the determinant of a submatrix

More information

Lecture 1: Basic Concepts

Lecture 1: Basic Concepts ENGG 5781: Matrix Analysis and Computations Lecture 1: Basic Concepts 2018-19 First Term Instructor: Wing-Kin Ma This note is not a supplementary material for the main slides. I will write notes such as

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

Lecture 3: Linesearch methods (continued). Steepest descent methods

Lecture 3: Linesearch methods (continued). Steepest descent methods Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information