Math 273a: Optimization Subgradient Methods
|
|
- Octavia Bruce
- 5 years ago
- Views:
Transcription
1 Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com
2 Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R n : f(y) f(ˉx) + g, y ˉx }. If f(x) is differentiable, then f(x) = { f(x)} is a singleton. For any x, y, g x f(x), and g y f(y), we have g x g y, x y 0. ( f is a monotonic point-to-set map)
3 Which functions have subgradients? Theorem (Nesterov Thm ) Let f be a closed convex function and x 0 int(dom(f)). Then f(x 0) is a nonempty bounded set. Proof uses supporting hyperplanes of epigraph to show existence, and local Lipschitz continuity of convex functions to show boundedness. 1 1 This and next Slide taken from Damek s lecture. Figure taken from Boyd and Vandenberghe,
4 The converse Lemma (Nesterov Lm 3.1.6) If f(x) for all x dom(f), then f is convex. Proof. x, y dom(f), α [0, 1], y α = (1 α)x + αy = x + α(y x), g f(y α). f(y) f(y α ) + g, y y α = f(y α ) + (1 α) g, y x (1) f(x) f(y α) + g, x y α = f(y α) α g, y x (2) Multiply equation (1) by α and equation (2) by (1 α). Add them together to get αf(y) + (1 α)f(x) f(y α).
5 Compute subgradients: general rules Differentiable functions: f(x) = { f(x)}. Composition with affine map: φ(x) = f(a(x) + b) φ(x) = A T f(a(x) + b). Positive sums: α, β > 0, f(x) = αf 1(x) + βf 2(x). f(x) = α f 1(x) + β f 2(x) Maximums: f(x) = max i {1,,n} {f i(x)} f(x) = conv{ f i(x) f i(x) = f(x)}
6 Examples f(x) = x. f(x) = { {sign(x)} x 0; [ 1, 1] otherwise 2 2 figure taken from Boyd and Vandenberghe,
7 Examples f(x) = n ai, x bi. Define i=1 Then f(x) = I (x) = {i a i, x b i < 0} I + (x) = {i a i, x b i > 0} I 0(x) = {i a i, x b i = 0}. i I + (x) a i f(x) = max i {1,,n} x i. Then i I (x) a i + i I 0 (x) f(x) = conv{e i x i = f(x)} f(0) = conv{e i i {1,, n}} [ a i, a i]
8 Examples f(x) = x 2. f is differential away from 0, so: f(x) = At 0, go back to subgradient equation: Thus, g f(0), if, and only if, g,y y 2 ball B2(0, 1) = B 2(0, 1). This is a common pattern! x x 2 x 0. y g, y 0 1 for all y 0. Thus, g is in the dual
9 Examples f(x) = x = max i {1,,n} x (i). f(x) = conv{g (i) g (i) x (i), x (i) = f(x)}, x 0. Going back to subgradient equation y 0 + g, y g,y Thus, g f(0), if, and only if, y 1 for all y 0. Thus, f(0) is the dual ball to the l norm: B 1(0, 1).
10 Examples f(x) = x 1 = n xi. Then i=1 f(x) = e i x i >0 e i + [ e i, e i] x i <0 x i =0 for all x. Then n f(0) = [ e i, e i] = B (0, 1). i=1
11 Optimality condition: 0 subgradient minimum Suppose that 0 f(x). If f is smooth and convex, 0 f(x) = { f(x)} = f(x) = 0. In general: If 0 f(x), then f(y) f(x) + 0, y x = f(x) for all y R n. = x is a minimum! Converse also true: f(y) f(x ) + 0 = f(x ) + 0, y x.
12 The subgradient method Iteration: x k+1 x k α k g k where g k f(x k ). Questions: Applications? Are f(x k ) and x k x monotonic? How to choose α k?
13 Applications Finding a point in the intersection of closed convex sets minimize f(x) = max{dist(x, C 1),..., dist(x, C 1)} Subgradient: if f(x) = dist(x, C j) and x C j, then g = x Proj C j (x) x Proj Cj (x) x dist(x, C j ). Minimizing non-smooth convex functions, e.g., piece-wise linear convex functions Dual subgradient method (generalizes the Uzawa algorithm), dual decomposition (more to come in this lecture)
14 Convergence overview Typically, convergence is established by identifying a monotonically nonincreasing sequence, such as f(x k ) f and x k x 2 However, since the subgradient g(x) is not continuous in x, neither sequence is monotonic Instead, we will define the total descent and use it to bound f(x k ) The choice of step sizes α k is critical.
15 Monotonicity of f(x k )? The definition f(y) f(x) + g, y x, g f(x) yields f(x k+1 ) f(x k ) g k+1, x k x k+1 = f(x k ) α k g k+1, g k. It is generally difficult to estimate g k+1, g k since g is not continuous. (No matter how close x k+1 is to x k, their subgradients g k+1 and g k can be very different.) Therefore, we cannot guarantee f(x k+1 ) < f(x k ). Note: Taking the implicit iteration x k+1 = x k α k g k+1 (the proximal method), we would ensure f(x k+1 ) f(x k ). It is more expensive to compute though.
16 Monotonicity of x k x 2 Let us assume that x exists. Then x k+1 x 2 = (x k x ) α k g k 2 = x k x 2 2α k g k, x k x + α 2 k g k 2. To have monotonicity: x k+1 x 2 x k x 2, we need 2α k g k, x k x + α 2 k g k 2 0 g k, x k x α k 2 gk 2. However, even at x k x, g k may not vanish. Therefore, x k x 2 is generally not monotonic unless g k < G (commonly assumed for subgradient method), and α k = O( x k x ), which is unrealistic to ensure since x is unknown.
17 However, it is often easy to have an estimate on f. For example, f 0 in many applications. The definition f(y) f(x) + g, y x g f(x) yields α k g k, x k x α k (f(x k ) f ) 0. Interpretation: the term α k g k, x k x guarantees a sufficient descent by at least α k (f(x k ) f ) However, the ascending term α 2 k g k 2 can be as large as α 2 kg 2.
18 After substitution, we get the bound x k+1 x 2 x k x 2 2α k (f(x k ) f ) + α 2 k g k 2, Telescopic sum over k gives x k+1 x 2 = x 0 x 2 2 Therefore, x k+1 x k α i(f(x i ) f ) + i=0 k α i(f(x i ) f ) x 0 x 2 + i=0 k αi 2 g i 2. i=0 k αi 2 g i 2 which bounds the total descent k i=0 α k(f(x i ) f ) by the total ascent k i=0 α2 i g i 2. Clearly, α k play the key role. i=0
19 Step size and convergence By x k+1 x and letting k k α i(f(x i ) f ) x 0 x 2 + αi 2 g i 2 i=0 fbest k = min{f(x i ) : i = 0, 1,..., k} (thus, f best f f(x i ) f, i k) g G (equivalent to Lip. f: f(x) f(y) G x y x, y) i=0 we have fbest k f x0 x 2 + G 2 k i=0 α2 i 2. k i=0 αi
20 We need unbounded k i=0 αi and bounded k αi as k. i=0 To have f k best f 0, we require, for example, k i=0 αi = and k i=0 α2 i ; or more weakly, k i=0 αi = and lim α k 0. (the truncation trick) Otherwise, f k best f in general.
21 Fixed step size Fixing α k α, we get fbest k f x0 x 2 + G 2 k i=0 α2 2 k α = x 0 x 2 2α(k + 1) + αg2 2. i=0 f k best f αg 2 /2 = O(α). while in the early stage we have k < x0 x 2 α 2 G 2, x 0 x 2 2α(k + 1) + αg2 x0 x 2 2 α(k + 1) and thus the (non-asymptotic, conditional) rate of convergence O( 1 αk ). larger α = faster convergence, lower final accuracy smaller α = slower convergence, higher final accuracy
22 Fixed step length α k = α/ g k We have f k best f x0 x 2 + k i=0 α2 k g k 2 2 k i=0 α k = x0 x 2 G 2α(k + 1) + αg 2. f k best f αg/2 = O(α), slightly better than with a fixed step size. while k < x0 x 2 α 2, we have the (non-asymptotic, conditional) rate of convergence O( G αk ). larger α = faster convergence, lower final accuracy smaller α = slower convergence, higher final accuracy
23 Polyak step size Assume that f is known (not x though). Example: f = 0 in projection problems.) Set: α k = f(xk ) f g k = arg min x k x 2 2α k (f(x k ) f ) + α 2 k g k 2 Then, x k x 2 2α k (f(x k ) f ) + αk g 2 k 2 = x k x 2 (f(xk ) f ) 2 g k (so x k x 2 is monotonic) and thus after adding over k, x k+1 x 2 x 0 x 2 1 G k (f(x i ) f ) 2. i=0 Finally, f k best f x0 x G k + 1 = O( 1 k ).
24 Oracle optimality For an O( 1 k ) method to guarantee f k best f ɛ, we need O(1/ɛ 2 ) iterations. It this tight for the subgradient method? Answer: Yes. Suppose x k+1 is computed by an arbitrary method as where the oracle gives arbitrary g k f(x k ) and f(x k ). x k+1 x 0 + span{g 0, g 1,..., g k } Theorem (Nesterov Thm 3.2.1) There is a nonsmooth convex function with g G uniformly so that the above method obeys f(x k ) f(x ) x0 x G 2(1 + k + 1).
25 The subgradient algorithm f is a proper closed convex function. Problem: minimize x f(x) Algorithm: pick any starting point x 1 pick g k f(x k ) set α k (α 0/k, fixed size, fixed length, or Polyak) x k+1 x k α k g k k k + 1 (monitor f(x k ) and fbest, k especially if using fixed size, fixed length)
26 Variant: projected subgradient method f is a proper closed convex function. C is a nonempty closed convex set. Problem: pick g k f(x k ) and α k, and apply minimize f(x), subject to x C. x x k+1 Proj C (x k α k g k )
27 since projection is nonexpansive, Proj C (x) Proj C (y) x y, x, y R n the analysis remains the same. x k+1 x 2 = Proj C (x k α k g k ) Proj C (x ) 2 (x k α k g k ) x 2 = (x k x ) α k g k 2 = x k x 2 2α k g k, x k x + αk g 2 k 2 =...
28 Summary for subgradient methods Universal. It handles non-differentiable convex problems and, in particular, the dual problem of linearly constrained convex problems (later lectures) No monotonicity for either f(x k ) or dist(x k, X ) except for Polyak s step size (requiring known f ) Convergence relies on uniformly bounded subgradient (or Lipschitz f) Rate of convergence fbest k f depends on the step size Constant step size (or length) does not guarantee fbest k f If we need fbest k f, use diminishing step sizes; the rate is at best O(1/ k) Convergence is quite slow (but the method is widely applicable) Some non-smooth problems have better rates by other methods, e.g., prox-linear iteration, operator splitting, dual smoothing (later lectures)
Math 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationConvergence of Fixed-Point Iterations
Convergence of Fixed-Point Iterations Instructor: Wotao Yin (UCLA Math) July 2016 1 / 30 Why study fixed-point iterations? Abstract many existing algorithms in optimization, numerical linear algebra, and
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationMath 273a: Optimization Lagrange Duality
Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper
More informationLecture 6: September 12
10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationOptimality Conditions for Nonsmooth Convex Optimization
Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never
More informationTight Rates and Equivalence Results of Operator Splitting Schemes
Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationConvex Analysis and Optimization Chapter 4 Solutions
Convex Analysis and Optimization Chapter 4 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationLecture 25: Subgradient Method and Bundle Methods April 24
IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More informationLecture 6: September 17
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationWE consider an undirected, connected network of n
On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationRadial Subgradient Descent
Radial Subgradient Descent Benja Grimmer Abstract We present a subgradient method for imizing non-smooth, non-lipschitz convex optimization problems. The only structure assumed is that a strictly feasible
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationConvex Analysis Background
Convex Analysis Background John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this set of notes, we will outline several standard facts from convex analysis, the study of
More informationLecture 2: Subgradient Methods
Lecture 2: Subgradient Methods John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this lecture, we discuss first order methods for the minimization of convex functions. We
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationLecture 1: Background on Convex Analysis
Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationLecture 14: Newton s Method
10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.18
XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization
More informationConvex Analysis and Optimization Chapter 2 Solutions
Convex Analysis and Optimization Chapter 2 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationORIE 4741: Learning with Big Messy Data. Proximal Gradient Method
ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:
More informationSubgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives
Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationIntroduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 7 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Convex Optimization Differentiation Definition: let f : X R N R be a differentiable function,
More informationLecture 24 November 27
EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationAsynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones
Asynchronous Algorithms for Conic Programs, including Optimal, Infeasible, and Unbounded Ones Wotao Yin joint: Fei Feng, Robert Hannah, Yanli Liu, Ernest Ryu (UCLA, Math) DIMACS: Distributed Optimization,
More informationFenchel Duality between Strong Convexity and Lipschitz Continuous Gradient
Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Xingyu Zhou The Ohio State University zhou.2055@osu.edu December 5, 2017 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 1
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationA Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions
A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationLecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008
Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization October 15, 2008 Outline Lecture 11 Gradient descent algorithm Improvement to result in Lec 11 At what rate will it converge? Constrained
More informationIterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem
Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationLocal strong convexity and local Lipschitz continuity of the gradient of convex functions
Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate
More informationConvergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer Abstract We generalize the classic convergence rate theory for subgradient methods to
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationSubgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives
Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationLecture 3: Linesearch methods (continued). Steepest descent methods
Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).
More informationSubgradient Methods. Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring
Subgradient Methods Stephen Boyd (with help from Jaehyun Park) Notes for EE364b, Stanford University, Spring 2013 14 May 2014; based on notes from January 2007 Contents 1 Introduction 3 2 Basic subgradient
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More informationStatic Problem Set 2 Solutions
Static Problem Set Solutions Jonathan Kreamer July, 0 Question (i) Let g, h be two concave functions. Is f = g + h a concave function? Prove it. Yes. Proof: Consider any two points x, x and α [0, ]. Let
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationLecture Notes on Iterative Optimization Algorithms
Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell December 8, 2014 Lecture Notes on Iterative Optimization Algorithms Contents Preface vii 1 Overview and Examples
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationLecture 6 : Projected Gradient Descent
Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly
More informationLECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS.
LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON 6.253 CLASS LECTURES AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS http://web.mit.edu/dimitrib/www/home.html
More informationSecond order forward-backward dynamical systems for monotone inclusion problems
Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from
More informationMATH 829: Introduction to Data Mining and Analysis Computing the lasso solution
1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso
More informationIn Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009
In Progress: Summary of Notation and Basic Results Convex Analysis C&O 663, Fall 2009 Intructor: Henry Wolkowicz November 26, 2009 University of Waterloo Department of Combinatorics & Optimization Abstract
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More information