Accelerating Nesterov s Method for Strongly Convex Functions
|
|
- Cynthia Mason
- 6 years ago
- Views:
Transcription
1 Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011
2 Outline The Gap 1 The Gap 2 3
3 Outline The Gap 1 The Gap 2 3
4 Our talk begins with a tiny gap For any x 0 R and any constant µ > 0, L > µ there exists a function f S,1 µ,l such that for any first-order method, we have f (x k ) f µ 2 ( κ 1 κ + 1 ) 2k x 0 x 2, κ = L µ. Nesterov s method generates a sequence {x k } k=0 such that ( ) k κ 1 f (x k ) f L x 0 x 2, κ = L κ µ.
5 At a closer look, the gap is not tiny Assume that κ is large. Given a small tolerance ɛ > 0, to make f (x k ) f < ɛ, the ideal first-order method needs K = log ɛ log µ 2 2 log κ 1 log 1 κ ɛ 4 κ+1 number of iterations. Nesterov s method needs log ɛ log L K = log 1 log ɛ κ κ 1 κ number of iterations, which is 4 times as large as the ideal number.
6 Can we reduce the gap? Can we reduce the gap for quadratic functions? minimize f (x) = 1 2 x T Ax b T x, µi n A LI n. In this case, we do have an ideal method, the conjugate gradient method, having the optimal convergence rate. for general strongly convex functions? minimize f (x), f (x) S µ,l.
7 Outline The Gap 1 The Gap 2 3
8 Nesterov s constant step scheme, III 0. Choose y 0 = x 0 R n. 1. k-th iteration (k 0). x k+1 = y k hf (y k ), y k+1 = x k+1 + β(x k+1 x k ), where h = 1 L and β = 1 µh 1+ µh. Q: Is Nesterov s choice of h and β optimal?
9 On quadratic functions When minimizing a quadratic function f (x) = 1 2 x T Ax b T x, Nesterov s updates become 0. Choose y 0 = x 0 = k-th iteration (k 0). x k+1 = y k h(ay k b), y k+1 = x k+1 + β(x k+1 x k ).
10 Eigendecomposition Let A = V ΛV T be A s eigendecomposition. Define x k = V T x k, ȳ k = V T y k for all k, and b = V T b. Then Nesterov s updates can be written as 0. Choose ȳ 0 = x 0 = k-th iteration (k 0). x k+1 = ȳ k h(λȳ k b), ȳ k+1 = x k+1 + β( x k+1 x k ). Λ is diagonal, hence the updates are actually element-wise: x k+1,i = ȳ k,i h(λ i ȳ k,i b i ), i = 1,..., n, ȳ k+1,i = x k+1,i β(λ i ȳ k,i b i ), i = 1,..., n.
11 Recurrence relation We can eliminate the sequence {ȳ k } from the update scheme. x k+1,i = ȳ k,i h(λ i ȳ k,i b i ) = ( x k,i + β( x k,i x k 1,i ) h(λ i ( x k,i + β( x k,i x k 1,i )) b i ) = (1 + β)(1 λ i h) x k,i β(1 λ i h) x k 1,i + h b i. Let ē k = V T (x k x ) = V T (x k V Λ 1 V T b) = x k Λ 1 b for all k. We have the following recurrence relation on the error: ē k+1,i = (1 + β)(1 λ i h)ē k,i β(1 λ i h)ē k 1,i.
12 Characteristic equation The characteristic equation for the recurrence relation is given by ξ 2 i = (1 + β)(1 λ i h)ξ i β(1 λ i h). Denote the two roots by ξ i,1 and ξ i,2, and assume they are distinct for simplicity. The general solution is given by ē k,i = C i,1 ξ k i,1 + C i,2 ξ k i,2. Let C i = C i,1 + C i,2 and θ i = max{ ξ i,1, ξ i,2 }. We have Hence, ē k,i C i θ k i. x k x 2 = x k x 2 = i ē k,i 2 i C 2 i θ 2k i Cθ 2k, where C = i C 2 i and θ = max i θ i.
13 Finding the optimal convergence rate Our problem becomes minimize θ subject to θ ξ 1 (λ), ξ 2 (λ), λ [µ, L], where ξ 1 (λ) and ξ 2 (λ) are the roots of ξ 2 = (1 + β)(1 λh)ξ β(1 λh), where h, β and θ are variables.
14 Special cases If β = 0, we are doing gradient descent. The optimal rate is given by θ = L µ L+µ, attained at h = 2 L+µ. If h = 1 L, the optimal rate is given by θ = 1 µh = 1 µ L, attained at β = 1 µh 1+ µh = L µ L+ µ, which confirms Nesterov s choice. Q: Why do we choose h = 1 L? It guarantees the most decrease in function value of a function with Lipschitz constant L.
15 The optimal convergence rate By considering all the combinations of h and β, we reach the following optimal solution: ( h 4 = the harmonic mean of 1 ) 3L + µ L and 2 L + µ β = 1 µh 1 + µh, θ = 1 µh = 1 2 3κ + 1.
16 Comparing the convergence rates Nesterov s method (h = 1 L ): ( x k x C 1 1 ) k x 0 x. κ Note that this is better than the convergence rate we have on general strongly convex functions. Nesterov s method (h = 4 3L+µ ): x k x C Conjugate gradient: x k x A 2 ( ) k 2 1 x 0 x. 3κ + 1 ( ) k 2 1 x 0 x A. κ + 1
17 What s happening on the eigenspace Figure: Error along eigendirections ( ē k,i )
18 The model problem minimize f (x) = 1 2 x T Ax b T x, where 2 1. A = δi n R n n, b = randn(n, 1) R n. 1 2 We chose n = 10 6 and δ = 0.05.
19 Figure: x k x
20 Figure: f (x k ) f
21 Outline The Gap 1 The Gap 2 3
22 Back to Nesterov s proof A pair of sequence {φ k (x)} and {λ k }, λ k 0 is called an estimate sequence of function f (x) if λ k 0 and for any x R n and all k 0 we have φ k (x) (1 λ k )f (x) + λ k φ 0 (x). If for a sequence {x k } we have f (x k ) φ k min x R n φ k(x) then f (x k ) f λ k [φ 0 (x ) f ] 0
23 A useful estimate sequence provided by Nesterov λ k+1 = (1 α k )λ k φ k+1 (x) = (1 α k )φ k (x) + α k [f (y k ) + f (y k ), x y k + µ 2 x y k 2 ] where {y k } is an arbitrary sequence in R n. α k (0, 1), k=0 α k =. λ 0 = 1. φ 0 is an arbitrary function on R n.
24 A specific choice of φ 0 (x) φ 0 (x) φ 0 + γ 0 2 x v 0 2 and set x 0 = v 0, φ 0 = f (x 0) The previous estimate sequence becomes with γ k+1 =(1 α k )γ k + α k µ φ k (x) φ k + γ k 2 x v k 2 v k+1 =[(1 α k )γ k v k + α k µy k α k f (y k )]/γ k+1 φ k+1 =(1 α k)φ k + α kf (y k ) α2 k 2γ k+1 f (y k ) 2 + α k(1 α k )γ k γ k+1 ( µ 2 y k v k 2 + f (y k ), v k y k )
25 Let the update be x k+1 = y k h k f (y k ) and use the inequalities φ k f (x k) f (y k ) + f (y k ), x k y k + µ 2 x k y k 2 f (x k+1 ) f (y k ) h k(2 Lh k ) 2 f (y k ) 2 We have ( φ k+1 f (x k+1) α2 k 2 2γ k+1 + h k(2 Lh k ) ) f (y k ) 2 + (1 α) f (y k ), α kγ k (v k y k ) + (x k y k ) γ k+1 + µ(1 α ( ) k) αk γ k v k y k 2 + x k y k 2 2 γ k+1
26 ( φ k+1 f (x k+1) Nesterov choice: y k = α kγ k v k +γ k+1 x k γ k +α k µ h k = 1 L. α2 k + h k(2 Lh k ) 2γ k+1 2 ) f (y k ) 2 +(1 α k ) f (y k ), α kγ k (v k y k ) + (x k y k ) γ k+1 + µ(1 α ( ) k) αk γ k v k y k 2 + x k y k 2 2 γ k+1 γ 0 µ. Since γ k+1 = (1 α k )γ k + α k µ, we have γ k µ α k can be as large as µ L convergence rate 1 µ L = 1 1 κ. at each step, which leads to the
27 A simplified version γ k µ, h k 1 L y k = α kv k +x k α+1 v k y k = v k x k α+1 x k y k = α(x k v k ) α+1 φ k+1 f (x k+1) ( α2 k 2µ + 1 ) f (y k ) 2 2L + µα k(1 α k ) 2(1 + α k ) x k v k 2
28 x k v k 2 / f (y k ) 2 Figure: f (x) = 1 2 Ax b 2 + λ smooth( x 1, τ) µ x 2
29 µα k (1 α k ) 2(1 + α k ) x k v k 2 α2 k ) 2 2µ 1 2L f ( α kv k +x k α k +1 Since the decay rate is k (1 α k), we want to find a large α k such that the inequality holds. Evaluating f ( α kv k +x k α+1 ) is time consuming, so we hope our first guess of α k is good. Note that f (y k ) has a trend of decreasing, so our procedure is to find an α k µ L such that µα k(1 α k ) x k v k 2 2(1+α k ) α2 f (y k 1 ) 2 k 2µ is large, then such α k usually makes the inequality holds.
30 f (y k ) The Gap Figure: f (x) = 1 2 Ax b 2 + λ smooth( x 1, τ) µ x 2
31 Test 1: smooth-bpdn The first test is a smooth version of Basis Pursuit De-Noising: minimize f (x) = 1 2 Ax b 2 + λ smooth( x 1, τ) + µ 2 x 2, where we set A = 1 n randn(m, n), m = 1000, n = 3000, λ = 0.2, τ = 0.001, and µ = x is a random sparse vector with 125 non-zeros and b = Ax + ε. We use the following estimate for L: ˆL = ( 1 + ) 2 m + λ n τ + µ
32 Figure: x k x
33 Figure: f (x k ) f
34 Test 2: anisotropic bowl The second test is minimize f (x) = subject to x τ. n i xi x 2, We choose n = 500 and τ = 4. x 0 is randomly chosen from the boundary {x x = τ}. For this problem, we have i=1 L = 12nτ = and µ = 1.
35 Figure: x k x
36 Figure: f (x k ) f
37 Test 3: back to quadratic functions Let s check the performance of the adaptive algorithm on quadratic functions. minimize f (x) = 1 2 x T Ax b T x. We choose A 1 m W n(i n, m), where n = 4500 and m = We use the following estimate for L and µ: ˆL = ( ) 2 ( ) 2 n n 1 +, ˆµ = 1. m m
38 Figure: x k x
39 Figure: f (x k ) f
40 Comparing with TFOCS(AT) Figure: x k x
41 Figure: f (x k ) f
42 Final thoughts The convergence rate of Nesterov s method depends on problem types. For quadratic problems, the speed is doubled. There is space to improve Nesterov s optimal gradient method on strongly convex functions. Whether we can improve Nesterov s method universally (with theoretical proof) is still a question.
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationConvergence Analysis of Deterministic. and Stochastic Methods for Convex. Optimization
Convergence Analysis of Deterministic and Stochastic Methods for Convex Optimization by Riley Brooks A final project submitted to the Department of Applied Mathematics in partial fulfillment of the requirements
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More informationSubgradient methods for huge-scale optimization problems
Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationIFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent
IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationCSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent
CSCI 1951-G Optimization Methods in Finance Part 12: Variants of Gradient Descent April 27, 2018 1 / 32 Outline 1) Moment and Nesterov s accelerated gradient descent 2) AdaGrad and RMSProp 4) Adam 5) Stochastic
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationComplexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 21: Sensitivity of Eigenvalues and Eigenvectors; Conjugate Gradient Method Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More information4 Stability analysis of finite-difference methods for ODEs
MATH 337, by T. Lakoba, University of Vermont 36 4 Stability analysis of finite-difference methods for ODEs 4.1 Consistency, stability, and convergence of a numerical method; Main Theorem In this Lecture
More informationthe method of steepest descent
MATH 3511 Spring 2018 the method of steepest descent http://www.phys.uconn.edu/ rozman/courses/m3511_18s/ Last modified: February 6, 2018 Abstract The Steepest Descent is an iterative method for solving
More informationSGD and Randomized projection algorithms for overdetermined linear systems
SGD and Randomized projection algorithms for overdetermined linear systems Deanna Needell Claremont McKenna College IPAM, Feb. 25, 2014 Includes joint work with Eldar, Ward, Tropp, Srebro-Ward Setup Setup
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationMore First-Order Optimization Algorithms
More First-Order Optimization Algorithms Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye Chapters 3, 8, 3 The SDM
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Jason E. Hicken Aerospace Design Lab Department of Aeronautics & Astronautics Stanford University 14 July 2011 Lecture Objectives describe when CG can be used to solve Ax
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationLecture # 20 The Preconditioned Conjugate Gradient Method
Lecture # 20 The Preconditioned Conjugate Gradient Method We wish to solve Ax = b (1) A R n n is symmetric and positive definite (SPD). We then of n are being VERY LARGE, say, n = 10 6 or n = 10 7. Usually,
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationJournal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19
Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal
More informationOptimized first-order minimization methods
Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure
More information26. Filtering. ECE 830, Spring 2014
26. Filtering ECE 830, Spring 2014 1 / 26 Wiener Filtering Wiener filtering is the application of LMMSE estimation to recovery of a signal in additive noise under wide sense sationarity assumptions. Problem
More informationFINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING. 1. Introduction. We study the nonlinear programming problem
FINE TUNING NESTEROV S STEEPEST DESCENT ALGORITHM FOR DIFFERENTIABLE CONVEX PROGRAMMING CLÓVIS C. GONZAGA AND ELIZABETH W. KARAS Abstract. We modify the first order algorithm for convex programming described
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More informationJanuary 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13
Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière Hestenes Stiefel January 29, 2014 Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationOn Nesterov s Random Coordinate Descent Algorithms
On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm
More informationPrimal-dual IPM with Asymmetric Barrier
Primal-dual IPM with Asymmetric Barrier Yurii Nesterov, CORE/INMA (UCL) September 29, 2008 (IFOR, ETHZ) Yu. Nesterov Primal-dual IPM with Asymmetric Barrier 1/28 Outline 1 Symmetric and asymmetric barriers
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationFAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING
FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this
More informationHow hard is this function to optimize?
How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationAn Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations
International Journal of Mathematical Modelling & Computations Vol. 07, No. 02, Spring 2017, 145-157 An Alternative Three-Term Conjugate Gradient Algorithm for Systems of Nonlinear Equations L. Muhammad
More informationOptimisation non convexe avec garanties de complexité via Newton+gradient conjugué
Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG
More informationNumerical Methods - Numerical Linear Algebra
Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear
More informationAdaptive Restarting for First Order Optimization Methods
Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation
More informationCS711008Z Algorithm Design and Analysis
CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More information12. Interior-point methods
12. Interior-point methods Convex Optimization Boyd & Vandenberghe inequality constrained minimization logarithmic barrier function and central path barrier method feasibility and phase I methods complexity
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationCoordinate Descent Methods on Huge-Scale Optimization Problems
Coordinate Descent Methods on Huge-Scale Optimization Problems Zhimin Peng Optimization Group Meeting Warm up exercise? Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant,
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS
ALGORITHMS FOR MINIMIZING DIFFERENCES OF CONVEX FUNCTIONS AND APPLICATIONS Mau Nam Nguyen (joint work with D. Giles and R. B. Rector) Fariborz Maseeh Department of Mathematics and Statistics Portland State
More informationDescent methods. min x. f(x)
Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationAccelerated gradient methods
ELE 538B: Large-Scale Optimization for Data Science Accelerated gradient methods Yuxin Chen Princeton University, Spring 018 Outline Heavy-ball methods Nesterov s accelerated gradient methods Accelerated
More informationLecture: Smoothing.
Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationIterative regularization of nonlinear ill-posed problems in Banach space
Iterative regularization of nonlinear ill-posed problems in Banach space Barbara Kaltenbacher, University of Klagenfurt joint work with Bernd Hofmann, Technical University of Chemnitz, Frank Schöpfer and
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationIterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming
Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study
More informationGradient Methods Using Momentum and Memory
Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set
More informationLIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS
LIMITED MEMORY BUNDLE METHOD FOR LARGE BOUND CONSTRAINED NONSMOOTH OPTIMIZATION: CONVERGENCE ANALYSIS Napsu Karmitsa 1 Marko M. Mäkelä 2 Department of Mathematics, University of Turku, FI-20014 Turku,
More information