Coordinate Descent Methods on Huge-Scale Optimization Problems
|
|
- Elmer Wilkinson
- 6 years ago
- Views:
Transcription
1 Coordinate Descent Methods on Huge-Scale Optimization Problems Zhimin Peng Optimization Group Meeting
2 Warm up exercise?
3 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home?
4 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home? A: Because they know the Chinese remainder theorem!
5 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home? A: Because they know the Chinese remainder theorem! Q: What does the zero say to the eight?
6 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home? A: Because they know the Chinese remainder theorem! Q: What does the zero say to the eight? A: Nice belt!
7 Motivation consider optimization problem: min f(x) x R N
8 Motivation consider optimization problem: min f(x) x R N Why coordinate descent methods(cd)?
9 Motivation consider optimization problem: min f(x) x R N Why coordinate descent methods(cd)? CD based on maximal absolute value of gradient 1. Choose i k = arg max 1 i n if(x k ) 2. Update x k+1 = x k α ik f(x k )e ik
10 Motivation consider optimization problem: min f(x) x R N Why coordinate descent methods(cd)? CD based on maximal absolute value of gradient 1. Choose i k = arg max 1 i n if(x k ) 2. Update x k+1 = x k α ik f(x k )e ik What s the problem with it?
11 Huge scale problems?
12 Huge scale problems? Sources: Internet, telecommunication Finite element schemes, weather prediction
13 Huge scale problems? Sources: Internet, telecommunication Finite element schemes, weather prediction Features: Expensive function evaluation Huge data
14 Huge scale problems? Sources: Internet, telecommunication Finite element schemes, weather prediction Features: Expensive function evaluation Huge data Conclusion: We need CD methods!
15 Unconstrained Optimization min f(x) x R N
16 Unconstrained Optimization Notations: Decomposition of R N : min f(x) x R N R N = n i=1 R ni
17 Unconstrained Optimization Notations: Decomposition of R N : min f(x) x R N R N = Partition of the unit matrix U: n i=1 R ni I N = (U 1, U 2,..., U n ) R N N, U i R N ni
18 Unconstrained Optimization Notations: Decomposition of R N : min f(x) x R N R N = Partition of the unit matrix U: n i=1 R ni I N = (U 1, U 2,..., U n ) R N N, U i R N ni x = (x (1), x (2),..., x (n) ) T R N can be represented as: n x = U i x (i) i=1
19 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni
20 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni Assume that the gradient of function f is coordinatewise Lipschitz continuous: x = max x =1 s, x f i(x + U i h i ) f i(x) (i) L i h i (i)
21 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni Assume that the gradient of function f is coordinatewise Lipschitz continuous: x = max x =1 s, x Optimal coordinate steps: f i(x + U i h i ) f i(x) (i) L i h i (i) T i (x) = x 1 L i U i f i(x) # s # arg max s, x 1 2 x 2
22 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni Assume that the gradient of function f is coordinatewise Lipschitz continuous: x = max x =1 s, x Optimal coordinate steps: f i(x + U i h i ) f i(x) (i) L i h i (i) T i (x) = x 1 L i U i f i(x) # s # arg max s, x 1 2 x 2
23 More notations... A new norm: x [α] = [ n i=1 where (i) is some fixed norm. L α i x (i) (2) (i) ] 1 2 Random counter A α, α R, which generates an random number i {1,..., n} with probability p (i) α = Lα i j Lα j
24 Method RCDM(α, x 0 ) Algorithm: 1. Choose i k = A α 2. Update x k+1 = T ik (x k )
25 Method RCDM(α, x 0 ) Algorithm: Theorem For any k 0, we have 1. Choose i k = A α 2. Update x k+1 = T ik (x k ) E[f(x k )] f 2 k + 4 L α j R1 α(x 2 0 ) where R β (x 0 ) = max x {max x X x x [β] : f(x) f(x 0 )} Comments: R β (x 0 ) measures the distance between the initial point x 0 and the optimal set X. In fact, R β (x 0 ) is positively correlated to the distance between x 0 and X. j
26 Proof Key inequality 1: The above inequality is given by the Lipschitz gradient inequality.
27 Proof Key inequality 1: The above inequality is given by the Lipschitz gradient inequality. Key inequality 2:
28 Combine the previous key inequalities, we have
29 Convergence of strongly convex functions Strongly convex functions: f(y) f(x) + f(x), y x + 1 σ(f) y x 2 2 σ = σ(f) is the convexity parameter
30 Convergence of strongly convex functions Strongly convex functions: f(y) f(x) + f(x), y x + 1 σ(f) y x 2 2 σ = σ(f) is the convexity parameter Theorem Let function f(x) be strongly convex with respect to the norm [1 α] with convexity parameter σ 1 α = σ 1 α (f) > 0. Then, for the sequence {x k } generated by RCMD we have E[f(x k )] f (1 σ 1 α(f) S α (f) )k (f(x 0 ) f )
31 Convergence of strongly convex functions Strongly convex functions: f(y) f(x) + f(x), y x + 1 σ(f) y x 2 2 σ = σ(f) is the convexity parameter Theorem Let function f(x) be strongly convex with respect to the norm [1 α] with convexity parameter σ 1 α = σ 1 α (f) > 0. Then, for the sequence {x k } generated by RCMD we have Proof: E[f(x k )] f (1 σ 1 α(f) S α (f) )k (f(x 0 ) f )
32 Expected quality is good!
33 Expected quality is good! How about the result of a single run?
34 Expected quality is good! How about the result of a single run? Define function f µ (x) by: f µ (x) = f(x) + µ 2 x x 0 2 [1] f µ (x) is strongly convex with respect to [1] f µ (x) has convexity parameter µ
35 Expected quality is good! How about the result of a single run? Define function f µ (x) by: f µ (x) = f(x) + µ 2 x x 0 2 [1] f µ (x) is strongly convex with respect to [1] f µ (x) has convexity parameter µ Theorem Let us define µ = 4R 2 1 ε (x0) and choose k µ ln 1 2µ(1 β) If the random point x k is generated by RCDM(0, x 0 ) as applied to function f µ, then Prob(f(x k ) f ε) β Comments: The second inequality is derived by the property of strongly convex function.
36 Accelerated Coordinate Descent Consider the following scheme applied to strongly convex function with given convexity parameter σ:
37 Convergence Based on the previous accelerated algorithm, we have the following convergence theorem:
38 Constrained optimization Consider the constrained minimization problem min f(x) x Q Q = n i=1 Q i, where Q i R ni are closed and convex f(x) is convex and satisfies the smoothness assumption: f i(x + U i h i ) f i(x) (i) L i h i (i)
39 Constrained optimization Consider the constrained minimization problem min f(x) x Q Q = n i=1 Q i, where Q i R ni are closed and convex f(x) is convex and satisfies the smoothness assumption: f i(x + U i h i ) f i(x) (i) L i h i (i) Algorithm: (1) Choose randomly i by uniform distribution on {1,...,n} (2) u (i) = arg min f i(x k ), u (i) x (i) u (i) k + L i Q i 2 u(i) x (i) k (3) Update x k+1 = x k + U T i (u (i) x (i) k ) 2 (i)
40 Theorem For any k 0 we have φ k f n n + k (1 2 R2 1(x 0 ) + f(x 0 ) f ) If f is strongly convex in [1] with constant σ, then φ k f (1 2σ n(1 + σ) )k ( 1 2 R2 1(x 0 ) + f(x 0 ) f )
41 Implementation
42 Google problem Let E R n n be an incidence matrix of graph; E = E diag(e T e) 1 ; Google problem: min 1 2 Ex x 2 + γ [ e, x 1]2 2
43 Google problem Let E R n n be an incidence matrix of graph; E = E diag(e T e) 1 ; Google problem: min 1 2 Ex x 2 + γ [ e, x 1]2 2
On Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationLecture 3: Huge-scale optimization problems
Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationOn Nesterov s Random Coordinate Descent Algorithms
On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More information15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018
15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationSubgradient methods for huge-scale optimization problems
Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems
More informationLecture 2: Convex Sets and Functions
Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are
More informationPavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017
Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov
More informationStochastic Gradient Descent with Variance Reduction
Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationRandom coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara
Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.
More informationCSC 576: Gradient Descent Algorithms
CSC 576: Gradient Descent Algorithms Ji Liu Department of Computer Sciences, University of Rochester December 22, 205 Introduction The gradient descent algorithm is one of the most popular optimization
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationCoordinate descent methods
Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationMVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg
MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationNon-negative Matrix Factorization via accelerated Projected Gradient Descent
Non-negative Matrix Factorization via accelerated Projected Gradient Descent Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium Email: manshun.ang@umons.ac.be Homepage: angms.science
More informationLecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008
Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization October 15, 2008 Outline Lecture 11 Gradient descent algorithm Improvement to result in Lec 11 At what rate will it converge? Constrained
More information(b) Prove that the following function does not tend to a limit as x tends. is continuous at 1. [6] you use. (i) f(x) = x 4 4x+7, I = [1,2]
TMA M208 06 Cut-off date 28 April 2014 (Analysis Block B) Question 1 (Unit AB1) 25 marks This question tests your understanding of limits, the ε δ definition of continuity and uniform continuity, and your
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationNumerical Methods. V. Leclère May 15, x R n
Numerical Methods V. Leclère May 15, 2018 1 Some optimization algorithms Consider the unconstrained optimization problem min f(x). (1) x R n A descent direction algorithm is an algorithm that construct
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationLecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent
10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for
More information10. Ellipsoid method
10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationPrimal-Dual Interior-Point Methods for Linear Programming based on Newton s Method
Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationIntroduction to Optimization
Introduction to Optimization Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Machine learning is important and interesting The general concept: Fitting models to data So far Machine
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationLecture 6: September 12
10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationNumerical Optimization. Review: Unconstrained Optimization
Numerical Optimization Finding the best feasible solution Edward P. Gatzke Department of Chemical Engineering University of South Carolina Ed Gatzke (USC CHE ) Numerical Optimization ECHE 589, Spring 2011
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationData Science - Convex optimization and application
1 Data Science - Convex optimization and application Data Science - Convex optimization and application Summary We begin by some illustrations in challenging topics in modern data science. Then, this session
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationAccelerating Nesterov s Method for Strongly Convex Functions
Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0
More informationAN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.
Bull. Iranian Math. Soc. Vol. 40 (2014), No. 1, pp. 235 242 Online ISSN: 1735-8515 AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationGradient Descent. Dr. Xiaowei Huang
Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationLecture 6: September 17
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationCoordinate gradient descent methods. Ion Necoara
Coordinate gradient descent methods Ion Necoara January 2, 207 ii Contents Coordinate gradient descent methods. Motivation..................................... 5.. Coordinate minimization versus coordinate
More informationA random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints
Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationFenchel Duality between Strong Convexity and Lipschitz Continuous Gradient
Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Xingyu Zhou The Ohio State University zhou.2055@osu.edu December 5, 2017 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 1
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationModern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization
Modern Stochastic Methods Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization 10-725 Last time: conditional gradient method For the problem min x f(x) subject to x C where
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationQuiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006
Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in
More informationCS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent
CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi Gradient Descent This lecture introduces Gradient Descent a meta-algorithm for unconstrained minimization. Under convexity of the
More informationMajorization Minimization - the Technique of Surrogate
Majorization Minimization - the Technique of Surrogate Andersen Ang Mathématique et de Recherche opérationnelle Faculté polytechnique de Mons UMONS Mons, Belgium email: manshun.ang@umons.ac.be homepage:
More informationChapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems
Chapter 1 Optimality Conditions: Unconstrained Optimization 1.1 Differentiable Problems Consider the problem of minimizing the function f : R n R where f is twice continuously differentiable on R n : P
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More information1 Definition of the Riemann integral
MAT337H1, Introduction to Real Analysis: notes on Riemann integration 1 Definition of the Riemann integral Definition 1.1. Let [a, b] R be a closed interval. A partition P of [a, b] is a finite set of
More informationSubgradient methods for huge-scale optimization problems
CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationSubgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives
Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationComputational Optimization. Mathematical Programming Fundamentals 1/25 (revised)
Computational Optimization Mathematical Programming Fundamentals 1/5 (revised) If you don t know where you are going, you probably won t get there. -from some book I read in eight grade If you do get there,
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationExistence of minimizers
Existence of imizers We have just talked a lot about how to find the imizer of an unconstrained convex optimization problem. We have not talked too much, at least not in concrete mathematical terms, about
More informationOptimality Conditions for Nonsmooth Convex Optimization
Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never
More informationLecture 14 Ellipsoid method
S. Boyd EE364 Lecture 14 Ellipsoid method idea of localization methods bisection on R center of gravity algorithm ellipsoid method 14 1 Localization f : R n R convex (and for now, differentiable) problem:
More informationRandom Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks
J. Optimization Theory & Applications manuscript No. (will be inserted by the editor) Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks Ion Necoara, Yurii Nesterov
More information