Coordinate Descent Methods on Huge-Scale Optimization Problems

Size: px
Start display at page:

Download "Coordinate Descent Methods on Huge-Scale Optimization Problems"

Transcription

1 Coordinate Descent Methods on Huge-Scale Optimization Problems Zhimin Peng Optimization Group Meeting

2 Warm up exercise?

3 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home?

4 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home? A: Because they know the Chinese remainder theorem!

5 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home? A: Because they know the Chinese remainder theorem! Q: What does the zero say to the eight?

6 Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant, always insist on taking the leftovers home? A: Because they know the Chinese remainder theorem! Q: What does the zero say to the eight? A: Nice belt!

7 Motivation consider optimization problem: min f(x) x R N

8 Motivation consider optimization problem: min f(x) x R N Why coordinate descent methods(cd)?

9 Motivation consider optimization problem: min f(x) x R N Why coordinate descent methods(cd)? CD based on maximal absolute value of gradient 1. Choose i k = arg max 1 i n if(x k ) 2. Update x k+1 = x k α ik f(x k )e ik

10 Motivation consider optimization problem: min f(x) x R N Why coordinate descent methods(cd)? CD based on maximal absolute value of gradient 1. Choose i k = arg max 1 i n if(x k ) 2. Update x k+1 = x k α ik f(x k )e ik What s the problem with it?

11 Huge scale problems?

12 Huge scale problems? Sources: Internet, telecommunication Finite element schemes, weather prediction

13 Huge scale problems? Sources: Internet, telecommunication Finite element schemes, weather prediction Features: Expensive function evaluation Huge data

14 Huge scale problems? Sources: Internet, telecommunication Finite element schemes, weather prediction Features: Expensive function evaluation Huge data Conclusion: We need CD methods!

15 Unconstrained Optimization min f(x) x R N

16 Unconstrained Optimization Notations: Decomposition of R N : min f(x) x R N R N = n i=1 R ni

17 Unconstrained Optimization Notations: Decomposition of R N : min f(x) x R N R N = Partition of the unit matrix U: n i=1 R ni I N = (U 1, U 2,..., U n ) R N N, U i R N ni

18 Unconstrained Optimization Notations: Decomposition of R N : min f(x) x R N R N = Partition of the unit matrix U: n i=1 R ni I N = (U 1, U 2,..., U n ) R N N, U i R N ni x = (x (1), x (2),..., x (n) ) T R N can be represented as: n x = U i x (i) i=1

19 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni

20 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni Assume that the gradient of function f is coordinatewise Lipschitz continuous: x = max x =1 s, x f i(x + U i h i ) f i(x) (i) L i h i (i)

21 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni Assume that the gradient of function f is coordinatewise Lipschitz continuous: x = max x =1 s, x Optimal coordinate steps: f i(x + U i h i ) f i(x) (i) L i h i (i) T i (x) = x 1 L i U i f i(x) # s # arg max s, x 1 2 x 2

22 More notations... Partial gradient of f(x) f i(x) = U T i f(x) R ni Assume that the gradient of function f is coordinatewise Lipschitz continuous: x = max x =1 s, x Optimal coordinate steps: f i(x + U i h i ) f i(x) (i) L i h i (i) T i (x) = x 1 L i U i f i(x) # s # arg max s, x 1 2 x 2

23 More notations... A new norm: x [α] = [ n i=1 where (i) is some fixed norm. L α i x (i) (2) (i) ] 1 2 Random counter A α, α R, which generates an random number i {1,..., n} with probability p (i) α = Lα i j Lα j

24 Method RCDM(α, x 0 ) Algorithm: 1. Choose i k = A α 2. Update x k+1 = T ik (x k )

25 Method RCDM(α, x 0 ) Algorithm: Theorem For any k 0, we have 1. Choose i k = A α 2. Update x k+1 = T ik (x k ) E[f(x k )] f 2 k + 4 L α j R1 α(x 2 0 ) where R β (x 0 ) = max x {max x X x x [β] : f(x) f(x 0 )} Comments: R β (x 0 ) measures the distance between the initial point x 0 and the optimal set X. In fact, R β (x 0 ) is positively correlated to the distance between x 0 and X. j

26 Proof Key inequality 1: The above inequality is given by the Lipschitz gradient inequality.

27 Proof Key inequality 1: The above inequality is given by the Lipschitz gradient inequality. Key inequality 2:

28 Combine the previous key inequalities, we have

29 Convergence of strongly convex functions Strongly convex functions: f(y) f(x) + f(x), y x + 1 σ(f) y x 2 2 σ = σ(f) is the convexity parameter

30 Convergence of strongly convex functions Strongly convex functions: f(y) f(x) + f(x), y x + 1 σ(f) y x 2 2 σ = σ(f) is the convexity parameter Theorem Let function f(x) be strongly convex with respect to the norm [1 α] with convexity parameter σ 1 α = σ 1 α (f) > 0. Then, for the sequence {x k } generated by RCMD we have E[f(x k )] f (1 σ 1 α(f) S α (f) )k (f(x 0 ) f )

31 Convergence of strongly convex functions Strongly convex functions: f(y) f(x) + f(x), y x + 1 σ(f) y x 2 2 σ = σ(f) is the convexity parameter Theorem Let function f(x) be strongly convex with respect to the norm [1 α] with convexity parameter σ 1 α = σ 1 α (f) > 0. Then, for the sequence {x k } generated by RCMD we have Proof: E[f(x k )] f (1 σ 1 α(f) S α (f) )k (f(x 0 ) f )

32 Expected quality is good!

33 Expected quality is good! How about the result of a single run?

34 Expected quality is good! How about the result of a single run? Define function f µ (x) by: f µ (x) = f(x) + µ 2 x x 0 2 [1] f µ (x) is strongly convex with respect to [1] f µ (x) has convexity parameter µ

35 Expected quality is good! How about the result of a single run? Define function f µ (x) by: f µ (x) = f(x) + µ 2 x x 0 2 [1] f µ (x) is strongly convex with respect to [1] f µ (x) has convexity parameter µ Theorem Let us define µ = 4R 2 1 ε (x0) and choose k µ ln 1 2µ(1 β) If the random point x k is generated by RCDM(0, x 0 ) as applied to function f µ, then Prob(f(x k ) f ε) β Comments: The second inequality is derived by the property of strongly convex function.

36 Accelerated Coordinate Descent Consider the following scheme applied to strongly convex function with given convexity parameter σ:

37 Convergence Based on the previous accelerated algorithm, we have the following convergence theorem:

38 Constrained optimization Consider the constrained minimization problem min f(x) x Q Q = n i=1 Q i, where Q i R ni are closed and convex f(x) is convex and satisfies the smoothness assumption: f i(x + U i h i ) f i(x) (i) L i h i (i)

39 Constrained optimization Consider the constrained minimization problem min f(x) x Q Q = n i=1 Q i, where Q i R ni are closed and convex f(x) is convex and satisfies the smoothness assumption: f i(x + U i h i ) f i(x) (i) L i h i (i) Algorithm: (1) Choose randomly i by uniform distribution on {1,...,n} (2) u (i) = arg min f i(x k ), u (i) x (i) u (i) k + L i Q i 2 u(i) x (i) k (3) Update x k+1 = x k + U T i (u (i) x (i) k ) 2 (i)

40 Theorem For any k 0 we have φ k f n n + k (1 2 R2 1(x 0 ) + f(x 0 ) f ) If f is strongly convex in [1] with constant σ, then φ k f (1 2σ n(1 + σ) )k ( 1 2 R2 1(x 0 ) + f(x 0 ) f )

41 Implementation

42 Google problem Let E R n n be an incidence matrix of graph; E = E diag(e T e) 1 ; Google problem: min 1 2 Ex x 2 + γ [ e, x 1]2 2

43 Google problem Let E R n n be an incidence matrix of graph; E = E diag(e T e) 1 ; Google problem: min 1 2 Ex x 2 + γ [ e, x 1]2 2

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

Lecture 3: Huge-scale optimization problems

Lecture 3: Huge-scale optimization problems Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

On Nesterov s Random Coordinate Descent Algorithms

On Nesterov s Random Coordinate Descent Algorithms On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

Lecture 2: Convex Sets and Functions

Lecture 2: Convex Sets and Functions Lecture 2: Convex Sets and Functions Hyang-Won Lee Dept. of Internet & Multimedia Eng. Konkuk University Lecture 2 Network Optimization, Fall 2015 1 / 22 Optimization Problems Optimization problems are

More information

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017

Pavel Dvurechensky Alexander Gasnikov Alexander Tiurin. July 26, 2017 Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods Coordinate Descent, Directional Search, Derivative-Free Method) Pavel Dvurechensky Alexander Gasnikov

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.

More information

CSC 576: Gradient Descent Algorithms

CSC 576: Gradient Descent Algorithms CSC 576: Gradient Descent Algorithms Ji Liu Department of Computer Sciences, University of Rochester December 22, 205 Introduction The gradient descent algorithm is one of the most popular optimization

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Coordinate descent methods

Coordinate descent methods Coordinate descent methods Master Mathematics for data science and big data Olivier Fercoq November 3, 05 Contents Exact coordinate descent Coordinate gradient descent 3 3 Proximal coordinate descent 5

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg MVE165/MMG631 Overview of nonlinear programming Ann-Brith Strömberg 2015 05 21 Areas of applications, examples (Ch. 9.1) Structural optimization Design of aircraft, ships, bridges, etc Decide on the material

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Non-negative Matrix Factorization via accelerated Projected Gradient Descent

Non-negative Matrix Factorization via accelerated Projected Gradient Descent Non-negative Matrix Factorization via accelerated Projected Gradient Descent Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium Email: manshun.ang@umons.ac.be Homepage: angms.science

More information

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008 Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization October 15, 2008 Outline Lecture 11 Gradient descent algorithm Improvement to result in Lec 11 At what rate will it converge? Constrained

More information

(b) Prove that the following function does not tend to a limit as x tends. is continuous at 1. [6] you use. (i) f(x) = x 4 4x+7, I = [1,2]

(b) Prove that the following function does not tend to a limit as x tends. is continuous at 1. [6] you use. (i) f(x) = x 4 4x+7, I = [1,2] TMA M208 06 Cut-off date 28 April 2014 (Analysis Block B) Question 1 (Unit AB1) 25 marks This question tests your understanding of limits, the ε δ definition of continuity and uniform continuity, and your

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Numerical Methods. V. Leclère May 15, x R n

Numerical Methods. V. Leclère May 15, x R n Numerical Methods V. Leclère May 15, 2018 1 Some optimization algorithms Consider the unconstrained optimization problem min f(x). (1) x R n A descent direction algorithm is an algorithm that construct

More information

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016 Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

10. Ellipsoid method

10. Ellipsoid method 10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

10 Numerical methods for constrained problems

10 Numerical methods for constrained problems 10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Konstantin Tretyakov (kt@ut.ee) MTAT.03.227 Machine Learning So far Machine learning is important and interesting The general concept: Fitting models to data So far Machine

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Non-convex optimization. Issam Laradji

Non-convex optimization. Issam Laradji Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Lecture 6: September 12

Lecture 6: September 12 10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Numerical Optimization. Review: Unconstrained Optimization

Numerical Optimization. Review: Unconstrained Optimization Numerical Optimization Finding the best feasible solution Edward P. Gatzke Department of Chemical Engineering University of South Carolina Ed Gatzke (USC CHE ) Numerical Optimization ECHE 589, Spring 2011

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Data Science - Convex optimization and application

Data Science - Convex optimization and application 1 Data Science - Convex optimization and application Data Science - Convex optimization and application Summary We begin by some illustrations in challenging topics in modern data science. Then, this session

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Accelerating Nesterov s Method for Strongly Convex Functions

Accelerating Nesterov s Method for Strongly Convex Functions Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0

More information

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S. Bull. Iranian Math. Soc. Vol. 40 (2014), No. 1, pp. 235 242 Online ISSN: 1735-8515 AN EIGENVALUE STUDY ON THE SUFFICIENT DESCENT PROPERTY OF A MODIFIED POLAK-RIBIÈRE-POLYAK CONJUGATE GRADIENT METHOD S.

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Introduction to Nonlinear Optimization Paul J. Atzberger

Introduction to Nonlinear Optimization Paul J. Atzberger Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Gradient Descent. Dr. Xiaowei Huang

Gradient Descent. Dr. Xiaowei Huang Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

Lecture 6: September 17

Lecture 6: September 17 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Coordinate gradient descent methods. Ion Necoara

Coordinate gradient descent methods. Ion Necoara Coordinate gradient descent methods Ion Necoara January 2, 207 ii Contents Coordinate gradient descent methods. Motivation..................................... 5.. Coordinate minimization versus coordinate

More information

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion

More information

The Frank-Wolfe Algorithm:

The Frank-Wolfe Algorithm: The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology

More information

Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient

Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Xingyu Zhou The Ohio State University zhou.2055@osu.edu December 5, 2017 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 1

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization Modern Stochastic Methods Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization 10-725 Last time: conditional gradient method For the problem min x f(x) subject to x C where

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006 Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in

More information

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi Gradient Descent This lecture introduces Gradient Descent a meta-algorithm for unconstrained minimization. Under convexity of the

More information

Majorization Minimization - the Technique of Surrogate

Majorization Minimization - the Technique of Surrogate Majorization Minimization - the Technique of Surrogate Andersen Ang Mathématique et de Recherche opérationnelle Faculté polytechnique de Mons UMONS Mons, Belgium email: manshun.ang@umons.ac.be homepage:

More information

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems Chapter 1 Optimality Conditions: Unconstrained Optimization 1.1 Differentiable Problems Consider the problem of minimizing the function f : R n R where f is twice continuously differentiable on R n : P

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

1 Definition of the Riemann integral

1 Definition of the Riemann integral MAT337H1, Introduction to Real Analysis: notes on Riemann integration 1 Definition of the Riemann integral Definition 1.1. Let [a, b] R be a closed interval. A partition P of [a, b] is a finite set of

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised) Computational Optimization Mathematical Programming Fundamentals 1/5 (revised) If you don t know where you are going, you probably won t get there. -from some book I read in eight grade If you do get there,

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Existence of minimizers

Existence of minimizers Existence of imizers We have just talked a lot about how to find the imizer of an unconstrained convex optimization problem. We have not talked too much, at least not in concrete mathematical terms, about

More information

Optimality Conditions for Nonsmooth Convex Optimization

Optimality Conditions for Nonsmooth Convex Optimization Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never

More information

Lecture 14 Ellipsoid method

Lecture 14 Ellipsoid method S. Boyd EE364 Lecture 14 Ellipsoid method idea of localization methods bisection on R center of gravity algorithm ellipsoid method 14 1 Localization f : R n R convex (and for now, differentiable) problem:

More information

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks

Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks J. Optimization Theory & Applications manuscript No. (will be inserted by the editor) Random Block Coordinate Descent Methods for Linearly Constrained Optimization over Networks Ion Necoara, Yurii Nesterov

More information