# Lecture 5: September 12

Save this PDF as:
Size: px
Start display at page:

## Transcription

1 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications. They may be distributed outside this class only with the permission of the Instructor. 5.1 Canonical Convex Programs Revisited Conic Programs Conic Programs are defined as Where min c T x x subject to Ax = b D(x) + d K (5.1) c, x R n and A R m x, b R m D : R n Y is a linear map, d Y, for Euclidean space Y K Y is a closed convex cone Second Order Cone program Second Order Cone program (SOCP) is defined as: minc T x x subject to D i x + d i 2 e T i x + f i, i = 1... p Ax = b (5.2) Theorem 5.1 SOCP s are Conic Programs Proof: Consider the second order cone: Q = {(x, t) : x 2 t} (5.3) Consider the constraint D 1 x + d 1 2 e T 1 x + f 1 = (D 1 x + d 1, e T 1 x + f 1 ) Q 1 (5.4) 5-1

2 5-2 Lecture 5: September 12 Similarly, (D 2 x + d 2, e T 2 x + f 2 ) Q 2. To generalize, we have D i x + d i 2 e T i x + f i (D i x + d i, e T i x + f i ) Q i, i {1, 2,... p} (5.5) Now, considering the cone taking the cartesian product of all the previous cones gives us the conic program, i.e K = Q i... Q p Theorem 5.2 Every SOCP is an Semi Definite Program (SDP) Proof: First note that: [ ] A B B T 0 A BC 1 B T 0 (5.6) C For symmetric A, C and C 0 Now we prove the following [ ] ti x x 2 t x T 0 (5.7) t Proof: [ ] ti x x T 0 t 2 I xx T 0 t a T [ t 2 I xx T ] a 0 a > 0 In particular, for a = x, we have t 2 x 2 x 4 0 x 2 t 2 x t (5.8) Finally, we have the following: Theorem 5.3 Every QP is a SOCP Proof: Consider the QP min x,t cx + t subject todx d 1 2 xt Qx t Ax = b In particular, consider the constraint 1 2 xt Qx t. We claim the following Proof: (5.9) 1 2 xt Qx t ( 1 Q x, 2 2 (1 t)) 2 1 (1 + t) (5.10) 2 ( 1 Q x, 2 2 (1 t)) 2 1 (1 + t) 2 ( 1 Q x, 2 2 (1 t)) (1 + t) xt Qx (1 t)2 1 (1 + t) xt Qx t (5.11)

3 Lecture 5: September Thus, we end up with the following hierarchy: Thus, this establishes the fact that QPs SOCPs LPs QPs SOCPs SDPs Conic programs (5.12) 1. LPs and QPs are easy algorithmically 2. SDPs in generally are hard (don t generalize to a large number of variables. Only very special SDPs are easy (like graphical lasso, tree norm imputation). 3. Conic problems are usually hard. 5.2 Gradient Descent Consider the unconstrained, smooth convex optimization: min x f(x) (5.13) f is convex and differentiable dom(f) = R n Optimal criterion value f = min x f(x) and solution at x Define Gradient Descent algorithm as: Choose initial point x 0 R n Repeat: x k = x k 1 t k f(x k 1 ), k = 1,2,3 (5.14) Interpretation The Taylor expansion of f gives us: f(y) f(x) + f(x) T (y x) (y x) 2 f(x)(y x) (5.15) Consider the Quadratic approximation of f, replacing 2 f(x) by 1 t I, we have f(y) f(x) + f(x) T (y x) + 1 2t y x 2 2 (5.16) Now, minimizing wrt y, we get: f(y) y f(x) + 1 (y x) = 0 2 = y = x t f(x) (5.17) Which gives us the gradient descent update rule. Figure 5.1a shows pictorially the interpretation. The dotted function shows the quadratic approximation, and the red dot shows the minima of the quadratic approximation.

4 5-4 Lecture 5: September 12 (a) Gradient Descent (b) Backtracking Line Search Backtracking Line Search If the step size is too large, gradient descent can diverge. Analogously, if the step-size is too small, gradient descent can converge too slowly. One way to avoid this is to adaptively choose the step size using backtracking line search: 1. Fix parameters 0 < β < 1 and 0 < α t = t init. While f(x t f(x)) > f(x) αt f(x) 2 2 t := βt 3. x + := x t f(x) Backtracking Intuition Consider the Figure 5.1b. In the figure, fixing x, δx = f(x) and α, as a function of t, we can see that by shrinking t, we move from right to left on the graph; as long as the line is under the function, and stop when it exceeds it Direct minimization For Line Search One could also think of doing an exact line search: t = argmin s 0 f(x s f(x)) (5.18) But that is usually not possible to do directly; and approximations to the same are not as efficient as general backtracking. So is not worth the effort.

5 Lecture 5: September Convergence Analysis Gradient Descent Convergence Convergence result for GD used for bases in all future lectures for different algorithms comparisons Rate is how quickly an algorithm converges If f is convex and differentiable dom(f) = R n f is Lipschitz continuous with constant L > 0 if f has two derivatives, same thing as saying the largest eigenvalue of the Hessian of f is at most L for all x - assumes twice differentiable second derivative some notion of curviture and this is the bound on that curvature (a) Gradient Descent Convergence criterian value at step k minus the optimal value t is the step size and is at most 1/L t equal to 1/L is biggest step sized allowed β is less than 1 and makes the bound a little worse when using backtracking proof is straight forward and uses properties about Lipschitz and can upperbound function by quadratic gradient descent has convergence rate O( 1 k ) We read this by saying after k iterations, the gap between the criterion and where we are goes down by 1/k finds ɛ-suboptimal point in O( 1 ɛ ) iterations gradient descent with fixed step size

6 5-6 Lecture 5: September Gradient Descent Convergence Strong Convexity what happens if you know more? strong convexity of f means f(x) - m 2 x 2 2 is convex for some m > 0 If we assume f is Lipschitz continuous with constant L > 0 can lower bound the function by a quadratic - Strong convexity Lipschitz and strong convexity together means you can sandwitch the function by two quadratics (a) Gradient Descent Convergence under Strong Convexity goes to 0 a lot faster rate under strong convexity is O(c k ) Exponentially fast! finds ɛ-suboptimal point in O(log( 1 ɛ )) iterations if you want converge to a small ɛ guarantee, this is a lot faster since log Called linear convergence since it looks linear on a semi-log plot contraction factor c in rate depends adversely on condition number L/m: higher condition number means slower rate contractor factor c approaches 1 as L/m gets larger poorly conditioned will make gradient descent do very poorly later will see how second order methods will help depends on ratio of largest eigenvalue and smallest eigenvalue of Hessian which is upperbounded by L/m under strong convexity affects not only our upper bound and is very apparent in practice too

7 Lecture 5: September Necessary Conditions Conditions for a simple problem, f(β) = 1 2 y Xβ 2 2 The gradient of that function is X T (Xβ Y ) and the Hessian is X T X Lipschitz contiunity of f means 2 f(x) LI As 2 f(β) = X T X, L = σ 2 max(x) if X is big, better idea to do backtracking Strong convexity of f means 2 f(x) mi As 2 f(β) = X T X, L = σ 2 min (X) If X is wide (more columns than rows), then σ min (X) = 0 and f can t be strongly convex Even if σ min (X) > 0, can have a very large condition number L/m = σ 2 max(x)/σ 2 min (X) condition number could be very large even if it s strongly convex non convex proof is simplier than convex Practicalities stopping rule we use is when f(x) 2 is small gradient is small we stop remember that f(x ) = 0 at the solution x if f is strongly convex with parameter m, then f(x) 2 2mɛ f(x) f ɛ this implies we are ɛ sub-optimal Pros and Cons of Gradient Descent Pro: Simple idea and each iteration is usually cheap Pro: fast for well-conditioned, strongly convex problems Con: can often be slow since many problems aren t strongly convex or well conditioned Con: can t handle nondifferentiable functions everything we do depends on the gradient Can we do better? Gradient Descent has O(1/ɛ) convergence rate over problem class of convex, differentiable functions with Lipschitz gradients Is gradient descent the best we can do if we only knew the gradient? We have first order methods which are itereative methods that update x (k) Can attain rate O(1/k 2 ) or O(1/ ɛ) There is method that can do faster using acceleration

8 5-8 Lecture 5: September 12 (a) Nesterov Convergence Theorem What about nonconvex functions?? Assume f is differentiable with Lipschitz gradient but now f is nonconvex Rather than optimality, find x such that f(x) 2 ɛ This is called ɛ stationary smallest gradient at step k is bounded by the below (a) Nonconvex Convergence Gradient descent has rate O(1/ k) or O(1/ɛ 2 ) even in the nonconvex case for finding stationary points Can t be improved (over class of differentiable functions with Lipschitz gradients) by any deterministic algorithm

9 Lecture 5: September (a) Proof Nonconvex Convergence

### Lecture 5: September 15

10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

### Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

### Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

### Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

### Lecture 6: September 12

10-725: Optimization Fall 2013 Lecture 6: September 12 Lecturer: Ryan Tibshirani Scribes: Micol Marchetti-Bowick Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

### Descent methods. min x. f(x)

Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

### Lecture 14: October 17

1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

### Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

### Lecture 1: September 25

0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

### Lecture 9: September 28

0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

### Lecture 6: September 17

10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 6: September 17 Scribes: Scribes: Wenjun Wang, Satwik Kottur, Zhiding Yu Note: LaTeX template courtesy of UC Berkeley EECS

### Lecture 23: Conditional Gradient Method

10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley

### Lecture 14: Newton s Method

10-725/36-725: Conve Optimization Fall 2016 Lecturer: Javier Pena Lecture 14: Newton s ethod Scribes: Varun Joshi, Xuan Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

### Lecture 17: October 27

0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

### Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

### Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

### Lecture 25: November 27

10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

### Lecture 23: November 21

10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Lecture 17: Primal-dual interior-point methods part II

10-725/36-725: Convex Optimization Spring 2015 Lecture 17: Primal-dual interior-point methods part II Lecturer: Javier Pena Scribes: Pinchao Zhang, Wei Ma Note: LaTeX template courtesy of UC Berkeley EECS

### Lecture 16: October 22

0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Lecture 23: November 19

10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

### Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

### Lecture 7: September 17

10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

### min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

### 10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Lecture 24: August 28

10-725: Optimization Fall 2012 Lecture 24: August 28 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Jiaji Zhou,Tinghui Zhou,Kawa Cheung Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

### Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

### 10. Unconstrained minimization

Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

### Lecture 10: September 26

0-725: Optimization Fall 202 Lecture 0: September 26 Lecturer: Barnabas Poczos/Ryan Tibshirani Scribes: Yipei Wang, Zhiguang Huo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

### Unconstrained minimization

CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

### Lecture 8: February 9

0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

### 5. Subgradient method

L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

### 1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

### Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

### Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

### Lecture 11: October 2

10-725: Optimization Fall 2012 Lecture 11: October 2 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Tongbo Huang, Shoou-I Yu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes

### Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

### Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

### Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

### Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

### Lecture 4: September 12

10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### Lecture 9: Numerical Linear Algebra Primer (February 11st)

10-725/36-725: Convex Optimization Spring 2015 Lecture 9: Numerical Linear Algebra Primer (February 11st) Lecturer: Ryan Tibshirani Scribes: Avinash Siravuru, Guofan Wu, Maosheng Liu Note: LaTeX template

### ECS171: Machine Learning

ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

### Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

### Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

### Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

### Optimization methods

Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

### Lecture 15: October 15

10-725: Optimization Fall 2012 Lecturer: Barnabas Poczos Lecture 15: October 15 Scribes: Christian Kroer, Fanyi Xiao Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

### Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

### CPSC 540: Machine Learning

CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

### Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

### On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

### Lecture 6: September 19

36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

### The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

### Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

### Convex Optimization Lecture 16

Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

### Lecture 20: November 1st

10-725: Optimization Fall 2012 Lecture 20: November 1st Lecturer: Geoff Gordon Scribes: Xiaolong Shen, Alex Beutel Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

### NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

### Lecture 4: January 26

10-725/36-725: Conve Optimization Spring 2015 Lecturer: Javier Pena Lecture 4: January 26 Scribes: Vipul Singh, Shinjini Kundu, Chia-Yin Tsai Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

### ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

### Optimization Tutorial 1. Basic Gradient Descent

E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

### Canonical Problem Forms. Ryan Tibshirani Convex Optimization

Canonical Problem Forms Ryan Tibshirani Convex Optimization 10-725 Last time: optimization basics Optimization terology (e.g., criterion, constraints, feasible points, solutions) Properties and first-order

### Optimization methods

Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

### Introduction to gradient descent

6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

### IFT Lecture 2 Basics of convex analysis and gradient descent

IF 6085 - Lecture Basics of convex analysis and gradient descent his version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribes: Assya rofimov,

### Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

### 6. Proximal gradient method

L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

### Lecture 14: October 11

10-725: Optimization Fall 2012 Lecture 14: October 11 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Zitao Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have not

### arxiv: v1 [math.oc] 1 Jul 2016

Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

### CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

### A Brief Review on Convex Optimization

A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

### Lecture 6 : Projected Gradient Descent

Lecture 6 : Projected Gradient Descent EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Consider the following update. x l+1 = Π C (x l α f(x l )) Theorem Say f : R d R is (m, M)-strongly

### Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

### Convex Optimization. (EE227A: UC Berkeley) Lecture 6. Suvrit Sra. (Conic optimization) 07 Feb, 2013

Convex Optimization (EE227A: UC Berkeley) Lecture 6 (Conic optimization) 07 Feb, 2013 Suvrit Sra Organizational Info Quiz coming up on 19th Feb. Project teams by 19th Feb Good if you can mix your research

### Gradient Descent. Dr. Xiaowei Huang

Gradient Descent Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Three machine learning algorithms: decision tree learning k-nn linear regression only optimization objectives are discussed,

### Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization

### 4 damped (modified) Newton methods

4 damped (modified) Newton methods 4.1 damped Newton method Exercise 4.1 Determine with the damped Newton method the unique real zero x of the real valued function of one variable f(x) = x 3 +x 2 using

### Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

### Lecture 1: January 12

10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key

### Higher-Order Methods

Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

### Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

### Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

### Lecture 5 : Projections

Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

### 1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

### Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

### CPSC 540: Machine Learning

CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this

### Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Solution Methods Richard Lusby Department of Management Engineering Technical University of Denmark Lecture Overview (jg Unconstrained Several Variables Quadratic Programming Separable Programming SUMT

### Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

### Gradient Methods Using Momentum and Memory

Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

### Unconstrained Optimization

1 / 36 Unconstrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University February 2, 2015 2 / 36 3 / 36 4 / 36 5 / 36 1. preliminaries 1.1 local approximation

### Convex functions, subdifferentials, and the L.a.s.s.o.

Convex functions, subdifferentials, and the L.a.s.s.o. MATH 697 AM:ST September 26, 2017 Warmup Let us consider the absolute value function in R p u(x) = (x, x) = x 2 1 +... + x2 p Evidently, u is differentiable

### Lecture 10: Duality in Linear Programs

10-725/36-725: Convex Optimization Spring 2015 Lecture 10: Duality in Linear Programs Lecturer: Ryan Tibshirani Scribes: Jingkun Gao and Ying Zhang Disclaimer: These notes have not been subjected to the

### Convex Optimization / Homework 1, due September 19

Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,

### Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Convex Optimization Prof. Nati Srebro Lecture 12: Infeasible-Start Newton s Method Interior Point Methods Equality Constrained Optimization f 0 (x) s. t. A R p n, b R p Using access to: 2 nd order oracle

### 10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

### How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

### E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained

### Lecture 9 Sequential unconstrained minimization

S. Boyd EE364 Lecture 9 Sequential unconstrained minimization brief history of SUMT & IP methods logarithmic barrier function central path UMT & SUMT complexity analysis feasibility phase generalized inequalities