ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method

Size: px
Start display at page:

Download "ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method"

Transcription

1 ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, / 31

2 Announcements Be a TA for CS/ORIE 1380: Data Science for All! syllabus: roster/sp18/class/orie/1380 apply: 2 / 31

3 Regularized empirical risk minimization choose model by solving minimize n l(x i, y i ; w) + r(w) i=1 with variable w R d parameter vector w R d loss function l : X Y R d R regularizer r : R d R 3 / 31

4 Regularized empirical risk minimization choose model by solving minimize n l(x i, y i ; w) + r(w) i=1 with variable w R d parameter vector w R d loss function l : X Y R d R regularizer r : R d R why? want to minimize the risk E (x,y) P l(x, y; w) approximate it by the empirical risk n i=1 l(x, y; w) add regularizer to help model generalize 3 / 31

5 Solving regularized risk minimization how should we fit these models? with a different software package for each model? with a different algorithm for each model? with a general purpose optimization solver? desiderata fast flexible we ll use the proximal gradient method 4 / 31

6 Proximal operator define the proximal operator of the function r : R d R prox r (z) = argmin(r(w) + 1 w 2 w z 2 2) 5 / 31

7 Proximal operator define the proximal operator of the function r : R d R prox r : R d R d prox r (z) = argmin(r(w) + 1 w 2 w z 2 2) 5 / 31

8 Proximal operator define the proximal operator of the function r : R d R prox r : R d R d prox r (z) = argmin(r(w) + 1 w 2 w z 2 2) generalized projection: if 1 C is the indicator of set C, prox 1C (w) = Π C (w) 5 / 31

9 Proximal operator define the proximal operator of the function r : R d R prox r : R d R d prox r (z) = argmin(r(w) + 1 w 2 w z 2 2) generalized projection: if 1 C is the indicator of set C, prox 1C (w) = Π C (w) implicit gradient step: if w = prox r (z) and r is smooth, r(w) + w z = 0 w = z r(w) 5 / 31

10 Proximal operator define the proximal operator of the function r : R d R prox r : R d R d prox r (z) = argmin(r(w) + 1 w 2 w z 2 2) generalized projection: if 1 C is the indicator of set C, prox 1C (w) = Π C (w) implicit gradient step: if w = prox r (z) and r is smooth, r(w) + w z = 0 w = z r(w) simple to evaluate: closed form solutions for many functions more info: [Parikh Boyd 2013] 5 / 31

11 Maps from functions to functions no consistent notation for map from functions to functions. for a function f : R d R, prox maps f to a new function prox f : R d R d proxf (x) evaluates this function at the point x maps f to a new function f : R d R d f (x) evaluates this function at the point x f x maps f to a new function x : Rd R d f x (x) x= x evaluates this function at the point x this one has the most confusing notation of all... 6 / 31

12 Maps from functions to functions in a nice programming language, you can write something like prox ( f ) ( x ) or prox ( f, x ) 7 / 31

13 Let s evaluate some proximal operators! define the proximal operator of the function r : R d R prox r (z) = argmin(r(w) + 1 w 2 w z 2 2) r(w) = 0 (identity) r(w) = d i=1 r i(w i ) (separable) r(w) = w 2 2 (shrinkage) r(w) = w 1 (soft-thresholding) r(w) = 1(w 0) (projection) r(w) = d 1 i=1 (w i+1 w i ) 2 (smoothing) 8 / 31

14 Proximal gradient method want to solve minimize l(w) + r(w) l : R d R smooth r : R d R with a fast prox operator proximal gradient method. pick step size sequence {α t } t=1 and w 0 R d repeat w t+1 = prox αtr (w t α t l(w t )) 9 / 31

15 Proximal gradient method want to solve minimize l(w) + r(w) l : R d R smooth r : R d R with a fast prox operator proximal gradient method. pick step size sequence {α t } t=1 and w 0 R d repeat w t+1 = prox αtr (w t α t l(w t )) complexity: O(nd) to evaluate gradient of loss function O(d) to prox and to update 9 / 31

16 Example: NNLS want to solve recall minimize 1 2 y Xw 2 + 1(w 0) ( 1 2 y Xw 2) = X T (y Xw) prox 1( 0) (w) = max(0, w) proximal gradient method. pick step size sequence {α t } t=0 and w 0 R d for t = 0, 1,... w t+1 = max(0, w t + α t X T (y Xw t )) 10 / 31

17 Example: NNLS want to solve recall minimize 1 2 y Xw 2 + 1(w 0) ( 1 2 y Xw 2) = X T (y Xw) prox 1( 0) (w) = max(0, w) proximal gradient method. pick step size sequence {α t } t=0 and w 0 R d for t = 0, 1,... w t+1 = max(0, w t + α t X T (y Xw t )) note: this is not the same as max(0, (X T X ) 1 X T y) 10 / 31

18 Example: NNLS proximal gradient method for NNLS. pick step size sequence {α t } t=0 and w 0 R d for t = 0, 1,... compute g t = X T (y Xw t ) (O(nd) flops) w t+1 = max(0, w t + α t g t ) (O(d) flops) O(nd) flops per iteration 11 / 31

19 Example: NNLS option: do work up front to reduce per-iteration complexity proximal gradient method for NNLS. pick step size sequence {α t } t=0 and w 0 R d form b = X T y (O(dn) flops), G = X T X (O(nd 2 ) flops) for t = 0, 1,... w t+1 = max(0, w t + α t (b Gw t )) (O(d 2 ) flops) O(nd 2 ) flops to begin, O(d 2 ) flops per iteration 12 / 31

20 Example: NNLS option: do work up front to reduce per-iteration complexity proximal gradient method for NNLS. pick step size sequence {α t } t=0 and w 0 R d form b = X T y (O(dn) flops), G = X T X (O(nd 2 ) flops) for t = 0, 1,... w t+1 = max(0, w t + α t (b Gw t )) (O(d 2 ) flops) O(nd 2 ) flops to begin, O(d 2 ) flops per iteration note: can compute b and G in parallel / 31

21 want to solve recall minimize Example: Lasso 1 2 y Xw 2 + λ w 1 ( 1 2 y Xw 2) = X T (y Xw) prox µ 1 (w) = s µ (w) where (s µ (w)) i = proximal gradient method. w i µ w i µ 0 w i µ w i + µ w i µ pick step size sequence {α t } t=0 and w 0 R d form b = X T y (O(dn) flops), G = X T X (O(nd 2 ) flops) for t = 0, 1,... w t+1 = s α t λ(w t + α t (b Gw t )) (O(d 2 ) flops) 13 / 31

22 Example: Lasso want to solve recall minimize 1 2 y Xw 2 + λ w 1 ( 1 2 y Xw 2) = X T (y Xw) prox µ 1 (w) = s µ (w) where (s µ (w)) i = proximal gradient method. w i µ w i µ 0 w i µ w i + µ w i µ pick step size sequence {α t } t=0 and w 0 R d form b = X T y (O(dn) flops), G = X T X (O(nd 2 ) flops) for t = 0, 1,... w t+1 = s α t λ(w t + α t (b Gw t )) (O(d 2 ) flops) notice: the hard part (O(d 2 )) is computing the gradient...! 13 / 31

23 Convergence two questions to ask: will the iteration ever stop? what kind of point will it stop at? if the iteration stops, we say it has converged 14 / 31

24 Convergence: what kind of point will it stop at? let s suppose r is differentiable 1 if we find w so that then w = prox αtr (w α t l(w)) w = argmin(α t r(w ) + 1 w 2 w (w α t l(w)) 2 2) 0 = α t r(w) + w w + α t l(w) = (r(w) + l(w)) so the gradient of the objective is 0 if l and r are convex, that means w minimizes l + r 1 take Convex Optimization for the proof for non-differentiable r 15 / 31

25 Convergence: will it stop? definitions: p = inf w l(w) + r(w) assumptions: loss function is continuously differentiable with L-Lipshitz derivative: l(w ) l(w) + l(w), w w + L 2 w w 2 for simplicity, consider constant step size α t = α 16 / 31

26 Proximal point method converges prove it for l = 0 first (aka the proximal point method) for any t = 0, 1,..., w t+1 = argmin αr(w) + 1 w 2 w w t 2 so in particular, αr(w t+1 ) w t+1 w t 2 αr(w t ) w t w t 2 1 2α w t+1 w t 2 r(w t ) r(w t+1 ) now add up these inequalities for t = 0, 1,..., T : 1 2α T w t+1 w t 2 t=0 it converges! T (r(w t ) r(w t+1 )) t=0 r(w 0 ) p 17 / 31

27 Proximal gradient method converges (I) now prove it for l 0. for any t = 0, 1,..., w t+1 = argmin αr(w) + 1 w 2 w (w t α l(w t )) 2 so in particular, r(w t+1 ) + 1 2α w t+1 (w t α l(w t )) 2 r(w t ) + 1 2α w t (w t α l(w t )) 2 r(w t+1 ) + 1 2α w t+1 w t 2 + α 2 l(w t )) 2 + l(w t ), w t+1 w t r(w t ) + α 2 l(w t ) 2 r(w t+1 ) + 1 2α w t+1 w t 2 + l(w t ), w t+1 w t r(w t ) 18 / 31

28 Proximal gradient method converges (II) now use l(w ) l(w) + l(w), w w + L 2 w w 2 with w = w t+1, w = w t l(w t+1 ) + r(w t+1 ) + 1 2α w t+1 w t 2 + l(w t ), w t+1 w t l(w t ) + r(w t ) + l(w t ), w t+1 w t + L 2 w t+1 w t 2 l(w t+1 ) + r(w t+1 ) + 1 2α w t+1 w t 2 l(w t ) + r(w t ) + L 2 w t+1 w t 2 1 2α w t+1 w t 2 L 2 w t+1 w t 2 l(w t ) + r(w t ) (l(w t+1 ) + r(w t+1 )) 19 / 31

29 Proximal gradient method converges (III) now add up these inequalities for t = 0, 1,..., T : ( ) T α L w t+1 w t 2 t=0 T l(w t ) + r(w t ) (l(w t+1 ) + r(w t+1 ))) t=0 l(w 0 ) + r(w 0 ) p if 1 α L 0 = α 1 L it converges! 20 / 31

30 Lipshitz? loss function is differentiable and L-Lipshitz: l(w ) l(w) + l(w), w w + L 2 w w 2 what is L? 21 / 31

31 Lipshitz? loss function is differentiable and L-Lipshitz: l(w ) l(w) + l(w), w w + L 2 w w 2 what is L? l(w ) = Xw y 2 = X (w w) + Xw y 2 = Xw y 2 + 2(Xw y) T X (w w) + X (w w) 2 = l(w) + l(w), w w + X (w w) 2 l(w) + l(w), w w + X 2 (w w) 2 so L = 2 X 2, where X is the maximum singular value of X 21 / 31

32 Speeding it up recall: computing the gradient is slow idea: approximate the gradient! a stochastic gradient l(w) is a random variable with E l(w) = l(w) 22 / 31

33 stochastic gradient obeys Stochastic gradient: examples E l(w) = l(w) examples: for l(w) = n i=1 (y i w T x i ) 2, 23 / 31

34 stochastic gradient obeys Stochastic gradient: examples E l(w) = l(w) examples: for l(w) = n i=1 (y i w T x i ) 2, single stochastic gradient. pick a random example i. set l(w) = n (y i w T x i ) 2 = 2n(y i w T x i )x i 23 / 31

35 stochastic gradient obeys Stochastic gradient: examples E l(w) = l(w) examples: for l(w) = n i=1 (y i w T x i ) 2, single stochastic gradient. pick a random example i. set l(w) = n (y i w T x i ) 2 = 2n(y i w T x i )x i minibatch stochastic gradient. pick a random set of examples S. set ( ) l(w) = n S (y i w T x i ) 2 i S ( = n S 2 ) (y i w T x i )x i i S (here S is the number of elements in the set S.) 23 / 31

36 Stochastic proximal gradient stochastic proximal gradient method. pick step size sequence {α t } t=1 and w 0 R d repeat pick i {1,..., n} uniformly at random w t+1 = prox αtr (w t + α t 2n(y i w T x i )x i ) (O(d) flops) 24 / 31

37 Stochastic proximal gradient stochastic proximal gradient method. pick step size sequence {α t } t=1 and w 0 R d repeat pick i {1,..., n} uniformly at random w t+1 = prox αtr (w t + α t 2n(y i w T x i )x i ) (O(d) flops) per iteration complexity: O(d)! 24 / 31

38 Extensions next lecture, we ll see loss functions that are not differentiable can still use proximal gradient method! need to define the subgradient 25 / 31

39 Subgradient define the subgradient of a convex function f : R d R at x f (x) = {g : y, f (y) f (x) + g, y x } the subgradient f maps points to sets! follows the chain rule: if f = h g, h : R R, and g : R d R is differentiable, f (x) = h(g(x)) g(x) 26 / 31

40 Subgradient for f : R R and convex, here s a simpler equivalent condition: if f is differentiable at x, f (x) = { f (x)} if f is not differentiable at x, it will still be differentiable just to the left and the right of x 2, so let g + = lim ɛ 0 f (x + ɛ) let g = lim ɛ 0 f (x ɛ) f (x) is any convex combination (i.e., any weighted average) of those gradients: f (x) = {αg + + (1 α)g : α [0, 1]} 2 (because a convex function is differentiable almost everywhere) 27 / 31

41 Proximal subgradient method proximal subgradient method. pick a step size sequence {α t } t=1 and w 0 R d repeat pick any g f (w t ) w t+1 = prox αtr (w t α t l(w t )) 28 / 31

42 Convergence for stochastic proximal (sub)gradient pick your poison: stochastic (sub)gradient, fixed step size α t = α: iterates converge quickly, then wander within a small ball stochastic (sub)gradient, decreasing step size α t = 1/t: iterates converge slowly to solution minibatch stochastic (sub)gradient with increasing minibatch size, fixed step size α t = α: iterates converge quickly to solution later iterations take (much) longer proofs: [Bertsekas, 2010] conditions: l is convex, subdifferentiable, Lipshitz continuous, and r is convex and Lipshitz continuous where it is < or all iterates are bounded 29 / 31

43 Proximal subgradient method Q: Why can t we just use gradient descent to solve all our problems? 30 / 31

44 Proximal subgradient method Q: Why can t we just use gradient descent to solve all our problems? A: Because some regularizers and loss functions aren t differentiable! 30 / 31

45 Proximal subgradient method Q: Why can t we just use gradient descent to solve all our problems? A: Because some regularizers and loss functions aren t differentiable! Q: Why can t we just use subgradient descent to solve all our problems? 30 / 31

46 Proximal subgradient method Q: Why can t we just use gradient descent to solve all our problems? A: Because some regularizers and loss functions aren t differentiable! Q: Why can t we just use subgradient descent to solve all our problems? A: Because some of our regularizers don t even have subgradients defined everywhere. (e.g., 1 + ) 30 / 31

47 References Vandenberghe: lecture on proximal gradient method. lectures/proxgrad.pdf Yin: lecture on proximal method. slides/lec05_proximaloperatordual.pdf Parikh and Boyd: paper on proximal algorithms. https: //stanford.edu/~boyd/papers/pdf/prox_algs.pdf Bertsekas: convergence proofs for every proximal gradient style method you can dream of / 31

ORIE 4741: Learning with Big Messy Data. Regularization

ORIE 4741: Learning with Big Messy Data. Regularization ORIE 4741: Learning with Big Messy Data Regularization Professor Udell Operations Research and Information Engineering Cornell October 26, 2017 1 / 24 Regularized empirical risk minimization choose model

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

ORIE 4741 Final Exam

ORIE 4741 Final Exam ORIE 4741 Final Exam December 15, 2016 Rules for the exam. Write your name and NetID at the top of the exam. The exam is 2.5 hours long. Every multiple choice or true false question is worth 1 point. Every

More information

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution

MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution 1/16 MATH 829: Introduction to Data Mining and Analysis Computing the lasso solution Dominique Guillot Departments of Mathematical Sciences University of Delaware February 26, 2016 Computing the lasso

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4 Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Sequential and reinforcement learning: Stochastic Optimization I

Sequential and reinforcement learning: Stochastic Optimization I 1 Sequential and reinforcement learning: Stochastic Optimization I Sequential and reinforcement learning: Stochastic Optimization I Summary This session describes the important and nowadays framework of

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

Is the test error unbiased for these programs? 2017 Kevin Jamieson

Is the test error unbiased for these programs? 2017 Kevin Jamieson Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

ORIE 4741: Learning with Big Messy Data. Generalization

ORIE 4741: Learning with Big Messy Data. Generalization ORIE 4741: Learning with Big Messy Data Generalization Professor Udell Operations Research and Information Engineering Cornell September 23, 2017 1 / 21 Announcements midterm 10/5 makeup exam 10/2, by

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Subgradient Descent. David S. Rosenberg. New York University. February 7, 2018

Subgradient Descent. David S. Rosenberg. New York University. February 7, 2018 Subgradient Descent David S. Rosenberg New York University February 7, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 February 7, 2018 1 / 43 Contents 1 Motivation and Review:

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Lecture: Smoothing.

Lecture: Smoothing. Lecture: Smoothing http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Smoothing 2/26 introduction smoothing via conjugate

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info,

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

CSC 411 Lecture 6: Linear Regression

CSC 411 Lecture 6: Linear Regression CSC 411 Lecture 6: Linear Regression Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 06-Linear Regression 1 / 37 A Timely XKCD UofT CSC 411: 06-Linear Regression

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this

More information

Introduction to Machine Learning (67577) Lecture 7

Introduction to Machine Learning (67577) Lecture 7 Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew

More information

Optimization and (Under/over)fitting

Optimization and (Under/over)fitting Optimization and (Under/over)fitting EECS 442 Prof. David Fouhey Winter 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/eecs442_w19/ Administrivia We re grading HW2 and will try

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Machine Learning A Geometric Approach

Machine Learning A Geometric Approach Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron

More information

A projection algorithm for strictly monotone linear complementarity problems.

A projection algorithm for strictly monotone linear complementarity problems. A projection algorithm for strictly monotone linear complementarity problems. Erik Zawadzki Department of Computer Science epz@cs.cmu.edu Geoffrey J. Gordon Machine Learning Department ggordon@cs.cmu.edu

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM

DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM Due: Monday, April 11, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Lasso Regression: Regularization for feature selection

Lasso Regression: Regularization for feature selection Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Lecture 2: Subgradient Methods

Lecture 2: Subgradient Methods Lecture 2: Subgradient Methods John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this lecture, we discuss first order methods for the minimization of convex functions. We

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Fast Rates for Exp-concave Empirial Risk Minimization

Fast Rates for Exp-concave Empirial Risk Minimization Fast Rates for Exp-concave Empirial Risk Minimization Tomer Koren, Kfir Y. Levy Presenter: Zhe Li September 16, 2016 High Level Idea Why could we use Regularized Empirical Risk Minimization? Learning theory

More information

Regression.

Regression. Regression www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts linear regression RMSE, MAE, and R-square logistic regression convex functions and sets

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information