Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

Similar documents
Optimality Conditions for Nonsmooth Convex Optimization

Lecture 1: Background on Convex Analysis

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Lecture 6 : Projected Gradient Descent

Suppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.

Lecture 14 Ellipsoid method

Lecture 3. Optimization Problems and Iterative Algorithms

Jensen s inequality for multivariate medians

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Math 273a: Optimization Subgradients of convex functions

10. Ellipsoid method

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Constrained Optimization and Lagrangian Duality

Stochastic Programming Math Review and MultiPeriod Models

Conditional Gradient (Frank-Wolfe) Method

Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Math 273a: Optimization Subgradient Methods

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

You should be able to...

Subgradient Method. Ryan Tibshirani Convex Optimization

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Unconstrained minimization of smooth functions

Convergence of Fixed-Point Iterations

Most Continuous Functions are Nowhere Differentiable

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Newton s Method. Javier Peña Convex Optimization /36-725

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Continuous Optimisation, Chpt 6: Solution methods for Constrained Optimisation

1 Convexity, concavity and quasi-concavity. (SB )

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

10. Unconstrained minimization

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Math 273a: Optimization Subgradients of convex functions

Algorithms for Nonsmooth Optimization

Existence of minimizers

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Part 2 Continuous functions and their properties

Primal/Dual Decomposition Methods

Lecture 3: Linesearch methods (continued). Steepest descent methods

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

Economics 204. The Transversality Theorem is a particularly convenient formulation of Sard s Theorem for our purposes: with r 1+max{0,n m}

CS269: Machine Learning Theory Lecture 16: SVMs and Kernels November 17, 2010

Lecture 8. Strong Duality Results. September 22, 2008

U e = E (U\E) e E e + U\E e. (1.6)

Spectral gradient projection method for solving nonlinear monotone equations

Optimality conditions for unconstrained optimization. Outline

Unconstrained minimization

LECTURE 4 LECTURE OUTLINE

Lecture 3 Introduction to optimality conditions

Optimization and Optimal Control in Banach Spaces

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

Mathematics 530. Practice Problems. n + 1 }

Constrained Optimization Theory

Solving Dual Problems

Stochastic Subgradient Method

IE 5531: Engineering Optimization I

Auxiliary-Function Methods in Optimization

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

4. Convex Sets and (Quasi-)Concave Functions

Lecture 14: Newton s Method

1 Cheeger differentiation

Chapter 2 Convex Analysis

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

6. Proximal gradient method

Convex Analysis Background

IE 5531: Engineering Optimization I

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Convex Analysis and Optimization Chapter 2 Solutions

CSCI : Optimization and Control of Networks. Review on Convex Optimization

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

5. Subgradient method

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization

Sequential Unconstrained Minimization: A Survey

A Brief Review on Convex Optimization

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Unconstrained optimization I Gradient-type methods

Coordinate Descent Methods on Huge-Scale Optimization Problems

On Nesterov s Random Coordinate Descent Algorithms - Continued

Zangwill s Global Convergence Theorem

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

We say that the function f obtains a maximum value provided that there. We say that the function f obtains a minimum value provided that there

Lecture 5: September 12

Solution Methods for Stochastic Programs

A First Order Method for Finding Minimal Norm-Like Solutions of Convex Optimization Problems

v( x) u( y) dy for any r > 0, B r ( x) Ω, or equivalently u( w) ds for any r > 0, B r ( x) Ω, or ( not really) equivalently if v exists, v 0.

THE INVERSE FUNCTION THEOREM

1 Lecture 25: Extreme values

1 Review of last lecture and introduction

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

5 Quasi-Newton Methods

Transcription:

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization October 15, 2008

Outline Lecture 11 Gradient descent algorithm Improvement to result in Lec 11 At what rate will it converge? Constrained minimization over simple sets Convex Optimization 1

Unconstrained Minimization minimize f(x) subject x R n. Assumption 1: The function f is convex and continuously differentiable over R n The optimal value f = inf x R n f(x) is finite. Gradient descent algorithm x k+1 = x k α f(x k ) Convex Optimization 2

Theorem 1: Bounded Gradients Theorem 1 Let Assumption 1 hold, and suppose that the gradients are bounded. Then, the gradient method generates the sequence {x k } such that lim inf k f(x k) f + αl2 2 We first proved that for y R n and for all k, x k+1 y 2 x k y 2 2α k (f(x k ) f(y)) + α 2 k f(x k) 2. Convex Optimization 3

Theorem 2: Lipschitz Gradients Q-approximation Lemma For continuously differentiable function with Lipschitz gradients, we have f(y) f(x) + f(x) T (y x) + M 2 y x 2 for all x, y R n, Theorem Let Assumption 1 hold, and assume that the gradients of f are Lipschitz. Then, for α with α < 2, we have M lim f(x k) = 0. k If in addition, an optimal solution exists [i.e., the min x f(x) is attained at some x ], then every accumulation point of the sequence {x k } is optimal. Convex Optimization 4

Proof: Using Q-approximation Lemma with y = x k+1 and x = x k, we have f(x k+1 ) f(x k ) α f(x k ) 2 + α2 M 2 f(x k) 2 = f(x k ) α 2 (2 αm) f(x k) 2 By summing these relations and using 2 αm > 0, we can see that k f(x k ) 2 <, implying lim k f(x k ) = 0. Suppose a solution exists, and the sequence {x k } has an accumulation point x. Then, by continuity of the gradient we have f(x k ) f( x) along an appropriate subsequence. Convex Optimization 5

Since f(x k ) 0, it follows f( x) = 0, implying that x is a solution. Where do we use convexity? Correct Theorem 2: Assume that the gradients of f are Lipschitz. Then, for α with α < 2, we have M k=1 f(x k ) 2 <. If in addition, an optimal solution exists [i.e., the min x f(x) is attained at some x ], then every accumulation point of the sequence {x k } is optimal. {x k } need not converge. accumalation point. Note difference between limit point and Example: f(x) = 1 1+x 2. Satisfies conditions and x k. Convex Optimization 6

What does convexity buy? Answer: Convergence of the sequence {x k } Recall/Define X as optimal set. Theorem 3: Let Assumption 1 hold. Further, assume that the gradients of f are Lipschitz and α with α < 2 M. If X is nonempty then lim x k = x, x X. k Proof (outline): Previous theorem tells us a lot. subsequence of {x k } is a point in X. Questions Is there a convergent subsequence? Every convergenent Do all the subsequence converge to the same point in X? Convex Optimization 7

Conditions of Lemma 1 are satisfied. Fix y = x, where x X x k+1 x 2 x k x 2 2α(f(x k ) f ) + α 2 f(x k ) 2. Drop the negative term x k+1 x 2 x k x 2 + α 2 f(x k ) 2. Note from Theorem 2 that l=1 f(x l ) 2 <. Convex Optimization 8

Therefore adding α 2 l=k+1 f(x l ) 2 to both sides we obtain x k+1 x 2 + α 2 l=k+1 f(x l ) 2 x k x 2 + α 2 f(x l ) 2. l=k Define Thus, equation is u k = x k x 2 + α 2 f(x l ) 2. l=k u k+1 u k, which implies {u k } converges. We already know l=k f(x l ) 2 converges as k. Therefore, x k x converges. Therefore, {x k } is bounded => convergent subsequence exists. Take a convergent subsequence {x sk } and let limit point be x. We know Convex Optimization 9

x X from previous Theorem. Thus, lim x s k k x = 0. But we just showed that lim k x k x exists. Therefore, the limit must be 0 and {x k } converge to x. Quick Summary: Without convexity no convergence of iterates. With convexity convergence of iterates. Go back and check f(x) = 1 1+x 2. It can t be convex. General note: What do I get from all this analysis? Convex Optimization 10

Rates of convergence Successfully solved the problem. Finish course and go home? Unfortunately, no. Gradient descent has poor rates of convergence. Emperically observed when we simulate. See last page for an example. Lets consider the class of strongly convex functions. Recall, 2 f(x) mi for all x R n f(y) f(x) + f(x) T (y x) + m 2 1 2m f(x) 2 f(x) f m x 2 x 2 Convex Optimization 11

Theorem 4: Let Assumption 1 hold. Further, assume that f is strongly convex, gradients of f are Lipschitz and α with α < min(2,m). Then, M x k x cq k, 0 < q < 1. Note: Geometric rate of convergence. (Normal people call it exponential;)) Proof: Strongly convex => Unique minimum. Basic iterate relation Convex Optimization 12

with y = x gives x k+1 x 2 x k x 2 2α(f(x k ) f ) + α f(x k ) 2 x k x 2 2α m 2 x k x 2 + α f(x k ) 2 = x k x 2 2α m 2 x k x 2 + α f(x k ) f(x ) 2 x k x 2 αm x k x 2 + α 2 M x k x 2 = ( 1 mα + α 2 M ) x k x 2 = ( 1 mα + α 2 M ) k+1 x0 x 2 Remark: Result holds when 0 < α < 2 m. Geometric is not bad. But, how many functions are strongly convex? Convex Optimization 13

Constrained minimization over simple sets Assumption 2 minimize f(x) subject x X. f is continuously differentiable and convex on an open interval containing X X is closed and compact f > Optimality condition f(x ) T (x x ) 0 Simple sets: Easy projection. Ball, hyperplanes, halfspaces Assumption 2: The set X is closed and convex. Convex Optimization 14

Recall projection P X [x] x P X [x] y when y X P X [x] P X [y] x y when x, y R n Gradient projection algorithm x k+1 = P X [x k α k f(x k )] Can we obtain results similar to Theorems 1-4? Lemma 1 and Theorem 1 Let Assumption 1 and 2 hold, and suppose that the gradients are bounded. Then, for any y X, and all k x k+1 y 2 x k y 2 2α k (f(x k ) f(y)) + α 2 k f(x k) 2. Convex Optimization 15

and hence for α k = α, lim inf k f(x k) f + αl2 2 Proof: By the definition of the method, it follows that for any k, x k+1 y 2 = P X [x k α k f(x k )] y 2 x k α k f(x k ) y 2 x k y 2 2α k f(x k ) T (x k y) + α 2 k f(x k) 2. Rest identical Theorem 2: Not very interesting. After all, f(x ) need not be 0. What we could show is something like lim k f(x k ) T (x x k ) 0 for all x. Convex Optimization 16

Theorem 3: Let Assumption 2 hold. Further, assume that the gradients of f are Lipschitz and α with α < 2 M. If X is nonempty then lim x k = x, x X. k Theorem 4: Let Assumption 2 hold. Further, assume that f is strongly convex, gradients of f are Lipschitz and α with α < min(2,m). Then, M x k x c q k, 0 < q < 1. Convex Optimization 17

Proofs Pack it. I am going home! Prof. Nedić will do it. I think. Convex Optimization 18