Existence of minimizers

Similar documents
The Karush-Kuhn-Tucker (KKT) conditions

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Constrained Optimization and Lagrangian Duality

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

CSCI : Optimization and Control of Networks. Review on Convex Optimization

A Brief Review on Convex Optimization

Optimality Conditions for Constrained Optimization

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Constrained Optimization Theory

Math 273a: Optimization Subgradients of convex functions

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Lecture 1: January 12

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Convex Optimization & Lagrange Duality

SECTION C: CONTINUOUS OPTIMISATION LECTURE 9: FIRST ORDER OPTIMALITY CONDITIONS FOR CONSTRAINED NONLINEAR PROGRAMMING

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Numerical Optimization

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Constrained optimization

ARE202A, Fall Contents

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

Unconstrained minimization of smooth functions

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Stochastic Programming Math Review and MultiPeriod Models

IE 5531: Engineering Optimization I

Lecture 3. Optimization Problems and Iterative Algorithms

Optimization Tutorial 1. Basic Gradient Descent

Zangwill s Global Convergence Theorem

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lecture: Duality of LP, SOCP and SDP

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Convex Functions. Pontus Giselsson

8 Barrier Methods for Constrained Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

CO 250 Final Exam Guide

Second Order Optimality Conditions for Constrained Nonlinear Programming

Convex Functions and Optimization

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

Concave and Convex Functions 1

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

minimize x subject to (x 2)(x 4) u,

Chapter 2. Optimization. Gradients, convexity, and ALS

Finite Dimensional Optimization Part I: The KKT Theorem 1

Chapter 2 Convex Analysis

CS-E4830 Kernel Methods in Machine Learning

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Concave and Convex Functions 1

Constrained Optimization

The proximal mapping

Econ 508-A FINITE DIMENSIONAL OPTIMIZATION - NECESSARY CONDITIONS. Carmen Astorne-Figari Washington University in St. Louis.

Descent methods. min x. f(x)

CONSTRAINT QUALIFICATIONS, LAGRANGIAN DUALITY & SADDLE POINT OPTIMALITY CONDITIONS

Subgradient Method. Ryan Tibshirani Convex Optimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Inequality Constraints

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Fast proximal gradient methods

Scientific Computing: Optimization

5. Duality. Lagrangian

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

2.098/6.255/ Optimization Methods Practice True/False Questions

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

5 Handling Constraints

Convex Optimization Lecture 6: KKT Conditions, and applications

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Lecture 6 : Projected Gradient Descent

ON LICQ AND THE UNIQUENESS OF LAGRANGE MULTIPLIERS

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Radial Subgradient Descent

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Useful Math for Microeconomics

Key words. nonlinear programming, pattern search algorithm, derivative-free optimization, convergence analysis, second order optimality conditions

Convex Analysis and Optimization Chapter 2 Solutions

1 Introduction to Optimization

IFT Lecture 2 Basics of convex analysis and gradient descent

Convex Optimization M2

6. Proximal gradient method

Quadratic Programming

Sequential Unconstrained Minimization: A Survey

Optimization. A first course on mathematics for economists

Math 164-1: Optimization Instructor: Alpár R. Mészáros

Date: July 5, Contents

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

OPTIMIZATION THEORY IN A NUTSHELL Daniel McFadden, 1990, 2003

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

Linear Algebra, Summer 2011, pt. 2

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

ICS-E4030 Kernel Methods in Machine Learning

Transcription:

Existence of imizers We have just talked a lot about how to find the imizer of an unconstrained convex optimization problem. We have not talked too much, at least not in concrete mathematical terms, about the conditions under which functionals achieve their imum over a domain. Here is a fundamental result in analysis: the Weierstrass extreme value theorem. If f(x) is a continuous functional on a compact set K R N, then it attains its imum value at least once. That is, f(x) x K has a imizer on K there exists a x K such that f(x ) f(x) for all x K. For a proof of this, see just about any introductory text on analysis. The same is also true for f achieving its maximum value on K. In the unconstrained setting, we are interested in f(x), x R N where f is convex. Simple examples illustrate that the imum does not necessarily have to be achieved for any x. That is, there is no x such that f(x ) f(x) for all x R N. For example, f(x) = e x does not have a imizer on the real line, even though it is as convex and as smooth as can be. There is, however, a class of functions for which we can guarantee a global imizer in the unconstrained setting. If the sublevel sets of f, S(f, β) = {x R N : f(x) b} 1

are compact (closed and bounded), then there will be at least one global imizer. This should be easy to see just choose β such that S(f, β) is non-empty, then f(x) x S(f,β) has a imizer (by the extreme value theorem), and this also clearly corresponds to a imizer of f over R N. If f is continuous (which all convex functions with dom f = R N are), then having compact sublevel sets is the same as being coercive: for every sequence {x k } R N with x k 2, we have f(x k ) as well. (I will let you prove that at home.) Until now, we have taken it for granted that local imizers for convex functions are also global imizers. We will nail this down right now. Let f(x) be convex function on R N, and suppose x is a local imizer of f in that there exists an ɛ > 0 such that f(x ) f(x) for all x x 2 ɛ. Then x is also a global imizer: f(x ) f(x) for all x R N. To prove this, suppose that there were a ˆx x such that f(ˆx) f(x ). Then by the convexity of f, f(x + θ(ˆx x )) (1 θ)f(x ) + θf(ˆx) f(x ) for all 0 θ 1. But choosing a small enough value of θ puts x + θ(ˆx x ) in the neighborhood where x is supposed to be a local. Specifically, 2

if we take θ < ɛ/ ˆx x 2, then the inequality above directly contradicts the assertion that x is a local imizer. Thus no such ˆx can exist. Our final result in this section gives a sufficient (but definitely not necessary) condition for the imizer to be unique. Let f be a strictly convex on R N. If f has a global imizer, then it is unique. This is again easy to argue by contradiction. Let x be a global imizer, and suppose that there existed a ˆx x with f(ˆx) = f(x ). But then there would be many x which achieve smaller values, as for all 0 < θ < 1, f(θx + (1 θ)ˆx) < θf(x ) + (1 θ)f(ˆx) = f(x ). As this would contradict the assertion that x is a global imizer, no such ˆx can exist. We close this section by noting that the entire discussion above would stay the same if we replaced x R N f(x) with x U f(x) for any open set U R N. Optimality conditions: unconstrained case How do we know when we have a imizer of a convex function on our hands? What is our certificate of optimality? For the time being, we will assume that f is differentiable, that f(x) exists at 3

every point we are considering. There are a comparable set of results for non-smooth f that we will discuss in last segment of the course. In our discussion on algorithms for unconstrained optimization over the past two weeks, we have often mentioned the following, but have never actually discussed exactly why it is true. Let f be a convex differentiable function on R N. Then x solves x R N f(x) if and only if f(x ) = 0. The proof of this relies on the critical fact that we can decrease f by moving in a direction which has an obtuse angle with the gradient. Let f be a function on R N that is differentiable at x, and let d R N be a vector obeying d, f(x) < 0. Then for small enough t > 0, f(x + td) < f(x). We call such a d a descent direction from x. d, f(x) > 0, then for small enough t > 0, f(x + td) > f(x). We call such a d an ascent direction from x. Similarly, if This fundamental fact is a direct consequence of the Taylor theorem: for any u R N, f(x + u) = f(x) + u, f(x) + h(u) u 2, 4

where h(u) : R N R is some function satisfying h(u) 0 as u 0. Taking u = td, we have f(x + td) = f(x) + t ( d, f(x) + h(td) d 2 ). For t > 0 small enough, we can make h(td) d 2 < d, f(x), and so the term inside the parentheses above is negative if d, f(x) is negative, and it is positive if d, f(x) is positive. At a particular point x, the only way we can make d, f(x ) 0 for all choices of d is if f(x ) = 0. So clearly x is a imizer f(x ) = 0. On the other hand, if f is convex, then f(x + td) f(x ) + t d, f(x ), for all t R and choices of d R N. This now makes it clear that f(x ) = 0 x is a imizer. Again, for everything we have said in this section, you can use any open domain U in place of R N. Optimality conditions: constrained case In this section, we consider the general constrained problem x C f(x) where C is a closed, convex set, and f is again a convex function. We have the following fundamental result. 5

Let f be a differentiable convex function, and C be a closed convex set. Then x is a imizer of if and only if x C and for all y C. x C f(x) y x, f(x ) 0 This result is geometrically intuitive; it is saying that every vector from x to another point y in C must make an obtuse angle with f(x ). That is, there cannot be any descent directions from x that lead to another point in C. Here is a picture: f(x) (level lines) rf(x? ) x? C 6

To prove this, we first argue that y x, f(x ) 0 for all y C implies that x is optimal. Since f is convex, for any y C and so f(y) f(x ) + y x, f(x ), f(y) f(x ) y x, f(x ) 0, Since this holds for every y C, x is a imizer. Now suppose that x is a imizer. If there were a y C such that y x, f(x ) < 0, then d = y x would be a descent direction, and there would exist a 0 < t < 1 such that f(x + t(y x )) < f(x ). Since C is convex and x, y C, we know x + t(y x ) C. But this contradicts the assertion that x is a imizer, so no such y exists. Examples The abstract geometrical result in the previous section will eventually lead us to the Karush-Kuhn-Tucker (KKT) conditions. But we will build up to this by looking at what it tells us in several important (and prevalent) cases. We assume throughout this section that f is convex, differentiable, and defined on all of R N. 7

Linear constraints Consider a convex optimization problem with linear 1 constraints, x f(x) subject to Ax = b, where A is M N and b R M. At a solution x, we have y x, f(x ) 0, for all y such that Ay = b. Since Ax = b as well, this is equivalent to h, f(x ) 0, for all h Null(A). Since h Null(A) h Null(A), we must have h, f(x ) = 0, for all h Null(A), i.e. the gradient is orthogonal to the null space of A. This means that it is in the row space, f(x ) Range(A T ), and so there is a ν R M such that f(x ) + A T ν = 0. 1 We really should be saying affine constraints, but linear constraints is typical nomenclature for this type of problem. 8

Summary: x is a solution to if and only if 1. Ax = b, and x f(x) subject to Ax = b, 2. there exists ν R M such that f(x ) + A T ν = 0. f(x) (level lines) rf(x? )=A T? x? x? + Null(A) 9

Non-negativity constraints Now consider the convex program At a solution x, we will have x f(x) subject to x 0. y x, f(x ) 0, for all y R N +. (1) Since both 0 R N + and 2x R N +, this means and so x, f(x ) = 0, (2) y, f(x ) 0, for all y R N +, meaning that the gradient has only non-negative values as well, f(x ) 0. (3) The conditions (2) and (3) are sufficient as well, as together they imply (1). Condition (3) is the same as saying there exists a λ 0 such that f(x ) λ = 0. We can also see that (2) and (3), along with the fact that x R N +, mean that f(x ) and x can only be non-zero at different indices: [ f(x )] n > 0 x n = 0, x n > 0 [ f(x )] n = 0. 10

Summary: x is a solution to if and only if x f(x) subject to x 0, 1. x 0, and there exists a λ R N such that 2. λ 0, and 3. λ n x n = 0 for all n = 1,..., N, and 4. f(x ) λ = 0. x? rf(x? )=? R N + 11

A single convex inequality constraint Now consider the convex program x f(x) subject to g(x) 0, where g is also a differentiable convex function. We will argue that in this case, the optimality conditions for x, g(x ) 0, and y x, f(x ) 0, for all y with g(y) 0, are equivalent to one of these two conditions holding, 1. g(x ) < 0 and f(x ) = 0, or 2. g(x ) = 0 and the gradients of g and f are negatively aligned: g(x ) = λf(x ), for some λ > 0. Establishing this relies on the following geometric fact 2 : Let u, v be vectors in R N. If no d exists such that d, u < 0, and d, v < 0 simultaneously, (4) then u and v are negatively aligned, u = λv, for some λ > 0. (5) The converse also holds, as if (5) is true, there is no way (4) can be true. 2 This is a special case of the famous Gordan Theorem. 12

The argument for this is simple. The sets {x : x, u < 0} and {x : x, v < 0} are open half spaces, and these half spaces are disjoint if and only if (5) holds. u v {x : hx, vi =0} {x : hx, ui =0} Suppose now that there is a x such that g(x ) = 0 and and λ > 0 so that g(x ) = λ f(x ). Let x be any other feasible point; g(x) 0. Then, by the convexity of g, g(x + θ(x x )) 0, for all 0 θ 1. Since the above is true for all θ in this range, we know that x x cannot be an ascent direction for g from x. Thus x x, g(x ) 0. Since g(x ) = λ f(x ), we now know Then by the convexity of f, x x, f(x ) 0. f(x) f(x ) + x x, f(x ) f(x ), 13

and so x is a imizer. Now suppose that x, with g(x ) 0, is a imizer. We know that g(x ) and f(x ) must be negatively aligned, as otherwise our geometric fact dictates that there is a d that is a descent direction for both g and f, meaning there is a 0 < t < 1 such that f(x + td) < f(x ), and g(x + td) < g(x ) 0. This would mean that there is a feasible point at which f is smaller than it is at x, directly contradicting the assertion that x is a imizer. Thus no such d can exist. We can collect all of this into the following summary: x is a solution to if and only if x f(x) subject to g(x) 0, 1. g(x ) 0, and there exists a λ R such that 2. λ 0, and 3. λ g(x ) = 0, and 4. f(x ) + λ g(x ) = 0. 14

f(x) (level lines)? rg(x? ) x? rf(x? ) {x : g(x) apple 0} 15