Lecture Unconstrained optimization. In this lecture we will study the unconstrained problem. minimize f(x), (2.1)

Similar documents
1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

Computational Optimization. Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

Math (P)refresher Lecture 8: Unconstrained Optimization

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Symmetric Matrices and Eigendecomposition

Math 273a: Optimization Basic concepts

CHAPTER 4: HIGHER ORDER DERIVATIVES. Likewise, we may define the higher order derivatives. f(x, y, z) = xy 2 + e zx. y = 2xy.

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

Performance Surfaces and Optimum Points

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Lecture 4: Convex Functions, Part I February 1

Preliminary draft only: please check for final version

MATH529 Fundamentals of Optimization Unconstrained Optimization II

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Convex Functions and Optimization

Lecture 3. Optimization Problems and Iterative Algorithms

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

1 Convexity, concavity and quasi-concavity. (SB )

Problem Set 0 Solutions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Functions of Several Variables

1 Kernel methods & optimization

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

A function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

ARE211, Fall2015. Contents. 2. Univariate and Multivariate Differentiation (cont) Taylor s Theorem (cont) 2

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Calculus 2502A - Advanced Calculus I Fall : Local minima and maxima

REVIEW OF DIFFERENTIAL CALCULUS

Unconstrained minimization of smooth functions

Examination paper for TMA4180 Optimization I

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

MA102: Multivariable Calculus

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Constrained Optimization and Lagrangian Duality

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Nonlinear Programming Models

BASICS OF CONVEX ANALYSIS

Summary Notes on Maximization

Chapter 2 Convex Analysis

Twice Differentiable Characterizations of Convexity Notions for Functions on Full Dimensional Convex Sets

is a new metric on X, for reference, see [1, 3, 6]. Since x 1+x

Gradient Descent. Dr. Xiaowei Huang

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Economics 101A (Lecture 3) Stefano DellaVigna

Semidefinite Programming Basics and Applications

Lecture 1: Introduction

x +3y 2t = 1 2x +y +z +t = 2 3x y +z t = 7 2x +6y +z +t = a

Nonlinear Optimization

Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2)

Lab 1: Iterative Methods for Solving Linear Systems

Applications of Linear Programming

Math 5BI: Problem Set 6 Gradient dynamical systems

Functions of Several Variables

Seminars on Mathematics for Economics and Finance Topic 5: Optimization Kuhn-Tucker conditions for problems with inequality constraints 1

CHAPTER 1. LINEAR ALGEBRA Systems of Linear Equations and Matrices

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions

Introduction to Unconstrained Optimization: Part 2

Review of Optimization Methods

Nonlinear Optimization for Optimal Control

MATH36061 Convex Optimization

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Introduction to Nonlinear Stochastic Programming

Monotone Function. Function f is called monotonically increasing, if. x 1 x 2 f (x 1 ) f (x 2 ) x 1 < x 2 f (x 1 ) < f (x 2 ) x 1 x 2

Faculty of Engineering, Mathematics and Science School of Mathematics

Nonlinear Programming (NLP)

Differentiation of functions of covariance

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

2019 Spring MATH2060A Mathematical Analysis II 1

Tangent spaces, normals and extrema

MATH2070 Optimisation

CPSC 540: Machine Learning

10-725/36-725: Convex Optimization Prerequisite Topics

Optimality Conditions for Constrained Optimization

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Chapter 1 Preliminaries

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

FIRST-ORDER SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS III: Autonomous Planar Systems David Levermore Department of Mathematics University of Maryland

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

CO 250 Final Exam Guide

Lagrange multipliers. Portfolio optimization. The Lagrange multipliers method for finding constrained extrema of multivariable functions.

MATH Max-min Theory Fall 2016

SECOND-ORDER CHARACTERIZATIONS OF CONVEX AND PSEUDOCONVEX FUNCTIONS

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44

Chapter 2. Optimization. Gradients, convexity, and ALS

LMI Methods in Optimal and Robust Control

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

1 Strict local optimality in unconstrained optimization

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

2x (x 2 + y 2 + 1) 2 2y. (x 2 + y 2 + 1) 4. 4xy. (1, 1)(x 1) + (1, 1)(y + 1) (1, 1)(x 1)(y + 1) 81 x y y + 7.

McMaster University CS-4-6TE3/CES Assignment-1 Solution Set

DEPARTMENT OF MATHEMATICS AND STATISTICS UNIVERSITY OF MASSACHUSETTS. MATH 233 SOME SOLUTIONS TO EXAM 2 Fall 2018

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

II. An Application of Derivatives: Optimization

Lecture: Adaptive Filtering

Convex Optimization. (EE227A: UC Berkeley) Lecture 28. Suvrit Sra. (Algebra + Optimization) 02 May, 2013

Transcription:

Lecture 2 In this lecture we will study the unconstrained problem minimize f(x), (2.1) where x R n. Optimality conditions aim to identify properties that potential minimizers need to satisfy in relation to f(x). We will review the well known local optimality conditions for differentiable functions from calculus. We then introduce convex functions and discuss some of their properties. 2.1 Unconstrained optimization Solutions to (2.1) come in different flavours, as in the following definition. Definition 2.1. A point x R n is a global minimizer of (2.1) if for all x R n, f(x ) f(x); a local minimizer, if there is an open neighbourhood U of x such that f(x ) f(x) for all x U; a strict local minimizer, if there is an open neighbourhood U of x such that f(x ) < f(x) for all x U; an isolated minimizer if there is an open neighbourhood U of x such that x is the only local minimizer in U. Without any further assumptions on f, finding a minimizer is a hopeless task: we simply can t examine the function at all points in R n. The situation becomes more tractable if we assume some smoothness conditions. Recall that C k (U) denotes the set of functions that are k times continuously differentiable on some set U. The following first-order necessary condition for optimality is well known. We write f(x) for the gradient of f at x, i.e., the vector f(x) = ( f (x),..., f ) (x) x 1 x n 1

2 Theorem 2.2. Let x be a local minimizer of f and assume that f C 1 (U) for a neighbourhood of U of x. Then f(x ) = 0. There are simple examples that show that this is not a sufficient condition: maxima and saddle points will also have a vanishing gradient. If we have access to secondorder information, in form of the second derivative, or Hessian, of f, then we can say more. Recall that the Hessian of f at x, 2 f(x), is the d d symmetric matrix given by the second derivatives, ) ( 2 2 f f(x) = x i x j. 1 i,j n In the one-variable case we have learned that if x is a local minimizer of f C 2 ([a, b]), then f (x ) = 0 and f (x ) 0. Moreover, the conditions f (x ) = 0 and f (x ) > 0 guarantee that we have a local minimizer. These conditions generalise to higher dimension, but first we need to know what f (x) > 0 when we have more than one variable. Recall also that a matrix A is positive semidefinite, written A 0, if for every x R d, x Ax 0, and positive definite, written A 0, if x Ax > 0. The property that the Hessian matrix is positive semidefinite is a multivariate generalization of the property that the second derivative is nonnegative. The known conditions for a minimizer involving the second derivative generalize accordingly. Theorem 2.3. Let f C 2 (U) for some open set U and x U. If x is a local minimizer, then f(x ) = 0 and 2 f(x ) is positive semidefinite. Conversely, if f(x ) = 0 and 2 f(x ) is positive definite, then x is a strict local minimizer. Unfortunately, the above criteria are not able to identify global minimizers, as differentiability is a local property. 2.2 Convex functions We now come to the central notion of this course. Definition 2.4. A set C R n is convex if for all x, y C and λ [0, 1], the line λx + (1 λ)y C. A convex body is a convex set that is closed and bounded. Definition 2.5. Let S R n. A function f : S R is called convex if S is convex and for all x, y S and λ [0, 1], f(λx + (1 λ)y) λf(x) + (1 λ)f(y). The function f is called strictly convex if f(λx + (1 λ)y) < λf(x) + (1 λ)f(y). A function f is called concave, if f is convex.

2.2. CONVEX FUNCTIONS 3 Figure 2.1: A convex set and a convex function Figure 2.1 illustrates how a convex function of one variable looks like. The graph of the function lies below any line connecting two points on it. Convex function have pleasant properties, while at the same time covering many of the functions that arise in applications. Perhaps the most important property is that local minima are global minima. Theorem 2.6. Let f : R n R be a convex function. Then any local minimizer of f is a global minimizer. Proof. Let x be a local minimizer and assume that it is not a global minimizer. Then there exists a vector y R d such that f(y) < f(x ). Since f is convex, for any λ [0, 1] and x = λy + (1 λ)x we have f(x) λf(y) + (1 λ)f(x ) < λf(x ) + (1 λ)f(x ) = f(x ). This holds for all x on the line segment connecting y and x. Since every open neighbourhood U of x contains a bit of this line segment, this means that every open neighbourhood U of x contains an x x such that f(x) f(x ), in contradiction to the assumption that x is a local minimizer. It follows that x has to be a global minimizer. Remark 2.7. Note that in the above theorem we made no assumptions about the differentiability of the function f! In fact, while a convex function is always continuous, it need not be differentiable. The function f(x) = x is a typical example: it is convex, but not differentiable at x = 0. Example 2.8. Affine functions f(x) = x, a + b and the exponential function e x are examples of convex functions. Example 2.9. In optimization we will often work with functions of matrices, where an m n matrix is considered as a vector in R m n = R mn. If the matrix is symmetric, that is, if A = A, then we only care about the upper diagonal entries, and we consider the space S n of symmetric matrices as a vector space of dimension n(n + 1)/2 (the number of entries on and above the main diagonal). Important functions on symmetric matrices that are convex are the operator norm A 2, defined as Ax 2 A 2 := max, x: x 1 x 2

4 or the function log det(x), defined on the set of positive semidefinite symmetric matrices S n +. There are useful ways of characterising convexity using differentiability. Theorem 2.10. x, y R n, 1. Let f C 1 (R n ). Then f is convex if and only if for all f(y) f(x) + f(x) (y x). 2. Let f C 2 (R n ). Then f is convex if and only if 2 f(x) is positive semidefinite. If 2 f(x) is positive definite, then f is strictly convex. Example 2.11. Consider a quadratic function of the form f(x) = 1 2 x Ax + b x + c, where A R n n is symmetric. Writing out the product, we get x T Ax = ( a 11 a 1n x 1 ) x 1 x n...... a n1 a nn x n = ( a 11 x 1 + + a 1n x n ) x 1 x n. = a n1 x 1 + + a nn x i=1 n a ij x i x j. Because A is symmetric, we have a ij = a ji, and the above product simplifies to x T Ax = a ii x 2 i + 2 i=1 1 i<j n a ij x i x j. This is a quadratic function, because it involves products of the x i. The gradient and the Hessian of f(x) are found by computing the partial derivatives of f: In summary, we have f x i = a ij x j + b i, j=1 2 f x i x j = a ij. f(x) = Ax + b, 2 f(x) = A. Using the previous theorem, we see that f is convex if and only if A is positive semidefinite. A typical example for such a function is j=1 f(x) = Ax b 2 = (Ax b) T (Ax b) = x T A T Ax 2b T Ax + b T b. The matrix A T A is always symmetric and positive semidefinite (why?) so that the function f is convex.

2.2. CONVEX FUNCTIONS 5 A convenient way to visualise a function f : R 2 R is through contour plots. A level set of the function f is a set of the form {x : f(x) = c}, where c is the level. Each such level set is a curve in R 2, and a contour plot is a plot of a collection of such curves for various c. If one colours the areas between adjacent curves, one gets a plot as in the following figure. A convex function is has the property that there is only one sink in the contour plot. In [1]: # import numpy as np import numpy.linalg as la import matplotlib.pyplot as plt % matplotlib inline # Create random data: we use the randn function X = np.random.randn(3,2) y = np.random.randn(3) # Solve least squares problem minimize \ X\beta-y\ ^2 # the index 0 says that we get the first component of the solution # (the function lstsq give more output than just the beta vector) beta = la.lstsq(x,y)[0] # Create function and plot the contours def f(a,b): return sum((a*x[:,0]+b*x[:,1]-y)**2) # Find the "right" boundaries around the minimum xx = np.linspace(beta[0]-8,beta[0]+8,100) yy = np.linspace(beta[1]-8,beta[1]+8,100) XX, YY = np.meshgrid(xx,yy) Z = np.zeros(xx.shape) for i in range(z.shape[0]): for j in range(z.shape[1]): Z[i,j] = f(xx[i,j],yy[i,j]) cmap = plt.cm.get_cmap("coolwarm") plt.contourf(xx,yy,z, cmap = cmap) plt.show()