Math 273a: Optimization Basic concepts

Similar documents
Lecture 3: Basics of set-constrained and unconstrained optimization

MATH 4211/6211 Optimization Basics of Optimization Problems

Computational Optimization. Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

MATH 4211/6211 Optimization Constrained Optimization

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

Math 273a: Optimization Netwon s methods

ECE580 Solution to Problem Set 3: Applications of the FONC, SONC, and SOSC

Math (P)refresher Lecture 8: Unconstrained Optimization

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Mathematical Economics. Lecture Notes (in extracts)

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Lecture Unconstrained optimization. In this lecture we will study the unconstrained problem. minimize f(x), (2.1)

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Scientific Computing: Optimization

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

CHAPTER 2: QUADRATIC PROGRAMMING

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Nonlinear Programming (NLP)

LMI Methods in Optimal and Robust Control

Optimality Conditions for Constrained Optimization

Lecture 4: Convex Functions, Part I February 1

Unconstrained Optimization

Module 04 Optimization Problems KKT Conditions & Solvers

Optimality conditions for unconstrained optimization. Outline

Symmetric Matrices and Eigendecomposition

ECE580 Solution to Problem Set 6

Numerical Optimization

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

Constrained optimization: direct methods (cont.)

Gradient Descent. Dr. Xiaowei Huang

Convex Optimization and Modeling

Nonlinear Programming Models

Introduction to Nonlinear Stochastic Programming

Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2)

Computational Finance

IE 5531: Engineering Optimization I

1 Computing with constraints

Chapter 4: Unconstrained nonlinear optimization

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Constrained Optimization Theory

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

1 Strict local optimality in unconstrained optimization

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

Optimality Conditions

Chapter 2: Unconstrained Extrema

MA102: Multivariable Calculus

Examination paper for TMA4180 Optimization I

MATH529 Fundamentals of Optimization Unconstrained Optimization II

Fundamentals of Unconstrained Optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

Nonlinear Optimization: What s important?

Computational Optimization. Constrained Optimization Part 2

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Optimization: an Overview

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Math 164: Optimization Barzilai-Borwein Method

Math 164 (Lec 1): Optimization Instructor: Alpár R. Mészáros

Second Order Optimality Conditions for Constrained Nonlinear Programming

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Applications of Linear Programming

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

Introduction to gradient descent

Preliminary draft only: please check for final version

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

Introduction to unconstrained optimization - direct search methods

Math 164-1: Optimization Instructor: Alpár R. Mészáros

Generalization to inequality constrained problem. Maximize

Lecture 3. Optimization Problems and Iterative Algorithms

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

Optimization Tutorial 1. Basic Gradient Descent

MATH2070 Optimisation

Mathematical Economics (ECON 471) Lecture 3 Calculus of Several Variables & Implicit Functions

Mechanical Systems II. Method of Lagrange Multipliers

Functions of Several Variables

Deep Learning. Authors: I. Goodfellow, Y. Bengio, A. Courville. Chapter 4: Numerical Computation. Lecture slides edited by C. Yim. C.

EC /11. Math for Microeconomics September Course, Part II Problem Set 1 with Solutions. a11 a 12. x 2

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

Numerical Optimization

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 18: Optimization Programming

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Constrained Optimization and Lagrangian Duality

LECTURE 22: SWARM INTELLIGENCE 3 / CLASSICAL OPTIMIZATION

Transcription:

Math 273a: Optimization Basic concepts Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 slides based on Chong-Zak, 4th Ed.

Goals of this lecture The general form of optimization: minimize f(x) subject to x Ω We study the following topics: terminology types of minimizers optimality conditions

Unconstrained vs constrained optimization minimize f(x) subject to x Ω Suppose x R n, Ω is called the feasible set. if Ω = R n, then the problem is called unconstrained. otherwise, the problem is called constrained. In general, more sophisticated techniques are needed to solve constrained problems.

(off the topic) Later, we will study some nonsmooth analysis and algorithms that allow f to have the extended value,. Then, we can write any constrained problem in the unconstrained form where the indicator function minimize f(x) + ι Ω (x) ι Ω (x) = { 0, x Ω,, x Ω. The objective function f(x) + ι Ω (x) is nonsmooth.

Types of solutions x is a local minimizer if there is ɛ > 0 such that f(x) f(x ) for all x Ω \ {x } and x x < ɛ x is a global minimizer if f(x) f(x ) for all x Ω \ {x } If is replaced with >, then they are strict local minimizer and strict global minimizer, respectively. x 1: strict global minimizer; x 2: strict local minimizer; x 3: local minimizer

Convexity and global minimizers A set Ω is convex if λx + (1 λ)y Ω for any x, y Ω and λ [0, 1]. A function is convex if f(λx + (1 λ)y) λf(x) + (1 λ)f(y) for any x, y Ω and λ [0, 1]. A function is convex if and only if its graph is convex. An optimization problem is convex if both the objective function and feasible set are convex. Theorem: Any local minimizer of a convex optimization problem is a global minimizer.

Derivatives First-order derivative: row vector Gradient of f: f = (Df) T, which is a column vector. A gradient represents the slope of the tangent of the graph of function. It gives the linear approximation of f at a point. It points toward the greatest rate of increase.

Hessian (i.e., second-derivative) of f: which is a symmetric matrix. (F (x)) ij = 2 f x i x j = 2 f x j x i. For one-dimensional function f(x) where x R, it reduces to f (x). F (x) is the Jacobian of f(x), that is, F (x) = J( f(x)). Alternative notation: H(x) and 2 f(x) are also used for Hessian. A Hessian gives a quadratic approximation of f at a point. Gradient and Hessian are local properties that help us recognize local solutions and determine a direction to move at toward the next point.

Example Consider Then, and f(x 1, x 2) = x 3 1 + x 2 1 x 1x 2 + x 2 2 + 5x 1 + 8x 2 + 4 [ ] 3x 2 1 + 2x 1 x 2 + 5 f(x) = R 2 x 1 + 2x 2 + 8 [ ] 6x1 + 2 1 F (x) = R 2 2. 1 2 Observation: if f is a quadratic function (remove x 3 1 in the above example), f(x) is a linear vector and F (x) is a symmetric constant matrix for any x.

Taylor expansion Suppose φ C m (m times continuously differentiable). The Taylor expansion of φ at a point a is φ(a + h) = φ(a) + φ (a)h + φ (a) 2! There are other ways to write the last two terms. h 2 + + φm (a) h m + o(h m ). 1 m! Example: Consider x, d R n and f C 2. Define φ(α) = f(x + αd). Then, φ (α) = f(x + αd) T d φ (α) = d T F (x + αd) T d Hence, f(x + αd) = f(x) + ( f(x) T d ) α + o(α) = f(x) + ( f(x) T d ) α + dt F (x) T d α 2 + o(α 2 ). 2 1 o(α) collects the term(s) that is asymptotically smaller than α near 0, that is, o(α) α 0, as α 0.

Feasible direction A vector d R n is a feasible direction at x Ω if d 0 and x + αd Ω for some small α > 0. (It is possible that d is an infeasible step, that is, x + d Ω. But if there is some room in Ω to move from x toward d, then d is a feasible direction.) d 1 is feasible, d 2 is infeasible If Ω = R n or x lies in the interior of Ω, then any d R n \ {0} is a feasible direction Feasible directions are introduced to establish optimality conditions, especially for points on the boundary of a constrained problem

First-order necessary condition Let C 1 be the set of continuously differentiable functions. Proof: Let d by any feasible direction. First-order Taylor expansion: f(x + αd) = f(x ) + αd T f(x ) + o(α). If d T f(x ) < 0, which does not depend on α, then f(x + αd) < f(x ) for all sufficiently small α > 0 (that is, all α (0, ᾱ) for some ᾱ > 0). This is a contradiction since x is a local minimizer.

Proof: Since any d R n \ {0} is a feasible direction, we can set d = f(x ). From Theorem 6.1, we have d T f(x ) = f(x ) 2 0. Since f(x ) 2 0, we have f(x ) 2 = 0 and thus f(x ) = 0. Comment: This condition also reduces the problem minimize f(x) to solving the equation f(x ) = 0.

x 1 fails to satisfy the FONC; x 2 satisfies the FONC

Second-order necessary condition In FONC, there are two possibilities d T f(x ) > 0; d T f(x ) = 0. In the first case, f(x + αd) > f(x ) for all sufficiently small α > 0. In the second case, the vanishing d T f(x ) allows us to check higher-order derivatives.

Let C 2 be the set of twice continuously differentiable functions. Proof: Assume that a feasible direction d with d T f(x ) = 0 and d T F (x )d < 0. By 2nd-order Taylor expansion (with a vanishing 1st order term), we have f(x + αd) = f(x ) + dt F (x )d α 2 + o(α 2 ), 2 where by our assumption d T F (x )d < 0. Hence, for all sufficiently small α > 0, we have f(x + αd) < f(x ), which contradicts that x is a local minimizer.

The necessary conditions are not sufficient Counter examples f(x) = x 3, f (x) = 3x 2, f (x) = 6x f(x) = x 2 1 x 2 2 0 is a saddle point: f(0) = 0 but neither a local minimizer nor maximizer By SONC, 0 is not a local minimizer!

Second-order sufficient condition Comments: part 2 states F (x ) is positive definite: x T F (x )x > 0 for x 0. the condition is not necessary for strict local minimizer. Proof: For any d 0 and d = 1, we have d T F (x )d λ min(f (x )) > 0. Use the 2nd order Taylor expansion f(x +αd) = f(x )+ α2 2 dt F (x )d+o(α 2 ) f(x )+ α2 2 λmin(f (x ))+o(α 2 ). Then, ᾱ > 0, regardless of d, such that f(x + αd) > f(x ), α (0, ᾱ).

Graph of f(x) = x 2 1 + x 2 2 The point 0 satisfies the SOSC.

Roles of optimality conditions Recognize a solution: given a candidate solution, check optimality conditions to verify it is a solution. Measure the quality of an approximate solution: measure how close a point is to being a solution Develop algorithms: reduce an optimization problem to solving a (nonlinear) equation (finding a root of the gradient). Later, we will see other forms of optimality conditions and how they lead to equivalent subproblems, as well as algorithms

Quiz questions 1. Show that for Ω = {x R n : Ax = b}, d 0 is a feasible direction at x Ω if and only if Ad = 0. 2. Show that for any unconstrained quadratic program, which has the form minimize f(x) := 1 2 xt Qx b T x, if x satisfies the second-order necessary condition, then x is a global minimizer. 3. Show that for any unconstrained quadratic program with Q 0 (Q is symmetric and positive semi-definite), x is a global minimizer if and only if x satisfies the first-order necessary condition. That is, the problem is equivalent to solving Qx = b. 4. Consider minimize c T x, subject to x Ω. Suppose that c 0 and the problem has a global minimizer. Can the minimizer lie in the interior of Ω?