OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions

Similar documents
Lecture 3: Basics of set-constrained and unconstrained optimization

Math 273a: Optimization Basic concepts

MATH 4211/6211 Optimization Basics of Optimization Problems

Computational Optimization. Convexity and Unconstrained Optimization 1/29/08 and 2/1(revised)

Optimization Tutorial 1. Basic Gradient Descent

OPER 627: Nonlinear Optimization Lecture 9: Trust-region methods

Lec3p1, ORF363/COS323

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Lecture 2: Convex Sets and Functions

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Week 4: Calculus and Optimization (Jehle and Reny, Chapter A2)

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

Module 04 Optimization Problems KKT Conditions & Solvers

Optimality conditions for unconstrained optimization. Outline

Computational Optimization. Constrained Optimization Part 2

MATH 4211/6211 Optimization Constrained Optimization

Unconstrained Optimization

Math (P)refresher Lecture 8: Unconstrained Optimization

Optimization Part 1 P. Agius L2.1, Spring 2008

Constrained Optimization Theory

Mathematical Economics: Lecture 2

Generalization to inequality constrained problem. Maximize

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MATH529 Fundamentals of Optimization Unconstrained Optimization II

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

AM 205: lecture 18. Last time: optimization methods Today: conditions for optimality

Part 5: Penalty and augmented Lagrangian methods for equality constrained optimization. Nick Gould (RAL)

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Lecture 4 - The Gradient Method Objective: find an optimal solution of the problem

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

8 Numerical methods for unconstrained problems

Unconstrained minimization of smooth functions

E5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

ECE580 Fall 2015 Solution to Midterm Exam 1 October 23, Please leave fractions as fractions, but simplify them, etc.

4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:

Chapter 1. Optimality Conditions: Unconstrained Optimization. 1.1 Differentiable Problems

Lecture 38. Almost Linear Systems

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

Unconstrained optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Lecture 5: September 12

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Lecture 2: Convex functions

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Performance Surfaces and Optimum Points

Higher-Order Methods

Analysis/Calculus Review Day 3

September Math Course: First Order Derivative

Newton s Method. Javier Peña Convex Optimization /36-725

Algorithms for nonlinear programming problems II

Introduction to Optimization

CHAPTER 2: QUADRATIC PROGRAMMING

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Taylor Series. Math114. March 1, Department of Mathematics, University of Kentucky. Math114 Lecture 18 1/ 13

Fundamentals of Unconstrained Optimization

Chapter 2: Unconstrained Extrema

Chapter 13. Convex and Concave. Josef Leydold Mathematical Methods WS 2018/19 13 Convex and Concave 1 / 44

Optimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

OR MSc Maths Revision Course

ECE580 Solution to Problem Set 3: Applications of the FONC, SONC, and SOSC

Lecture Notes: Geometric Considerations in Unconstrained Optimization

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Advanced computational methods X Selected Topics: SGD

Line Search Methods for Unconstrained Optimisation

Nonlinear equations and optimization

March 8, 2010 MATH 408 FINAL EXAM SAMPLE

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Optimization Methods. Lecture 19: Line Searches and Newton s Method

Convex Functions and Optimization

1 Convexity, concavity and quasi-concavity. (SB )

Lecture 7 Monotonicity. September 21, 2008

Unconstrained optimization I Gradient-type methods

1 Kernel methods & optimization

14. Nonlinear equations

Math Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Second Order Optimization Algorithms I

Paul Schrimpf. October 18, UBC Economics 526. Unconstrained optimization. Paul Schrimpf. Notation and definitions. First order conditions

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

Generalized Gradient Descent Algorithms

Lecture Note 1: Background

Stochastic Programming Math Review and MultiPeriod Models

Appendix A Taylor Approximations and Definite Matrices

14 Lecture 14 Local Extrema of Function

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 3: Wednesday, Jan 9

Consequences of Continuity

Numerical Optimization

Review of Power Series

Transcription:

OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions Department of Statistical Sciences and Operations Research Virginia Commonwealth University Aug 28, 2013 (Lecture 2) Nonlinear Optimization Aug 28, 2013 1 / 16

Quiz What is my son s name? Where was my best picture taken? Yes/No questions: A convergent sequence could only have one accumulation point {kπ kπ } k=1 have many but a finite number of accumulation point We will study optimization problems with a nondifferentiable objective function A finite union of open sets is open An infinite union of closed sets is closed Lipschitz continuous functions must be continuous (Lecture 2) Nonlinear Optimization Aug 28, 2013 2 / 16

Today s Outline More math background Optimality conditions (Lecture 2) Nonlinear Optimization Aug 28, 2013 3 / 16

Cal-Cool-Less Gradient of a function f : R n R is denoted by: 2 f (x) = f (x) = [f (x)] = 2 f x 2 1 f x 1. f x n Hessian of a function f : the matrix of second partial derivatives: 2 f x 1 x 2 2 f x 1 x n. f x nx 1.. 2 f x nx 2 2 f xn 2 Hessian matrix 2 f (x) is symmetric if f is twice continuously differentiable Chain rule: [h(g(x))] = h (g(x)) g (x) (Lecture 2) Nonlinear Optimization Aug 28, 2013 4 / 16

Mean-value theorem (first-order expansion) f : R n R continuously differentiable over an open set I, then x I, p R n : 1 t (0, 1) such that f (x + p) = f (x) + f (x + tp) p 2 f (x + p) = f (x) + f (x) p + o( p ) Notation o( ): h(α) is o(α) if lim α 0 h(α) α = 0 Geometry: Gradient f (x) captures the local trend at point x (Lecture 2) Nonlinear Optimization Aug 28, 2013 5 / 16

Second-order expansion f : R n R is twice continuously differentiable over an open set I. 1 f (x + y) = f (x) + 1 0 2 f (x + ty)ydt 2 y, x + y I, α (0, 1) that f (x + y) = f (x) + y f (x) + 1 2 y 2 f (x + αy)y 3 y, x + y I, f (x + y) = f (x) + y f (x) + 1 2 y 2 f (x)y + o( y 2 ) When f = 0, 2 f becomes the principle factor. That s it! I promise we will only see f and 2 f in this course! In general, Taylor s expansion: f (x + y) = f (x) + n ( 1) k+1 1 k! f (k) y k + o( y n ) k=1 People make fun of optimization: optimization = Taylor s expansion with n = 2 (Lecture 2) Nonlinear Optimization Aug 28, 2013 6 / 16

PD and PSD A symmetric matrix A R n n is: - Positive definite (PD) if x Ax > 0, x 0 - Positive semidefinite (PSD) if x Ax 0, x R n Some useful results on PD/PSD: A R m n, A A is PSD A A is PD if and only if A has full column rank. If A R n n, A A is PD A is nonsingular A is PD A 1 is PD A is PD eig(a) > 0; PSD eig(a) 0 A is PD, λ A : smallest eigenvalue, Λ A : largest eigenvalue, then λ A x 2 2 x Ax Λ A x 2 2 (Lecture 2) Nonlinear Optimization Aug 28, 2013 7 / 16

Ideas for optimization from Cal-cool-less? Gradient, Hessian provides tools for studying local properties of a function Special Hessian matrix (PD, PSD) provides nice structures of local properties Looking for the minimum of a function? Combine them! (Lecture 2) Nonlinear Optimization Aug 28, 2013 8 / 16

Key concepts Stationary points: f (x) = 0 Local minimum ˆx : f (ˆx) f (x), x N (ˆx) S Global minimum x : f (x ) f (x), x S A sad but true fact: the algorithms in this course will only guarantee local minimum (Lecture 2) Nonlinear Optimization Aug 28, 2013 9 / 16

Optimality conditions: Necessary conditions Necessary condition of an unconstrained local minimizer x of f : FONC: Assume f C 1 on an open set I, then f (x ) = 0 We call x a stationary point Originally formulated by Fermat (1637) What if a closed set? We will see it in the 2nd half of the course Q: why not sufficient? SONC: Assume f C 2 on an open set I, then 2 f (x ) is PSD (Lecture 2) Nonlinear Optimization Aug 28, 2013 10 / 16

Optimality conditions: Sufficient conditions Only first-order information is clearly not sufficient to claim a local minimizer. SOSC Let f C 2 on an open set I, suppose x I that: f (x ) = 0 2 f (x ) is PD Then x is a strict unconstrained local minimizer. Furthermore, δ that f (x) f (x ) + δ 2 x x 2, x {x x x ɛ}, ɛ > 0 The gap between SONC and SOSC is very narrow! But there is no sufficient and necessary optimality condition in general (Lecture 2) Nonlinear Optimization Aug 28, 2013 11 / 16

How we shall use the optimality conditions 1 Use them directly: First find all points satisfying f (x) = 0 Then check SONC, see if 2 f (x) is PSD Finally, for the remaining points, check if 2 f (x) is PD Q: What do we get after all these? 2 Alternatively: find all points satisfying FONC (stationary points), and pick the one with the minimum objective value Q: Is it a global minimum? 3 In general, solving f (x) = 0 could be as hard as solving the original optimization problem! (Lecture 2) Nonlinear Optimization Aug 28, 2013 12 / 16

Optimality conditions and optimization algorithms 1 Algorithms check whether a solution satisfy the optimality condition, and terminate if so 2 The behavior of many algorithms in the neighborhood of a local minimum depends on whether certain optimality conditions hold If SOSC holds, then typically algorithms converge very fast locally Keep them in mind! We will see them a lot soon (Lecture 2) Nonlinear Optimization Aug 28, 2013 13 / 16

All we have is the LOCAL result... Accent of a nonlinear optimizer: solve = find a local optimum Which local optimum we find depends on the starting point. (Lecture 2) Nonlinear Optimization Aug 28, 2013 14 / 16

Key concept: Convexity From now on, convexity is your best friend A set S is convex, if x, y S, then αx + (1 α)y S, α [0, 1] A function f is convex, if: (1) Its domain (where it is defined) S is a convex set (2) f (αx +(1 α)y) αf (x)+(1 α)f (y), α [0, 1] x, y S A $1,000,000,000 result Optimizing a convex function over a convex set: ANY local optimal solution is globally optimal! (Lecture 2) Nonlinear Optimization Aug 28, 2013 15 / 16

Next time Optimization under convexity: optimization utopia No class next Monday (Labor Day) Homework 1 out, due on Sept. 9th (Lecture 2) Nonlinear Optimization Aug 28, 2013 16 / 16