Chapter 4: Unconstrained nonlinear optimization

Similar documents
Chapter 2: Preliminaries and elements of convex analysis

OPTIMIZATION. joint course with. Ottimizzazione Discreta and Complementi di R.O. Edoardo Amaldi. DEIB Politecnico di Milano

Lecture 3: Basics of set-constrained and unconstrained optimization

Math 273a: Optimization Basic concepts

5.5 Quadratic programming

Nonlinear Programming Models

Parameter Estimation

Lecture 2: Convex Sets and Functions

MVE165/MMG631 Linear and integer optimization with applications Lecture 13 Overview of nonlinear programming. Ann-Brith Strömberg

Fundamentals of Unconstrained Optimization

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

Maximum likelihood estimation

Optimality Conditions

Chapter 3: Discrete Optimization Integer Programming

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

Lecture 4 September 15

3.10 Lagrangian relaxation

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

3.3 Easy ILP problems and totally unimodular matrices

Introduction to Estimation Methods for Time Series models Lecture 2

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Static Problem Set 2 Solutions

Chapter 3: Discrete Optimization Integer Programming

4TE3/6TE3. Algorithms for. Continuous Optimization

IEOR 165 Lecture 13 Maximum Likelihood Estimation

Convex Analysis and Economic Theory Winter 2018

Chapter 3. Point Estimation. 3.1 Introduction

Unconstrained Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Introduction to Convex Analysis Microeconomics II - Tutoring Class

Chapter 2 BASIC PRINCIPLES. 2.1 Introduction. 2.2 Gradient Information

Nonlinear Programming (NLP)

Finite Dimensional Optimization Part III: Convex Optimization 1

Introduction to Nonlinear Stochastic Programming

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

SOLUTION FOR HOMEWORK 7, STAT p(x σ) = (1/[2πσ 2 ] 1/2 )e (x µ)2 /2σ 2.

Optimization. A first course on mathematics for economists

Lecture 3 September 1

Numerical Optimization

Lecture 2: Convex functions

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

1 Convexity, concavity and quasi-concavity. (SB )

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Math (P)refresher Lecture 8: Unconstrained Optimization

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Optimality conditions for unconstrained optimization. Outline

SWFR ENG 4TE3 (6TE3) COMP SCI 4TE3 (6TE3) Continuous Optimization Algorithm. Convex Optimization. Computing and Software McMaster University

CE 191: Civil and Environmental Engineering Systems Analysis. LEC 05 : Optimality Conditions

Lecture 4: Optimization. Maximizing a function of a single variable

Convex Optimization Theory

EC /11. Math for Microeconomics September Course, Part II Problem Set 1 with Solutions. a11 a 12. x 2

Graduate Econometrics I: Maximum Likelihood I

Preliminary draft only: please check for final version

Convex Feasibility Problems

3.7 Strong valid inequalities for structured ILP problems

Constrained Optimization and Lagrangian Duality

NONLINEAR. (Hillier & Lieberman Introduction to Operations Research, 8 th edition)

OPER 627: Nonlinear Optimization Lecture 2: Math Background and Optimality Conditions

Optimization Theory. Lectures 4-6

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

4.5 Simplex method. min z = c T x s.v. Ax = b. LP in standard form

Performance Surfaces and Optimum Points

Convex Analysis and Optimization Chapter 2 Solutions

14 Lecture 14 Local Extrema of Function

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

6. MAXIMUM LIKELIHOOD ESTIMATION

Neural Network Training

Introduction to gradient descent

DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO

Gradient Descent. Dr. Xiaowei Huang

Appendix PRELIMINARIES 1. THEOREMS OF ALTERNATIVES FOR SYSTEMS OF LINEAR CONSTRAINTS

Microeconomics I. September, c Leopold Sögner

Convex Analysis and Optimization Chapter 4 Solutions

Maximum Likelihood Estimation

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Convex optimization problems. Optimization problem in standard form

Convexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.

Reading group: Calculus of Variations and Optimal Control Theory by Daniel Liberzon

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

An Alternative Proof of Primitivity of Indecomposable Nonnegative Matrices with a Positive Trace

4.5 Simplex method. LP in standard form: min z = c T x s.t. Ax = b

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Economics 101A (Lecture 3) Stefano DellaVigna

LECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS.

Stochastic Comparisons of Order Statistics from Generalized Normal Distributions

September Math Course: First Order Derivative

1 Directional Derivatives and Differentiability

Support Vector Machines

Convex Optimization & Lagrange Duality

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

5. Duality. Lagrangian

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

5.6 Penalty method and augmented Lagrangian method

Variational Inequalities. Anna Nagurney Isenberg School of Management University of Massachusetts Amherst, MA 01003

Constrained Optimization Theory

where u is the decision-maker s payoff function over her actions and S is the set of her feasible actions.

Transcription:

Chapter 4: Unconstrained nonlinear optimization Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-15-16.shtml Academic year 2015-16 Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 1 / 14

4.1 Examples 1) Statistical estimation A random variable X with probability density f(x,θ), where θ R m is the parameter vector, and n independent observations x 1,...,x n of X. Maximum likelihood: Estimates ˆθ of θ are derived by maximizing L(θ) = f(x 1,θ) f(x 2,θ)...f(x n,θ) Assumption: θ for which all factors are positive Since ln is monotonically increasing, ˆθ maximizes also n ln(l(θ)) = ln(f(x j,θ)) If f is differentiable with respect to θ in ˆθ, necessary optimality conditions: n θ f(x j,ˆθ) = 0 f(x j,ˆθ) j=1 j=1 Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 2 / 14

For Guassian density and θ = (µ,σ), we obtain f(x) = 1 σ µ)2 exp (x 2π 2σ 2 ln(l(θ)) = n 2 ln(2π) nln(σ) 1 2σ 2 Minimum is achieved in a stationary point: [ln(l(θ))] µ n (x j µ) 2 j=1 = 1 n (x σ 2 j µ) = 0 j=1 and [ln(l(θ))] σ = n σ + 1 n (x σ 3 j µ) 2 = 0 j=1 Therefore ˆµ = 1 n n x j ˆσ = 1 n j=1 n (x j ˆµ) 2 j=1 Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 3 / 14

2) 3-D Image Reconstruction (Computer Tomography see Chapter 1) Problem: Given V R 3 subdivided into n voxels V j and the measurements provided by m beams, reconstruct a 3-D image of V, that is, determine the density x j for each V j. i-th beam attenuation depends on the total amount of matter on the way: Let b i be the measurement of the i-th beam at the exit point. Given m beams with prescribed directions, we have: a ij x j = b i i = 1,...,m j J i x j 0 j = 1,...,n usually infeasible due to measurement errors, non uniformity of the V j s,... Since in general m < n, one possible formulation: min m i=1 (b i j J i a ij x j ) 2 +δ n j=1 x j s.t. x j 0 j = 1,...,n. with δ > 0. 3) Linear Regression... j J i a ij x j Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 4 / 14

4.2 Optimality conditions Consider a generic optimization problem: where S R n and f C 1 o C 2. Unconstrained case: S = R n min x S f(x) Extension of the necessary and sufficient optimality conditions (first and second order), and special case where f and S are convex. Definition: d R n is a feasible direction at x if α > 0 such that x +αd S α [0,α] (1) N.B.: At any interior point all directions (all d R n ) are feasible. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 5 / 14

First order necessary optimality conditions: If f C 1 on S and x is a local minimum of f over S, then for any feasible direction d R n at x t f(x)d 0, namely all feasible directions are ascent directions. Proof According to (1), we consider φ: [0,α] R such that φ(α) = f(x +αd) Since x is a local minimum of f over S, α = 0 is a local minimum of φ(α). Taylor series of φ at point α = 0 φ(α) = φ(0)+αφ (0)+o(α) N.B.: u(α) = o(α) if u(α) tends to 0 faster than α when α 0. Suppose that φ (0) < 0: if α 0 + we can neglect the asymptotic term and we have φ(α) φ(0) < 0, which contradicts the local optimality of 0. Therefore φ (0) 0 and, since φ (α) = t f(x +αd)d, we have t f(x)d 0. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 6 / 14

Example: min x1, x 2 0 f(x 1,x 2) = x 2 1 x 1 +x 2 +x 1x 2 4.8 4 3.2 2.4 1.6 0.8 0.5 1 1.5 2 2.5 x = ( 1 2 0)t is a global minimum because t f(x )d 0 for all feasible directions d in x (all those with d 2 0), even if t f(x ) = (0 3 2 ) 0. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 7 / 14

Second order necessary optimality conditions: If f C 2 on S and x is a local minimum of f over S then i) t f(x)d 0 for every d R n feasible direction at x, ii) if t f(x)d = 0 then d t 2 f(x)d 0. Proof To verify (ii), we proceed in a similar way. Suppose t f(x)d = 0, then φ(α) = φ(0)+αφ (0) + 1 }{{} 2 α2 φ (0)+o(α 2 ). 0 If φ (0) < 0, for sufficiently small values of α we have φ(α) φ(0) 1 2 α2 φ (0) < 0, namely 0 would not be a local minimum of φ(α). Hence φ (0) 0 and φ (0) = d t 2 f(x)d 0. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 8 / 14

Corollary: (Unconstrained case) If f C 2 on S and x int(s) is a local minimum of F over S, then 1 f(x) = 0 (stationarity condition) 2 2 f(x) is positive semidefinite. Proof Since x int(s), all d R n are feasible directions at x. The facts that t f(x)d 0 for every d and d imply (1). Point 2) is an immediate consequence of d t 2 f(x)d 0 for all d R n. Three types of candidate points: local minima, local maxima and saddle points. Clearly these optimality conditions are not sufficient. For instance, f(x) = x 3 with f (0) = 0 and f (0) = 0 but x = 0 is not a local minimum. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 9 / 14

Example: min x1, x 2 0 f(x 1,x 2) = x 3 1 x 2 1x 2 +2x 2 2 15 10 5 1 2 3 4 5 6 7 8 9 Candidate points: (0 0) and (6 9). The point (0 0) belongs to the boundary and (6 9) is not a local minimum even though, for x 1 = 6, x 2 = 9 it is a local minimum w.r.t. x 2 and, for x 2 = 9, x 1 = 6 it is a local minimum w.r.t. x 1. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 10 / 14

Sufficient optimality conditions: If f C 2 on S and x int(s) such that f(x) = 0 and 2 f(x) is positive definite, then x is a strict local minimum of f over S, namely Proof f(x) > f(x) x N ǫ(x) S. Let d B ǫ(0) be any feasible direction such that x +d S B ǫ(x). Then with f(x) = 0. f(x +d) = f(x)+ t f(x)d + 1 2 dt 2 f(x)d +o( d 2 ) Since 2 f(x) is positive definite, a > 0 such that d t 2 f(x)d a d 2 with a smallest eigenvalue of 2 f(x). Thus for d sufficiently small f(x +d) f(x) a 2 d 2 > 0 which implies f(x +d) > f(x), namely x is a strict local minimum along d. Since this holds d R n such that x +d S B ǫ(x), f is locally strictly convex. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 11 / 14

Convex problems min f(x) x C R n where C Rn convex and f convex We know that, if f : C R is convex, every local minimum is a global minimum. Necessary and sufficient conditions for global optimality: Let f : C R be convex of class C 1 on C R n convex. x is a global minimum of f on C if and only if t f(x )(y x ) 0 y C. Proof Necessary condition: if f C 1 and x is a local minimum (and hence, due to convexity, also global minimum) then t f(x )d 0 d feasible directions at x, namely d = y x with y C. Sufficient conditions: f is convex if and only if f(y) f(x )+ t f(x )(y x ) y C. The assumption f(x )(y x ) 0 implies that f(y) f(x ) for every y C. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 12 / 14

Definition: Let C R n be convex. Then x C is an extreme point of C if it cannot be expressed as a convex combination of two different points of C, namely implies that x 1 = x 2. x = αx 1 +(1 α)x 2 with x 1,x 2 C and α (0,1) Property: (maximization of convex functions) Let f be a convex function defined on the convex bounded closed set C. If f has a (finite) maximum over C, then there exists an optimal extreme point of C. Proof Suppose that x is a global maximum of f over C, but not an extreme point. 1) Verify that the maximum is achieved at a point on the boundary C. Since C is convex bounded and closed, any x int(c) can be expressed as a convex combination of two points y 1,y 2 C that belong to the boundary C. If x is not an extreme point, y 1,y 2 C and α [0,1] such that f(x ) αf(y 1 )+(1 α)f(y 2 ) min{f(y 1 ),f(y 2 )}. Thus also y 1 and y 2 are global maxima. Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 13 / 14

2) Suppose that x C is not an extreme point. Consider the intersection T 1 = C H, where H is a supporting hyperplane at x C. Clearly T 1 is of dimension n 1. Since T 1 is compact, there exists a global optimum x 1 of f over T 1 such that and, as previously, we have x 1 T 1. maxf(x) = f(x 1 ) = f(x ) x T 1 Claim: If x 1 is an extreme point of T 1, x 1 is also an extreme point of C. If x 1 is not an extreme point of T 1, we similarly define T 2,... In the worst case dim(t n) = 0. Such an isolated point x n is clearly an extreme point. Since an extreme point of T i is also an extreme point of T i 1, x n must be an extreme point of C. Illustrations: a polyhedron and a convex set with and infinite number of extreme points. Particular cases: Linear programming (a linear function is both convex and concave, and the polyhedron of the feasible solutions has a finite number of extreme points). Edoardo Amaldi (PoliMI) Optimization Academic year 2015-16 14 / 14