Introduction to Nonlinear Stochastic Programming

Similar documents
Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming

Interior Point Methods for Linear Programming: Motivation & Theory

Lecture 3. Optimization Problems and Iterative Algorithms

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

4TE3/6TE3. Algorithms for. Continuous Optimization

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Optimality, Duality, Complementarity for Constrained Optimization

Duality Theory of Constrained Optimization

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Applications of Linear Programming

Examination paper for TMA4180 Optimization I

Nonlinear Programming

CS711008Z Algorithm Design and Analysis

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Algorithms for nonlinear programming problems II

Nonlinear Programming Models

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Algorithms for nonlinear programming problems II

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 18: Optimization Programming

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Algorithms for constrained local optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

10 Numerical methods for constrained problems

SF2822 Applied Nonlinear Optimization. Preparatory question. Lecture 9: Sequential quadratic programming. Anders Forsgren

Solution Methods for Stochastic Programs

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Optimization for Machine Learning

Lecture Note 5: Semidefinite Programming for Stability Analysis

Convex Optimization M2

Convex Functions and Optimization

5. Duality. Lagrangian

Constrained Optimization and Lagrangian Duality

Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

Second-order cone programming

Math 273a: Optimization Basic concepts

Operations Research Lecture 4: Linear Programming Interior Point Method

Optimisation in Higher Dimensions

12. Interior-point methods

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Linear and non-linear programming

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

LP. Kap. 17: Interior-point methods

Nonlinear Optimization: What s important?

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

minimize x subject to (x 2)(x 4) u,

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture 8. Strong Duality Results. September 22, 2008

Additional Homework Problems

Optimization Theory. Lectures 4-6

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Lecture 2: Convex Sets and Functions

SF2822 Applied nonlinear optimization, final exam Wednesday June

Optimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

Applied Lagrange Duality for Constrained Optimization

Miscellaneous Nonlinear Programming Exercises

Linear and Combinatorial Optimization

Primal-dual relationship between Levenberg-Marquardt and central trajectories for linearly constrained convex optimization

Generalization to inequality constrained problem. Maximize

Pacific Journal of Optimization (Vol. 2, No. 3, September 2006) ABSTRACT

Lecture: Algorithms for LP, SOCP and SDP

LECTURE 10 LECTURE OUTLINE

Convergence Analysis of Inexact Infeasible Interior Point Method. for Linear Optimization

Symmetric Matrices and Eigendecomposition

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

LMI Methods in Optimal and Robust Control

Optimization Problems with Constraints - introduction to theory, numerical Methods and applications

BASICS OF CONVEX ANALYSIS

We describe the generalization of Hazan s algorithm for symmetric programming

Convex Optimization & Lagrange Duality

Optimality Conditions for Constrained Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Interior-Point Methods

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

arxiv: v1 [math.oc] 1 Jul 2016

Chapter 2 Convex Analysis

Module 04 Optimization Problems KKT Conditions & Solvers

You should be able to...

Computational Optimization. Mathematical Programming Fundamentals 1/25 (revised)

Lecture 1: Introduction. Outline. B9824 Foundations of Optimization. Fall Administrative matters. 2. Introduction. 3. Existence of optima

Convex Optimization and Modeling

A Unified Analysis of Nonconvex Optimization Duality and Penalty Methods with General Augmenting Functions

Lecture 3 Interior Point Methods and Nonlinear Optimization

March 5, 2012 MATH 408 FINAL EXAM SAMPLE

Using interior point methods for optimization in training very large scale Support Vector Machines. Interior Point Methods for QP. Elements of the IPM

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Enhanced Fritz John Optimality Conditions and Sensitivity Analysis

Maximum likelihood estimation

CS-E4830 Kernel Methods in Machine Learning

Lagrange Relaxation and Duality

Lagrangian Duality Theory

IE 521 Convex Optimization Homework #1 Solution

Algorithms for Constrained Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Mathematical Economics. Lecture Notes (in extracts)

Lecture: Duality.

Transcription:

School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS 2007, Bergamo, April 2007 1

Outline Convexity convex sets, convex functions local optimum, global optimum Duality Lagrange duality Wolfe duality primal-dual pairs of LPs and QPs Nonlinear Stochastic Program with Recourse Interior Point Methods for Optimization unified view of Linear, Quadratic and Nonlinear programming SPS 2007, Bergamo, April 2007 2

Consider the general optimization problem min f(x) s.t. g(x) 0, where x R n, and f : R n R and g : R n R m are convex, twice differentiable. Basic Assumptions: f and g are convex If there exists a local minimum then it is a global one. f and g are twice differentiable We can use the second order Taylor approximations of them. SPS 2007, Bergamo, April 2007 3

Convexity Reading: Bertsekas, D., Nonlinear Programming, Athena Scientific, Massachusetts, 1995. ISBN 1-886529-14-0. SPS 2007, Bergamo, April 2007 4

Convexity is a key property in optimization. Def. A set C R n is convex if λx + (1 λ)y C, x, y C, λ [0, 1]. y z x z y x Convex set Nonconvex set Def. Let C be a convex subset of R n. A function f : C R is convex if f(λx + (1 λ)y) λf(x) + (1 λ)f(y), x, y C, λ [0, 1]. x z y x z Convex function Nonconvex function y SPS 2007, Bergamo, April 2007 5

Convexity and Optimization Consider a problem minimize f(x) subject to x X, where X is a set of feasible solutions and f : X R is an objective function. Def. A vector ˆx is a local minimum of f if ɛ > 0 such that f(ˆx) f(x), x x ˆx < ɛ. Def. A vector ˆx is a global minimum of f if f(ˆx) f(x), x X. SPS 2007, Bergamo, April 2007 6

Lemma. If X is a convex set and f : X R is a convex function, then a local minimum is a global minimum. Proof. Suppose that x is a local minimum, but not a global one. Then y x such that f(y)<f(x). From convexity of f, for all λ [0, 1], we have f((1 λ)x+λy) (1 λ)f(x)+λf(y) < (1 λ)f(x)+λf(x) = f(x). In particular, for a sufficiently small λ, the point z = (1 λ)x+λy lies in the ɛ-neighbourhood of x and f(z) < f(x). This contradicts the assumption that x is a local minimum. SPS 2007, Bergamo, April 2007 7

Useful properties 1. For any collection {C i i I} of convex sets, the intersection i I C i is convex. 2. If C is a convex set and f : C R is a convex function, the level sets {x C f(x) α} and {x C f(x) < α} are convex for all scalars α. 3. Let C R n be a convex set and f : C R be differentiable over C. (a) The function f is convex if and only if f(y) f(x) + T f(x)(y x), x, y C. (b) If the inequality is strict for x y, then f is strictly convex. 4. Let C R n be a convex set and f : C R be twice continuously differentiable over C. (a) If 2 f(x) is positive semidefinite for all x C, then f is convex. (b) If 2 f(x) is positive definite for all x C, then f is strictly convex. (c) If f is convex, then 2 f(x) is positive semidefinite for all x C. SPS 2007, Bergamo, April 2007 8

Lemma. If f : R n R and g : R n R m are convex, then the following general optimization problem min f(x) s.t. g(x) 0 is convex. Proof. Since the objective function f is convex, we only need to prove that the feasible set of the above problem is convex. Define for i = 1, 2,..., m From Property 2, X i is convex for all i. We observe that X = {x R n : g(x) 0} X i = {x R n : g i (x) 0}. X = {x R n : g i (x) 0, i = 1..m} = X i. i i.e., X is an intersection of convex sets and from Property 1, X is a convex set. SPS 2007, Bergamo, April 2007 9

Duality Reading: Bertsekas, D., Nonlinear Programming, Athena Scientific, Massachusetts, 1995. ISBN 1-886529-14-0. SPS 2007, Bergamo, April 2007 10

Consider a general optimization problem min f(x) s.t. g(x) 0, (1) x X R n, where f : R n R and g : R n R m. The set X is arbitrary; it may include, for example, an integrality constraint. Let ˆx be an optimal solution of (1) and define ˆf = f(ˆx). Introduce the Lagrange multiplier y i 0 for every inequality constraint g i (x) 0. Define y = (y 1,..., y m ) T and the Lagrangian y are also called dual variables. L(x, y) = f(x) + y T g(x), SPS 2007, Bergamo, April 2007 11

Consider the problem L D (y) = min x L(x, y) s.t. x X R n. Its optimal solution x depends on y and so does the optimal objective L D (y). Lemma. For any y 0, L D (y) is a lower bound on ˆf (the optimal solution of (1)), i.e., ˆf L D (y) y 0. Proof. ˆf = min {f(x) g(x) 0, x X} min { f(x) + y T g(x) g(x) 0, y 0, x X } min { f(x) + y T g(x) y 0, x X } = L D (y). Corollary. ˆf max L D(y), i.e., ˆf max min L(x, y). y 0 y 0 x X SPS 2007, Bergamo, April 2007 12

Lagrangian Duality If i g i (x) > 0, then max y 0 L(x, y) = + (we let the corresponding y i grow to + ). If i g i (x) 0, then L(x, y) = f(x), max y 0 because i y i g i (x) 0 and the maximum is attained when y i g i (x) = 0, i = 1, 2,..., m. Hence the problem (1) is equivalent to the following MinMax problem which could also be written as follows: min max L(x, y), x X y 0 ˆf = min max L(x, y). x X y 0 SPS 2007, Bergamo, April 2007 13

Consider the following problem min {f(x) g(x) 0, x X}, where f, g and X are arbitrary. With this problem we associate the Lagrangian L(x, y) = f(x) + y T g(x), y are dual variables (Lagrange multipliers). The weak duality always holds: min max L(x, y) max x X y 0 y 0 min x X L(x, y). We have not made any assumption about functions f and g and set X. If f and g are convex, X is convex and certain regularity conditions are satisfied, then min max L(x, y) = max min L(x, y). x X y 0 y 0 x X This is called the strong duality. SPS 2007, Bergamo, April 2007 14

Duality and Convexity The weak duality holds regardless of the form of functions f, g and set X: min max L(x, y) max x X y 0 y 0 min x X L(x, y). What do we need for the inequality in the weak duality to become an equation? If X R n is convex; f and g are convex; optimal solution is finite; some mysterious regularity conditions hold, then strong duality holds. That is min max L(x, y) = max x X y 0 y 0 An example of regularity conditions: x int(x) such that g(x) < 0. min x X L(x, y). SPS 2007, Bergamo, April 2007 15

Lagrange duality does not need differentiability. Suppose f and g are convex and differentiable. Suppose X is convex. The dual function L D (y) = min x X requires minimization with respect to x. Instead of minimization with respect to x, we ask for a stationarity with respect to x: L(x, y). x L(x, y) = 0. Lagrange dual problem: max y 0 Wolfe dual problem: L D(y) ( i.e., max y 0 min x X ) L(x, y). max L(x, y) s.t. x L(x, y) = 0 y 0. SPS 2007, Bergamo, April 2007 16

Dual Linear Program Consider a linear program min c T x s.t. Ax = b, x 0, where c, x R n, b R m, A R m n. We associate Lagrange multipliers y R m and s R n (s 0) with the constraints Ax = b and x 0, and write the Lagrangian L(x, y, s) = c T x y T (Ax b) s T x. To determine the Lagrangian dual L D (y, s) = min L(x, y, s) x X we need stationarity with respect to x: x L(x, y, s) = c A T y s = 0. SPS 2007, Bergamo, April 2007 17

Hence and the dual LP has a form: where y R m and s R n. L D (y, s) = c T x y T (Ax b) s T x = b T y + x T (c A T y s) = b T y. max b T y s.t. A T y + s = c, y free, s 0, Primal Problem Dual Problem min c T x max b T y s.t. Ax = b, s.t. A T y + s = c, x 0; s 0. SPS 2007, Bergamo, April 2007 18

Dual Quadratic Program Consider a quadratic program min c T x + 1 2 xt Q x s.t. Ax = b, x 0, where c, x R n, b R m, A R m n, Q R n n. We associate Lagrange multipliers y R m and s R n (s 0) with the constraints Ax = b and x 0, and write the Lagrangian L(x, y, s) = c T x + 1 2 xt Q x y T (Ax b) s T x. To determine the Lagrangian dual L D (y, s) = min L(x, y, s) x X we need stationarity with respect to x: x L(x, y, s) = c + Qx A T y s = 0. SPS 2007, Bergamo, April 2007 19

Hence L D (y, s) = c T x + 1 2 xt Q x y T (Ax b) s T x = b T y + x T (c + Qx A T y s) 1 2 xt Q x = b T y 1 2 xt Q x, and the dual QP has the form: max b T y 1 2 xt Q x s.t. A T y + s Qx = c, x, s 0, where y R m and x, s R n. Primal Problem Dual Problem min c T x + 1 2 xt Qx max b T y 1 2 xt Qx s.t. Ax = b, s.t. A T y + s Qx = c, x 0; s 0. SPS 2007, Bergamo, April 2007 20

General Stochastic Program with Recourse Reading: Kall P. and S.W. Wallace., Stochastic Programming, John Wiley & Sons, Chichester 1994, UK. SPS 2007, Bergamo, April 2007 21

General Stochastic Program with Recourse Consider a deterministic problem minimize f(x) subject to g(x) 0, x X, where f : R n R is an objective function, and g i : R n R, i = 1...m are constraints. Consider its stochastic analogue minimize f(x, ξ) subject to g(x, ξ) 0, x X, where ξ is a random vector varying over a set Ξ R k, f : R n Ξ R is an objective function, and g i : R n Ξ R, i = 1...m are constraints. SPS 2007, Bergamo, April 2007 22

Define and a nonlinear recourse function g + i (x, ξ) = { 0 if gi (x, ξ) 0, g i (x, ξ) otherwise. Q(x, ξ) = min{q(y) u i (y) g + i (x, ξ), i = 1..m, y Y R n }, where q : R n R and u i : R n R are given. Replace original stochastic program by stochastic program with recourse min x X E ξ{f(x, ξ) + Q(x, ξ)}. Convexity of Recourse Problems Lemma. If the functions q(.) and g i (., ξ), i = 1..m are convex and the functions u i (.), i = 1..m are concave, then the nonlinear recourse function Q(x, ξ) = min{q(y) u i (y) g + i (x, ξ), i = 1..m, y Y R n } (2) is convex with respect to its first argument. SPS 2007, Bergamo, April 2007 23

Proof. Let y 1 and y 2 be the optimal solutions of the recourse problems for x 1 and x 2, respectively. By the convexity of g i (., ξ) and the concavity of u i (.) we have for any λ [0, 1]: g i (λx 1 + (1 λ)x 2, ξ) λg i (x 1, ξ) + (1 λ)g i (x 2, ξ) λu i (y 1 ) + (1 λ)u i (y 2 ) u i (λy 1 + (1 λ)y 2 ). Hence ȳ = λy 1 + (1 λ)y 2 is feasible in (2) for x = λx 1 + (1 λ)x 2 and by convexity of q(.) which completes the proof. Q( x, ξ) q(ȳ) λq(y 1 ) + (1 λ)q(y 2 ) λq(x 1, ξ) + (1 λ)q(x 2, ξ), SPS 2007, Bergamo, April 2007 24

Lemma. If f(., ξ) and Q(., ξ) are convex in x ξ Ξ, and if X is a convex set, then E ξ{f(x, ξ) + Q(x, ξ)}. min x X is a convex program. Proof. Take x, y X, λ [0, 1] and z = λx+(1 λ)y. From convexity of f(., ξ) and Q(., ξ) with respect to their first argument, ξ Ξ, we have: and f(z, ξ) λf(x, ξ) + (1 λ)f(y, ξ) Q(z, ξ) λq(x, ξ) + (1 λ)q(y, ξ), respectively. Adding these inequalities, we obtain ξ Ξ implying f(z, ξ) + Q(z, ξ) λ[f(x, ξ) + Q(x, ξ)] + (1 λ)[f(y, ξ) + Q(y, ξ)], E ξ {f(z, ξ)+q(z, ξ)} λe ξ {f(x, ξ)+q(x, ξ)}+(1 λ)e ξ {f(y, Hence the objective in stochastic program with recourse is convex. ξ)+q(y, ξ)}. SPS 2007, Bergamo, April 2007 25

Interior Point Methods from LP via QP to NLP Reading: Wright S., Primal-Dual Interior-Point Methods, SIAM, 1997. Nocedal J. & S. Wright, Numerical Optimization, Springer-Verlag, 1999. Conn A., N. Gould & Ph. Toint, Trust-Region Methods, SIAM, 2000. SPS 2007, Bergamo, April 2007 26

Logarithmic barrier replaces the inequality x j 0. ln x j ln x 1 x Observe that min e n j=1 ln x j max n j=1 x j The minimization of n j=1 ln x j is equivalent to the maximization of the product of distances from all hyperplanes defining the positive orthant: it prevents all x j from approaching zero. SPS 2007, Bergamo, April 2007 27

Use Logarithmic Barrier Primal Problem Dual Problem min c T x max b T y s.t. Ax = b, s.t. A T y + s = c, x 0; s 0. Primal Barrier Problem Dual Barrier Problem min c T x n ln x j max b T y + n j=1 j=1 ln s j s.t. Ax = b, s.t. A T y + s = c, SPS 2007, Bergamo, April 2007 28

Primal Barrier Problem: min c T x µ n j=1 ln x j s.t. Ax = b. n Lagrangian: L(x, y, µ) = c T x y T (Ax b) µ ln x j, Stationarity: x L(x, y, µ) = c A T y µx 1 e = 0 j=1 Denote: s = µx 1 e, i.e. XSe = µe. The First Order Optimality Conditions are: Ax = b, A T y + s = c, XSe = µe (x, s) > 0. SPS 2007, Bergamo, April 2007 29

First Order Optimality Conditions Simplex Method: Interior Point Method: Ax = b A T y + s = c XSe = 0 x, s 0. Ax = b A T y + s = c XSe = µe x, s 0. s s s s x Basic: x > 0, s = 0 Nonbasic: x = 0, s > 0 x x "Basic": x > 0, s = 0 "Nonbasic": x = 0, s > 0 x Theory: IPMs converge in O( n) or O(n) iterations Practice: IPMs converge in O(log n) iterations... but one iteration may be expensive! SPS 2007, Bergamo, April 2007 30

Newton Method We use Newton Method to find a stationary point of the barrier problem. z f(x) Find a root of a nonlinear equation k f(x ) k z-f(x ) = f(x )(x-x ) k k A tangent line f(x) = 0. k+1 f(x ) k+2 f(x ) x k x k+1 x k+2 x z f(x k ) = f(x k ) (x x k ) is a local approximation of the graph of the function f(x). Substite z = 0 to get a new point x k+1 = x k ( f(x k )) 1 f(x k ). SPS 2007, Bergamo, April 2007 31

Newton Method The first order optimality conditions for the barrier problem form a large system of nonlinear equations F (x, y, s) = 0, where F : R 2n+m R 2n+m is an application defined as follows: Ax b F (x, y, s) = A T y + s c. XSe µe Actually, the first two terms of it are linear; only the last one, corresponding to the complementarity condition, is nonlinear. For a given point (x, y, s) we find the Newton direction ( x, y, s) by solving the system of linear equations: A 0 0 x b Ax 0 A T I y = c A T y s. S 0 X s µe XSe SPS 2007, Bergamo, April 2007 32

IPM for LP min c T x min c T x µ n j=1 ln x j s.t. Ax = b, s.t. Ax = b, x 0. The first order conditions (for the barrier problem) Ax = b, A T y + s = c, XSe = µe. Newton direction A 0 0 0 A T I S 0 X x y s = ξ p ξ d ξ µ = b Ax c A T y s µe XSe. Augmented system [ Θ 1 A T A 0 ] [ x y ] = [ ξd X 1 ξ µ ξ p ]. SPS 2007, Bergamo, April 2007 33

IPM for QP min c T x + 1 2 xt Q x min c T x + 1 2 xt Q x µ n j=1 ln x j s.t. Ax = b, s.t. Ax = b, x 0. The first order conditions (for the barrier problem) Ax = b, A T y + s Qx = c, XSe = µe. Newton direction A 0 0 Q A T I S 0 X x y s = ξ p ξ d ξ µ = b Ax c A T y s+qx µe XSe. Augmented system [ Q Θ 1 A T A 0 ] [ x y ] = [ ξd X 1 ξ µ ξ p ]. SPS 2007, Bergamo, April 2007 34

IPM for NLP min f(x) min f(x) µ m s.t. g(x) + z = 0 s.t. g(x) + z = 0, z 0. Lagrangian: L(x, y, z, µ) = f(x) + y T (g(x) + z) µ m ln z i. The first order conditions (for the barrier problem) f(x) + g(x) T y = 0, g(x) + z = 0, Y Ze = µe. Newton direction Q(x, y) A(x) T 0 A(x) 0 I 0 Z Y Augmented system [ ] [ ] Q(x, y) A(x) T x A(x) ZY 1 = y x y = z [ f(x) A(x) T y g(x) µy 1 e i=1 f(x) A(x) T y g(x) z µe Y Ze ] where. i=1 ln z i A(x) = g Q(x, y) = 2 xxl SPS 2007, Bergamo, April 2007 35

Generic Interior-Point NLP Algorithm Initialize k = 0 (x 0, y 0, z 0 ) such that y 0 > 0 and z 0 > 0, µ 0 = 1 m (y0 ) T z 0 Repeat until optimality k = k + 1 µ k = σµ k 1, where σ (0, 1) Compute A(x) and Q(x, y) = Newton direction towards µ-center Ratio test: α 1 := max {α > 0 : y + α y 0}, α 2 := max {α > 0 : z + α z 0}. Choose the step: α min {α 1, α 2 } (use trust region or line search) Make step: x k+1 = x k + α x, y k+1 = y k + α y, z k+1 = z k + α z. SPS 2007, Bergamo, April 2007 36