Stochastic Programming Math Review and MultiPeriod Models

Similar documents
Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

CO 250 Final Exam Guide

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

SMPS and the Recourse Function

CE 191: Civil & Environmental Engineering Systems Analysis. LEC 17 : Final Review

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

Chance Constrained Programming

Math 273a: Optimization Subgradients of convex functions

Lecture 3. Optimization Problems and Iterative Algorithms

Constrained Optimization and Lagrangian Duality

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Outline. Roadmap for the NPP segment: 1 Preliminaries: role of convexity. 2 Existence of a solution

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Optimality Conditions for Constrained Optimization

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture 18: Optimization Programming

University of California, Davis Department of Agricultural and Resource Economics ARE 252 Lecture Notes 2 Quirino Paris

Existence of minimizers

1. f(β) 0 (that is, β is a feasible point for the constraints)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Primal/Dual Decomposition Methods

Stochastic Decomposition

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

1 Convexity, concavity and quasi-concavity. (SB )

Lecture 6: Conic Optimization September 8

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

2.098/6.255/ Optimization Methods Practice True/False Questions

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

LECTURE 10 LECTURE OUTLINE

Constrained Optimization

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

LECTURE 7 Support vector machines

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Stochastic Integer Programming

Concave programming. Concave programming is another special case of the general constrained optimization. subject to g(x) 0

Optimization. A first course on mathematics for economists

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Lecture 6 : Projected Gradient Descent

Constrained and Unconstrained Optimization Prof. Adrijit Goswami Department of Mathematics Indian Institute of Technology, Kharagpur

January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

CS-E4830 Kernel Methods in Machine Learning

Convex Optimization & Lagrange Duality

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

Subgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives

Support Vector Machines

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

Optimization Tutorial 1. Basic Gradient Descent

BASICS OF CONVEX ANALYSIS

FIN 550 Exam answers. A. Every unconstrained problem has at least one interior solution.

Lagrange Relaxation and Duality

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Nonlinear Programming and the Kuhn-Tucker Conditions

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Introduction to Nonlinear Stochastic Programming

Selected Topics in Optimization. Some slides borrowed from

Convex Optimization M2

ICS-E4030 Kernel Methods in Machine Learning

We are now going to move on to a discussion of Inequality constraints. Our canonical problem now looks as ( ) = 0 ( ) 0

Numerical Optimization Techniques

Linear and Combinatorial Optimization

Convex Optimization Lecture 6: KKT Conditions, and applications

Constrained Optimization Theory

Chapter 2. Optimization. Gradients, convexity, and ALS

IE 5531: Engineering Optimization I

Chap 2. Optimality conditions

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Lecture 8 Plus properties, merit functions and gap functions. September 28, 2008

CSC 411 Lecture 17: Support Vector Machine

minimize x subject to (x 2)(x 4) u,

Generalization to inequality constrained problem. Maximize

How to Characterize Solutions to Constrained Optimization Problems

MATH2070 Optimisation

Support Vector Machine

IE 5531: Engineering Optimization I

Constrained maxima and Lagrangean saddlepoints

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

TMA 4180 Optimeringsteori KARUSH-KUHN-TUCKER THEOREM

Math 273a: Optimization Basic concepts

Financial Optimization ISE 347/447. Lecture 21. Dr. Ted Ralphs

36106 Managerial Decision Modeling Linear Decision Models: Part II

LECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions

Convex Optimization Notes

Convex Programs. Carlo Tomasi. December 4, 2018

EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

4TE3/6TE3. Algorithms for. Continuous Optimization

Support Vector Machine (SVM) and Kernel Methods

The general programming problem is the nonlinear programming problem where a given function is maximized subject to a set of inequality constraints.

Useful Math for Microeconomics

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES

Optimality Conditions for Nonsmooth Convex Optimization

LINEAR PROGRAMMING II

ARE202A, Fall 2005 CONTENTS. 1. Graphical Overview of Optimization Theory (cont) Separating Hyperplanes 1

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Numerical Optimization

Convex Functions and Optimization

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Support Vector Machines: Maximum Margin Classifiers

Transcription:

IE 495 Lecture 5 Stochastic Programming Math Review and MultiPeriod Models Prof. Jeff Linderoth January 27, 2003 January 27, 2003 Stochastic Programming Lecture 5 Slide 1

Outline Homework questions? I would start on it fairly soon if I were you... A fairly lengthy math (review?) session Differentiability KKT Conditions Modeling Examples Jacob and MIT Multi-period production planning January 27, 2003 Stochastic Programming Lecture 5 Slide 2

Yucky Math Review Derivative Let f be a function from R n R. The directional derivative f of f with respect to the direction d is f (x, d) = lim λ 0 f(x + λd) f(x) λ If this direction derivative exists and has the same value for all d R n, then f is differentiable. The unique value of the derivative is called the gradient of f at x We denote its value as f(x). January 27, 2003 Stochastic Programming Lecture 5 Slide 3

Not Everything is Differentiable Probably, everything you have ever tried to optimize has been differentiable. This will not be the case in this class! Even nice, simple, convex functions may not be differentiable at all points in their domain. Examples? A vector η R n is a subgradient of a convex function f at a point x iff (if and only if) f(z) f(x) + η T (z x) z R n The graph of the (linear) function h(z) = f(x) + η T (z x) is a supporting hyperplane to the convex set epi(f) at the point (x, f(x)). January 27, 2003 Stochastic Programming Lecture 5 Slide 4

More Definitions The set of all subgradients of f at x is called the subdifferential of f at x. Denoted by f(x)? Is f(x) a convex set? Thm: η f(x) iff f (x, d) η T d d R n January 27, 2003 Stochastic Programming Lecture 5 Slide 5

Optimality Conditions We are interested in determining conditions under which we can verify that a solution is optimal. To KISS, we will (for now) focus on minimizing functions that are One-dimensional Continuous ( f(a) f(b) L a b ) Differentiable Recall: a function f(x) is convex on a set S if for all a S and b S, f(λa + (1 λ)b) λf(a) + (1 λ)b. January 27, 2003 Stochastic Programming Lecture 5 Slide 6

Why do we care? Because they are important Because Prof. Linderoth says so! Many optimization algorithms work to find points that satisfy these conditions When faced with a problem that you don t know how to handle, write down the optimality conditions Often you can learn a lot about a problem, by examining the properties of its optimal solutions. January 27, 2003 Stochastic Programming Lecture 5 Slide 7

Preliminaries Call the following problem P: z = min f(x) : x S Def: Any point x S that gives a value of f(x ) = z is the global minimum of P. x = arg min x S f(x). Def: Local minimum of P: Any point x l S such that f(x l ) f(y) for all y in the neighborhood of x l. (y S N ɛ (x l )). Thm: Assume S is convex, then if f(x) is convex on S, then any local minimum of P is a global minimum of P. January 27, 2003 Stochastic Programming Lecture 5 Slide 8

Oh No A Proof! Since x l is a local minimum, N ɛ (x l ) around x l such that f(x) f(x l ) x S N ɛ (x l ). Suppose that x l is not a global minimum, so ˆx S such that f(ˆx) < f(x l ). Since f is convex, λ [0, 1], f(λˆx + (1 λ)x l ) λf(ˆx) + (1 λ)f(x l ) < λf(x l ) + (1 λ)f(x l ) = f(x l ) For λ > 0 and very small λˆx + (1 λ)x l S N ɛ (x l ). But this contradicts f(x) f(x l ) x S N ɛ (x l ). Q. E. D. January 27, 2003 Stochastic Programming Lecture 5 Slide 9

Starting Simple Optimizing 1-D functions Consider optimizing the following function (for a scalar variable x R 1 ): z = min f(x) Call an optimal solution to this problem x. (x = arg min f(x)). What is necessary for a point x to be an optimal solution? f (x) = 0 Ex. f(x) = (x 1) 2 f (x) = 2(x 1) = 0 x = 1 January 27, 2003 Stochastic Programming Lecture 5 Slide 10

f(x) = (x 1) 2 9 (x-1)**2 8 7 6 5 4 3 2 1 0-2 -1.5-1 -0.5 0 0.5 1 1.5 2 January 27, 2003 Stochastic Programming Lecture 5 Slide 11

Is That All We Need? Is f (x) = 0 also sufficient for x to be a (locally) optimal solution? Ex. f(x) = 1 (x 1) 2 f (x) = 2(x 1) = 0 x = 1 January 27, 2003 Stochastic Programming Lecture 5 Slide 12

f(x) = 1 (x 1) 2 1 1 - (x-1)**2 0-1 -2-3 -4-5 -6-7 -8-2 -1.5-1 -0.5 0 0.5 1 1.5 2 January 27, 2003 Stochastic Programming Lecture 5 Slide 13

Obviously Not Since x = 1 is a local minimum of f(x) = 1 (x 1) 2, the f (x) = 0 condition is obviously not all we need to ensure that we get a local minimum? What is the sufficient condition for a point ˆx to be (locally) optimal? f (ˆx) > 0! This is equivalent to saying that f(x) is convex at ˆx.? Who has heard of the following terms? Hessian Matrix? Positive (Semi)-definite? If f(x) is convex for all x, then (from the previous Thm.) any local minimum is also a global minimum. January 27, 2003 Stochastic Programming Lecture 5 Slide 14

(1-D) Constrained Optimization Now we consider the following problem for scalar variable x R 1. z = min 0 x u f(x) There are three cases for where an optimal solution might be x = 0 0 < x < u x = u January 27, 2003 Stochastic Programming Lecture 5 Slide 15

Breaking it down If 0 < x < u, then the necessary and sufficient conditions for optimality are the same as the unconstrained case If x = 0, then we need f (x) 0 (necessary), f > 0 (sufficient) If x = u, then we need f (x) 0 (necessary), f > 0 (sufficient) January 27, 2003 Stochastic Programming Lecture 5 Slide 16

KKT Conditions How do these conditions generalize to optimization problems with more than one variable? The intuition if a constraint holds with equality (is binding), then the gradient of the objective function must be pointing in a way that would improve the objective. Formally The negative gradient of the objective function must be a linear combination of the gradients of the binding constraints. The KKT stands for Karush-Kuhn-Tucker. Story Time! Remember the Optimality Conditions from linear programming? These are just the KKT conditions! January 27, 2003 Stochastic Programming Lecture 5 Slide 17

Example (x R 2 ) minimize x 1 + x 2 subject to x 2 1 + x 2 2 2 x 2 0 You see at the optimal solution x = ( 2, 0), January 27, 2003 Stochastic Programming Lecture 5 Slide 18

The Canonical Problem minimize subject to f(x) g 1 (x) b 1 g 2 (x) b 2. g m (x) b m x 1 0 x 2 0. x n 0 January 27, 2003 Stochastic Programming Lecture 5 Slide 19

KKT Conditions Geometrically, if (ˆx) is an optimal solution, then we must be able to write f(ˆx) as a nonnegative linear combination of the binding constraints. If a constraint is not binding, it s weight must be 0. m f(ˆx) = λ i g i (ˆx) µ i=1 λ i = 0 if g i (ˆx) < b ( i) µ j = 0 if ˆx j > 0 ( j) January 27, 2003 Stochastic Programming Lecture 5 Slide 20

KKT Conditions If ˆx is an optimal solution to P, then there exists multipliers λ 1, λ 2,..., λ m, µ 1, µ 2,..., µ n that satisfy the following conditions: f(ˆx) x j m i=1 g i (ˆx) b i i = 1, 2,..., m x i 0 j = 1, 2,..., n λ i g(ˆx) x j + µ j = 0 j = 1, 2,... n λ i 0 i = 1, 2,..., m µ j 0 j = 1, 2,..., n λ i (b i g i (ˆx)) = 0 i = 1, 2,..., m µ j x j = 0 j = 1, 2,... n January 27, 2003 Stochastic Programming Lecture 5 Slide 21

Returning to example minimize x 1 + x 2 subject to x 2 1 + x 2 2 2 (λ) x 2 0 (µ) January 27, 2003 Stochastic Programming Lecture 5 Slide 22

KKT Conditions Primal Feasible: x 2 1 + x 2 2 2 x 2 0 Dual Feasible: λ 0 µ 0 1 = λ(2x 1 ) 1 = λ(2x 2 ) µ January 27, 2003 Stochastic Programming Lecture 5 Slide 23

KKT Conditions, cont. Complementary Slackness: λ(2 x 2 1 x 2 2) = 0 µx 2 = 0 January 27, 2003 Stochastic Programming Lecture 5 Slide 24

Generalizing to Nondifferentiable Functions In full generality, this would require some fairly heavy duty convex analysis. Convex analysis is a great subject, you should all study it! Instead, I first want to show that when passing to nondifferentiable functions, we would replace f(x) = 0 with 0 f(x). I am sorry for all the theorems, but all little more math never hurt anyone. (At least as far as I know). January 27, 2003 Stochastic Programming Lecture 5 Slide 25

Theorem Let f : R n R be a convex function and let S be a nonempty convex set. ˆx = arg min x S f(x) if (and only if) η is a subgradient of f at ˆx such that η T (x ˆx) 0 x S Proof. (Duh!) I will prove only the very, very easy direction. If η is a subgradient of f at ˆx such that η T (x ˆx) 0 x S, f(x) f(ˆx) + η T (x ˆx) f(ˆx) x S. So ˆx = arg min x S f(x) Q.E.D. January 27, 2003 Stochastic Programming Lecture 5 Slide 26

Theorem Let f : R n R be a convex function. ˆx = arg min x R n f(x) if (and only if) 0 f(ˆx). Proof. ˆx = arg min x R n f(x) if and only if ( ) η is a subgradient where η T (x ˆx) 0 x R n. Choose x = ˆx η. η T (ˆx η ˆx) = η T η 0. This can only happen when η = 0 ( η 2 i = 0 η i = 0 i). So η = 0 f(ˆx) Q.E.D. January 27, 2003 Stochastic Programming Lecture 5 Slide 27

Now in Full Generality Thm: For a convex function f : R n R, and convex functions g i : Re n R, i = 1, 2,... m, if we have some nice regularity conditions (which you should assume we have unless I tell you otherwise), ˆx is an optimal solution to min{f(x) : g i (x) 0 i = 1, 2,... m} if and only if the following conditions hold: g i (x) 0 i = 1, 2,... m λ 1, λ 2,... λ m R such that 0 f(ˆx) + m i=1 λ i g i (ˆx). λ i 0 i = 1, 2,... m λ i g i (ˆx) = 0 i = 1, 2,... m January 27, 2003 Stochastic Programming Lecture 5 Slide 28

Daddy Has Big Plans MIT costs $39,060/year right now In 10 years, when Jacob is ready for MIT, it will cost > $80000/year. (YIKES!) Let s design a stochastic programming problem to help us out. In Y years, we would like to reach a tuition goal of G. We will assume that Helen and I rebalance our portfolio every v years, so that there are T = Y/v times when we need to make a decision about what to buy. There are T periods in our stochastic programming problem. January 27, 2003 Stochastic Programming Lecture 5 Slide 29

Details We are given a universe N of investment decisions We have a set T = {1, 2,... T } of investment periods Let ω it, i N, t T be the return of investment i N in period t T. If we exceed our goal G, we get an interest rate of q that Helen and I can enjoy in our golden years If we don t meet the goal of G, Helen and I will have to borrow money at a rate of r so that Jacob can go to MIT. We have $b now. January 27, 2003 Stochastic Programming Lecture 5 Slide 30

Variables x it, i N, t T : Amount of money to invest in vehicle i during period t y : Excess money at the end of horizon w : Shortage in money at the end of the horizon January 27, 2003 Stochastic Programming Lecture 5 Slide 31

(Deterministic) Formulation maximize subject to i N qy + rw x i1 = b i N x it t T \ 1 ω it x i,t 1 i N = i N ω it x it y + w = G x it 0 i N, t T y, w 0 January 27, 2003 Stochastic Programming Lecture 5 Slide 32

Random returns As evidenced by our recent performance, my wife and I are bad at picking stocks. In our defense, returns on investments are random variables. Imagine that for each there are a number of potential outcomes R for the returns at each time t. January 27, 2003 Stochastic Programming Lecture 5 Slide 33

Scenarios The scenarios consist of all possible sequences of outcomes. Ex. Imagine R = 4 and T = 3. The the scenarios would be... t = 1 t = 2 t = 3 1 1 1 1 1 2 1 1 3 1 1 4 1 2 1.. 4 4 4 January 27, 2003 Stochastic Programming Lecture 5 Slide 34

Making it Stochastic x its, i N, t T, s S: Amount of money to invest in vehicle i during period t in scenario s y s : Excess money at the end of horizon in scenario s w s : Shortage in money at the end of the horizon in scenario s Note that the (random) return ω it now is like a function of the scenario s. It depends on the mapping of the scenarios to the scenario tree. January 27, 2003 Stochastic Programming Lecture 5 Slide 35

A Stochastic Version maximize qy s + rw s subject to x i1 = b i N x its ω its x i,t 1,s = i N i N ω it x it s y s + w s = G s S i N t T \ 1, s S x its 0 i N, t T, s S y s, w s 0 s S January 27, 2003 Stochastic Programming Lecture 5 Slide 36

Next time? Is this correct? Answer the question above... Writing the deterministic equivalent of multistage problems (Maybe) one more modeling example Properties of the recourse function. (Starting BL 3.1) January 27, 2003 Stochastic Programming Lecture 5 Slide 37