Non-Linear Optimization

Similar documents
Algorithms for Constrained Optimization

Lecture V. Numerical Optimization

Convex Optimization. Problem set 2. Due Monday April 26th

Quasi-Newton Methods

Lecture 14: October 17

3E4: Modelling Choice. Introduction to nonlinear programming. Announcements

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

2. Quasi-Newton methods

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Mathematical optimization

Optimization II: Unconstrained Multivariable

Selected Topics in Optimization. Some slides borrowed from

Statistical Computing (36-350)

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Optimization and Root Finding. Kurt Hornik

Markov Chains. Chapter 16. Markov Chains - 1

Constrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

An Algorithm for Unconstrained Quadratically Penalized Convex Optimization (post conference version)

5 Quasi-Newton Methods

MATH 56A SPRING 2008 STOCHASTIC PROCESSES

Part 1. The Review of Linear Programming

Convex Optimization CMU-10725

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

EAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science

Optimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23

Static (or Simultaneous- Move) Games of Complete Information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Linear Models for Regression CS534

Nonlinear Optimization: What s important?

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

1. If X has density. cx 3 e x ), 0 x < 0, otherwise. Find the value of c that makes f a probability density. f(x) =

Optimization Methods

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Linear Models for Regression CS534

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

NonlinearOptimization

Multidisciplinary System Design Optimization (MSDO)

Unconstrained minimization of smooth functions

AM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods

Handout 1: Introduction to Dynamic Programming. 1 Dynamic Programming: Introduction and Examples

Formulation Problems: Possible Answers

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

T 1. The value function v(x) is the expected net gain when using the optimal stopping time starting at state x:

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Numerical Optimization. Review: Unconstrained Optimization

Statistics and Econometrics I

Data Mining (Mineria de Dades)

Linear Models for Regression CS534

Lecture 18: Constrained & Penalized Optimization

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Chapter 2: Discrete Distributions. 2.1 Random Variables of the Discrete Type

Discussion 03 Solutions

Image restoration. An example in astronomy

4y Springer NONLINEAR INTEGER PROGRAMMING

Bindel, Fall 2011 Intro to Scientific Computing (CS 3220) Week 6: Monday, Mar 7. e k+1 = 1 f (ξ k ) 2 f (x k ) e2 k.

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Nonlinear Programming (Hillier, Lieberman Chapter 13) CHEM-E7155 Production Planning and Control

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

STAT Advanced Bayesian Inference

Lecture 18: November Review on Primal-dual interior-poit methods

minimize x subject to (x 2)(x 4) u,

Computational Optimization. Augmented Lagrangian NW 17.3

Part A. Ch (a) Thus, order quantity is 39-12=27. (b) Now b=5. Thus, order quantity is 29-12=17

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

OPER 627: Nonlinear Optimization Lecture 14: Mid-term Review

Penalty and Barrier Methods. So we again build on our unconstrained algorithms, but in a different way.

Buyer - Vendor incentive inventory model with fixed lifetime product with fixed and linear back orders

ORIE 6326: Convex Optimization. Quasi-Newton Methods

AM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods

A Dynamic Model for Investment Strategy

Week 04 Discussion. a) What is the probability that of those selected for the in-depth interview 4 liked the new flavor and 1 did not?

Business Statistics PROBABILITY DISTRIBUTIONS

MATH 4211/6211 Optimization Quasi-Newton Method

Free (Ad)vice. Matt Mitchell. July 20, University of Toronto

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Optimization II: Unconstrained Multivariable

A Summary of Economic Methodology

Lecture 1: Supervised Learning

Primal/Dual Decomposition Methods

Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Support vector machines Lecture 4

Support Vector Machines

JRF (Quality, Reliability and Operations Research): 2013 INDIAN STATISTICAL INSTITUTE INSTRUCTIONS

The Dual Simplex Algorithm

Chapter 2. Optimization. Gradients, convexity, and ALS

IE 5531: Engineering Optimization I

1 Basic continuous random variable problems

Chapter 4. Unconstrained optimization

Today we ll discuss ways to learn how to think about events that are influenced by chance.

Nonlinear Programming

Optimization Methods for Machine Learning

Numerical optimization. Numerical optimization. Longest Shortest where Maximal Minimal. Fastest. Largest. Optimization problems

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)

Unconstrained Multivariate Optimization

Dan Roth 461C, 3401 Walnut

Suggested solutions for the exam in SF2863 Systems Engineering. December 19,

Transcription:

Non-Linear Optimization Distinguishing Features Common Examples EOQ Balancing Risks Minimizing Risk 15.057 Spring 03 Vande Vate 1

Hierarchy of Models Network Flows Linear Programs Mixed Integer Linear Programs 15.057 Spring 03 Vande Vate 2

A More Academic View Mixed Integer Linear Programs Non-Convex Optimization Network Flows Linear Programs Convex Optimization 15.057 Spring 03 Vande Vate 3

A More Academic View Integer Models Non-Convex Optimization Networks & Linear Models Convex Optimization 15.057 Spring 03 Vande Vate 4

Convexity The Distinguishing Feature Separates Hard from Easy Convex Combination Weighted Average Non-negative weights Weights sum to 1 15.057 Spring 03 Vande Vate 5

Convex Functions The function lies below the line 800 700 600 500 400 300 Examples? Convex Function Convex combinations of the values 200 100 0 0 5 10 15 20 25 30 15.057 Spring 03 Vande Vate 6

What s Easy Find the minimum of a Convex Function A local minimum is a global minimum 15.057 Spring 03 Vande Vate 7

Convex Set A set S is CONVEX if every convex combination of points in S is also in S The set of points above a convex function Convex Function 800 700 600 500 400 300 200 100 0 0 5 10 15 20 25 30 15.057 Spring 03 Vande Vate 8

What s Easy Find the minimum of a Convex Function over (subject to) a Convex Set 15.057 Spring 03 Vande Vate 9

Concave Function Concave Function The function lies ABOVE the line 250 200 150 100 Examples? 50 0 5 10 15 20 25 30 15.057 Spring 03 Vande Vate 10

What s Easy Find the maximum of a Concave Function over (subject to) a Convex Set. 15.057 Spring 03 Vande Vate 11

Academic Questions Is a linear function convex or concave? Do the feasible solutions of a linear program form a convex set? Do the feasible solutions of an integer program form a convex set? 15.057 Spring 03 Vande Vate 12

Ugly - Hard Cost Volume 15.057 Spring 03 Vande Vate 13

Integer Programming is Hard 3 2 Coils Limit Production Capacity Why? 1 Coils Bands Limit 1 2 3 4 Bands 15.057 Spring 03 Vande Vate 14

Review Convex Optimization Convex (min) or Concave (max) objective Convex feasible region Non-Convex Optimization Stochastic Optimization Incorporates Randomness 15.057 Spring 03 Vande Vate 15

Agenda Convex Optimization Unconstrained Optimization Constrained Optimization Non-Convex Optimization Convexification Heuristics 15.057 Spring 03 Vande Vate 16

Convex Optimization Unconstrained Optimization If the partial derivatives exist (smooth) find a point where the gradient is 0 Otherwise (not smooth) find point where 0 is a subgradient 15.057 Spring 03 Vande Vate 17

Unconstrained Convex Optimization Smooth Find a point where the Gradient is 0 Find a solution to f(x) = 0 Analytically (when possible) Iteratively otherwise 15.057 Spring 03 Vande Vate 18

Solving f(x) = 0 Newton s Method Approximate using gradient f(y) f(x) + ½(y-x) t H x (y-x) Computing next iterate involves inverting H x Quasi-Newton Methods Approximate H and update the approximation so we can easily update the inverse (BFGS) Broyden, Fletcher, Goldfarb, Shanno 15.057 Spring 03 Vande Vate 19

Line Search Newton/Quasi-Newton Methods yield direction to next iterate 1-dimensional search in this direction Several methods 15.057 Spring 03 Vande Vate 20

Unconstrained Convex Optimization Non-smooth Subgradient Optimization Find a point where 0 is a subgradient 15.057 Spring 03 Vande Vate 21

What s a Subgradient Like a gradient f(y) f(x) +γ x (y-x) f(x) is a minimum point f(y) = f(x) - 2(y-x) x f(y) = f(x) + (y-x) 1 γ x -2 a 0 is a subgradient if and only if... 15.057 Spring 03 Vande Vate 22

Steepest Descent If 0 is not a subgradient at x, subgradient indicates where to go Direction of steepest descent Find the best point in that direction line search 15.057 Spring 03 Vande Vate 23

Examples EOQ Model Balancing Risk Minimizing Risk 15.057 Spring 03 Vande Vate 24

EOQ How large should each order be Trade-off Cost of Inventory (known) Cost of transactions (what?) Larger orders Higher Inventory Cost Lower Ordering Costs 15.057 Spring 03 Vande Vate 25

The Idea Increase the order size until the incremental cost of holding the last item equals the incremental savings in ordering costs If the costs exceed the savings? If the savings exceed the costs? 15.057 Spring 03 Vande Vate 26

Modeling Costs Q is the order quantity Average inventory level is Q/2 h*c is the Inv. Cost. in $/unit/year Total Inventory Cost h*c*q/2 Last item contributes what to inventory cost? h*c/2 15.057 Spring 03 Vande Vate 27

Modeling Costs D is the annual demand How many orders do we place? D/Q Transaction cost is A per transaction Total Transaction Cost AD/Q 15.057 Spring 03 Vande Vate 28

Total Cost Total Cost = h*cq/2 + AD/Q Total Cost 120 100 What kind of function? 80 60 40 20 0 0 10 20 30 40 50 60 70 Order Quantity 15.057 Spring 03 Vande Vate 29

Incremental Savings What does the last item save? Savings of Last Item AD/(Q-1) - AD/Q [ADQ - AD(Q-1)]/[Q(Q-1)] ~ AD/Q 2 Order up to the point that extra carrying costs match incremental savings h*c/2 = AD/Q 2 Q 2 = 2AD/(h*c) Q = 2AD/(h*c) 15.057 Spring 03 Vande Vate 30

Key Assumptions? Known constant rate of demand 15.057 Spring 03 Vande Vate 31

Value? No one can agree on the ordering cost Each value of the ordering cost implies A value of Q from which we get An inventory investment c*q/2 A number of orders per year: D/Q Trace the balance for each value of ordering costs 15.057 Spring 03 Vande Vate 32

The EOQ Trade off Known values Annual Demand D Product value c Inventory carrying percentage h Unknown transaction cost A For each value of A Calculate Q = 2AD/(h*c) Calculate Inventory Investment cq/2 Calculate Annual Orders D/Q 15.057 Spring 03 Vande Vate 33

The Tradeoff Benchmark EOQ Trade off $3,000 Inventory Investment $2,500 $2,000 $1,500 $1,000 Where are you? Where can you be? But wait! What prevents getting there? $500 $0 0 50 100 150 200 250 300 Orders Per Year 15.057 Spring 03 Vande Vate 34

Balancing Risks 15.057 Spring 03 Vande Vate 35

Variability Some events are inherently variable When customers arrive How many customers arrive Transit times Daily usage Stock Prices... Hard to predict exactly Dice Lotteries 15.057 Spring 03 Vande Vate 36

Random Variables Examples Outcome of rolling a dice Closing Stock price Daily usage Time between customer arrivals Transit time Seasonal Demand 15.057 Spring 03 Vande Vate 37

Distribution The values of a random variable and their frequencies Example: Rolling 2 Fair Die 34 33 43 44 32 42 52 53 54 22 23 24 25 35 45 55 21 31 41 51 61 62 63 64 65 11 12 13 14 15 16 26 36 46 56 66 Number of Outcomes 1 2 3 4 5 6 5 4 3 2 1 Fraction of Outcomes 0.028 0.056 0.083 0.111 0.139 0.167 0.139 0.111 0.083 0.056 0.028 Value 2 3 4 5 6 7 8 9 10 11 12 15.057 Spring 03 Vande Vate 38

Theoretical vs Empirical Empirical Distribution Based on observations Value 2 3 4 5 6 7 8 9 10 11 12 Number of Outcomes 1 2 1 5 3 9 8 3 3 1 - Fraction of Outcomes 0.03 0.06 0.03 0.14 0.08 0.25 0.22 0.08 0.08 0.03 - Theoretical Distribution Based on a model Value 2 3 4 5 6 7 8 9 10 11 12 Fraction of Outcomes 0.03 0.06 0.08 0.11 0.14 0.17 0.14 0.11 0.08 0.06 0.03 15.057 Spring 03 Vande Vate 39

Empirical vs Theoretical One Perspective: If the die are fair and we roll many many times, empirical should match theoretical. Another Perspective: If the die are reasonably fair, the theoretical is close and saves the trouble of rolling. 15.057 Spring 03 Vande Vate 40

Empirical vs Theoretical The Empirical Distribution is flawed because it relies on limited observations The Theoretical Distribution is flawed because it necessarily ignores details about reality Exactitude? It s random. 15.057 Spring 03 Vande Vate 41

Continuous vs Discrete Discrete Value of dice Number of units sold Continuous Essentially, if we measure it, it s discrete Theoretical convenience 15.057 Spring 03 Vande Vate 42

Probability Discrete: What s the probability we roll a 12 with two fair die: 1/36 Continuous: What s the probability the temperature will be exactly 72.00 o F tomorrow at noon EST? Zero! Events: What s the probability that the temperature will be at least 72 o F tomorrow at noon EST? 15.057 Spring 03 Vande Vate 43

Continuous Distribution Standard Normal Distribution 0.45 0.4 0.35 0.3 0.25 0.2 Probability the random variable is greater than 2 is the area under the curve above 2 0.15 0.1 0.05 0-10 -8-6 -4-2 0 2 4 6 8 10 15.057 Spring 03 Vande Vate 44

Total Probability Empirical, Theoretical, Continuous, Discrete, Probability is between 0 and 1 Total Probability (over all possible outcomes) is 1 15.057 Spring 03 Vande Vate 45

Summary Stats The Mean Weights each outcome by its probability AKA Expected Value Average May not even be possible Example: Win $1 on Heads, nothing on Tails 15.057 Spring 03 Vande Vate 46

Summary Stats The Variance Measures spread about the mean How unpredictable is the thing Nomal Distributions with Different Variances 0.45 0.4 0.35 Variance 1 0.3 0.25 Which would you rather manage? 0.2 0.15 0.1 Variance 9 0.05 0-10 -8-6 -4-2 0 2 4 6 8 10 15.057 Spring 03 Vande Vate 47

Variance Nomal Distributions with Different Variances 0.45 0.4 0.35 0.3 0.25 Variance 1 0.2 0.15 0.1 Variance 9 0.05 0-10 -8-6 -4-2 0 2 4 6 8 10 15.057 Spring 03 Vande Vate 48

Std. Deviation Variance is measured in units squared Think sum of squared errors Standard Deviation is the square root It s measured in the same units as the random variable The two rise and fall together Coefficient of Variation Standard Deviation/Mean Spread relative to the Average 15.057 Spring 03 Vande Vate 49

Balancing Risk Basic Insight Bet on the outcome of a variable process Choose a value You pay $0.5/unit for the amount your bet exceeds the outcome You earn the smaller of your value and the outcome Question: What value do you choose? 15.057 Spring 03 Vande Vate 50

Similar to... Anything you are familiar with? 15.057 Spring 03 Vande Vate 51

The Distribution Distribution 0.45 0.4 0.35 Mean 5 Std. Dev. 1 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12 15.057 Spring 03 Vande Vate 52

The Idea Balance the risks Look at the last item What did it promise? What risk did it pose? If Promise is greater than the risk? If the Risk is greater than the promise? 15.057 Spring 03 Vande Vate 53

Measuring Risk and Return Revenue from the last item $1 if the Outcome is greater,$0 otherwise Expected Revenue $1*Probability Outcome is greater than our choice Risk posed by last item $0.5 if the Outcome is smaller, $0 otherwise Expected Risk $0.5*Probability Outcome is smaller than our choice 15.057 Spring 03 Vande Vate 54

Balancing Risk and Reward Expected Revenue $1*Probability Outcome is greater than our choice Expected Risk $0.5*Probability Outcome is smaller than our choice How are probabilities Related? 15.057 Spring 03 Vande Vate 55

Risk & Reward Distribution 0.45 0.4 0.35 Prob. Outcome is smaller Our choice How are they related? Prob. Outcome is larger 0.2 0.15 0.1 0.05 0 0 2 4 6 8 10 12 15.057 Spring 03 Vande Vate 56

Balance Expected Revenue $1*(1- Probability Outcome is smaller than our choice) Expected Risk $0.5*Probability Outcome is smaller than our choice Set these equal 1*(1-P) = 0.5*P 1 = 1.5*P 2/3 = P = Probability Outcome is smaller than our choice 15.057 Spring 03 Vande Vate 57

Making the Choice Distribution 0.45 0.4 0.35 Prob. Outcome is smaller Our choice 0.3 0.25 0.2 0.15 0.1 2/3 Cumulative Probability 0.05 0 0 2 4 6 8 10 12 15.057 Spring 03 Vande Vate 58

Constrained Optimization Feasible Direction techniques Eliminating constraints Implicit Function Penalty Methods Duality 15.057 Spring 03 Vande Vate 59

Feasible Directions Unconstrained Optimization X Start at a point: x 0 Identify an^ improving direction: d Feasible Find a best ^ solution in direction d: x + εd Feasible Repeat A Feasible direction: one you can move in A Feasible solution: don t move too far. Typically for Convex feasible region 15.057 Spring 03 Vande Vate 60

Constrained Optimization Penalty Methods Move constraints to objective with penalties or barriers As solution approaches the constraint the penalty increases Example: min f(x) => min f(x) + t/(3x - x 2 ) s.t. x 2 3x as x 2 approaches 3x, penalty increases rapidly 15.057 Spring 03 Vande Vate 61

Relatively reliable tools for Quadratic objective Linear constraints Continuous variables 15.057 Spring 03 Vande Vate 62

Summary Easy Problems Convex Minimization Concave Maximization Unconstrained Optimization Local gradient information Constrained problems Tricks for reducing to unconstrained or simply constrained problems NLP tools practical only for smaller problems 15.057 Spring 03 Vande Vate 63