Computational Stochastic Programming
|
|
- Edmund Todd
- 6 years ago
- Views:
Transcription
1 Computational Stochastic Programming Jeff Linderoth Dept. of ISyE Dept. of CS Univ. of Wisconsin-Madison SPXIII Bergamo, Italy July 7, 2013 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 89
2 Mission Impossible! It is impossible to discuss all of computational SP in 90 minutes. I will focus on a few basic topics, and (try to) provide references for a few others. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 2 / 89
3 Bias Bias The bias of an estimator is the difference between this estimator s expected value E[^θ] and the true value of the parameter θ. An estimator is unbiased if E[^θ θ] = 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 3 / 89
4 Bias Bias The bias of an estimator is the difference between this estimator s expected value E[^θ] and the true value of the parameter θ. An estimator is unbiased if E[^θ θ] = 0 I Am Biased! But this lecture is also extremely biased, in the English sense of the word. It will cover best things that I know most about There is lots of great work in computational SP that I (unfortunately) won t mention. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 3 / 89
5 What I WILL cover Stochastic LP w/recourse (Primarily 2-Stage) Decomposition. Benders Decomposition Lagrangian Relaxation Dual Decomposition Stochastic approximation Modern/Bundle-type methods. Trust region methods Regularized Decomposition The level method Multistage Extensions Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 4 / 89
6 What I WILL cover Stochastic LP w/recourse (Primarily 2-Stage) Decomposition. Benders Decomposition Lagrangian Relaxation Dual Decomposition Stochastic approximation Modern/Bundle-type methods. Trust region methods Regularized Decomposition The level method Multistage Extensions Software Tools SMPS format Some available software tools for modeling and solving Role of parallel computing Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 4 / 89
7 Other Things Covered Sampling Exterior Sampling Methods Sample Average Approximation Interior Sampling Methods Stochastic Quasi-Gradient Stochastic Decomposition Mirror-Prox Methods Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 5 / 89
8 Other Things Covered Sampling Exterior Sampling Methods Sample Average Approximation Interior Sampling Methods Stochastic Quasi-Gradient Stochastic Decomposition Mirror-Prox Methods Stochastic Integer Programming Integer L-Shaped Dual Decomposition Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 5 / 89
9 Stochastic Programming A Stochastic Program def min f(x) = E ω [F(x, ω)] x X Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 6 / 89
10 Stochastic Programming A Stochastic Program def min f(x) = E ω [F(x, ω)] x X 2 Stage Stochastic LP w/recourse The Recourse Problem F(x, ω) def = c T x + Q(x, ω) c T x: Pay me now Q(x, ω): Pay me later Q(x, ω) def = min q(ω) T y W(ω)y = h(ω) T(ω)x y 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 6 / 89
11 Stochastic Programming A Stochastic Program def min f(x) = E ω [F(x, ω)] x X 2 Stage Stochastic LP w/recourse The Recourse Problem F(x, ω) def = c T x + Q(x, ω) c T x: Pay me now Q(x, ω): Pay me later Q(x, ω) def = min q(ω) T y W(ω)y = h(ω) T(ω)x y 0 E[F(x, ω)] = c T x + E[Q(x, ω)] def = c T x + φ(x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 6 / 89
12 Decomposition Algorithms Two Ways of Thinking Algorithms are equivalent regardless of how you think about them. But thinking in different ways gives different insights Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 7 / 89
13 Decomposition Algorithms Two Ways of Thinking Algorithms are equivalent regardless of how you think about them. But thinking in different ways gives different insights Complementary Viewpoints 1 As a large-scale problem for which you will apply decomposition techniques 2 As a oracle convex optimization problem Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 7 / 89
14 Decomposition: A Popular Method M E T H O D Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 8 / 89
15 Extensive Form Large Scale = Extensive Form This is sometimes called the deterministic equivalent, but I prefer the term extensive form Assume Ω = {ω 1, ω 2,... ω S } R r, P(ω = ω s ) = p s, s = 1, 2,..., S T s def = T(ω s ), h s def = h(ω s ), q s def = q(ω s ), W S = W(ω s ) Then can the write extensive form as just a large LP: Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 9 / 89
16 Extensive Form Large Scale = Extensive Form This is sometimes called the deterministic equivalent, but I prefer the term extensive form Assume Ω = {ω 1, ω 2,... ω S } R r, P(ω = ω s ) = p s, s = 1, 2,..., S T s def = T(ω s ), h s def = h(ω s ), q s def = q(ω s ), W S = W(ω s ) Then can the write extensive form as just a large LP: c T x + p 1 q T 1 y 1 + p 2 q T 2 y p s q T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 9 / 89
17 Extensive Form Small SP s are Easy! Cplex/Extensive Form 50 Time L shaped number of scenarios Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 10 / 89
18 Extensive Form Small SP s are Easy! Cplex/Extensive Form 50 Time L shaped number of scenarios In my experience, using barrier/interior point method is faster than simplex/pivoting-based methods for solving extensive form LPs. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 10 / 89
19 Extensive Form The Upshot If it is too large to solve directly, then we must exploit the structure. If I fix the first stage variables x, then the problem decomposes by scenario Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 11 / 89
20 Extensive Form The Upshot If it is too large to solve directly, then we must exploit the structure. If I fix the first stage variables x, then the problem decomposes by scenario c T x + p 1 q T 1 y 1 + p 2 q T 2 y p s q T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 11 / 89
21 Extensive Form The Upshot If it is too large to solve directly, then we must exploit the structure. If I fix the first stage variables x, then the problem decomposes by scenario c T x + p 1 q T 1 y 1 + p 2 q T 2 y p s q T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Key Idea Benders Decomposition: Characterize the solution of a scenario linear program as a function of first stage solution x Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 11 / 89
22 Extensive Form Benders of Extensive Form z LP (x) = S s=1 zs LP (x), where z s LP (x) = inf{qt s y W s y = h s T s x, y 0} The dual of the LP defining z s LP (x) is sup{(h s T s x) T π W T s π q s }. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 12 / 89
23 Extensive Form Benders of Extensive Form z LP (x) = S s=1 zs LP (x), where z s LP (x) = inf{qt s y W s y = h s T s x, y 0} The dual of the LP defining z s LP (x) is sup{(h s T s x) T π W T s π q s }. Set of dual feasible solutions for scenario s: Π s = {π R m W T s π q s } Vertices of Π s : Extreme rays of Π s : V(Π s ) = {v 1s, v 2s,..., v Vs,s} R(Π s ) = {r 1s, r 2s,..., r Rs,s} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 12 / 89
24 Extensive Form (In Case You Forgot...) Minkowski s Theorem There are two ways to describe polyhedra. As an intersection of halfspaces (by its facets), or by appropriate combinations of its extreme points and extreme rays Minkowski-Weyl Theorem Let P = {x R n Ax b} have extreme points V(P) = {v 1, v 2,... v V } and extreme rays R(P) = {r 1, r 2,... r R }, then P = { x R n x = V j=1 V λ j v j + j=1 R µ j r j j=1 } λ j = 1, λ j 0 j = 1,... V, µ j 0 j = 1,... R Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 13 / 89
25 Extensive Form A Little LP Theory We assume that z s LP (x) is bounded below. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 14 / 89
26 Extensive Form A Little LP Theory We assume that z s LP (x) is bounded below. The LP is feasible (z s LP (x) < + ) if there is not a direction of unboundedness in the dual LP. r R(Π s ) (Ws T r 0) such that (h s T s x) T r > 0 So x is a feasible solution (z s LP (x) < + ) if (h s T s x) T r 0 r = 1,..., R s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 14 / 89
27 Extensive Form A Little LP Theory We assume that z s LP (x) is bounded below. The LP is feasible (z s LP (x) < + ) if there is not a direction of unboundedness in the dual LP. r R(Π s ) (Ws T r 0) such that (h s T s x) T r > 0 So x is a feasible solution (z s LP (x) < + ) if (h s T s x) T r 0 r = 1,..., R s If there is an optimal solution to an LP, then there is an optimal solution that occurs at an extreme point. By strong duality, the optimal solution to primal and dual LPs will have same objective value, so z s LP (x) = max {(h s T s x) T v js j=1,2,...v S r T ks (h s T s x) 0 k = 1, 2,... R s } Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 14 / 89
28 Extensive Form Developing Benders Decomposition Scenario s second stage feasibility set: C s def = {x y 0 with W s y = h s T s x} = {x h s T s x pos(w s )} First stage feasibility set X def = {x R n + Ax = b} Second stage feasibility set: C def = S s=1 C s (2SP) def min f(x) = c T x + x X C S p s Q(x, ω s ) s=1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 15 / 89
29 Extensive Form Developing Benders Decomposition Scenario s second stage feasibility set: C s def = {x y 0 with W s y = h s T s x} = {x h s T s x pos(w s )} First stage feasibility set X def = {x R n + Ax = b} Second stage feasibility set: C def = S s=1 C s (2SP) def min f(x) = c T x + x X C S p s Q(x, ω s ) s=1 x C s (h s T s x) T r js 0 j = 1,... R s θ s Q(x, ω s ) θ s (h s T s x) T v js j = 1,... V s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 15 / 89
30 Extensive Form Benders-LShaped Use these results Introduce auxiliary variables θ s to represent the value of Q(x, ω s ) N.B. I am changing notation just a little bit Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 16 / 89
31 Extensive Form Benders-LShaped Use these results Introduce auxiliary variables θ s to represent the value of Q(x, ω s ) N.B. I am changing notation just a little bit Unaggregated: Full Multicut min c T x + s S p s θ s r T T s x r T h s s S, r R(Π s ) θ s + v T T s x v T h s s S, v V(Π s ) Ax = b x 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 16 / 89
32 Extensive Form LShaped Method. Aggregate inequalities and remove variables θ s for each scenario. Instead introduce variable: Θ s S p sθ s p s (v T s h s v T s T s x) (choosing any v s V(Π s ). Fully-Aggregated: LShaped min c T x + Θ r T T s x r T h s s S, r R(Π s ) Θ + p s v T T s x p s v T h s v V(Π s ) s S s S Ax = b x 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 17 / 89
33 Extensive Form LShaped Method. Aggregate inequalities and remove variables θ s for each scenario. Instead introduce variable: Θ s S p sθ s p s (v T s h s v T s T s x) (choosing any v s V(Π s ). Fully-Aggregated: LShaped min c T x + Θ r T T s x r T h s s S, r R(Π s ) Θ + p s v T T s x p s v T h s v V(Π s ) s S s S Ax = b x 0 N.B. Different aggregations are possible Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 17 / 89
34 Extensive Form A Whole Spectrum Complete Aggregation (Traditional LShaped): One variable for φ(x) Complete Multicut: S variables for φ(x) We can do anything in between... Partition the scenarios into C clusters S 1, S 2,... S C. φ [Sk ](x) = s S k p s Q(x, ω s ) Θ [Sk ] s S k p s Q(x, ω s ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 18 / 89
35 Extensive Form A Whole Spectrum Complete Aggregation (Traditional LShaped): One variable for φ(x) Complete Multicut: S variables for φ(x) We can do anything in between... Partition the scenarios into C clusters S 1, S 2,... S C. φ [Sk ](x) = s S k p s Q(x, ω s ) Θ [Sk ] s S k p s Q(x, ω s ) References Original multicut paper: [Birge and Louveaux, 1988] You need not stay with one fixed aggregation: Recent paper by Trukhanov et al. [2010] Ph.D. thesis by Janjarassuk [2009]. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 18 / 89
36 Extensive Form Benders Decomposition Regardless of aggregation, linear program is likely to have exponentially many constraints. Benders Decomosition is a cutting plane method for solving one of the linear programs. I will describe for (full) multicut, but other algorithms are really just aggregated version of this one Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 19 / 89
37 Extensive Form Benders Decomposition Regardless of aggregation, linear program is likely to have exponentially many constraints. Benders Decomosition is a cutting plane method for solving one of the linear programs. I will describe for (full) multicut, but other algorithms are really just aggregated version of this one Basic Questions For a given x k, θ k 1,..., θk s, we must check for each scenario s S 1 If there r R(Π s ) such that rt s x < T h s 2 If there v V(Π s ) such that θ k s + v T T s x k < v T h s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 19 / 89
38 Extensive Form Find a Ray? Phase 1 LP: Its Dual U s (x k ) = min 1 T u + 1 T v W s y + u v = h s T s x k y, u, v 0 max π T (h s T s x) W T s π 0 1 π 1 If U s (x k ) > 0, then by strong duality it has an optimal dual solution π k such that [π k ] T (h s T s x k ) > 0, so if we add the inequality this will exclude the solution x k (h s T s x) T π k 0 π k from Phase-1 LP is the dual extreme ray! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 20 / 89
39 Extensive Form Find a Vertex Question: Does v V(Π s ) such that θ k s + vt s x k < v T h s? So we should Note this is the same as solving max {(h s T s x k )v} v V(Π s) sup{(h s T s x) T π W T s π q s }. And by duality, this is also the same as solving z s LP (x) = inf{qt s y W s y = h s T s x, y 0}, and looking at the (optimal) dual variables for the constraints W s y = h s T s x. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 21 / 89
40 Extensive Form Master Problem Cutting Plane Algorithm Will Identify R s R(Π s ) subset of extreme rays of dual feasible set Π s V s V(Π s ) subset of extreme points of dual feasible set Π s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 22 / 89
41 Extensive Form Master Problem Cutting Plane Algorithm Will Identify R s R(Π s ) subset of extreme rays of dual feasible set Π s V s V(Π s ) subset of extreme points of dual feasible set Π s Full LP Master Problem min c T x + s S p s θ s min c T x + s S p s θ s r T T s x r T h s s S, r R(Π s ) θ s + v T T s x v T h s s S, v V(Π s ) Ax = b x 0 r T T s x r T h s s S, r R s θ s + v T T s x v T h s s S, v V s Ax = b x 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 22 / 89
42 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89
43 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89
44 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89
45 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89
46 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. 3 ub = c T x k + s S p sq(x k, ω s ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89
47 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. 3 ub = c T x k + s S p sq(x k, ω s ) 4 If ub lb ɛ or Done = true then Stop. x k is an optimal solution. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89
48 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. 3 ub = c T x k + s S p sq(x k, ω s ) 4 If ub lb ɛ or Done = true then Stop. x k is an optimal solution. 5 Solve Master problem. Let lb be its optimal solution value, and let k k + 1. Let x k be the optimal solution to the master problem. Go to 2. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89
49 Extensive Form A First Example subject to min x 1 + x 2 ω 1 x 1 + x 2 7 ω 2 x 1 + x 2 4 x 1 0 x 2 0 ω = (ω 1, ω 2 ) Ω = {(1, 1/3), (5/2, 2/3), (4, 1)} Each outcome has p s = 1 3 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 24 / 89
50 Extensive Form A First Example subject to min x 1 + x 2 ω 1 x 1 + x 2 7 ω 2 x 1 + x 2 4 x 1 0 x 2 0 ω = (ω 1, ω 2 ) Ω = {(1, 1/3), (5/2, 2/3), (4, 1)} Each outcome has p s = 1 3 Huh? This problem doesn t make sense! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 24 / 89
51 Extensive Form Recourse Formulation min x 1 + x s + λ p s (y 1s + y 2s ) s S ω 1s x 1 + x 2 + y 1s 7 s = 1, 2, 3 ω 2s x 1 + x 2 + y 2s 4 s = 1, 2, 3 x 1, x 2, y 1s, y 2s 0 s = 1, 2, 3 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 25 / 89
52 Extensive Form Recourse Formulation min x 1 + x s + λ p s (y 1s + y 2s ) s S ω 1s x 1 + x 2 + y 1s 7 s = 1, 2, 3 ω 2s x 1 + x 2 + y 2s 4 s = 1, 2, 3 x 1, x 2, y 1s, y 2s 0 s = 1, 2, 3 Stop! Gammer Time? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 25 / 89
53 Oracle-Based Oracle-Based Methods Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 26 / 89
54 Oracle-Based Two-Stage Stochastic LP with Recourse Our problem is where def min f(x) = c T x + E[Q(x, ω)] x X X = {x R n + Ax = b} Q(x, ω) = min y 0 {q(ω)t y T(ω)x + W(ω)y = h(ω)} In Q(x, ω), as x changes, the right hand side of the linear program changes. So, we should care very much about the value function of a linear program with respect to changes in its right-hand-side: v : R m R v(z) = min {q T y Wy = z} y R p + Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 27 / 89
55 Oracle-Based Nice Theorems Nice Theorem 1 Assume that Π = {π R m W T π q} } z 0 R m such that y 0 0 with Wy 0 = z 0 then v(z) is a proper, convex, polyhedral function v(z 0 ) = arg max{π T z 0 π Π} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 28 / 89
56 Oracle-Based Nice Theorems Nice Theorem 1 Assume that Π = {π R m W T π q} } z 0 R m such that y 0 0 with Wy 0 = z 0 then v(z) is a proper, convex, polyhedral function v(z 0 ) = arg max{π T z 0 π Π} Nice Theorem 2 Under similar conditions (on each scenario W s, q s ) f(x) = c T x + E[Q(x, ω)] = c T x + φ(x) is proper, convex, and polyhedral subgradients of f come from (transformed and aggregated) optimal dual solutions of the second stage subproblems: S ( ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 28 / 89
57 Oracle-Based Easy Peasy? (2SP) min f(x) x X C We know that f(x) is a nice 1 function It is also true that X C is a nice polyhedral set, so it should be easy to solve (2SP) What s the Problem?! f(x) is given implicitly: To evaluate f(x), we must solve S linear programs. 1 proper, convex, polyhedral Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 29 / 89
58 Oracle-Based Overarching Theme We will approximate 2 f by ever-improving functions of the form f(x) c T x + m k (x) Where m k (x) is a model of our expected recourse function: m k (x) S s=1 p s Q(x, ω s ) def = φ(x). We will also build ever-improving outerapproximations of C: (C k C). 2 often underapproximate Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 30 / 89
59 Oracle-Based Overarching Theme We will approximate 2 f by ever-improving functions of the form f(x) c T x + m k (x) Where m k (x) is a model of our expected recourse function: m k (x) S s=1 p s Q(x, ω s ) def = φ(x). We will also build ever-improving outerapproximations of C: (C k C). Since we know that Q(x k, ω s ) is convex, and Q(x k, ω s ) = T T s arg max π Π s {π T (h s T s x k )} we can underapproximate Q(x k, ω s ) using a (sub)-gradient inequality 2 often underapproximate Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 30 / 89
60 Oracle-Based Building a model By definition of convexity, we get Q(x, ω s ) Q(x k, ω s ) + y T (x x k ) y Q(x k, ω s ) Q(x k, ω s ) + [ T T s π k s] T (x x k ) for some π k s arg max π Π s {π T (h s T s x k )} = Q(x k, ω s ) + [π k s] T T s x k π k st s x = β k s + (α k s) T x We 3 aggregate these together to build a model of φ(x) S φ(x) = p s Q(X, ω s ) s=1 = β k + [ᾱ k ] T x S S p s β k s + p s [α k s] T x s=1 s=1 3 sometimes Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 31 / 89
61 Oracle-Based Our Model Choose some different x j X, π j s arg max π Πs {π T (h s T s x k ), j = 1,... k 1 Our Model (to minimize) β j s =Q(x j, ω s ) + [π j s] T T s x j βj = α j s = π k st s ᾱ j = m k (x) = max { β j + [ᾱ j ] T x} j=1,...,k 1 S p s β j s s=1 S p s α j s We model the process of minimizing the maximum using an auxiliary variable θ: θ β j + [ᾱ j ] T x j = 1,... k 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 32 / 89 s=1
62 Oracle-Based An Oracle-Based Method for Convex Minimization We assume f : X R n R is a convex function given by an oracle : We can get values f(x k ) and subgradients s k f(x k ) for x k X 1 Find x 1 X, k 1, θ 1, ub, I = 2 Subproblem: Compute f(x k ), s k f(x k ).ub min{f(x k ), ub} 3 If θ k = f(x k ). STOP, x k is optimal. 4 Else: I = I {k}. Solve Master: min {θ θ θ,x X f(xi ) + s T i (x xi ) i I}. Let solution be x k+1, θ k. Go to 2. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 33 / 89
63 Oracle-Based Worth 1000 Words φ(x) x Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 34 / 89
64 Oracle-Based Worth 1000 Words φ(x) x x k Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 35 / 89
65 Oracle-Based Worth 1000 Words φ(x) x x 2 x 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 36 / 89
66 Oracle-Based Regularizing A Dumb Algorithm? x k+1 arg min {c T x + m k (x) Ax = b} x R n + What happens if you start the algorithm with an initial iterate that is the optimal solution x? Are you done? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 37 / 89
67 Oracle-Based Regularizing A Dumb Algorithm? x k+1 arg min {c T x + m k (x) Ax = b} x R n + What happens if you start the algorithm with an initial iterate that is the optimal solution x? Are you done? Unfortunately, no. At the first iterations, we have a very poor model m k ( ), so when we minimize this model, we may move very far away from x A variety of methods in stochastic programming use well-known methods from nonlinear programming/convex optimization to ensure that iterations are well-behaved. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 37 / 89
68 Oracle-Based Regularizing Regularizing Borrow the trust region concept from NLP Wright [2003]) At iteration k Have an incumbent solution x k Impose constraints x x k k k large like LShaped k small stay very close. This is often called Regularizing the method. (Linderoth and Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 38 / 89
69 Oracle-Based Regularizing Regularizing Borrow the trust region concept from NLP Wright [2003]) At iteration k Have an incumbent solution x k Impose constraints x x k k k large like LShaped k small stay very close. This is often called Regularizing the method. (Linderoth and Another (Good) Idea Penalize the length of the step you will take. min c T x + j C θ j + 1/(2ρ) x x k 2 ρ large like LShaped ρ small stay very close. This is known as the regularized decomposition method. Pioneered in stochastic programming by Ruszczyński [1986]. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 38 / 89
70 Oracle-Based Regularizing Trust Region Effect: Step Length Step Size Iteration Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 39 / 89
71 Oracle-Based Regularizing Trust Region Effect: Function Values 2e e e+07 LShaped function value LShaped master value Trust region function value Trust region master value 1.7e+07 Value 1.6e e e e e Iteration Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 40 / 89
72 Oracle-Based Regularizing Bundle-Trust These ideas are known in the nondifferentiable optimization community as Bundle-Trust-Region methods. Bundle Build up a bundle of subgradients to better approximate your function. (Get a better model m( )) Trust region Stay close (in a region you trust), until you build up a good enough bundle to model your function accurately Accept new iterate if it improves the objective by a sufficient amount. Potentially increase k or ρ. (Serious Step) Otherwise, improve the estimation of φ(x k ), resolve master problem, and potentially reduce k of ρ (Null Step) These methods can be shown to converge, even if cuts are deleted from the master problem. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 41 / 89
73 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89
74 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 2 Compute f(y 1 ) and subgradient model update information: ( β, ᾱ j ) if LShaped. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89
75 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 2 Compute f(y 1 ) and subgradient model update information: ( β, ᾱ j ) if LShaped. 3 Master: Let y k+1 arg min{c T x + m k (x) x (B(x k, ) X}) 4 Compute predicted decrease: 5 If δ k ɛ Stop, y k+1 is optimal. δ k = f(x k ) ^f k (y k+1 ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89
76 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 2 Compute f(y 1 ) and subgradient model update information: ( β, ᾱ j ) if LShaped. 3 Master: Let y k+1 arg min{c T x + m k (x) x (B(x k, ) X}) 4 Compute predicted decrease: δ k = f(x k ) ^f k (y k+1 ) 5 If δ k ɛ Stop, y k+1 is optimal. 6 Subproblems: Compute f(y k+1 ) and subgradient information. Update m k (x) with subgradient information from y k+1. If f(x k ) f(y k+1 ) µδ k, then Serious Step: x k+1 y k+1 Else: Null Step: 7 x k+1 x k Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89
77 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89
78 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89
79 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} 1 Choose λ (0, 1), x 1 X, k = 1 2 Compute f(x k ), s k f(x k ), update m k (x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89
80 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} 1 Choose λ (0, 1), x 1 X, k = 1 2 Compute f(x k ), s k f(x k ), update m k (x) 3 Minimize Model: z k = min x X m k (x) Let z k = min i=1,...,k {f(x i )} be the best objective value you ve seen so far Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89
81 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} 1 Choose λ (0, 1), x 1 X, k = 1 2 Compute f(x k ), s k f(x k ), update m k (x) 3 Minimize Model: z k = min x X m k (x) Let z k = min i=1,...,k {f(x i )} be the best objective value you ve seen so far 4 Project: Let l k = z k + λ(z k z k ). x k+1 arg min x X { x x k 2 m k (x) l k }.k k + 1. Go to 2. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89
82 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89
83 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. The diameter of a compact set X is diam(x) def = max x y x,y X Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89
84 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. The diameter of a compact set X is Smart Guy Theorem diam(x) def = max x y x,y X z k z k ɛ k C(λ) ( ) LD 2, ɛ Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89
85 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. The diameter of a compact set X is Smart Guy Theorem diam(x) def = max x y x,y X z k z k ɛ k C(λ) ( ) LD 2, ɛ C(λ) = 1 λ(1 λ) 2 (2 λ) This rate is independent of the number of variables of the problem The minimimum C(λ ) = 4 when λ = Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89
86 Oracle-Based Level Method Papers with Computational Experience Some computational experience in Zverovich s Ph.D. thesis: [Zverovich, 2011] Zverovich et al. [2012] have a nice, comprehensive comparison between Solving extensive form using simplex method and barrier method LShaped-method (aggregated forms) Regularized Decomposition Level method Trust region method Who s the winner? Hard to pick. But I think level method wins, simplex on extensive form is slowest Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 45 / 89
87 Oracle-Based Level Method Performance Profile [Zverovich et al., 2012] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 46 / 89
88 Other Ideas for 2SP Dual Decomposition Methods A Dual Idea Dual Decomposition Create copies of the first-stage decision variables for each scenario Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 47 / 89
89 Other Ideas for 2SP Dual Decomposition Methods A Dual Idea Dual Decomposition Create copies of the first-stage decision variables for each scenario minimize subject to p s c T x s + q T y s s S Ax s = b T s x s + Wy s = h s s S x s 0 s S y s 0 s S x 1 = x 2 =... = x S Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 47 / 89
90 Other Ideas for 2SP Dual Decomposition Methods Relax Nonanticipativity The constraints x 0 = x 1 = x 2 =... = x s are called nonanticipativity constraints. We relax the nonanticipativity constraints, so the problem decomposes by scenario, and then we do Lagrangian Relaxation: max D s (λ s ) λ 1,...λ s s S where D s (λ s ) = min {p s (c T x s + q T y s ) + λ T s (x s x 0 ), } (x s,y s) F s and F s = {(x, y) Ax = b, T s x + W s y = h s, x 0, y 0} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 48 / 89
91 Other Ideas for 2SP Dual Decomposition Methods Relax Nonanticipativity The constraints x 0 = x 1 = x 2 =... = x s are called nonanticipativity constraints. We relax the nonanticipativity constraints, so the problem decomposes by scenario, and then we do Lagrangian Relaxation: max D s (λ s ) λ 1,...λ s s S where D s (λ s ) = min {p s (c T x s + q T y s ) + λ T s (x s x 0 ), } (x s,y s) F s and F s = {(x, y) Ax = b, T s x + W s y = h s, x 0, y 0} Even Fancier You can do Augmented Lagrangian or Progressive Hedging [Rockafellar and Wets, 1991] by adding a quadratic proximal term to the Lagrangian function Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 48 / 89
92 Other Ideas for 2SP Bunching Bunching This idea is found in the works of Wets [1988] and Gassmann [1990] If W s = W, q s = q, s = 1,..., S, then to evaluate φ(x) we must solve S linear programs that differ only in their right hand side. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 49 / 89
93 Other Ideas for 2SP Bunching Bunching This idea is found in the works of Wets [1988] and Gassmann [1990] If W s = W, q s = q, s = 1,..., S, then to evaluate φ(x) we must solve S linear programs that differ only in their right hand side. Therefore, the dual LPs differ only the objective function: π s arg max π {πt (h s T s^x) : π T W q}. Basic Idea π s is feasible for all scenarios, and we have a dual feasible basis matrix W B For a new scenario (h r, T r ), with new objective (h r T r^x), if all reduced costs are negative, then π s is also optimal for scenario r Use dual simplex to solve scenario linear programs evaluating φ(x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 49 / 89
94 Other Ideas for 2SP Interior Point Methods Interior Point methods c T x + p 1 q T 1 y 1 + p 2 q T 2 y p sq T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Since extensive form is highly structured, then matrices of kkt system that must be solved (via Newton-type methods) for interior point methods can also be exploited. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 50 / 89
95 Other Ideas for 2SP Interior Point Methods Interior Point methods c T x + p 1 q T 1 y 1 + p 2 q T 2 y p sq T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Since extensive form is highly structured, then matrices of kkt system that must be solved (via Newton-type methods) for interior point methods can also be exploited. He s The Expert! Definitely stick around for Jacek Gondzio s final plenary Recent computational advances in solving very large stochastic programs, 5PM on Friday. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 50 / 89
96 Computing Computing Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 51 / 89
97 Computing SMPS SMPS Format How do we specify a stochastic programming instance to the solver? We could form the extensive form ourselves, but... For really big problems, forming the extensive form is out of the questions. We need to just specify the random parts of the model. We can do this using SMPS format There is some recent research work in developed stochastic programming support in an AML. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 52 / 89
98 Computing SMPS Modeling Language Support AMPL: ( Talk by Gautum Mitra: Formulation and solver support for optimisation under uncertainty, Thursday afternoon, Room 3. SML: Colombo et al. [2009]. Adds keywords extending AMPL that encode problem structure. GAMS: ( Uses GAMS EMP (Extended Math Programming) framework. Manual at LINDO: ( Has support for built-in sampling 4 procedures. MPL: ( Has built-in decomposition solvers. Some introductory slides at 4 We ll talk about sampling shortly Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 53 / 89
99 Computing SMPS SMPS Components Core file Like MPS file for base instance Time file Specifies the time dependence structure Stoch file Specifies the randomness Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 54 / 89
100 Computing SMPS SMPS Components Core file Like MPS file for base instance Time file Specifies the time dependence structure Stoch file Specifies the randomness SMPS References Birge et al. [1987], Gassmann and Schweitzer [2001] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 54 / 89
101 Computing SMPS SMPS Core File Like an MPS file specifying a base scenario Must permute the rows and columns so that the time indexing is sequential. (Columns for stage j listed before columns for stage j + 1). Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 55 / 89
102 Computing SMPS SMPS Core File Like an MPS file specifying a base scenario Must permute the rows and columns so that the time indexing is sequential. (Columns for stage j listed before columns for stage j + 1). min x 1 + x s + λ p s (y 1s + y 2s ) s S ω 1s x 1 + x 2 + y 1s 7 s = 1, 2, 3 ω 2s x 1 + x 2 + y 2s 4 s = 1, 2, 3 x 1, x 2, y 1s, y 2s 0 s = 1, 2, 3 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 55 / 89
103 Computing SMPS little.cor NAME little ROWS G R0001 G R0002 N R0003 COLUMNS C0001 R C0001 R C0001 R C0002 R C0002 R C0002 R C0003 R C0003 R C0004 R C0004 R RHS B R B R ENDATA Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 56 / 89
104 Computing SMPS little.tim Specify which row/column starts each time period. Must be sequential! * TIME little PERIODS IMPLICIT C0001 R0001 T1 C0003 R0001 T2 ENDATA Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 57 / 89
105 Computing SMPS Stoch File BLOCKS Specify a block of parameters that changes together INDEP Specify that all the parameters you are specifying are all independent random variables SCENARIO Specify a base scenario Specify what things change and when... Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 58 / 89
106 Computing SMPS litle.sto * STOCH little * BLOCKS DISCRETE BL BLOCK1 T C0001 R C0001 R BL BLOCK1 T C0001 R C0001 R BL BLOCK1 T C0001 R C0001 R ENDATA * STOCH little * INDEP DISCRETE C0001 R C0001 R C0001 R C0001 R ENDATA Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 59 / 89
107 Computing Utility Libraries Some Utility Libraries If you need to read and write SMPS files and manipulate and query the instance as part of build an algorithm in a programming language, you can try the following libraries Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 60 / 89
108 Computing Utility Libraries Some Utility Libraries If you need to read and write SMPS files and manipulate and query the instance as part of build an algorithm in a programming language, you can try the following libraries PySP: [Watson et al., 2012] Based on Pyomo [Hart et al., 2011] Also allows to build models Some algorithmic support, especially for progressive hedging type algorithms Watson and Woodruff [2011] Coin-SMI: From Coin-OR collection of open source code. SUTIL: A little bit dated, but being refectored now Has implemented methods for sampling from distribution specified in SMPS files Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 60 / 89
109 Computing Parallel Computation Parallelizing In decomposition algorithms, the evaluation of φ(x) solving the different LP s, can be done independently. If you have K computers, send them each one of S /K linear programs, and your evaluation of φ(x) will be completed K times faster. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 61 / 89
110 Computing Parallel Computation Parallelizing In decomposition algorithms, the evaluation of φ(x) solving the different LP s, can be done independently. If you have K computers, send them each one of S /K linear programs, and your evaluation of φ(x) will be completed K times faster. Factors Affecting Efficiency Synchronization: Waiting for all parallel machines to complete Solving the master problem worker machines waiting for master to complete Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 61 / 89
111 Computing Parallel Computation Worker Usage Number of Idle Workers t (sec.) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 62 / 89
112 Computing Parallel Computation Stamp Out Synchronicity! We start a new iteration only after all LPs have been evaluated In cloud/heterogeneous computing environments, different processors act at different speeds, so many may wait idle for the slowpoke Even worse, in many cloud environments, machines can be reclaimed before completing their tasks. Distributed Computing Fact Asynchronous methods are preferred for traditional parallel computing environments. They are nearly required for heterogenous and dynamic environments! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 63 / 89
113 Computing Parallel Computation ATR An Asynchronous Trust Region Method Keep a basket B of trial points for which we are evaluating the objective function Make decision on whether or accept new iterate x k+1 after entire f(x k ) is computed Convergence theory and cut deletion theory is similar to the synchronous algorithm Populate the basket quickly by initially solving the master problem after only α% of the scenario LPs have been solved Greatly reduces the synchronicity requirements Might be doing some unnecessary work the candidiate points might be better if you waited for complete information from the preceeding iterations Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 64 / 89
114 Computing Parallel Computation The World s Largest LP Storm A cargo flight scheduling problem (Mulvey and Ruszczyński) In 2000, we aimed to solve an instance with 10,000,000 scenarios x R 121, y(ω s ) R 1259 The deterministic equivalent is of size A R 985,032,889 12,590,000,121 Cuts/iteration 1024, # Chunks 1024, B = 4 Started from an N = solution, 0 = 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 65 / 89
115 Computing Parallel Computation The Super Storm Computer Number Type Location 184 Intel/Linux Argonne 254 Intel/Linux New Mexico 36 Intel/Linux NCSA 265 Intel/Linux Wisconsin 88 Intel/Solaris Wisconsin 239 Sun/Solaris Wisconsin 124 Intel/Linux Georgia Tech 90 Intel/Solaris Georgia Tech 13 Sun/Solaris Georgia Tech 9 Intel/Linux Columbia U. 10 Sun/Solaris Columbia U. 33 Intel/Linux Italy (INFN) 1345 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 66 / 89
116 Computing Parallel Computation TA-DA!!!!! Wall clock time 31:53:37 CPU time 1.03 Years Avg. # machines 433 Max # machines 556 Parallel Efficiency 67% Master iterations 199 CPU Time solving the master problem 1:54:37 Maximum number of rows in master problem Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 67 / 89
117 Computing Parallel Computation Number of Workers #workers Sec. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 68 / 89
118 Sampling and Monte Carlo Monte Carlo Methods Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 69 / 89
119 Sampling and Monte Carlo The Ugly Truth Imagine the following (real) problem. A Telecom company wants to expand its network in a way in which to meet an unknown (random) demand. There are 86 unknown demands. Each demand is independent and may take on one of five values. S = Ω = Π 86 k=1 (5) = 586 = The number of subatomic particles in the universe. How do we solve a problem that has more variables and more constraints than the number of subatomic particles in the universe? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 70 / 89
120 Sampling and Monte Carlo The Ugly Truth Imagine the following (real) problem. A Telecom company wants to expand its network in a way in which to meet an unknown (random) demand. There are 86 unknown demands. Each demand is independent and may take on one of five values. S = Ω = Π 86 k=1 (5) = 586 = The number of subatomic particles in the universe. How do we solve a problem that has more variables and more constraints than the number of subatomic particles in the universe? The answer is we can t! We solve an approximating problem obtained through sampling. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 70 / 89
121 Sampling and Monte Carlo Monte Carlo Methods ( ) min {f(x) E PF(x, ω) F(x, ω)dp(ω)} x X Ω Most of the theory presented holds for (*) A very general SP problem Naturally it holds for our favorite SP problem: X def = {x Ax = b, x 0} f(x) c T x + E{Q(x, ω)} Q(x, ω) min y 0 {q(ω) T y Wy = h(ω) T(ω)x} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 71 / 89
122 Sampling and Monte Carlo Monte Carlo Methods ( ) min {f(x) E PF(x, ω) F(x, ω)dp(ω)} x X Ω Most of the theory presented holds for (*) A very general SP problem Naturally it holds for our favorite SP problem: X def = {x Ax = b, x 0} f(x) c T x + E{Q(x, ω)} Q(x, ω) min y 0 {q(ω) T y Wy = h(ω) T(ω)x} The Dirty Secret Evaluating f(x) is completely intractable! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 71 / 89
123 Sampling and Monte Carlo Sampling Methods Interior Sampling Methods Sample while solving LShaped Method [Dantzig and Infanger, 1991] Stochastic Decomposition [Higle and Sen, 1991] Stochastic Approximation Methods Stochastic Quasi-Gradient [Ermoliev, 1983] Mirror-Descent Stochastic Approximation [Nemirovski et al., 2009] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 72 / 89
124 Sampling and Monte Carlo Sampling Methods Interior Sampling Methods Sample while solving LShaped Method [Dantzig and Infanger, 1991] Stochastic Decomposition [Higle and Sen, 1991] Stochastic Approximation Methods Stochastic Quasi-Gradient [Ermoliev, 1983] Mirror-Descent Stochastic Approximation [Nemirovski et al., 2009] Exterior sampling methods Sample. Then Solve. Sample Average Approximation Key Obtain (Statistical) bounds on solution quality Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 72 / 89
125 Sampling and Monte Carlo Sample Average Approximation Instead of solving (*), we solve an approximating problem. Let ξ 1,..., ξ N be N realizations of the random variable ξ: min {f N(x) N 1 x S N F(x, ξ j )}. j=1 f N (x) is just the sample average function For any x, we consider f N (x) a random variable, as it depends on the random sample Since ξ j drawn from P, f N (x) is an unbiased estimator of f(x) E[f N (x)] = f(x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 73 / 89
126 Sampling and Monte Carlo Lower Bounds v = min {f(x) E PF(x, ω) F(x, ω)p(ω)dω} x S Ω For some sample ξ 1,..., ξ N, let Thm: v N = min x S {f N(x) N 1 N F(x, ξ j )}. j=1 E[v N ] v Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 74 / 89
127 Sampling and Monte Carlo Proof E min x X N 1 min x X N 1 N N F(x, ξ j ) N 1 F(x, ξ j ) x X j=1 N F(x, ξ j ) E j=1 E [v N ] E j=1 N 1 N 1 N j=1 N j=1 min E N 1 x X F(x, ξ j ) F(x, ξ j ) N j=1 x X x X F(x, ξ j ) = v Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 75 / 89
128 Sampling and Monte Carlo Next? Now we need to somehow estimate E[v n ] Idea: Generate M independent samples, ξ 1,j,..., ξ N,j, j = 1,..., M, each of size N, and solve the corresponding SAA problems min x X { N f j N (x) := N 1 } F(x, ξ i,j ), (1) i=1 for each j = 1,..., M. Let v j N and compute be the optimal value of problem (1), M L N,M 1 M j=1 v j N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 76 / 89
129 Sampling and Monte Carlo Lower Bounds The estimate L N,M is an unbiased estimate of E[v N ]. By our last theorem, it provides a statistical lower bound for the true optimal value v. When the M batches ξ 1,j, ξ 2,j,..., ξ N,j, j = 1,..., M, are i.i.d. (although the elements within each batch do not need to be i.i.d.) have by the Central Limit Theorem that M [LN,M E(v N )] N (0, σ 2 L ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 77 / 89
130 Sampling and Monte Carlo Confidence Intervals I can estimate the variance of my estimate L M,N as s 2 L (M) 1 M 1 M j=1 ( v j N L N,M) 2. Defining z α to satisfy P{N(0, 1) z α } = 1 α, and replacing σ L by s L (M), we can obtain an approximate (1 α)-confidence interval for E[v N ] to be [ L N,M z αs L (M), L N,M + z ] αs L (M) M M Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 78 / 89
131 Sampling and Monte Carlo Upper Bounds v def = min{f(x) = E P F(x; ω) def = F(x; ω)p(ω)dω} x X Ω Quick, Someone prove... f(^x) v x X How can we estimate f(^x)? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 79 / 89
132 Sampling and Monte Carlo Estimating f(^x) Generate T independent batches of samples of size N, denoted by ξ 1,j, ξ 2,j,..., ξ N,j, j = 1, 2,..., T, where each batch has the unbiased property, namely E f j N(x) := N 1 N i=1 We can then use the average value defined by as an estimate of f(^x). F(x, ξ i,j ) = f(x), for all x X. U N,T (^x) := T 1 T f j N(^x) j=1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 80 / 89
133 Sampling and Monte Carlo More Confidence Intervals By applying the Central Limit Theorem again, we have that T [U N,T (^x) f(^x)] N(0, σ 2 U (^x)), as T, where σ 2 U (^x) := Var [ f N(^x) ]. We can estimate σ 2 U (^x) by the sample variance estimator s 2 U (^x; T) defined by s 2 1 U (^x; T) := T 1 T j=1 [ f j N(^x) U N,T (^x)] 2. By replacing σ 2 U (^x) with s2 U (^x; T), we can proceed as above to obtain a (1 α)-confidence interval for f(^x): [ U N,T (^x) z αs U (^x; T), U N,T (^x) + z ] αs U (^x; T). T T Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 81 / 89
134 Sampling and Monte Carlo Putting it all together v N is the optimal solution value for the sample average function: { v N min x S f N (x) := N 1 } N j=1 F(x, ωj ) Estimate E(v N ) as Ê(v N) = L N,M = M 1 M j=1 vj N Solve M stochastic LP s, each of sampled size N. f N (x) is the sample average function Draw ω 1,... ω N from P f N (x) N 1 N j=1 F(x, ωj ) For Stochastic LP w/recourse solve N LP s. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 82 / 89
135 Sampling and Monte Carlo Recapping Theorems Thm. E(v N ) v f(x) x Thm. fn (^x) E(v N ) f(^x) v, as N, M, N We are mostly interested in estimating the quality of a given solution ^x. This is f(^x) v. f N (^x) computed by solving N independent LPs. Ê(v N ) computed by solving M independent stochastic LPs. Independent no synchronization easy to do in parallel Independent can construct confidence intervals around the estimates Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 83 / 89
136 Sampling and Monte Carlo An experiment M times Solve a stochastic sampled approximation of size N. (Thus obtaining an estimate of E(v N )). For each of the M solutions x 1,... x M, estimate f(^x) by solving N LP s. Test Instances Name Application Ω (m 1, n 1 ) (m 2, n 2 ) LandS HydroPower Planning 10 6 (2,4) (7,12) gbd Fleet Routing (?,?) (?,?) storm Cargo Flight Scheduling (185, 121) (?,1291) 20term Vehicle Assignment (1,5) (71,102) ssn Telecom. Network Design (1,89) (175,706) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 84 / 89
137 Sampling and Monte Carlo Convergence of Optimal Solution Value 9 M 12, N = 10 6 Monte Carlo Sampling Instance N = 50 N = 100 N = 500 N = 1000 N = term gbd LandS storm ssn Latin Hypercube Sampling Instance N = 50 N = 100 N = 500 N = 1000 N = term gbd LandS storm ssn Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 85 / 89
138 Sampling and Monte Carlo 20term Convergence. Monte Carlo Sampling Lower Bound Upper Bound Value N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 86 / 89
139 Sampling and Monte Carlo ssn Convergence. Monte Carlo Sampling Lower Bound Upper Bound Value N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 87 / 89
140 Sampling and Monte Carlo storm Convergence. Monte Carlo Sampling 1.555e e+06 Lower Bound Upper Bound 1.553e e e+06 Value 1.55e e e e e e e N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 88 / 89
141 Sampling and Monte Carlo gbd Convergence. Monte Carlo Sampling 1800 Lower Bound Upper Bound Value N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 89 / 89
142 Bibliography Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14
143 References J. R. Birge and F. V. Louveaux. A multicut algorithm for two-stage stochastic linear programs. European Journal of Operations Research, 34: , J. R. Birge, M. A. H. Dempster, H. I. Gassmann, E. A. Gunn, and A. J. King. A standard input format for multiperiod stochastic linear programs. COAL Newsletter, 17:1 19, J. R. Birge, C. J. Donohue, D. F. Holmes, and O. G. Svintsiski. A parallel implementation of the nested decomposition algorithm for multistage stochastic linear programs. Mathematical Programming, 75: , M. Colombo, A. Grothey, J. Hogg, K. Woodsend, and J. Gondzio. A structure-conveying modelling language for mathematical and stochastic programming. Mathematical Programming Computation, 1(4): , G. Dantzig and G. Infanger. Large-scale stochastic linear programs Importance sampling and Bender s decomposition. In C. Brezinski and U. Kulisch, editors, Computational and Applied Mathematics I (Dublin, 1991), pages North-Holland, Amsterdam, Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14
144 References Y. Ermoliev. Stochastic quasi-gradient methods and their application to systems optimization. Stochastics, 4:1 37, H. I. Gassmann. MSLiP: A computer code for the multistage stochastic linear programming problem. Mathematical Programming, 47: , H.I. Gassmann and E. Schweitzer. A comprehensive input format for stochastic linear programs. Annals of Operations Research, 104:89 125, A. Gavironski. Implementation of stochastic quasigradient methods. In Numerical Techniques for Stochastic Optimization. Springer-Verlag, W. E. Hart, J.-P. Watson, and D. L. Woodruff. Pyomo: Modeling and solving mathematical programs in Python. Mathematical Programming Computation, 3(3): , J. L. Higle and S. Sen. Stochastic decomposition: An algorithm for two stage linear programs with recourse. Mathematics of Operations Research, 16(3): , Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14
145 References U. Janjarassuk. Using Computational Grids for Effective Solution of Stochastic Programs. PhD thesis, Department of Industrial and Systems Engineering, Lehigh University, G. Lan, A. Nemirovski, and A. Shapiro. Validation analysis of mirror descent stochastic approximation method. Mathematical Programming, pages 1 34, C. Lemaréchal, A. Nemirovskii, and Y. Nesterov. New variants of bundle methods. Mathematical Programming, 69: , J. T. Linderoth and S. J. Wright. Implementing a decomposition algorithm for stochastic programming on a computational grid. Computational Optimization and Applications, 24: , Special Issue on Stochastic Programming. A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19: , H. Robbins and S. Monro. On a stochastic approximation method. Annals of Mathematical Statistics, 22: , Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14
146 References R.T. Rockafellar and Roger J.-B. Wets. Scenarios and policy aggregation in optimization under uncertainty. Mathematics of Operations Research, 16(1): , A. Ruszczyński. A regularized decomposition for minimizing a sum of polyhedral functions. Mathematical Programming, 35: , A. Ruszczyński. Parallel decomposition of multistage stochastic programming problems. Mathematical Programming, 58: , S. Trukhanov, L. Ntaimo, and A. Schaefer. Adaptive multicut aggregation for two-stage stochastic linear programs with recourse. European Journal of Operational Research, 206: , J.-P. Watson and D. L. Woodruff. Progressive hedging innovations for a class of stochastic mixed-integer resource allocation problems. Computational Management Science, 8(4): , J.-P. Watson, D. L. Woodruff, and W. E. Hart. Pysp: Modeling and solving stochastic programs in Python. Mathematical Programming Computation, 4(2): , R. J. B. Wets. Large-scale linear programming techniques in stochastic Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14
147 Bibliography programming. In Numerical Techniques for Stochastic Optimization. Springer-Verlag, V. Zverovich. Modelling and Solution Methods for Stochastic Optimization. PhD thesis, Brunel university, V. Zverovich, C. I. Fábián, E. F. D. Ellison, and G. Mitra. A computational study of a solver system for processing two-stage stochastic LPs with enhanced Benders decomposition. Mathematical Programming Computation, 4(3):21 238, Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 2 / 14
148 Bibliography Miscellaneous Topics Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 2 / 14
Stochastic Decomposition
IE 495 Lecture 18 Stochastic Decomposition Prof. Jeff Linderoth March 26, 2003 March 19, 2003 Stochastic Programming Lecture 17 Slide 1 Outline Review Monte Carlo Methods Interior Sampling Methods Stochastic
More informationMonte Carlo Methods for Stochastic Programming
IE 495 Lecture 16 Monte Carlo Methods for Stochastic Programming Prof. Jeff Linderoth March 17, 2003 March 17, 2003 Stochastic Programming Lecture 16 Slide 1 Outline Review Jensen's Inequality Edmundson-Madansky
More informationSMPS and the Recourse Function
IE 495 Lecture 8 SMPS and the Recourse Function Prof. Jeff Linderoth February 5, 2003 February 5, 2003 Stochastic Programming Lecture 8 Slide 1 Outline Formulating Stochastic Program(s). SMPS format Jacob
More informationStochastic Integer Programming
IE 495 Lecture 20 Stochastic Integer Programming Prof. Jeff Linderoth April 14, 2003 April 14, 2002 Stochastic Programming Lecture 20 Slide 1 Outline Stochastic Integer Programming Integer LShaped Method
More informationReformulation and Sampling to Solve a Stochastic Network Interdiction Problem
Network Interdiction Stochastic Network Interdiction and to Solve a Stochastic Network Interdiction Problem Udom Janjarassuk Jeff Linderoth ISE Department COR@L Lab Lehigh University jtl3@lehigh.edu informs
More informationSolution Methods for Stochastic Programs
Solution Methods for Stochastic Programs Huseyin Topaloglu School of Operations Research and Information Engineering Cornell University ht88@cornell.edu August 14, 2010 1 Outline Cutting plane methods
More informationThe L-Shaped Method. Operations Research. Anthony Papavasiliou 1 / 44
1 / 44 The L-Shaped Method Operations Research Anthony Papavasiliou Contents 2 / 44 1 The L-Shaped Method [ 5.1 of BL] 2 Optimality Cuts [ 5.1a of BL] 3 Feasibility Cuts [ 5.1b of BL] 4 Proof of Convergence
More informationDecomposition Algorithms for Stochastic Programming on a Computational Grid
Optimization Technical Report 02-07, September, 2002 Computer Sciences Department, University of Wisconsin-Madison Jeff Linderoth Stephen Wright Decomposition Algorithms for Stochastic Programming on a
More informationOptimization Tools in an Uncertain Environment
Optimization Tools in an Uncertain Environment Michael C. Ferris University of Wisconsin, Madison Uncertainty Workshop, Chicago: July 21, 2008 Michael Ferris (University of Wisconsin) Stochastic optimization
More informationFinancial Optimization ISE 347/447. Lecture 21. Dr. Ted Ralphs
Financial Optimization ISE 347/447 Lecture 21 Dr. Ted Ralphs ISE 347/447 Lecture 21 1 Reading for This Lecture C&T Chapter 16 ISE 347/447 Lecture 21 2 Formalizing: Random Linear Optimization Consider the
More informationReformulation and Sampling to Solve a Stochastic Network Interdiction Problem
Reformulation and Sampling to Solve a Stochastic Network Interdiction Problem UDOM JANJARASSUK, JEFF LINDEROTH Department of Industrial and Systems Engineering, Lehigh University, 200 W. Packer Ave. Bethlehem,
More informationThe L-Shaped Method. Operations Research. Anthony Papavasiliou 1 / 38
1 / 38 The L-Shaped Method Operations Research Anthony Papavasiliou Contents 2 / 38 1 The L-Shaped Method 2 Example: Capacity Expansion Planning 3 Examples with Optimality Cuts [ 5.1a of BL] 4 Examples
More informationAn Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse
An Adaptive Partition-based Approach for Solving Two-stage Stochastic Programs with Fixed Recourse Yongjia Song, James Luedtke Virginia Commonwealth University, Richmond, VA, ysong3@vcu.edu University
More informationBenders Decomposition
Benders Decomposition Yuping Huang, Dr. Qipeng Phil Zheng Department of Industrial and Management Systems Engineering West Virginia University IENG 593G Nonlinear Programg, Spring 2012 Yuping Huang (IMSE@WVU)
More informationLagrangian Relaxation in MIP
Lagrangian Relaxation in MIP Bernard Gendron May 28, 2016 Master Class on Decomposition, CPAIOR2016, Banff, Canada CIRRELT and Département d informatique et de recherche opérationnelle, Université de Montréal,
More informationCO 250 Final Exam Guide
Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,
More informationBenders Decomposition Methods for Structured Optimization, including Stochastic Optimization
Benders Decomposition Methods for Structured Optimization, including Stochastic Optimization Robert M. Freund April 29, 2004 c 2004 Massachusetts Institute of echnology. 1 1 Block Ladder Structure We consider
More informationStochastic Programming Math Review and MultiPeriod Models
IE 495 Lecture 5 Stochastic Programming Math Review and MultiPeriod Models Prof. Jeff Linderoth January 27, 2003 January 27, 2003 Stochastic Programming Lecture 5 Slide 1 Outline Homework questions? I
More informationChance Constrained Programming
IE 495 Lecture 22 Chance Constrained Programming Prof. Jeff Linderoth April 21, 2003 April 21, 2002 Stochastic Programming Lecture 22 Slide 1 Outline HW Fixes Chance Constrained Programming Is Hard Main
More informationLecture 1. Stochastic Optimization: Introduction. January 8, 2018
Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing
More informationComplexity of two and multi-stage stochastic programming problems
Complexity of two and multi-stage stochastic programming problems A. Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA The concept
More informationInteger Programming Reformulations: Dantzig-Wolfe & Benders Decomposition the Coluna Software Platform
Integer Programming Reformulations: Dantzig-Wolfe & Benders Decomposition the Coluna Software Platform François Vanderbeck B. Detienne, F. Clautiaux, R. Griset, T. Leite, G. Marques, V. Nesello, A. Pessoa,
More informationLinear Programming: Simplex
Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016
More informationOptimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More informationIntroduction to Nonlinear Stochastic Programming
School of Mathematics T H E U N I V E R S I T Y O H F R G E D I N B U Introduction to Nonlinear Stochastic Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio SPS
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationQuiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006
Quiz Discussion IE417: Nonlinear Programming: Lecture 12 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University 16th March 2006 Motivation Why do we care? We are interested in
More informationPenalty and Barrier Methods General classical constrained minimization problem minimize f(x) subject to g(x) 0 h(x) =0 Penalty methods are motivated by the desire to use unconstrained optimization techniques
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationStochastic Dual Dynamic Integer Programming
Stochastic Dual Dynamic Integer Programming Jikai Zou Shabbir Ahmed Xu Andy Sun December 26, 2017 Abstract Multistage stochastic integer programming (MSIP) combines the difficulty of uncertainty, dynamics,
More informationDecomposition methods in optimization
Decomposition methods in optimization I Approach I: I Partition problem constraints into two groups: explicit and implicit I Approach II: I Partition decision variables into two groups: primary and secondary
More informationSolving Dual Problems
Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationA DECOMPOSITION PROCEDURE BASED ON APPROXIMATE NEWTON DIRECTIONS
Working Paper 01 09 Departamento de Estadística y Econometría Statistics and Econometrics Series 06 Universidad Carlos III de Madrid January 2001 Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624
More informationOutline. Relaxation. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Lagrangian Relaxation. Lecture 12 Single Machine Models, Column Generation
Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING 1. Lagrangian Relaxation Lecture 12 Single Machine Models, Column Generation 2. Dantzig-Wolfe Decomposition Dantzig-Wolfe Decomposition Delayed Column
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationCS711008Z Algorithm Design and Analysis
CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief
More informationLecture 9: Dantzig-Wolfe Decomposition
Lecture 9: Dantzig-Wolfe Decomposition (3 units) Outline Dantzig-Wolfe decomposition Column generation algorithm Relation to Lagrangian dual Branch-and-price method Generated assignment problem and multi-commodity
More informationA Benders Algorithm for Two-Stage Stochastic Optimization Problems With Mixed Integer Recourse
A Benders Algorithm for Two-Stage Stochastic Optimization Problems With Mixed Integer Recourse Ted Ralphs 1 Joint work with Menal Güzelsoy 2 and Anahita Hassanzadeh 1 1 COR@L Lab, Department of Industrial
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More informationBilevel Integer Optimization: Theory and Algorithms
: Theory and Algorithms Ted Ralphs 1 Joint work with Sahar Tahernajad 1, Scott DeNegre 3, Menal Güzelsoy 2, Anahita Hassanzadeh 4 1 COR@L Lab, Department of Industrial and Systems Engineering, Lehigh University
More informationLagrange Relaxation: Introduction and Applications
1 / 23 Lagrange Relaxation: Introduction and Applications Operations Research Anthony Papavasiliou 2 / 23 Contents 1 Context 2 Applications Application in Stochastic Programming Unit Commitment 3 / 23
More informationOptimization. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes
Optimization Charles J. Geyer School of Statistics University of Minnesota Stat 8054 Lecture Notes 1 One-Dimensional Optimization Look at a graph. Grid search. 2 One-Dimensional Zero Finding Zero finding
More information2.098/6.255/ Optimization Methods Practice True/False Questions
2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationOverview of course. Introduction to Optimization, DIKU Monday 12 November David Pisinger
Introduction to Optimization, DIKU 007-08 Monday November David Pisinger Lecture What is OR, linear models, standard form, slack form, simplex repetition, graphical interpretation, extreme points, basic
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationColumn Generation for Extended Formulations
1 / 28 Column Generation for Extended Formulations Ruslan Sadykov 1 François Vanderbeck 2,1 1 INRIA Bordeaux Sud-Ouest, France 2 University Bordeaux I, France ISMP 2012 Berlin, August 23 2 / 28 Contents
More informationComputations with Disjunctive Cuts for Two-Stage Stochastic Mixed 0-1 Integer Programs
Computations with Disjunctive Cuts for Two-Stage Stochastic Mixed 0-1 Integer Programs Lewis Ntaimo and Matthew W. Tanner Department of Industrial and Systems Engineering, Texas A&M University, 3131 TAMU,
More informationCS 6820 Fall 2014 Lectures, October 3-20, 2014
Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given
More informationInterior Point Methods for Convex Quadratic and Convex Nonlinear Programming
School of Mathematics T H E U N I V E R S I T Y O H F E D I N B U R G Interior Point Methods for Convex Quadratic and Convex Nonlinear Programming Jacek Gondzio Email: J.Gondzio@ed.ac.uk URL: http://www.maths.ed.ac.uk/~gondzio
More informationInteger Programming Chapter 15
Integer Programming Chapter 15 University of Chicago Booth School of Business Kipp Martin November 9, 2016 1 / 101 Outline Key Concepts Problem Formulation Quality Solver Options Epsilon Optimality Preprocessing
More informationDecomposition Algorithms with Parametric Gomory Cuts for Two-Stage Stochastic Integer Programs
Decomposition Algorithms with Parametric Gomory Cuts for Two-Stage Stochastic Integer Programs Dinakar Gade, Simge Küçükyavuz, Suvrajeet Sen Integrated Systems Engineering 210 Baker Systems, 1971 Neil
More informationApplications. Stephen J. Stoyan, Maged M. Dessouky*, and Xiaoqing Wang
Introduction to Large-Scale Linear Programming and Applications Stephen J. Stoyan, Maged M. Dessouky*, and Xiaoqing Wang Daniel J. Epstein Department of Industrial and Systems Engineering, University of
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationMultidisciplinary System Design Optimization (MSDO)
Multidisciplinary System Design Optimization (MSDO) Numerical Optimization II Lecture 8 Karen Willcox 1 Massachusetts Institute of Technology - Prof. de Weck and Prof. Willcox Today s Topics Sequential
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationMINLP: Theory, Algorithms, Applications: Lecture 3, Basics of Algorothms
MINLP: Theory, Algorithms, Applications: Lecture 3, Basics of Algorothms Jeff Linderoth Industrial and Systems Engineering University of Wisconsin-Madison Jonas Schweiger Friedrich-Alexander-Universität
More informationNetwork Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini
In the name of God Network Flows 6. Lagrangian Relaxation 6.3 Lagrangian Relaxation and Integer Programming Fall 2010 Instructor: Dr. Masoud Yaghini Integer Programming Outline Branch-and-Bound Technique
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationBenders Decomposition Methods for Structured Optimization, including Stochastic Optimization
Benders Decomposition Methods for Structured Optimization, including Stochastic Optimization Robert M. Freund May 2, 2001 Block Ladder Structure Basic Model minimize x;y c T x + f T y s:t: Ax = b Bx +
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationDisjunctive Decomposition for Two-Stage Stochastic Mixed-Binary Programs with Random Recourse
Disjunctive Decomposition for Two-Stage Stochastic Mixed-Binary Programs with Random Recourse Lewis Ntaimo Department of Industrial and Systems Engineering, Texas A&M University, 3131 TAMU, College Station,
More informationDecomposition Algorithms for Two-Stage Chance-Constrained Programs
Mathematical Programming manuscript No. (will be inserted by the editor) Decomposition Algorithms for Two-Stage Chance-Constrained Programs Xiao Liu Simge Küçükyavuz Luedtke James Received: date / Accepted:
More informationScenario Grouping and Decomposition Algorithms for Chance-constrained Programs
Scenario Grouping and Decomposition Algorithms for Chance-constrained Programs Siqian Shen Dept. of Industrial and Operations Engineering University of Michigan Joint work with Yan Deng (UMich, Google)
More informationInterior Point Methods for Mathematical Programming
Interior Point Methods for Mathematical Programming Clóvis C. Gonzaga Federal University of Santa Catarina, Florianópolis, Brazil EURO - 2013 Roma Our heroes Cauchy Newton Lagrange Early results Unconstrained
More informationFenchel Decomposition for Stochastic Mixed-Integer Programming
Fenchel Decomposition for Stochastic Mixed-Integer Programming Lewis Ntaimo Department of Industrial and Systems Engineering, Texas A&M University, 3131 TAMU, College Station, TX 77843, USA, ntaimo@tamu.edu
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationStochastic Programming: From statistical data to optimal decisions
Stochastic Programming: From statistical data to optimal decisions W. Römisch Humboldt-University Berlin Department of Mathematics (K. Emich, H. Heitsch, A. Möller) Page 1 of 24 6th International Conference
More informationInterior-Point versus Simplex methods for Integer Programming Branch-and-Bound
Interior-Point versus Simplex methods for Integer Programming Branch-and-Bound Samir Elhedhli elhedhli@uwaterloo.ca Department of Management Sciences, University of Waterloo, Canada Page of 4 McMaster
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More informationAcceleration and Stabilization Techniques for Column Generation
Acceleration and Stabilization Techniques for Column Generation Zhouchun Huang Qipeng Phil Zheng Department of Industrial Engineering & Management Systems University of Central Florida Sep 26, 2014 Outline
More informationPart 4: Active-set methods for linearly constrained optimization. Nick Gould (RAL)
Part 4: Active-set methods for linearly constrained optimization Nick Gould RAL fx subject to Ax b Part C course on continuoue optimization LINEARLY CONSTRAINED MINIMIZATION fx subject to Ax { } b where
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationDecomposition with Branch-and-Cut Approaches for Two Stage Stochastic Mixed-Integer Programming
Decomposition with Branch-and-Cut Approaches for Two Stage Stochastic Mixed-Integer Programming by Suvrajeet Sen SIE Department, University of Arizona, Tucson, AZ 85721 and Hanif D. Sherali ISE Department,
More informationScenario decomposition of risk-averse two stage stochastic programming problems
R u t c o r Research R e p o r t Scenario decomposition of risk-averse two stage stochastic programming problems Ricardo A Collado a Dávid Papp b Andrzej Ruszczyński c RRR 2-2012, January 2012 RUTCOR Rutgers
More informationOptimisation and Operations Research
Optimisation and Operations Research Lecture 22: Linear Programming Revisited Matthew Roughan http://www.maths.adelaide.edu.au/matthew.roughan/ Lecture_notes/OORII/ School
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationMotivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory
Instructor: Shengyu Zhang 1 LP Motivating examples Introduction to algorithms Simplex algorithm On a particular example General algorithm Duality An application to game theory 2 Example 1: profit maximization
More informationGeneration and Representation of Piecewise Polyhedral Value Functions
Generation and Representation of Piecewise Polyhedral Value Functions Ted Ralphs 1 Joint work with Menal Güzelsoy 2 and Anahita Hassanzadeh 1 1 COR@L Lab, Department of Industrial and Systems Engineering,
More informationNumerical Methods for Inverse Kinematics
Numerical Methods for Inverse Kinematics Niels Joubert, UC Berkeley, CS184 2008-11-25 Inverse Kinematics is used to pose models by specifying endpoints of segments rather than individual joint angles.
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationA. Shapiro Introduction
ESAIM: PROCEEDINGS, December 2003, Vol. 13, 65 73 J.P. Penot, Editor DOI: 10.1051/proc:2003003 MONTE CARLO SAMPLING APPROACH TO STOCHASTIC PROGRAMMING A. Shapiro 1 Abstract. Various stochastic programming
More informationStochastic Integer Programming An Algorithmic Perspective
Stochastic Integer Programming An Algorithmic Perspective sahmed@isye.gatech.edu www.isye.gatech.edu/~sahmed School of Industrial & Systems Engineering 2 Outline Two-stage SIP Formulation Challenges Simple
More informationPedro Munari - COA 2017, February 10th, University of Edinburgh, Scotland, UK 2
Pedro Munari [munari@dep.ufscar.br] - COA 2017, February 10th, University of Edinburgh, Scotland, UK 2 Outline Vehicle routing problem; How interior point methods can help; Interior point branch-price-and-cut:
More informationExtended Formulations, Lagrangian Relaxation, & Column Generation: tackling large scale applications
Extended Formulations, Lagrangian Relaxation, & Column Generation: tackling large scale applications François Vanderbeck University of Bordeaux INRIA Bordeaux-Sud-Ouest part : Defining Extended Formulations
More informationSolving Bilevel Mixed Integer Program by Reformulations and Decomposition
Solving Bilevel Mixed Integer Program by Reformulations and Decomposition June, 2014 Abstract In this paper, we study bilevel mixed integer programming (MIP) problem and present a novel computing scheme
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More information10 Numerical methods for constrained problems
10 Numerical methods for constrained problems min s.t. f(x) h(x) = 0 (l), g(x) 0 (m), x X The algorithms can be roughly divided the following way: ˆ primal methods: find descent direction keeping inside
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationKaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth
More informationComputational Optimization. Augmented Lagrangian NW 17.3
Computational Optimization Augmented Lagrangian NW 17.3 Upcoming Schedule No class April 18 Friday, April 25, in class presentations. Projects due unless you present April 25 (free extension until Monday
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationIE 495 Final Exam. Due Date: May 1, 2003
IE 495 Final Exam Due Date: May 1, 2003 Do the following problems. You must work alone on this exam. The only reference materials you are allowed to use on the exam are the textbook by Birge and Louveaux,
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationThe Multi-Path Utility Maximization Problem
The Multi-Path Utility Maximization Problem Xiaojun Lin and Ness B. Shroff School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47906 {linx,shroff}@ecn.purdue.edu Abstract
More informationLINEAR AND NONLINEAR PROGRAMMING
LINEAR AND NONLINEAR PROGRAMMING Stephen G. Nash and Ariela Sofer George Mason University The McGraw-Hill Companies, Inc. New York St. Louis San Francisco Auckland Bogota Caracas Lisbon London Madrid Mexico
More information