Computational Stochastic Programming

Size: px

Start display at page:

Download "Computational Stochastic Programming"

Edmund Todd
6 years ago
Views:

1 Computational Stochastic Programming Jeff Linderoth Dept. of ISyE Dept. of CS Univ. of Wisconsin-Madison SPXIII Bergamo, Italy July 7, 2013 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 89

2 Mission Impossible! It is impossible to discuss all of computational SP in 90 minutes. I will focus on a few basic topics, and (try to) provide references for a few others. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 2 / 89

3 Bias Bias The bias of an estimator is the difference between this estimator s expected value E[^θ] and the true value of the parameter θ. An estimator is unbiased if E[^θ θ] = 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 3 / 89

4 Bias Bias The bias of an estimator is the difference between this estimator s expected value E[^θ] and the true value of the parameter θ. An estimator is unbiased if E[^θ θ] = 0 I Am Biased! But this lecture is also extremely biased, in the English sense of the word. It will cover best things that I know most about There is lots of great work in computational SP that I (unfortunately) won t mention. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 3 / 89

5 What I WILL cover Stochastic LP w/recourse (Primarily 2-Stage) Decomposition. Benders Decomposition Lagrangian Relaxation Dual Decomposition Stochastic approximation Modern/Bundle-type methods. Trust region methods Regularized Decomposition The level method Multistage Extensions Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 4 / 89

6 What I WILL cover Stochastic LP w/recourse (Primarily 2-Stage) Decomposition. Benders Decomposition Lagrangian Relaxation Dual Decomposition Stochastic approximation Modern/Bundle-type methods. Trust region methods Regularized Decomposition The level method Multistage Extensions Software Tools SMPS format Some available software tools for modeling and solving Role of parallel computing Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 4 / 89

7 Other Things Covered Sampling Exterior Sampling Methods Sample Average Approximation Interior Sampling Methods Stochastic Quasi-Gradient Stochastic Decomposition Mirror-Prox Methods Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 5 / 89

8 Other Things Covered Sampling Exterior Sampling Methods Sample Average Approximation Interior Sampling Methods Stochastic Quasi-Gradient Stochastic Decomposition Mirror-Prox Methods Stochastic Integer Programming Integer L-Shaped Dual Decomposition Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 5 / 89

9 Stochastic Programming A Stochastic Program def min f(x) = E ω [F(x, ω)] x X Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 6 / 89

10 Stochastic Programming A Stochastic Program def min f(x) = E ω [F(x, ω)] x X 2 Stage Stochastic LP w/recourse The Recourse Problem F(x, ω) def = c T x + Q(x, ω) c T x: Pay me now Q(x, ω): Pay me later Q(x, ω) def = min q(ω) T y W(ω)y = h(ω) T(ω)x y 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 6 / 89

11 Stochastic Programming A Stochastic Program def min f(x) = E ω [F(x, ω)] x X 2 Stage Stochastic LP w/recourse The Recourse Problem F(x, ω) def = c T x + Q(x, ω) c T x: Pay me now Q(x, ω): Pay me later Q(x, ω) def = min q(ω) T y W(ω)y = h(ω) T(ω)x y 0 E[F(x, ω)] = c T x + E[Q(x, ω)] def = c T x + φ(x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 6 / 89

12 Decomposition Algorithms Two Ways of Thinking Algorithms are equivalent regardless of how you think about them. But thinking in different ways gives different insights Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 7 / 89

13 Decomposition Algorithms Two Ways of Thinking Algorithms are equivalent regardless of how you think about them. But thinking in different ways gives different insights Complementary Viewpoints 1 As a large-scale problem for which you will apply decomposition techniques 2 As a oracle convex optimization problem Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 7 / 89

14 Decomposition: A Popular Method M E T H O D Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 8 / 89

15 Extensive Form Large Scale = Extensive Form This is sometimes called the deterministic equivalent, but I prefer the term extensive form Assume Ω = {ω 1, ω 2,... ω S } R r, P(ω = ω s ) = p s, s = 1, 2,..., S T s def = T(ω s ), h s def = h(ω s ), q s def = q(ω s ), W S = W(ω s ) Then can the write extensive form as just a large LP: Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 9 / 89

16 Extensive Form Large Scale = Extensive Form This is sometimes called the deterministic equivalent, but I prefer the term extensive form Assume Ω = {ω 1, ω 2,... ω S } R r, P(ω = ω s ) = p s, s = 1, 2,..., S T s def = T(ω s ), h s def = h(ω s ), q s def = q(ω s ), W S = W(ω s ) Then can the write extensive form as just a large LP: c T x + p 1 q T 1 y 1 + p 2 q T 2 y p s q T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 9 / 89

17 Extensive Form Small SP s are Easy! Cplex/Extensive Form 50 Time L shaped number of scenarios Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 10 / 89

18 Extensive Form Small SP s are Easy! Cplex/Extensive Form 50 Time L shaped number of scenarios In my experience, using barrier/interior point method is faster than simplex/pivoting-based methods for solving extensive form LPs. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 10 / 89

19 Extensive Form The Upshot If it is too large to solve directly, then we must exploit the structure. If I fix the first stage variables x, then the problem decomposes by scenario Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 11 / 89

20 Extensive Form The Upshot If it is too large to solve directly, then we must exploit the structure. If I fix the first stage variables x, then the problem decomposes by scenario c T x + p 1 q T 1 y 1 + p 2 q T 2 y p s q T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 11 / 89

21 Extensive Form The Upshot If it is too large to solve directly, then we must exploit the structure. If I fix the first stage variables x, then the problem decomposes by scenario c T x + p 1 q T 1 y 1 + p 2 q T 2 y p s q T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Key Idea Benders Decomposition: Characterize the solution of a scenario linear program as a function of first stage solution x Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 11 / 89

22 Extensive Form Benders of Extensive Form z LP (x) = S s=1 zs LP (x), where z s LP (x) = inf{qt s y W s y = h s T s x, y 0} The dual of the LP defining z s LP (x) is sup{(h s T s x) T π W T s π q s }. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 12 / 89

23 Extensive Form Benders of Extensive Form z LP (x) = S s=1 zs LP (x), where z s LP (x) = inf{qt s y W s y = h s T s x, y 0} The dual of the LP defining z s LP (x) is sup{(h s T s x) T π W T s π q s }. Set of dual feasible solutions for scenario s: Π s = {π R m W T s π q s } Vertices of Π s : Extreme rays of Π s : V(Π s ) = {v 1s, v 2s,..., v Vs,s} R(Π s ) = {r 1s, r 2s,..., r Rs,s} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 12 / 89

24 Extensive Form (In Case You Forgot...) Minkowski s Theorem There are two ways to describe polyhedra. As an intersection of halfspaces (by its facets), or by appropriate combinations of its extreme points and extreme rays Minkowski-Weyl Theorem Let P = {x R n Ax b} have extreme points V(P) = {v 1, v 2,... v V } and extreme rays R(P) = {r 1, r 2,... r R }, then P = { x R n x = V j=1 V λ j v j + j=1 R µ j r j j=1 } λ j = 1, λ j 0 j = 1,... V, µ j 0 j = 1,... R Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 13 / 89

25 Extensive Form A Little LP Theory We assume that z s LP (x) is bounded below. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 14 / 89

26 Extensive Form A Little LP Theory We assume that z s LP (x) is bounded below. The LP is feasible (z s LP (x) < + ) if there is not a direction of unboundedness in the dual LP. r R(Π s ) (Ws T r 0) such that (h s T s x) T r > 0 So x is a feasible solution (z s LP (x) < + ) if (h s T s x) T r 0 r = 1,..., R s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 14 / 89

27 Extensive Form A Little LP Theory We assume that z s LP (x) is bounded below. The LP is feasible (z s LP (x) < + ) if there is not a direction of unboundedness in the dual LP. r R(Π s ) (Ws T r 0) such that (h s T s x) T r > 0 So x is a feasible solution (z s LP (x) < + ) if (h s T s x) T r 0 r = 1,..., R s If there is an optimal solution to an LP, then there is an optimal solution that occurs at an extreme point. By strong duality, the optimal solution to primal and dual LPs will have same objective value, so z s LP (x) = max {(h s T s x) T v js j=1,2,...v S r T ks (h s T s x) 0 k = 1, 2,... R s } Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 14 / 89

28 Extensive Form Developing Benders Decomposition Scenario s second stage feasibility set: C s def = {x y 0 with W s y = h s T s x} = {x h s T s x pos(w s )} First stage feasibility set X def = {x R n + Ax = b} Second stage feasibility set: C def = S s=1 C s (2SP) def min f(x) = c T x + x X C S p s Q(x, ω s ) s=1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 15 / 89

29 Extensive Form Developing Benders Decomposition Scenario s second stage feasibility set: C s def = {x y 0 with W s y = h s T s x} = {x h s T s x pos(w s )} First stage feasibility set X def = {x R n + Ax = b} Second stage feasibility set: C def = S s=1 C s (2SP) def min f(x) = c T x + x X C S p s Q(x, ω s ) s=1 x C s (h s T s x) T r js 0 j = 1,... R s θ s Q(x, ω s ) θ s (h s T s x) T v js j = 1,... V s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 15 / 89

30 Extensive Form Benders-LShaped Use these results Introduce auxiliary variables θ s to represent the value of Q(x, ω s ) N.B. I am changing notation just a little bit Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 16 / 89

31 Extensive Form Benders-LShaped Use these results Introduce auxiliary variables θ s to represent the value of Q(x, ω s ) N.B. I am changing notation just a little bit Unaggregated: Full Multicut min c T x + s S p s θ s r T T s x r T h s s S, r R(Π s ) θ s + v T T s x v T h s s S, v V(Π s ) Ax = b x 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 16 / 89

32 Extensive Form LShaped Method. Aggregate inequalities and remove variables θ s for each scenario. Instead introduce variable: Θ s S p sθ s p s (v T s h s v T s T s x) (choosing any v s V(Π s ). Fully-Aggregated: LShaped min c T x + Θ r T T s x r T h s s S, r R(Π s ) Θ + p s v T T s x p s v T h s v V(Π s ) s S s S Ax = b x 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 17 / 89

33 Extensive Form LShaped Method. Aggregate inequalities and remove variables θ s for each scenario. Instead introduce variable: Θ s S p sθ s p s (v T s h s v T s T s x) (choosing any v s V(Π s ). Fully-Aggregated: LShaped min c T x + Θ r T T s x r T h s s S, r R(Π s ) Θ + p s v T T s x p s v T h s v V(Π s ) s S s S Ax = b x 0 N.B. Different aggregations are possible Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 17 / 89

34 Extensive Form A Whole Spectrum Complete Aggregation (Traditional LShaped): One variable for φ(x) Complete Multicut: S variables for φ(x) We can do anything in between... Partition the scenarios into C clusters S 1, S 2,... S C. φ [Sk ](x) = s S k p s Q(x, ω s ) Θ [Sk ] s S k p s Q(x, ω s ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 18 / 89

35 Extensive Form A Whole Spectrum Complete Aggregation (Traditional LShaped): One variable for φ(x) Complete Multicut: S variables for φ(x) We can do anything in between... Partition the scenarios into C clusters S 1, S 2,... S C. φ [Sk ](x) = s S k p s Q(x, ω s ) Θ [Sk ] s S k p s Q(x, ω s ) References Original multicut paper: [Birge and Louveaux, 1988] You need not stay with one fixed aggregation: Recent paper by Trukhanov et al. [2010] Ph.D. thesis by Janjarassuk [2009]. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 18 / 89

36 Extensive Form Benders Decomposition Regardless of aggregation, linear program is likely to have exponentially many constraints. Benders Decomosition is a cutting plane method for solving one of the linear programs. I will describe for (full) multicut, but other algorithms are really just aggregated version of this one Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 19 / 89

37 Extensive Form Benders Decomposition Regardless of aggregation, linear program is likely to have exponentially many constraints. Benders Decomosition is a cutting plane method for solving one of the linear programs. I will describe for (full) multicut, but other algorithms are really just aggregated version of this one Basic Questions For a given x k, θ k 1,..., θk s, we must check for each scenario s S 1 If there r R(Π s ) such that rt s x < T h s 2 If there v V(Π s ) such that θ k s + v T T s x k < v T h s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 19 / 89

38 Extensive Form Find a Ray? Phase 1 LP: Its Dual U s (x k ) = min 1 T u + 1 T v W s y + u v = h s T s x k y, u, v 0 max π T (h s T s x) W T s π 0 1 π 1 If U s (x k ) > 0, then by strong duality it has an optimal dual solution π k such that [π k ] T (h s T s x k ) > 0, so if we add the inequality this will exclude the solution x k (h s T s x) T π k 0 π k from Phase-1 LP is the dual extreme ray! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 20 / 89

39 Extensive Form Find a Vertex Question: Does v V(Π s ) such that θ k s + vt s x k < v T h s? So we should Note this is the same as solving max {(h s T s x k )v} v V(Π s) sup{(h s T s x) T π W T s π q s }. And by duality, this is also the same as solving z s LP (x) = inf{qt s y W s y = h s T s x, y 0}, and looking at the (optimal) dual variables for the constraints W s y = h s T s x. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 21 / 89

40 Extensive Form Master Problem Cutting Plane Algorithm Will Identify R s R(Π s ) subset of extreme rays of dual feasible set Π s V s V(Π s ) subset of extreme points of dual feasible set Π s Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 22 / 89

41 Extensive Form Master Problem Cutting Plane Algorithm Will Identify R s R(Π s ) subset of extreme rays of dual feasible set Π s V s V(Π s ) subset of extreme points of dual feasible set Π s Full LP Master Problem min c T x + s S p s θ s min c T x + s S p s θ s r T T s x r T h s s S, r R(Π s ) θ s + v T T s x v T h s s S, v V(Π s ) Ax = b x 0 r T T s x r T h s s S, r R s θ s + v T T s x v T h s s S, v V s Ax = b x 0 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 22 / 89

42 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89

43 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89

44 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89

45 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89

46 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. 3 ub = c T x k + s S p sq(x k, ω s ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89

47 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. 3 ub = c T x k + s S p sq(x k, ω s ) 4 If ub lb ɛ or Done = true then Stop. x k is an optimal solution. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89

48 Extensive Form Benders/Cutting Plane Method 1 k = 1, R s = V s = s S, lb =, ub =, x 1 X 2 Done = true. For each s S Solve Phase 1 LP to get U s (x k ). If U s (x k ) > 0 Q(x k, ω s ) =. Let π k s be optimal dual solution to Phase 1 LP. R s R s {π k s }, Done = false. Go to 5. Solve Phase-2 LP for Q(x k, ω s ), let π k s be its optimal dual multiplier. If Q(x k, ω s ) > θ k s, then V s V s {π k s }, Done = false. 3 ub = c T x k + s S p sq(x k, ω s ) 4 If ub lb ɛ or Done = true then Stop. x k is an optimal solution. 5 Solve Master problem. Let lb be its optimal solution value, and let k k + 1. Let x k be the optimal solution to the master problem. Go to 2. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 23 / 89

49 Extensive Form A First Example subject to min x 1 + x 2 ω 1 x 1 + x 2 7 ω 2 x 1 + x 2 4 x 1 0 x 2 0 ω = (ω 1, ω 2 ) Ω = {(1, 1/3), (5/2, 2/3), (4, 1)} Each outcome has p s = 1 3 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 24 / 89

50 Extensive Form A First Example subject to min x 1 + x 2 ω 1 x 1 + x 2 7 ω 2 x 1 + x 2 4 x 1 0 x 2 0 ω = (ω 1, ω 2 ) Ω = {(1, 1/3), (5/2, 2/3), (4, 1)} Each outcome has p s = 1 3 Huh? This problem doesn t make sense! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 24 / 89

51 Extensive Form Recourse Formulation min x 1 + x s + λ p s (y 1s + y 2s ) s S ω 1s x 1 + x 2 + y 1s 7 s = 1, 2, 3 ω 2s x 1 + x 2 + y 2s 4 s = 1, 2, 3 x 1, x 2, y 1s, y 2s 0 s = 1, 2, 3 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 25 / 89

52 Extensive Form Recourse Formulation min x 1 + x s + λ p s (y 1s + y 2s ) s S ω 1s x 1 + x 2 + y 1s 7 s = 1, 2, 3 ω 2s x 1 + x 2 + y 2s 4 s = 1, 2, 3 x 1, x 2, y 1s, y 2s 0 s = 1, 2, 3 Stop! Gammer Time? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 25 / 89

53 Oracle-Based Oracle-Based Methods Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 26 / 89

54 Oracle-Based Two-Stage Stochastic LP with Recourse Our problem is where def min f(x) = c T x + E[Q(x, ω)] x X X = {x R n + Ax = b} Q(x, ω) = min y 0 {q(ω)t y T(ω)x + W(ω)y = h(ω)} In Q(x, ω), as x changes, the right hand side of the linear program changes. So, we should care very much about the value function of a linear program with respect to changes in its right-hand-side: v : R m R v(z) = min {q T y Wy = z} y R p + Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 27 / 89

55 Oracle-Based Nice Theorems Nice Theorem 1 Assume that Π = {π R m W T π q} } z 0 R m such that y 0 0 with Wy 0 = z 0 then v(z) is a proper, convex, polyhedral function v(z 0 ) = arg max{π T z 0 π Π} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 28 / 89

56 Oracle-Based Nice Theorems Nice Theorem 1 Assume that Π = {π R m W T π q} } z 0 R m such that y 0 0 with Wy 0 = z 0 then v(z) is a proper, convex, polyhedral function v(z 0 ) = arg max{π T z 0 π Π} Nice Theorem 2 Under similar conditions (on each scenario W s, q s ) f(x) = c T x + E[Q(x, ω)] = c T x + φ(x) is proper, convex, and polyhedral subgradients of f come from (transformed and aggregated) optimal dual solutions of the second stage subproblems: S ( ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 28 / 89

57 Oracle-Based Easy Peasy? (2SP) min f(x) x X C We know that f(x) is a nice 1 function It is also true that X C is a nice polyhedral set, so it should be easy to solve (2SP) What s the Problem?! f(x) is given implicitly: To evaluate f(x), we must solve S linear programs. 1 proper, convex, polyhedral Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 29 / 89

58 Oracle-Based Overarching Theme We will approximate 2 f by ever-improving functions of the form f(x) c T x + m k (x) Where m k (x) is a model of our expected recourse function: m k (x) S s=1 p s Q(x, ω s ) def = φ(x). We will also build ever-improving outerapproximations of C: (C k C). 2 often underapproximate Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 30 / 89

59 Oracle-Based Overarching Theme We will approximate 2 f by ever-improving functions of the form f(x) c T x + m k (x) Where m k (x) is a model of our expected recourse function: m k (x) S s=1 p s Q(x, ω s ) def = φ(x). We will also build ever-improving outerapproximations of C: (C k C). Since we know that Q(x k, ω s ) is convex, and Q(x k, ω s ) = T T s arg max π Π s {π T (h s T s x k )} we can underapproximate Q(x k, ω s ) using a (sub)-gradient inequality 2 often underapproximate Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 30 / 89

60 Oracle-Based Building a model By definition of convexity, we get Q(x, ω s ) Q(x k, ω s ) + y T (x x k ) y Q(x k, ω s ) Q(x k, ω s ) + [ T T s π k s] T (x x k ) for some π k s arg max π Π s {π T (h s T s x k )} = Q(x k, ω s ) + [π k s] T T s x k π k st s x = β k s + (α k s) T x We 3 aggregate these together to build a model of φ(x) S φ(x) = p s Q(X, ω s ) s=1 = β k + [ᾱ k ] T x S S p s β k s + p s [α k s] T x s=1 s=1 3 sometimes Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 31 / 89

61 Oracle-Based Our Model Choose some different x j X, π j s arg max π Πs {π T (h s T s x k ), j = 1,... k 1 Our Model (to minimize) β j s =Q(x j, ω s ) + [π j s] T T s x j βj = α j s = π k st s ᾱ j = m k (x) = max { β j + [ᾱ j ] T x} j=1,...,k 1 S p s β j s s=1 S p s α j s We model the process of minimizing the maximum using an auxiliary variable θ: θ β j + [ᾱ j ] T x j = 1,... k 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 32 / 89 s=1

62 Oracle-Based An Oracle-Based Method for Convex Minimization We assume f : X R n R is a convex function given by an oracle : We can get values f(x k ) and subgradients s k f(x k ) for x k X 1 Find x 1 X, k 1, θ 1, ub, I = 2 Subproblem: Compute f(x k ), s k f(x k ).ub min{f(x k ), ub} 3 If θ k = f(x k ). STOP, x k is optimal. 4 Else: I = I {k}. Solve Master: min {θ θ θ,x X f(xi ) + s T i (x xi ) i I}. Let solution be x k+1, θ k. Go to 2. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 33 / 89

63 Oracle-Based Worth 1000 Words φ(x) x Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 34 / 89

64 Oracle-Based Worth 1000 Words φ(x) x x k Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 35 / 89

65 Oracle-Based Worth 1000 Words φ(x) x x 2 x 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 36 / 89

66 Oracle-Based Regularizing A Dumb Algorithm? x k+1 arg min {c T x + m k (x) Ax = b} x R n + What happens if you start the algorithm with an initial iterate that is the optimal solution x? Are you done? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 37 / 89

67 Oracle-Based Regularizing A Dumb Algorithm? x k+1 arg min {c T x + m k (x) Ax = b} x R n + What happens if you start the algorithm with an initial iterate that is the optimal solution x? Are you done? Unfortunately, no. At the first iterations, we have a very poor model m k ( ), so when we minimize this model, we may move very far away from x A variety of methods in stochastic programming use well-known methods from nonlinear programming/convex optimization to ensure that iterations are well-behaved. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 37 / 89

68 Oracle-Based Regularizing Regularizing Borrow the trust region concept from NLP Wright [2003]) At iteration k Have an incumbent solution x k Impose constraints x x k k k large like LShaped k small stay very close. This is often called Regularizing the method. (Linderoth and Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 38 / 89

69 Oracle-Based Regularizing Regularizing Borrow the trust region concept from NLP Wright [2003]) At iteration k Have an incumbent solution x k Impose constraints x x k k k large like LShaped k small stay very close. This is often called Regularizing the method. (Linderoth and Another (Good) Idea Penalize the length of the step you will take. min c T x + j C θ j + 1/(2ρ) x x k 2 ρ large like LShaped ρ small stay very close. This is known as the regularized decomposition method. Pioneered in stochastic programming by Ruszczyński [1986]. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 38 / 89

70 Oracle-Based Regularizing Trust Region Effect: Step Length Step Size Iteration Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 39 / 89

71 Oracle-Based Regularizing Trust Region Effect: Function Values 2e e e+07 LShaped function value LShaped master value Trust region function value Trust region master value 1.7e+07 Value 1.6e e e e e Iteration Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 40 / 89

72 Oracle-Based Regularizing Bundle-Trust These ideas are known in the nondifferentiable optimization community as Bundle-Trust-Region methods. Bundle Build up a bundle of subgradients to better approximate your function. (Get a better model m( )) Trust region Stay close (in a region you trust), until you build up a good enough bundle to model your function accurately Accept new iterate if it improves the objective by a sufficient amount. Potentially increase k or ρ. (Serious Step) Otherwise, improve the estimation of φ(x k ), resolve master problem, and potentially reduce k of ρ (Null Step) These methods can be shown to converge, even if cuts are deleted from the master problem. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 41 / 89

73 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89

74 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 2 Compute f(y 1 ) and subgradient model update information: ( β, ᾱ j ) if LShaped. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89

75 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 2 Compute f(y 1 ) and subgradient model update information: ( β, ᾱ j ) if LShaped. 3 Master: Let y k+1 arg min{c T x + m k (x) x (B(x k, ) X}) 4 Compute predicted decrease: 5 If δ k ɛ Stop, y k+1 is optimal. δ k = f(x k ) ^f k (y k+1 ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89

76 Oracle-Based Regularizing Vanilla Trust Region f(x) = c T x + φ(x) ^f k (x) = c T x + m k (x) 1 Let x 1 X, > 0, µ (0, 1)K = k = 1, y 1 = x 1 2 Compute f(y 1 ) and subgradient model update information: ( β, ᾱ j ) if LShaped. 3 Master: Let y k+1 arg min{c T x + m k (x) x (B(x k, ) X}) 4 Compute predicted decrease: δ k = f(x k ) ^f k (y k+1 ) 5 If δ k ɛ Stop, y k+1 is optimal. 6 Subproblems: Compute f(y k+1 ) and subgradient information. Update m k (x) with subgradient information from y k+1. If f(x k ) f(y k+1 ) µδ k, then Serious Step: x k+1 y k+1 Else: Null Step: 7 x k+1 x k Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 42 / 89

77 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89

78 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89

79 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} 1 Choose λ (0, 1), x 1 X, k = 1 2 Compute f(x k ), s k f(x k ), update m k (x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89

80 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} 1 Choose λ (0, 1), x 1 X, k = 1 2 Compute f(x k ), s k f(x k ), update m k (x) 3 Minimize Model: z k = min x X m k (x) Let z k = min i=1,...,k {f(x i )} be the best objective value you ve seen so far Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89

81 Oracle-Based Level Method Level Method Basic Idea Instead of restricting search to points in the neighborhood of the current iterate, you restrict the research to points whose objective lies in the neighborhood of the current iterate. Idea is from Lemaréchal et al. [1995] m k (x) = max i=1,...,k {f(x i ) + s T i (x xi )} 1 Choose λ (0, 1), x 1 X, k = 1 2 Compute f(x k ), s k f(x k ), update m k (x) 3 Minimize Model: z k = min x X m k (x) Let z k = min i=1,...,k {f(x i )} be the best objective value you ve seen so far 4 Project: Let l k = z k + λ(z k z k ). x k+1 arg min x X { x x k 2 m k (x) l k }.k k + 1. Go to 2. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 43 / 89

82 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89

83 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. The diameter of a compact set X is diam(x) def = max x y x,y X Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89

84 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. The diameter of a compact set X is Smart Guy Theorem diam(x) def = max x y x,y X z k z k ɛ k C(λ) ( ) LD 2, ɛ Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89

85 Oracle-Based Level Method Convergence Rate A function f : X R is Lipschitz continuous over its domain X if L R such that f(y) f(x) L y x x, y X. The diameter of a compact set X is Smart Guy Theorem diam(x) def = max x y x,y X z k z k ɛ k C(λ) ( ) LD 2, ɛ C(λ) = 1 λ(1 λ) 2 (2 λ) This rate is independent of the number of variables of the problem The minimimum C(λ ) = 4 when λ = Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 44 / 89

Oracle-Based Level Method Papers with Computational Experience Some computational experience in Zverovich s Ph.D. thesis: [Zverovich, 2011] Zverovich et al.

86 Oracle-Based Level Method Papers with Computational Experience Some computational experience in Zverovich s Ph.D. thesis: [Zverovich, 2011] Zverovich et al. [2012] have a nice, comprehensive comparison between Solving extensive form using simplex method and barrier method LShaped-method (aggregated forms) Regularized Decomposition Level method Trust region method Who s the winner? Hard to pick. But I think level method wins, simplex on extensive form is slowest Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 45 / 89

87 Oracle-Based Level Method Performance Profile [Zverovich et al., 2012] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 46 / 89

88 Other Ideas for 2SP Dual Decomposition Methods A Dual Idea Dual Decomposition Create copies of the first-stage decision variables for each scenario Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 47 / 89

89 Other Ideas for 2SP Dual Decomposition Methods A Dual Idea Dual Decomposition Create copies of the first-stage decision variables for each scenario minimize subject to p s c T x s + q T y s s S Ax s = b T s x s + Wy s = h s s S x s 0 s S y s 0 s S x 1 = x 2 =... = x S Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 47 / 89

90 Other Ideas for 2SP Dual Decomposition Methods Relax Nonanticipativity The constraints x 0 = x 1 = x 2 =... = x s are called nonanticipativity constraints. We relax the nonanticipativity constraints, so the problem decomposes by scenario, and then we do Lagrangian Relaxation: max D s (λ s ) λ 1,...λ s s S where D s (λ s ) = min {p s (c T x s + q T y s ) + λ T s (x s x 0 ), } (x s,y s) F s and F s = {(x, y) Ax = b, T s x + W s y = h s, x 0, y 0} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 48 / 89

91 Other Ideas for 2SP Dual Decomposition Methods Relax Nonanticipativity The constraints x 0 = x 1 = x 2 =... = x s are called nonanticipativity constraints. We relax the nonanticipativity constraints, so the problem decomposes by scenario, and then we do Lagrangian Relaxation: max D s (λ s ) λ 1,...λ s s S where D s (λ s ) = min {p s (c T x s + q T y s ) + λ T s (x s x 0 ), } (x s,y s) F s and F s = {(x, y) Ax = b, T s x + W s y = h s, x 0, y 0} Even Fancier You can do Augmented Lagrangian or Progressive Hedging [Rockafellar and Wets, 1991] by adding a quadratic proximal term to the Lagrangian function Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 48 / 89

92 Other Ideas for 2SP Bunching Bunching This idea is found in the works of Wets [1988] and Gassmann [1990] If W s = W, q s = q, s = 1,..., S, then to evaluate φ(x) we must solve S linear programs that differ only in their right hand side. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 49 / 89

93 Other Ideas for 2SP Bunching Bunching This idea is found in the works of Wets [1988] and Gassmann [1990] If W s = W, q s = q, s = 1,..., S, then to evaluate φ(x) we must solve S linear programs that differ only in their right hand side. Therefore, the dual LPs differ only the objective function: π s arg max π {πt (h s T s^x) : π T W q}. Basic Idea π s is feasible for all scenarios, and we have a dual feasible basis matrix W B For a new scenario (h r, T r ), with new objective (h r T r^x), if all reduced costs are negative, then π s is also optimal for scenario r Use dual simplex to solve scenario linear programs evaluating φ(x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 49 / 89

94 Other Ideas for 2SP Interior Point Methods Interior Point methods c T x + p 1 q T 1 y 1 + p 2 q T 2 y p sq T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Since extensive form is highly structured, then matrices of kkt system that must be solved (via Newton-type methods) for interior point methods can also be exploited. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 50 / 89

95 Other Ideas for 2SP Interior Point Methods Interior Point methods c T x + p 1 q T 1 y 1 + p 2 q T 2 y p sq T s y s Ax = b T 1 x + W 1 y 1 = h 1 T 2 x + W 2 y 2 = h T S x + W S y s = h s x X y 1 Y y 2 Y y s Y Since extensive form is highly structured, then matrices of kkt system that must be solved (via Newton-type methods) for interior point methods can also be exploited. He s The Expert! Definitely stick around for Jacek Gondzio s final plenary Recent computational advances in solving very large stochastic programs, 5PM on Friday. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 50 / 89

96 Computing Computing Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 51 / 89

97 Computing SMPS SMPS Format How do we specify a stochastic programming instance to the solver? We could form the extensive form ourselves, but... For really big problems, forming the extensive form is out of the questions. We need to just specify the random parts of the model. We can do this using SMPS format There is some recent research work in developed stochastic programming support in an AML. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 52 / 89

98 Computing SMPS Modeling Language Support AMPL: ( Talk by Gautum Mitra: Formulation and solver support for optimisation under uncertainty, Thursday afternoon, Room 3. SML: Colombo et al. [2009]. Adds keywords extending AMPL that encode problem structure. GAMS: ( Uses GAMS EMP (Extended Math Programming) framework. Manual at LINDO: ( Has support for built-in sampling 4 procedures. MPL: ( Has built-in decomposition solvers. Some introductory slides at 4 We ll talk about sampling shortly Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 53 / 89

99 Computing SMPS SMPS Components Core file Like MPS file for base instance Time file Specifies the time dependence structure Stoch file Specifies the randomness Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 54 / 89

100 Computing SMPS SMPS Components Core file Like MPS file for base instance Time file Specifies the time dependence structure Stoch file Specifies the randomness SMPS References Birge et al. [1987], Gassmann and Schweitzer [2001] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 54 / 89

101 Computing SMPS SMPS Core File Like an MPS file specifying a base scenario Must permute the rows and columns so that the time indexing is sequential. (Columns for stage j listed before columns for stage j + 1). Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 55 / 89

102 Computing SMPS SMPS Core File Like an MPS file specifying a base scenario Must permute the rows and columns so that the time indexing is sequential. (Columns for stage j listed before columns for stage j + 1). min x 1 + x s + λ p s (y 1s + y 2s ) s S ω 1s x 1 + x 2 + y 1s 7 s = 1, 2, 3 ω 2s x 1 + x 2 + y 2s 4 s = 1, 2, 3 x 1, x 2, y 1s, y 2s 0 s = 1, 2, 3 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 55 / 89

103 Computing SMPS little.cor NAME little ROWS G R0001 G R0002 N R0003 COLUMNS C0001 R C0001 R C0001 R C0002 R C0002 R C0002 R C0003 R C0003 R C0004 R C0004 R RHS B R B R ENDATA Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 56 / 89

104 Computing SMPS little.tim Specify which row/column starts each time period. Must be sequential! * TIME little PERIODS IMPLICIT C0001 R0001 T1 C0003 R0001 T2 ENDATA Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 57 / 89

105 Computing SMPS Stoch File BLOCKS Specify a block of parameters that changes together INDEP Specify that all the parameters you are specifying are all independent random variables SCENARIO Specify a base scenario Specify what things change and when... Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 58 / 89

106 Computing SMPS litle.sto * STOCH little * BLOCKS DISCRETE BL BLOCK1 T C0001 R C0001 R BL BLOCK1 T C0001 R C0001 R BL BLOCK1 T C0001 R C0001 R ENDATA * STOCH little * INDEP DISCRETE C0001 R C0001 R C0001 R C0001 R ENDATA Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 59 / 89

107 Computing Utility Libraries Some Utility Libraries If you need to read and write SMPS files and manipulate and query the instance as part of build an algorithm in a programming language, you can try the following libraries Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 60 / 89

108 Computing Utility Libraries Some Utility Libraries If you need to read and write SMPS files and manipulate and query the instance as part of build an algorithm in a programming language, you can try the following libraries PySP: [Watson et al., 2012] Based on Pyomo [Hart et al., 2011] Also allows to build models Some algorithmic support, especially for progressive hedging type algorithms Watson and Woodruff [2011] Coin-SMI: From Coin-OR collection of open source code. SUTIL: A little bit dated, but being refectored now Has implemented methods for sampling from distribution specified in SMPS files Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 60 / 89

109 Computing Parallel Computation Parallelizing In decomposition algorithms, the evaluation of φ(x) solving the different LP s, can be done independently. If you have K computers, send them each one of S /K linear programs, and your evaluation of φ(x) will be completed K times faster. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 61 / 89

110 Computing Parallel Computation Parallelizing In decomposition algorithms, the evaluation of φ(x) solving the different LP s, can be done independently. If you have K computers, send them each one of S /K linear programs, and your evaluation of φ(x) will be completed K times faster. Factors Affecting Efficiency Synchronization: Waiting for all parallel machines to complete Solving the master problem worker machines waiting for master to complete Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 61 / 89

111 Computing Parallel Computation Worker Usage Number of Idle Workers t (sec.) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 62 / 89

112 Computing Parallel Computation Stamp Out Synchronicity! We start a new iteration only after all LPs have been evaluated In cloud/heterogeneous computing environments, different processors act at different speeds, so many may wait idle for the slowpoke Even worse, in many cloud environments, machines can be reclaimed before completing their tasks. Distributed Computing Fact Asynchronous methods are preferred for traditional parallel computing environments. They are nearly required for heterogenous and dynamic environments! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 63 / 89

113 Computing Parallel Computation ATR An Asynchronous Trust Region Method Keep a basket B of trial points for which we are evaluating the objective function Make decision on whether or accept new iterate x k+1 after entire f(x k ) is computed Convergence theory and cut deletion theory is similar to the synchronous algorithm Populate the basket quickly by initially solving the master problem after only α% of the scenario LPs have been solved Greatly reduces the synchronicity requirements Might be doing some unnecessary work the candidiate points might be better if you waited for complete information from the preceeding iterations Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 64 / 89

Computing Parallel Computation The World s Largest LP Storm A cargo flight scheduling problem (Mulvey and Ruszczyński) In 2000, we aimed to solve an instance with 10,000,000 scenarios x R 121, y(ω s

114 Computing Parallel Computation The World s Largest LP Storm A cargo flight scheduling problem (Mulvey and Ruszczyński) In 2000, we aimed to solve an instance with 10,000,000 scenarios x R 121, y(ω s ) R 1259 The deterministic equivalent is of size A R 985,032,889 12,590,000,121 Cuts/iteration 1024, # Chunks 1024, B = 4 Started from an N = solution, 0 = 1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 65 / 89

115 Computing Parallel Computation The Super Storm Computer Number Type Location 184 Intel/Linux Argonne 254 Intel/Linux New Mexico 36 Intel/Linux NCSA 265 Intel/Linux Wisconsin 88 Intel/Solaris Wisconsin 239 Sun/Solaris Wisconsin 124 Intel/Linux Georgia Tech 90 Intel/Solaris Georgia Tech 13 Sun/Solaris Georgia Tech 9 Intel/Linux Columbia U. 10 Sun/Solaris Columbia U. 33 Intel/Linux Italy (INFN) 1345 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 66 / 89

116 Computing Parallel Computation TA-DA!!!!! Wall clock time 31:53:37 CPU time 1.03 Years Avg. # machines 433 Max # machines 556 Parallel Efficiency 67% Master iterations 199 CPU Time solving the master problem 1:54:37 Maximum number of rows in master problem Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 67 / 89

117 Computing Parallel Computation Number of Workers #workers Sec. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 68 / 89

118 Sampling and Monte Carlo Monte Carlo Methods Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 69 / 89

119 Sampling and Monte Carlo The Ugly Truth Imagine the following (real) problem. A Telecom company wants to expand its network in a way in which to meet an unknown (random) demand. There are 86 unknown demands. Each demand is independent and may take on one of five values. S = Ω = Π 86 k=1 (5) = 586 = The number of subatomic particles in the universe. How do we solve a problem that has more variables and more constraints than the number of subatomic particles in the universe? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 70 / 89

120 Sampling and Monte Carlo The Ugly Truth Imagine the following (real) problem. A Telecom company wants to expand its network in a way in which to meet an unknown (random) demand. There are 86 unknown demands. Each demand is independent and may take on one of five values. S = Ω = Π 86 k=1 (5) = 586 = The number of subatomic particles in the universe. How do we solve a problem that has more variables and more constraints than the number of subatomic particles in the universe? The answer is we can t! We solve an approximating problem obtained through sampling. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 70 / 89

121 Sampling and Monte Carlo Monte Carlo Methods ( ) min {f(x) E PF(x, ω) F(x, ω)dp(ω)} x X Ω Most of the theory presented holds for (*) A very general SP problem Naturally it holds for our favorite SP problem: X def = {x Ax = b, x 0} f(x) c T x + E{Q(x, ω)} Q(x, ω) min y 0 {q(ω) T y Wy = h(ω) T(ω)x} Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 71 / 89

122 Sampling and Monte Carlo Monte Carlo Methods ( ) min {f(x) E PF(x, ω) F(x, ω)dp(ω)} x X Ω Most of the theory presented holds for (*) A very general SP problem Naturally it holds for our favorite SP problem: X def = {x Ax = b, x 0} f(x) c T x + E{Q(x, ω)} Q(x, ω) min y 0 {q(ω) T y Wy = h(ω) T(ω)x} The Dirty Secret Evaluating f(x) is completely intractable! Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 71 / 89

123 Sampling and Monte Carlo Sampling Methods Interior Sampling Methods Sample while solving LShaped Method [Dantzig and Infanger, 1991] Stochastic Decomposition [Higle and Sen, 1991] Stochastic Approximation Methods Stochastic Quasi-Gradient [Ermoliev, 1983] Mirror-Descent Stochastic Approximation [Nemirovski et al., 2009] Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 72 / 89

124 Sampling and Monte Carlo Sampling Methods Interior Sampling Methods Sample while solving LShaped Method [Dantzig and Infanger, 1991] Stochastic Decomposition [Higle and Sen, 1991] Stochastic Approximation Methods Stochastic Quasi-Gradient [Ermoliev, 1983] Mirror-Descent Stochastic Approximation [Nemirovski et al., 2009] Exterior sampling methods Sample. Then Solve. Sample Average Approximation Key Obtain (Statistical) bounds on solution quality Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 72 / 89

125 Sampling and Monte Carlo Sample Average Approximation Instead of solving (*), we solve an approximating problem. Let ξ 1,..., ξ N be N realizations of the random variable ξ: min {f N(x) N 1 x S N F(x, ξ j )}. j=1 f N (x) is just the sample average function For any x, we consider f N (x) a random variable, as it depends on the random sample Since ξ j drawn from P, f N (x) is an unbiased estimator of f(x) E[f N (x)] = f(x) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 73 / 89

126 Sampling and Monte Carlo Lower Bounds v = min {f(x) E PF(x, ω) F(x, ω)p(ω)dω} x S Ω For some sample ξ 1,..., ξ N, let Thm: v N = min x S {f N(x) N 1 N F(x, ξ j )}. j=1 E[v N ] v Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 74 / 89

127 Sampling and Monte Carlo Proof E min x X N 1 min x X N 1 N N F(x, ξ j ) N 1 F(x, ξ j ) x X j=1 N F(x, ξ j ) E j=1 E [v N ] E j=1 N 1 N 1 N j=1 N j=1 min E N 1 x X F(x, ξ j ) F(x, ξ j ) N j=1 x X x X F(x, ξ j ) = v Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 75 / 89

128 Sampling and Monte Carlo Next? Now we need to somehow estimate E[v n ] Idea: Generate M independent samples, ξ 1,j,..., ξ N,j, j = 1,..., M, each of size N, and solve the corresponding SAA problems min x X { N f j N (x) := N 1 } F(x, ξ i,j ), (1) i=1 for each j = 1,..., M. Let v j N and compute be the optimal value of problem (1), M L N,M 1 M j=1 v j N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 76 / 89

129 Sampling and Monte Carlo Lower Bounds The estimate L N,M is an unbiased estimate of E[v N ]. By our last theorem, it provides a statistical lower bound for the true optimal value v. When the M batches ξ 1,j, ξ 2,j,..., ξ N,j, j = 1,..., M, are i.i.d. (although the elements within each batch do not need to be i.i.d.) have by the Central Limit Theorem that M [LN,M E(v N )] N (0, σ 2 L ) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 77 / 89

130 Sampling and Monte Carlo Confidence Intervals I can estimate the variance of my estimate L M,N as s 2 L (M) 1 M 1 M j=1 ( v j N L N,M) 2. Defining z α to satisfy P{N(0, 1) z α } = 1 α, and replacing σ L by s L (M), we can obtain an approximate (1 α)-confidence interval for E[v N ] to be [ L N,M z αs L (M), L N,M + z ] αs L (M) M M Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 78 / 89

131 Sampling and Monte Carlo Upper Bounds v def = min{f(x) = E P F(x; ω) def = F(x; ω)p(ω)dω} x X Ω Quick, Someone prove... f(^x) v x X How can we estimate f(^x)? Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 79 / 89

132 Sampling and Monte Carlo Estimating f(^x) Generate T independent batches of samples of size N, denoted by ξ 1,j, ξ 2,j,..., ξ N,j, j = 1, 2,..., T, where each batch has the unbiased property, namely E f j N(x) := N 1 N i=1 We can then use the average value defined by as an estimate of f(^x). F(x, ξ i,j ) = f(x), for all x X. U N,T (^x) := T 1 T f j N(^x) j=1 Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 80 / 89

133 Sampling and Monte Carlo More Confidence Intervals By applying the Central Limit Theorem again, we have that T [U N,T (^x) f(^x)] N(0, σ 2 U (^x)), as T, where σ 2 U (^x) := Var [ f N(^x) ]. We can estimate σ 2 U (^x) by the sample variance estimator s 2 U (^x; T) defined by s 2 1 U (^x; T) := T 1 T j=1 [ f j N(^x) U N,T (^x)] 2. By replacing σ 2 U (^x) with s2 U (^x; T), we can proceed as above to obtain a (1 α)-confidence interval for f(^x): [ U N,T (^x) z αs U (^x; T), U N,T (^x) + z ] αs U (^x; T). T T Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 81 / 89

134 Sampling and Monte Carlo Putting it all together v N is the optimal solution value for the sample average function: { v N min x S f N (x) := N 1 } N j=1 F(x, ωj ) Estimate E(v N ) as Ê(v N) = L N,M = M 1 M j=1 vj N Solve M stochastic LP s, each of sampled size N. f N (x) is the sample average function Draw ω 1,... ω N from P f N (x) N 1 N j=1 F(x, ωj ) For Stochastic LP w/recourse solve N LP s. Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 82 / 89

135 Sampling and Monte Carlo Recapping Theorems Thm. E(v N ) v f(x) x Thm. fn (^x) E(v N ) f(^x) v, as N, M, N We are mostly interested in estimating the quality of a given solution ^x. This is f(^x) v. f N (^x) computed by solving N independent LPs. Ê(v N ) computed by solving M independent stochastic LPs. Independent no synchronization easy to do in parallel Independent can construct confidence intervals around the estimates Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 83 / 89

136 Sampling and Monte Carlo An experiment M times Solve a stochastic sampled approximation of size N. (Thus obtaining an estimate of E(v N )). For each of the M solutions x 1,... x M, estimate f(^x) by solving N LP s. Test Instances Name Application Ω (m 1, n 1 ) (m 2, n 2 ) LandS HydroPower Planning 10 6 (2,4) (7,12) gbd Fleet Routing (?,?) (?,?) storm Cargo Flight Scheduling (185, 121) (?,1291) 20term Vehicle Assignment (1,5) (71,102) ssn Telecom. Network Design (1,89) (175,706) Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 84 / 89

137 Sampling and Monte Carlo Convergence of Optimal Solution Value 9 M 12, N = 10 6 Monte Carlo Sampling Instance N = 50 N = 100 N = 500 N = 1000 N = term gbd LandS storm ssn Latin Hypercube Sampling Instance N = 50 N = 100 N = 500 N = 1000 N = term gbd LandS storm ssn Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 85 / 89

138 Sampling and Monte Carlo 20term Convergence. Monte Carlo Sampling Lower Bound Upper Bound Value N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 86 / 89

139 Sampling and Monte Carlo ssn Convergence. Monte Carlo Sampling Lower Bound Upper Bound Value N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 87 / 89

140 Sampling and Monte Carlo storm Convergence. Monte Carlo Sampling 1.555e e+06 Lower Bound Upper Bound 1.553e e e+06 Value 1.55e e e e e e e N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 88 / 89

141 Sampling and Monte Carlo gbd Convergence. Monte Carlo Sampling 1800 Lower Bound Upper Bound Value N Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 89 / 89

142 Bibliography Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14

143 References J. R. Birge and F. V. Louveaux. A multicut algorithm for two-stage stochastic linear programs. European Journal of Operations Research, 34: , J. R. Birge, M. A. H. Dempster, H. I. Gassmann, E. A. Gunn, and A. J. King. A standard input format for multiperiod stochastic linear programs. COAL Newsletter, 17:1 19, J. R. Birge, C. J. Donohue, D. F. Holmes, and O. G. Svintsiski. A parallel implementation of the nested decomposition algorithm for multistage stochastic linear programs. Mathematical Programming, 75: , M. Colombo, A. Grothey, J. Hogg, K. Woodsend, and J. Gondzio. A structure-conveying modelling language for mathematical and stochastic programming. Mathematical Programming Computation, 1(4): , G. Dantzig and G. Infanger. Large-scale stochastic linear programs Importance sampling and Bender s decomposition. In C. Brezinski and U. Kulisch, editors, Computational and Applied Mathematics I (Dublin, 1991), pages North-Holland, Amsterdam, Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14

144 References Y. Ermoliev. Stochastic quasi-gradient methods and their application to systems optimization. Stochastics, 4:1 37, H. I. Gassmann. MSLiP: A computer code for the multistage stochastic linear programming problem. Mathematical Programming, 47: , H.I. Gassmann and E. Schweitzer. A comprehensive input format for stochastic linear programs. Annals of Operations Research, 104:89 125, A. Gavironski. Implementation of stochastic quasigradient methods. In Numerical Techniques for Stochastic Optimization. Springer-Verlag, W. E. Hart, J.-P. Watson, and D. L. Woodruff. Pyomo: Modeling and solving mathematical programs in Python. Mathematical Programming Computation, 3(3): , J. L. Higle and S. Sen. Stochastic decomposition: An algorithm for two stage linear programs with recourse. Mathematics of Operations Research, 16(3): , Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14

145 References U. Janjarassuk. Using Computational Grids for Effective Solution of Stochastic Programs. PhD thesis, Department of Industrial and Systems Engineering, Lehigh University, G. Lan, A. Nemirovski, and A. Shapiro. Validation analysis of mirror descent stochastic approximation method. Mathematical Programming, pages 1 34, C. Lemaréchal, A. Nemirovskii, and Y. Nesterov. New variants of bundle methods. Mathematical Programming, 69: , J. T. Linderoth and S. J. Wright. Implementing a decomposition algorithm for stochastic programming on a computational grid. Computational Optimization and Applications, 24: , Special Issue on Stochastic Programming. A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19: , H. Robbins and S. Monro. On a stochastic approximation method. Annals of Mathematical Statistics, 22: , Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14

146 References R.T. Rockafellar and Roger J.-B. Wets. Scenarios and policy aggregation in optimization under uncertainty. Mathematics of Operations Research, 16(1): , A. Ruszczyński. A regularized decomposition for minimizing a sum of polyhedral functions. Mathematical Programming, 35: , A. Ruszczyński. Parallel decomposition of multistage stochastic programming problems. Mathematical Programming, 58: , S. Trukhanov, L. Ntaimo, and A. Schaefer. Adaptive multicut aggregation for two-stage stochastic linear programs with recourse. European Journal of Operational Research, 206: , J.-P. Watson and D. L. Woodruff. Progressive hedging innovations for a class of stochastic mixed-integer resource allocation problems. Computational Management Science, 8(4): , J.-P. Watson, D. L. Woodruff, and W. E. Hart. Pysp: Modeling and solving stochastic programs in Python. Mathematical Programming Computation, 4(2): , R. J. B. Wets. Large-scale linear programming techniques in stochastic Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 1 / 14

147 Bibliography programming. In Numerical Techniques for Stochastic Optimization. Springer-Verlag, V. Zverovich. Modelling and Solution Methods for Stochastic Optimization. PhD thesis, Brunel university, V. Zverovich, C. I. Fábián, E. F. D. Ellison, and G. Mitra. A computational study of a solver system for processing two-stage stochastic LPs with enhanced Benders decomposition. Mathematical Programming Computation, 4(3):21 238, Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 2 / 14

148 Bibliography Miscellaneous Topics Jeff Linderoth (UW-Madison) Computational SP Lecture Notes 2 / 14

Stochastic Decomposition

IE 495 Lecture 18 Stochastic Decomposition Prof. Jeff Linderoth March 26, 2003 March 19, 2003 Stochastic Programming Lecture 17 Slide 1 Outline Review Monte Carlo Methods Interior Sampling Methods Stochastic