MIDAS: A Mixed Integer Dynamic Approximation Scheme

Size: px

Start display at page:

Download "MIDAS: A Mixed Integer Dynamic Approximation Scheme"

Bertina Fields
5 years ago
Views:

1 MIDAS: A Mixed Integer Dynamic Approximation Scheme Andy Philpott, Faisal Wahid, Frédéric Bonnans May 7, 2016 Abstract Mixed Integer Dynamic Approximation Scheme (MIDAS) is a new sampling-based algorithm for solving finite-horizon stochastic dynamic programs with monotonic Bellman functions. MIDAS approximates these value functions using step functions, leading to stage problems that are mixed integer programs. We provide a general description of MIDAS, and prove its almost-sure convergence to an ε-optimal policy when the Bellman functions are known to be continuous, and the sampling process satisfies standard assumptions. 1 Introduction In general, multistage stochastic programming problems are extremely difficult to solve. If one discretizes the random variable using a finite set of outcomes in each stage and represents the process as a scenario tree, then the problem size grows exponentially with the number of stages and outcomes [9]. On the other hand, if the random variables are stagewise independent then the problem can be formulated as a dynamic programming recursion, and attacked using an approximate dynamic programming algorithm. When the Bellman functions are known to be convex (if minimizing) or concave (if maximizing) then they can be approximated by cutting planes. This is the basis of the popular Stochastic Dual Dynamic Programming (SDDP) algorithm originally proposed by Pereira and Pinto [7]. This creates a sequence of cutting-plane outer approximations to the Bellman function at each stage by evaluating stage problems on independently sampled sequences of random outcomes. A comprehensive recent summary of this algorithm and its properties is provided by [10]. A number of algorithms that are variations of this idea have been proposed (see e.g. [3, 4, 6]). The first formal proof of almost-sure convergence for SDDP-based algorithms was provided by Chen and Powell in [3] for their CUPPS algorithm. Their proof however relied on a key unstated assumption identified by Philpott and Guan in [8], who provided a new proof of almost-sure convergence for problems University of Auckland University of Auckland Inria & École Polytechnique 1

2 with polyhedral Bellman functions without requiring this assumption. Almost-sure convergence for general convex Bellman functions was recently proved using a different approach by Girardeau et al. [5]. In order to guarantee the validity of the outer approximation of the Bellman functions, SDDP-based methods require these functions to be convex (if minimizing) or concave (if maximizing). However, there are many instances of problems when the optimal value of the stage problem is not a convex (or concave) function of the state variables. For example, if the stage problem involves state variables with components appearing in both the objective function and the right-hand side of a convex optimization problem then the optimal value function might have a saddle point. More generally, if the stage problems are not convex then we cannot guarantee convexity of the optimal value function. This will happen when stage problems are mixed-integer programs (MIPS), or when they incorporate nonlinear effects, such as modeling head effects in hydropower production functions [2]. Our interest in problems of this type is motivated by models of hydroelectricity systems in which we seek to maximize revenue from releasing water through generating stations on a river chain. This can be modeled as a system with discrete-time stochastic dynamics: x t+1 = f t (x t, u t, ξ t ), x 1 = x, t = 1, 2,..., T. In this problem, u t U(x t ) is a control and ξ t is a random noise term. We assume that f t is a continuous function of its arguments. The control u t generates a reward r t (x t, u t ) in each stage and a terminal reward Q(x T +1 ), where r t and Q are continuous and monotonic increasing in their arguments. Given initial state x, we seek an optimal policy yielding V 1 ( x), where [ ] V t (x) = E ξt max r t(x, u, ξ t ) + V t+1 (f t (x, u, ξ t ))} u U(x) V T +1 (x) = Q(x). Here V t (x) denotes the maximum expected reward from the beginning of stage t onwards, given the state is x, and we take action u t after observing the random disturbance ξ t. In the hydro-scheduling context, x = (s, p) represents both a reservoir stock variable s and a price variable p, and u = (v, l) represents the reservoir release through generators and reservoir spill. In this case the dynamics might be represented by [ ] [ ] st+1 st v = t l t + ω t, p t+1 α t p t + b t + η t where ω t is (random) reservoir inflow, and η t is the error term for an autoregressive model of price, so ξ t = [ ω t η t ]. Here we might define r t (s, p, v, l, ξ t ) = (α t p + b t + η t )g(v), giving the revenue earned by released energy g(v) sold at price α t p + b t + η t. For such a model, it is easy to show (under the assumption that we can spill energy without penalty) that V t (s, p) is continuous and monotonic increasing in s and p, but in general V t (s, p) is not concave so representing it within the SDDP framework is not trivial. 2

3 A number of authors have looked to extend SDDP methods to deal with non-convex stage problems. The first approach replaces the non-convex components of the problem with convex approximations, for example using McCormick envelopes to approximate the production function of hydro plants as in [2]. The second approach convexifies the value function, e.g. using Lagrangian relaxation techniques. A recent paper [12] by Thomé et al. proposes a Lagrangian relaxation of of the SDDP subproblem, and then uses the Lagrange multipliers to produce valid Benders cuts. A similar approach is adopted by Steeger and Rebennack in [11]. Abgottspon et al [1] introduce a heuristic to add locally valid cuts that enhance a convexified approximation of the value function. In this paper we propose a new extension of SDDP called Mixed Integer Dynamic Approximation Scheme (MIDAS). MIDAS uses the same algorithmic framework as SDDP but, instead of using cutting planes, MIDAS uses step functions to approximate the value function. The approximation requires an assumption that the value function is monotonic in the state variables. In this case each stage problem can be solved to global optimality as a MIP. We show that if the true Bellman function is continuous then MIDAS converges almost surely to a 2T ε-optimal solution for a problem with T stages. The rest of the paper is laid out as follows. We begin with some preliminary results on ε-outer approximation of monotonic functions. In section 3 we prove the convergence of MIDAS to a 2T ε-optimal solution to a multistage deterministic optimization problem. Lastly, in section 4, we extend our proof to a sampling-based algorithm applied to the multistage stochastic optimization problem, and demonstrate its almost-sure convergence. We conclude the paper in section 5 by discussing the potential value of using MIDAS to solve stochastic multistage integer programs. 2 Mixed Integer Dynamic Approximation Scheme Before describing MIDAS, we prove some preliminary lemmas relating to global optimization. Consider the problem P: max Q(x) s.t. x X where Q(x) is an arbitrary upper semicontinuous function and X is an arbitrary compact subset of R n. It follows that P has a global maximum attained at x, say. Given ε > 0, we seek an ε-optimal solution to P. In other words we will find ˆx X with Q(ˆx) Q(x ) ε. Suppose that there is some upper semicontinuous function Q so that for every x, Q(x) Q(x). (1) Lemma 1. Any solution x to P : max Q(x) s.t. x X. with Q( x) Q( x) + ε is an ε-optimal solution to P. 3

4 Proof. If x solves P then (1) implies so Q(x ) Q(x ) Q( x) Q( x) + ε Q( x) + ε Q(x ), yielding the result. MIDAS updates the value function at points x h that are the most recently visited values of the state variables computed using a forward pass of the algorithm. We can illustrate the basic step of MIDAS using a deterministic two-stage program P : max u,x r(u) + Q(x) s.t. x = f(u), u U, x X. We assume that Q(x) is monotonic increasing and continuous on X. Since X is compact Q(x) will be uniformly continuous. Thus we have Proposition 1. There exists ɛ > 0 and δ > 0 such that for every x,y with x y < δ, Q(x) Q(y) < ε. Suppose that we have computed a finite set of values q h, h = 1, 2,..., H, that approximate Q(x h ) from below, so for some ε > 0 we have q h Q(x h ) q h + ε, h = 1, 2,..., H. Let } Q H (x) = min q h : x x h + δ1, h = 1, 2,..., H. (2) Then Q H (x) is piecewise constant with Q H (x h + δ1) = q h, and Q H (x) Q(x) ε ε. (3) An illustration of the approximation of Q(x) = x sin(10x) using points x h = 0.2, 0.4, 0.6, 0.8, is provided by Figure 1, where δ = 0.5, ε = 0 and Q H (x), shown in black is an ε-upper bound on Q(x), shown in grey. 4

5 Q(x) x Figure 1: Approximation of Q(x) = x + 0.1sin(10x) using points x h = 0.2, 0.4, 0.6, 0.8. A contour plot of Q H (x) when H = 4 for an example in two dimensions is illustrated in Figure 2. x 2 q 4 q 2 q 3 q 1 x 1 Figure 2: Contour plot of Q H (x) when H = 4. In Figure 2, the circled points are x h, h = 1, 2, 3, 4. The darker shading indicates increasing values of Q H (x) that equals Q(x h ) in each region containing x h. In general, it is easy to see that Q H (x) as defined by (2) is an upper semicontinuous function, and gives 5

6 an approximation to P which is P H : max u,x r(u) + Q H (x) s.t. x = f(u), u U, x X. One possible MIP formulation of P H is proposed in the Appendix. Lemma 2. Suppose for some H, that (u, x) is a feasible solution to P H then (u, x) is a feasible solution to P and r(u) + Q H (x) r(u) + Q(x) ε ε. Proof. The result follows from (3). Lemma 3. Any solution (ũ, f(ũ)) to P H with Q H (f(ũ)) Q(f(ũ)) + ε is a 2ε + ε-optimal solution to P. Proof. If (ũ, f(ũ)) solves P H then any optimal solution (u, f(u )) to P satisfies r(ũ) + Q H (f(ũ)) r(u ) + Q H (f(u )). Thus r(ũ) + Q(f(ũ)) + ε r(ũ) + Q H (f(ũ)) r(u ) + Q H (f(u )) r(u ) + Q(f(u )) ε ε, where the last inequality follows from (3). Now consider the following algorithm (Algorithm 1) to solve P. Algorithm 1 MIDAS for two stage problem 1. Set H = 1, x 1 = x X δ, q 1 = M. 2. Solve P H to give (u H+1, x H+1 ) where x H+1 = f(u H+1 ). 3. If x H+1 x h < δ for some h H then stop. 4. Otherwise let q H+1 = Q(x H+1 ). 5. Increase H by 1 and go to step 2. 6

7 Algorithm 1 stops the first time that x H+1 x : x x h < δ} for some h H. This means that the algorithm continues as long as x H+1 x h δ for all h H. Since X δ is compact and δ is fixed, the algorithm will terminate after a finite number of steps H with x H+1 x h < δ, for some h H. Lemma 4. Suppose for some H that (u H+1, x H+1 ) solves P H h H. Then (u H+1, x H+1 ) is a 2ε + ε solution to P. and x H+1 x h < δ, for some Proof. We have (u H+1, x H+1 ) solves P H where x H+1 = x(u H+1 ) and by Lemma 2, for every feasible (u, x) with x X δ r(u) + Q H (x) r(u) + Q(x) ε. Let N δ (x h ) = x : x x h < δ}. Since Q(x) is uniformly continuous, by Proposition 1, for every x N δ (x h ) Q(x) Q(x h ) < ε. Since x H+1 is the first iterate to be in N δ (x h ) we have, by construction, that Q H (x) = Q H (x h ) for every x N δ (x h ). Thus Q H (x H+1 ) = Q H (x h ) = q h = Q(x h ). It follows that r(u H+1 ) + Q H (x H+1 ) = r(u H+1 ) + Q(x h ) < r(u H+1 ) + Q(x H+1 ) + ε. Now Lemma 3 implies that a solution (u H+1, x H+1 ) to P H, with Q H (x H+1 ) Q(x H+1 ) + ε is a 2ε + ε-optimal solution to P. 3 Multistage optimization problems In Section 2 we discussed how MIDAS approximates the value function, and proved that this is an outer approximation which is 2ε + ε optimal. In this section we prove the convergence of the MIDAS algorithm for a deterministic multistage optimization problem. Given an initial state x, we consider a multistage deterministic optimization problem of the form: MP: max x,u T r t (x t, u t ) + Q(x T +1 ) t=1 s.t. x t+1 = f t (x t, u t ), t = 1, 2,..., T x 1 = x, u t U t (x t ), x t X t, t = 1, 2,..., T 7

8 This gives a dynamic programming formulation V t (x) = max u U(x) r t(x, u) + V t+1 (f t (x, u))}, V T +1 (x) = Q(x), where we seek a policy that maximizes V 1 ( x). We assume that r t (x, u) is continuous and increasing and for every t, V t (x) is a monotonic increasing continuous function of x. Observe that we now allow the function r to depend on the state x as well as the control. The deterministic MIDAS algorithm (Algorithm 2) applied to MP is as follows. Algorithm 2 Deterministic MIDAS 1. Set H = 1, Q H t (x) = M, t = 1, 2,..., T Forward pass: Set x H 1 = x. For t = 1 to T, (a) Solve max rt (x H u U(x H t ) t, u) + Q H t+1 (f t(x H t, u)) } to give u H t ; (b) If f t (x H t, u H t ) x h t+1 < δ for h < H then set x H t+1 = xh t+1, else set xh t+1 = f t(x H t, u H t ). 3. If for every t = 1 to T + 1 there is some h < H with x H t = x h t then stop. 4. Backward pass: Update Q H T +1 (x) to QH+1 T +1 (x) by adding qh T +1 = Q(xH T +1 ) at point xh T +1. Then for every t = T down to 1 do } (a) Solve ϕ = max r t (x H u U(x H t ) t, u) + Q H+1 t+1 (f t(x H t, u)) ; (b) Update Q H t (x) to Q H+1 t (x) by adding q H t = ϕ at point x H t. 5. Increase H by 1 and go to step 1. We now show that this algorithm will converge in step 3 after a finite number of iterations to a solution (u H 1, uh 2,..., uh T ) that is a 2T ε optimal solution to the dynamic program starting in state x. First observe that the sequence (x H 1, xh 2,..., xh T )} lies in a compact set, so it has some convergent subsequence. It follows that we can find H and h H with x H+1 t x h t < δ for every t = 1, 2,..., T + 1. We demonstrate this using the following two lemmas. Lemma 5. Let N δ (S) = x : x s < δ, s S}. Then any infinite set S lying in a compact set contains a finite subset Z with S N δ (Z). Proof. Consider an infinite set S lying in a compact set X. Since S is a subset of the closed set X, its closure S lies in X, and so S is compact. Now for some fixed δ > 0, it is easy to see that N δ (S) is a open covering of S, since any limit point of S must be in N δ (s}) for some s S. Since S is compact, there exists a finite subcover, i.e. a finite set Z = z 1, z 2,..., z N } S with S S N k=1 N δ(z k ). 8

9 Lemma 6. Suppose for every t there is some h H with x H+1 t x h t < δ. Then the sequence of actions (u H+1 1, u H+1 2,..., u H+1 ) is a 2T ε optimal solution to the dynamic program starting in state x. T Proof. If for every t = 1, 2,..., T + 1 there is some h H with x H+1 t x h t < δ, then the deterministic MIDAS algorithm stops in Step 3. We can then apply Lemma 4, to each stage problem, where we substitute V t+1 (x) for Q(x). Denote by u h t an optimal solution at state x h t to be the (true) t th stage problem of MP, so } u h t arg max r t (x h u U(x h t ) t, u) + V t+1 (f t (x h t, u)). Now apply induction on t to show that in Step 4b, we have for all t = 1, 2,..., T, and all h = 1, 2,..., H. q h t V t (x h t ) 2(t T 1)ε If t = T, then we have qt h +1 = Q(xh T +1 ) = V T +1(x h T +1 ), h = 1, 2,..., H, so ε = 0. It follows that for all h = 1, 2,..., H, } qt h = max r T (x h u U(x h T ) T, u) + Q h+1 T +1 (f T (x h T, u)) r T (x h T, u h T ) + V T +1 (f T (x h T, u h T )) 2ε by Lemma 4 suppose the result is true for t 2, 3,..., T }. Then choosing ε = 2(t T 1)ε, we have for all h = 1, 2,..., H, V t (x h t ) q h t ε. Thus Lemma 4 gives for all h = 1, 2,..., H, } qt 1 h = max r t 1 (x h t 1, u) + Q h u U(x h t 1 ) t (f t 1 (x h t 1, u)) yielding the result for t 1. r t 1 (x h t 1, u h t 1) + V t (f t 1 (x h t 1, u h t 1)) 2ε ε = V t 1 (x h t 1) 2ε + 2(t T 1)ε = V t 1 (x h t 1) + 2(t 1 T 1)ε Now when t = 1, x h 1 = x, so we have for all h = 1, 2,..., H, q h 1 = max u U( x) r 1( x, u) + Q h 2(f 1 ( x, u))} r 1 ( x, u h 1) + V 2 (f 1 ( x, u h 1)) 2T ε. showing that q h 1 is the value of a 2T ε-optimal solution to MP. We can now demonstrate the convergence of MIDAS in a finite number of steps. Theorem 1. MIDAS terminates in a finite number of iterations to a 2T ε optimal solution to the dynamic program starting in state x. 9

10 Proof. The sequence S = (x H 1, xh 2,..., xh T )} H=1,2,... generated by MIDAS lies in a compact set, so by Lemma 5 it contains a finite subsequence Z with S N δ (Z). Let H be the largest index in Z. It follows that for every t = 1, 2,..., T + 1, there exists some h H with x H +1 t x h t < δ. The result follows by Lemma 6. 4 Multistage stochastic optimization problems We extend model MP, described in Section 3, to include random noise ξ t on the state transition function f t (x t, u t, ξ t ). This creates the following multistage stochastic program MSP. [ T ] MSP: max E ξt r t (x t, u t ) + V T +1 (x T +1 ) t=1 s.t. x t+1 = f t (x t, u t, ξ t ), t = 1, 2..., T, x 1 = x u t U(x t ), t = 1, 2..., T, x t X t, t = 1, 2..., T. In MSP, we seek an optimal policy in terms of actions u t that is measurable with respect to the filtration (Ω t, F t, P t ) of ξ t for t = 1, 2,..., T. In this paper we restrict attention to a specific form of MSP that has a dynamic programming formulation. Let V t (x) denote the expected reward at the beginning of stage t onwards, given the state x, and we take action u after observing the random disturbance ξ t. Here V T +1 (x) is a given monotonic continuous function. The dynamic programming recursion is V t (x) = [ ] E ξt max t(x, u) + V t+1 (f t (x, u, ξ t ))} u U(x) V T +1 (x) = Q(x), where we seek a policy that maximizes V 1 (x). We make the following assumptions: 1. ξ t is stagewise independent. 2. The set Ω t of random outcomes in each stage t = 1, 2,..., T is discrete and finite (Ω t = ξ ti i = 1,..., N < } with probabilities P ti > 0). 3. X t is a compact set. The particular form of ξ t we assume yields a finite set of possible sample paths or scenarios ω(j) : j = 1, 2,..., T t=1 N t}. The MIDAS algorithm selects one of these randomly in each forward pass. This gives Algorithm 3 described below: 10

11 Algorithm 3 Sampled MIDAS Set H = 1, Q H t (x) = M, t = 1, 2,..., T + 1. While H < H max do 1. Forward pass: Set x H 1 = x. For t = 1 to T : (a) Sample ξ t to give ξ H t Ω t ; (b) Solve max rt (x H u U(x H t ) t, u) + Q H t+1(f t (x H t, u, ξt H )) } to give u H t ; (c) If f t (x H t, u H t, ξt H ) x h t+1 < δ for h < H then set x H t+1 = xh t+1, else set xh t+1 = f t (x H t, u H t, ξt H ). 2. Backward pass: Update Q H T +1 (x) to QH+1 T +1 (x) by adding qh T +1 = V T +1(x H T +1 ) at point xh T +1. Then for every t = T down to 1 do (a) Solve ϕ = [ N i=1 P ti max r t (x H u U(x H t ) t, u) + Q H+1 t+1 (f t(x H t, u, ξ ti ))} ] ; (b) Update Q H t (x) to Q H+1 t (x) by adding q H t = ϕ at point x H t. 3. Increase H by 1 and go to step 1. Observe that Sampled MIDAS has no stopping criterion. However we have the following result. Lemma 7. There is some H such that for every H > H, f t (x H t, u H t, ξt H ) x h t+1 < δ for some h < H. Proof. Consider the event tree defined by the random variable ξ t, consisting of nodes n N. Any policy when implemented defines a state variable x(n) for each node n N of this tree. Let x = x(n), n N }. Each forward pass H updates values of x(n) (from x H (n) to x H+1 (n)) for nodes n corresponding to ξt H, t = 1, 2,..., T. For nodes n not on this sample path we let x H+1 (n) = x H (n), thus defining x H+1. This defines an infinite sequence S = x H } H= of points, all of which lie in a compact set. By Lemma 5 S contains a finite subsequence Z with S N δ (Z). Let H be the largest index in Z. Suppose it is not true that for every t = 1, 2,..., T, and every ξ t Ω t and every H > H there is some h H with f t (x H t, u H t, ξ t ) x h t+1 < δ. Then for some t and ξ t (corresponding to node n N), and some H > H, f t (x H t, u H t, ξ t ) x h t+1 δ for every h H, so x H+1 (n) x h t+1 δ, giving x H+1 t+1 / N δ (Z), a contradiction. By Lemma 7, after H iterations every sample path produced in step 1 yields state values that have been visited before. It is clear that no additional points will be added in subsequent backward passes and so the MIDAS algorithm will simply replicate points that it has already computed with no improvement in its approximation of Q t+1 (x) at each t. If this situation could be identified then we might terminate the algorithm at this point. This then raises the question of whether the solution obtained at this point 11

12 is a 2T ε-optimal solution to MSP. The answer to this latter question depends on the sampling strategy used in step 1a. Following [8] we assume that this satisfies the Forward Pass Sampling Property. Forward Pass Sampling Property (FPSP): For each j = 1, 2,..., T 1 t=2 N t, with probability 1 H : ξt H t = 1, 2, 3,..., T } = ω(j) } =. FPSP states that each scenario ω(j) is traversed infinitely many times with probability 1 in the forward pass. There are many sampling methods satisfying this property. For example, one sampling method would be independently sampling a single outcome in each stage with a positive probability for each ξ ti in the forward pass, which meets FPSP by the Borel-Cantelli lemma. Another sampling method that satisfies FPSP is to repeat an exhaustive enumeration of each scenario ω(j), j = 1, 2,..., T t=1 N t in the forward pass. Theorem 2. If step 1a satisfies FPSP then sampled MIDAS converges almost surely to a 2T ε-optimal policy of MSP. Proof. We prove the result by induction. Consider the sequence S = x H } H>H defined by Lemma 7. Each x H (n) in S is some point that has been visited in a previous iteration of sampled MIDAS, and no new points are added. Consider any leaf node m in N, corresponding to t = T + 1. By the FPSP this is visited an infinite number of times, so at least once by some iteration H > H with probability 1. The collection x H (m) : H = 1, 2,...} = x h T +1, h = 1, 2,..., H }. For all such h, the terminal value function V T +1 (x h T +1 ) is known exactly, so ε = 0 in the conditions of Lemma 4. It follows that for each x h T, h = 1, 2,..., H and each ξ T,i ( )} max r T (x h u U(x h T ) T, u) + Q H T +1 f T (x h T, u, ξ T,i ) ( ) max r T (x h u U(x h T ) T, u) + V T +1 f T (x h T, u, ξ T,i ) 2ε. so [ P T i i i = i max r T (x h u U(x h T ) T, u) + Q H T +1 ( )} ] f T (x h T, u, ξ T,i ) [ ] )} P T i max r T (x h u U(x h T ) T, u) + V T +1 (f T (x h T, u, ξ T,i ) 2ε [ P T i max r T (x h u U(x h T ) T, u) + V T +1 (f T (x h T, u, ξ T,i ))} ] 2ε. Now suppose that for some t 2, 3,..., T } and each x h t, h = 1, 2,..., H [ ( )} ] P ti max r t (x h u U(x i h t ) t, u) + Q H t+1 f t (x h t, u, ξ t,i ) [ ( P ti max r t (x h u U(x h t ) t, u) + Q H t+1 f t (x h t, u, ξ t,i ))} ] ε. i 12

13 where ε = 2(t T 1)ε. Thus step 2 of Sampled MIDAS and Lemma 4 give for all h = 1, 2,..., H, qt 1 h = [ } ] P t 1,i max r t 1 (x h t 1, u) + Q h u U(x i h t 1 ) t (f t 1 (x h t 1, u, ξ t,i )) [ ] } P t 1,i max r t 1 (x h t 1, u) + Q h u U(x i h t 1 ) t (f t 1 (x h t 1, u, ξ t,i )) 2ε ε = [ P t 1,i max r t 1 (x h t 1, u) + Q h u U(x i h t 1 ) t (f t 1 (x h t 1, u, ξ t,i ))} ] 2ε ε = V t 1 (x h t 1) 2ε + 2(t T 1)ε = V t 1 (x h t 1) + 2(t 1 T 1)ε yielding the result for t 1. Then by induction we have Q H 1 ( x) V 1 ( x) + 2(1 T 1)ε = V 1 ( x) 2T ε yielding the result. 5 Conclusion In this paper, we have proposed a method called MIDAS for solving multistage stochastic dynamic programming problems with monotonic Bellman functions. We have demonstrated the almost-sure convergence of MIDAS to a 2T ε-optimal policy in a finite number of steps. The quality of the optimal policy will depend on the value of δ, since a smaller δ will give a lower ε. We may expect that the number of iterations will increase as δ decreases. Furthermore, since the current formulation treats each stage problem as a MIP, then with each iteration, the stage problems will increase in size. So small δ values will give more iterations, each of which involves problem instances of increasing size. There remains considerable scope for tuning the algorithm by choosing δ adaptively as the algorithm proceeds. MIDAS can be extended in several ways. Currently, we approximate the stage problems using a MIP formulation that treats the Bellman functions as piecewise constant. One formulation of such a MIP is given in the Appendix. A more accurate approximation might be achieved using an alternative formulation that for example uses specially ordered sets to yield a piecewise affine approximation. As long as this provides an ε-outer approximation of the Bellman function, we can incorporate it into a MIDAS scheme, which will converge almost surely by the same arguments above. Although our convergence result applies to problems with continuous Bellman functions, MIDAS can be applied to multistage stochastic integer programming (MSIP) problems. Solving a deterministic equivalent of a MSIP can be difficult, as the scenario tree grows exponentially with the number of stages and the number of outcomes, and MIP algorithms generally scale poorly. However, since solving many small MIPs will be faster than solving a single large MIP, we might expect MIDAS to produce good candidate solutions to MSIP problems with much less computational effort. Our hope is that MIDAS 13

14 will provide a practical computational approach to solving these difficult multistage stochastic integer programming problems. Appendix: A MIP representation of Q H (x) Consider the two-stage problem P : max u r(u) + Q(x) s.t. x = f(u), u U, x X, where Q(x) is continuous and monotonic increasing, and X = x : 0 x i K i, i = 1, 2,..., n}. Since X is compact, we have an upper bound M on Q(x). Suppose we have computed a finite set of values q h, h = 1, 2,..., H, that approximate Q(x h ), so Q(x h ) q h ε, h = 1, 2,..., H. (For example we could choose ε = 0, if Q(x h ) = q h, h = 1, 2,..., H.). For any x X δ = x : δ x i K i, i = 1, 2,..., n}, define Q H (x) to be the optimal value of the mixed integer program Q H (x) = max ϕ s.t. ϕ q h + (M q h )(1 w h ), h = 1, 2,..., H, x k x h k zh k + δ, k = 1, 2,..., n, n k=1 zh k = 1 w h, h = 1, 2,..., H, w h 0, 1}, h = 1, 2,..., H, zk h 0, 1}, k = 1, 2,..., n, h = 1, 2,..., H. and for x X \ X δ set Q H (x) = minq H (y) : y x, y X δ }. The problem P is approximated by P H : max u r(u) + Q H (x) s.t. x = f(u), u U, x X. Since Q is uniformly continuous, there exists ε and δ so that for every x, y with x y < δ we have Q(x) Q(y) < ε. We now show that Q H (x) gives an upper bound on Q(x), to a tolerance level of ε + ε. Proposition 2. For every x X, Q H (x) Q(x) ε ε. 14

15 Proof. Consider first a given point x X δ, and any w h, z h k that is feasible for P H. If w h = 0, h = 1, 2,..., H, then ϕ M is the only constraint on ϕ and so at optimality yielding the result. ϕ = M Q(x), Otherwise assume that the optimal solution to P H (x) has w h = 1 for some h. We define H(x) = h : x k < x h k + δ for every k}, and show that Q H (x) = minq h : h H(x)}. (4) First if h H(x) then w h = 1. This is because choosing w h = 0 implies zk h = 1 for some k, so for at least one k x k x h k + δ, so h / H(x). Now if h / H(x) then any feasible solution to P H (x) can have either w h = 0 or w h = 1. Observe however that if q h < minq h : h H(x)} then choosing w h = 1 would yield a value of ϕ strictly lower than the value obtained by choosing w h = 0, and so choosing w h = 1 cannot maximize ϕ. It follows that Thus the optimal value minq h : h H(x)} minq h : h / H(x), w h = 1}. Q H (x) = minq h : w h = 1} = minq h : h H(x)}. Now by monotonicity of Q, for every h H(x), Q(x) Q(x h + δe), so Q(x) q h Q(x h + δe) q h Q(x h + δe) Q(x h ) + ε < ε + ε by continuity of Q. Since Q H (x) = minq h : h H(x)}, we thus have as required. Q H (x) Q(x) ε ε, To complete the proof, suppose finally that x X \ X δ, so Q H (x) = minq H (y) : y x, y X δ }. Then by the argument above applied to each y X δ, minq H (y) : y x, y X δ } minq(y) ε ε : y x, y X δ } Q(x) ε ε since Q is monotonic increasing. 15

16 References [1] H. Abgottspon, K. Njalsson, M.A. Bucher, and G. Andersson. Risk-averse medium-term hydro optimization considering provision of spinning reserves. In 2014 International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), pages 1 6. IEEE, [2] S. Cerisola, J.M. Latorre, and A. Ramos. Stochastic dual dynamic programming applied to nonconvex hydrothermal models. European Journal of Operational Research, 218(3): , [3] Z. Chen and W.B. Powell. Convergent cutting-plane and partial-sampling algorithm for multistage stochastic linear programs with recourse. Journal of Optimization Theory and Applications, 102(3): , [4] C.J. Donohue and J.R. Birge. The abridged nested decomposition method for multistage stochastic linear programs with relatively complete recourse. Algorithmic Operations Research, 1(1), [5] P. Girardeau, V. Leclere, and A.B. Philpott. On the convergence of decomposition methods for multistage stochastic convex programs. Mathematics of Operations Research, 40(1): , [6] M. Hindsberger and A.B. Philpott. Resa: A method for solving multistage stochastic linear programs. Journal of Applied Operational Research, 6(1):2 15, [7] M.V.F. Pereira and L.M.V.G. Pinto. Multistage stochastic optimization applied to energy planning. Mathematical Programming, 52(1-3): , [8] A.B. Philpott and Z. Guan. On the convergence of stochastic dual dynamic programming and related methods. Operations Research Letters, 36(4): , [9] A. Shapiro. On complexity of multistage stochastic programs. Operations Research Letters, 34(1): 1 8, [10] A. Shapiro. Analysis of stochastic dual dynamic programming method. European Journal of Operational Research, 209(1):63 72, [11] G. Steeger and S. Rebennack. Strategic bidding for multiple price-maker hydroelectric producers. IIE Transactions, 47(9): , [12] F. Thome, M.V.F. Pereira, S. Granville, and M. Fampa. Non-convexities representation on hydrothermal operation planning using SDDP. Technical report, working paper, available: psr-inc. com. br/portal/psr/publicacoes,

MIDAS: A Mixed Integer Dynamic Approximation Scheme

MIDAS: A Mixed Integer Dynamic Approximation Scheme Andy Philpott, Faisal Wahid, Frédéric Bonnans June 16, 2016 Abstract Mixed Integer Dynamic Approximation Scheme (MIDAS) is a new sampling-based algorithm