Distributionally Robust Stochastic Control with Conic Confidence Sets

Size: px

Start display at page:

Download "Distributionally Robust Stochastic Control with Conic Confidence Sets"

Christiana Kelley
6 years ago
Views:

1 Disribuionally Robus Sochasic Conrol wih Conic Confidence Ses Insoon Yang Absrac The heory of (sandard) sochasic opimal conrol is based on he assumpion ha he probabiliy disribuion of uncerain variables is fully known. In pracice, however, obaining an accurae disribuion is ofen challenging. To resolve his issue, we sudy a disribuionally robus sochasic conrol problem ha minimizes a cos funcion of ineres given ha he disribuion of uncerain variables is no known bu lies in a so-called ambiguiy se. We firs invesigae a dynamic programming approach and idenify condiions for he exisence and opimaliy of non-randomized Markov policies. We hen propose a dualiy-based reformulaion mehod for an associaed Bellman equaion in cases wih conic confidence ses. This reformulaion alleviaes he compuaional issues inheren in he infinie-dimensional minimax opimizaion problem in he Bellman equaion wihou sacrificing opimaliy. The effeciveness of he proposed mehod is demonsraed hrough an applicaion o a sochasic invenory conrol problem. I. INTRODUCTION Sandard sochasic conrol mehods assume ha he probabiliy disribuion of uncerain variables (e.g., disurbances) is available. However, his assumpion is ofen resricive in pracice because obaining an accurae disribuion requires large-scale, high-resoluion sensor measuremens over a long raining period or muliple periods. Siuaions in which uncerain variables are no direcly observed could be much more challenging; compuaional mehods, such as filering or saisical learning echniques, are ofen used o obain he (poserior) disribuion of he uncerain variables given limied observaions. The accuracy of he obained disribuion is ofen poor, as i is subjec o he qualiy of he observaions, compuaional mehods, and prior knowledge abou he variables. If poor disribuional informaion is employed in consrucing a sochasic opimal conroller, i does no guaranee opimaliy and can even cause caasrophic sysem behaviors [1], [2]. To overcome his issue of limied disribuion informaion in sochasic conrol, we invesigae a disribuionally robus conrol approach. This emerging sochasic conrol mehod minimizes a cos funcion of ineres, assuming ha he disribuion of uncerain variables is no compleely known bu conained in a pre-specified ambiguiy se of probabiliy disribuions. In finie-sae Markov decision process seings, several ypes of ambiguiy ses have been considered: [3], [4], [5] employ ambiguiy ses wih momen consrains, confidence inervals and Wassersein disance, respecively. However, for coninuous-sae conrol problems, only a few This work is pored in par by NSF under ECCS and CNS I. Yang is wih he Elecrical Engineering Deparmen, Universiy of Souhern California, Los Angeles, CA 90089, USA insoonya@usc.edu cases wih momen consrain-based and oal variaion disance ambiguiy are well sudied [6], [7], [8]. In his paper, we consider an imporan class of ambiguiy ses ha are characerized wih confidence ses and probabiliy inervals in a coninuous-sae sochasic conrol seing. The conribuions of his work are wofold. Firs, we invesigae a dynamic programming soluion o discree-ime disribuionally robus conrol problems in general finie-horizon cases and provide condiions for he exisence and opimaliy of non-randomized Markov policies. This exisence resul is based on he lower semi-coninuiy of an associaed dynamic programming operaor. Our dynamic programming approach idenifies an imporan srucural propery: he sysem sae is a sufficien saisic. Second, we propose a dual formulaion of he Bellman equaion in cases wih conic confidence ses. This approach convers he compuaionally challenging infinie-dimensional minimax problem ino a semi-infinie program, which can be solved by exising convergen algorihms. We also show ha he proposed reformulaion is exac using srong dualiy and he nesing condiion for confidence ses proposed by Wiesemann e al. [9]. The uiliy of our approach is demonsraed hrough an applicaion o a sochasic invenory conrol problem. The remainder of his paper is organized as follows. In Secion II, we inroduce he problem seup, including a dynamic game formulaion of disribuionally robus conrol problems. In Secion III, we provide condiions for he exisence and opimaliy of non-randomized Markov policies and anoher se of condiions for he convexiy of he value funcion. Based on hese analyical resuls, we develop a dualiy-based reformulaion mehod for he Bellman equaion in Secion IV. A sochasic invenory conrol problem is considered in Secion V as an applicaion of he proposed mehod. We use he following noaion hroughou he paper. Given a Borel space X, P(X) denoes he se of Borel probabiliy measures on X. Given a cone K, K represens is dual cone. We also le T := {0, 1,, T 1} and T := {0, 1,, T }. II. PROBLEM SETUP A. Ambiguiy in he Disribuion of Disurbances Consider he following sochasic sysem subjec o he disurbance {w } T =0 1, w, defined on a sandard probabiliy space (Ω, F, P): x +1 = f(x, u, w ), (1) where x R n is he sae, u R m is he conrol inpu a sage and f : R n R m R n is a measurable funcion. We assume ha w s and w are independen for s. In

2 many pracical siuaions, he full disurbance disribuion may no be available. To overcome his challenge, we will invesigae a disribuionally robus conrol approach ha minimizes he wors-case cos associaed wih he sysem evoluion under informaion consrains characerized as a se of probabiliy disribuions. We assume ha he rue probabiliy disribuion µ of w is no compleely known bu conained in a so-called ambiguiy se, D. Example 1 (Ambiguiy wih Confidence Ses). Consider he following se of probabiliy measures: D := {µ P( ) µ(c i ) [p i, pi ], i I }, (2) where p i, pi [0, 1] are model parameers, C i s are confidence ses, and I := {1,, N } is a given index se. Wih any measure in his se, he probabiliy ha w is conained in he se C i is beween p i and pi. Noe ha he user can specify he disribuional informaion, such as C i and (p i, pi ), ha he conroller russ given daa and saisical informaion abou he disurbance. Anoher popular ype of ambiguiy ses in single-sage opimizaion problems is based on momen consrains and confidence ses [10], [11], [12], [13], [14], [9]. Recenly, saisical disance-based ambiguiy ses also received a grea deal of aenion since hey can be easily designed from an empirical disribuion wihou requiring a large enough daa samples o esimae momens [15], [16], [17], [18], [19], [20]. In Secion IV, we consider ambiguiy ses of he form (2) ha are characerized wih conic confidence ses. This class of ambiguiy ses can be inuiively consruced from empirical disribuions, as will be illusraed in Secion V. B. Disribuionally Robus Conrol as a Dynamic Game Le H be he se of hisories up o sage, whose elemen is of he form 1 h = (x 0, u 0, w 0,, x 1, u 1, w 1, x ). The se of admissible conrol sraegies is chosen as Π := {π := (π 0,, π T 1 ) π (U(x ) h ) = 1 h H }, where U(x ) is he se of admissible acions given sae x, and π is a sochasic kernel from H o R m. Noe ha he sraegy space is broad enough o conain randomized non- Markov policies. 2 Similarly, we le H e be he se of exended hisories up o sage, whose elemen akes he form h e := (x 0, u 0, w 0, µ 0,, x 1, u 1, w 1, µ 1, x, u ). By viewing he disurbance as an adversarial player who chooses he disurbance disribuion given available informaion h e, we define he se of admissible disribuion sraegies as Γ := {γ = (γ 0,, γ T 1 ) γ (D h e ) = 1 h e H e }. 1 All he resuls in his paper are valid wih hisories of he form h := (x 0, u 0, w 0, µ 0,, x 1, u 1, w 1, µ 1, x ) ha also conains Player II s acions (µ 0,, µ 1 ). However, we use he reduced version of hisories because he realized disribuions may no be observable in pracice. 2 Suppose ha for each, π ( h ) is concenraed a a measurable funcion φ : R n R m such ha φ (x) U(x) for all x R n and for all h H. Then, π is a (non-randomized) Markov policy and by a sligh abuse of noaion π is considered o be idenical wih φ. Consider he following cos funcion associaed wih he sysem evoluion saring from x 0 = x: [ T 1 ] J x [π, γ] := E π,γ r(x, u ) + q(x T ), =0 where r : R n R m R and q : R n R are sagewise and erminal cos funcions of ineres. Here, E π,γ is he expecaion aken wih respec o he probabiliy measure P π,γ induced by he sraegy pair (π, γ). Our goal is o choose a conrol policy ha minimizes he wors-case cos given ambiguous informaion abou he probabiliy disribuion {µ } T =0 1 of he disurbance {w } T =0 1. To be more precise, we define he opimal disribuionally robus conrol policies as follows: Definiion 1. A conrol sraegy π Π is said o be an opimal disribuionally robus policy if i saisfies J x [π, γ] J x [π, γ ] π Π. γ Γ γ Γ A desired conrol sraegy can be obained by solving he following minimax conrol problem: inf J x [π, γ]. (3) π Π γ Γ The mos imporan par of his problem formulaion is he inner maximizaion of he cos funcion over all probabiliy disribuion policies in he sraegy space Γ, which encodes disribuional ambiguiy hrough D. One can view his problem as a wo-player zero-sum dynamic game, in which Player I chooses a conrol policy o minimize he cos and Player II selecs he disurbance s disribuion sraegy ha maximizes he cos. The proposed minimax formulaion implies he following propery: Proposiion 1. Suppose ha an opimal soluion o he disribuionally robus conrol problem (3) exiss and is denoed by (π, γ ). Then, he following inequaliies hold: J x [π, γ] J x [π, γ ] J x [π, γ ] (π, γ) Π Γ. γ Γ The firs and second equaliies hold wih γ = γ and π = π, respecively. In addiion, π is an opimal disribuionally robus policy. The firs inequaliy implies ha when he opimal policy π is employed, he wors-case cos value is equal o J x [π, γ ] for any disribuional error consisen wih he consrains in he ambiguiy se D for each. Thus, his approach provides a performance guaranee in he form of an upper-bound, J x [π, γ ], of he cos value, which is igh. Noe ha his performance guaranee may no be valid when a differen conrol policy is employed as shown in he second inequaliy. Furhermore, he second inequaliy confirms ha an opimal soluion o he dynamic game problem (3) provides an opimal disribuionally robus policy.

3 III. DYNAMIC PROGRAMMING SOLUTION A. Exisence of Opimal Disribuionally Robus Policies We begin by inroducing he following dynamic programming operaor T, T : [ ] T v(x) := inf r(x, u)+ v(f(x, u, w))dµ(w), u U(x) µ D where v is a measurable funcion on R n. We hen define he value funcion of he disribuionally robus conrol problem (3) as v (x) := T T +1 T T 1 q(x) for each T and v T (x) := q(x). By definiion, v (x) represens he minimal wors-case expeced cos value from sage o T given x = x. Under he following assumpion for he measurable selecion condiion, he value funcion is lower semi-coninuous and hus he disribuionally robus conrol problem (3) admis a non-randomized Markov policy, which is opimal. Assumpion 1. The following properies hold: (i) r(x, u) and q(x) are lower semi-coninuous and bounded below for all (x, u) R n R m such ha u U(x); (ii) For each bounded coninuous funcion g : R n R, he funcion ĝ (x, u, µ) := g(f(x, u, w))dµ(w) is coninuous for all (x, u, µ) R n R m D such ha u U(x); (iii) The se U(x) is compac for each x R n. In addiion, he se-valued mapping x U(x) is upper semiconinuous. Theorem 1. Suppose ha Assumpion 1 holds. Then, he value funcion v is lower semi-coninuous for each T. Furhermore, here exiss a measurable funcion φ : R n R m for each T such ha φ (x) U(x) and v (x) = µ D [ r(x, φ (x)) + ] v +1 (f(x, φ (x), w))dµ(w) for all x R n. The non-randomized Markov policy π := (φ 0,, φ T 1 ) Π is an opimal soluion o he disribuionally robus conrol problem (3), i.e., v 0 (x) = J x [π, γ]. γ Γ This heorem can be shown by exending he proof of Theorem 1 in [7] and Theorem 3.1 in [21]. The key idea is o show ha he lower semi-coninuiy of v is preserved hrough he dynamic programming operaor. We can hen use mahemaical inducion o show ha v is lower semiconinuous and hus, he ouer minimizaion problem in he definiion of value funcions admis an opimal soluion. Noe ha Theorem 1 allows us o idenify an imporan srucural propery of he disribuionally robus conrol problem: he sysem sae is a sufficien saisic for Player I (conroller) under Assumpion 1. This observaion yields he pracical advanage ha i suffices o focus on non-randomized Markov polices when designing an opimal disribuionally robus conroller. B. Bellman Equaion Applying he dynamic programming principle [22], [23], we can evaluae he value funcion backward in ime as follows: Proposiion 2. Suppose ha Assumpion 1 holds. Then, he value funcion v saisfies he following Bellman equaion: v (x) = min u U(x) [ r(x, u) + µ D wih v T (x) = q(x). ] (4) v +1 (f(x, u, w))dµ(w) Noe ha due o Theorem 1 he ouer minimizaion problem in he Bellman equaion admis an opimal soluion. From he numerical perspecive, however, solving he Bellman equaion is challenging. In addiion o he scalabiliy issue inheren in dynamic programming, his Bellman equaion involves an infinie-dimensional minimax opimizaion problem because he disurbance may have a coninuous densiy. In he nex secion, we will resolve his compuaional issue for an imporan class of disribuionally robus sochasic conrol problems in which confidence ses are specified. Wih such an ambiguiy se, we will show ha an opimal disribuionally robus policy can be obained by solving compuaionally racable semi-infinie programs if he value funcion is convex. C. Convexiy of he Value Funcion We now show ha he value funcion is convex under he following condiions: Assumpion 2. The disribuionally robus conrol problem (3) saisfies he followings: (i) r : R n R m R and q : R n R are convex funcions; (ii) f : R n R m R n is an affine funcion; (iii) For all λ (0, 1) and for all x 1, x 2 R n, if u i U(x i ), i = 1, 2, hen λu 1 + (1 λ)u 2 U(λx 1 + (1 λ)x 2 ). Proposiion 3. Suppose ha Assumpion 2 holds. Then, he value funcion v : R n R is convex for each T. Proof. We use mahemaical inducion. For = T, v T : R n R is convex since v T = q. Suppose ha v τ : R n R is convex for τ = T 1,, + 1. We now he consider he value funcion a sage. Fix λ (0, 1) and x 1, x 2 R n. For any ɛ > 0, here exiss an ɛ-opimal soluion u i U(x i ) o he ouer minimizaion problem in he Bellman equaion (4) for (, x i ), i.e., v (x i )+ɛ > r(x i, u i )+ v +1 (f(x i, u i, w))dµ(w). µ D

4 Le x λ := λx 1 + (1 λ)x 2 and u λ := λu 1 + (1 λ)u 2. Due o Assumpion 2 (iii), u λ U(x λ ). Thus, v (x λ ) r(x λ, u λ ) + v +1 (f(x λ, u λ, w))dµ(w). µ D We now noice ha f(x λ, u λ, w) = λf(x 1, u 1, w) + (1 λ)f(x 2, u 2, w) due o Assumpion 2 (ii). Since v +1 and r are convex, we obain ha v (x λ ) λr(x 1, u 1 ) + (1 λ)r(x 2, u 2 ) + λv +1 (f(x 1, u 1, w))dµ(w) µ D R l + (1 λ)v +1 (f(x 2, u 2, w))dµ(w) µ D < λv (x 1 ) + (1 λ)v (x 2 ) + ɛ. Leing ɛ 0, we conclude ha v is convex. Noe ha his proposiion does no require he exisence of opimal disribuionally robus policies. Furhermore, we do no impose any specific srucure on he ambiguiy se D for he convexiy of he value funcion. IV. STRONG DUALITY-BASED REFORMULATION A. Disribuional Ambiguiy wih Conic Confidence Ses We now focus on he disribuionally robus conrol problem wih a paricular ambiguiy se of he form (2) in Example 1: D := {µ P( ) µ(c i ) [p i, pi ], i I }, where I := {1,, N } and he confidence se C i has he following conic represenaion: C i = {w C i w K i d i }, where C i R Li l and d i R Li are model parameers, and K i is a proper cone. We impose he following assumpion: Assumpion 3. The ambiguiy se saisfies he following condiions: (i) The confidence se C N is compac and p N = p N = 1. (ii) There exiss a disribuion measure µ D such ha µ(c) i (p i, pi ) whenever p i < pi, i I. (iii) (Nesing Condiion) For each T and all i, i I such ha i i, we have eiher C i s C i, C i s C i or C i C i =, where X s Y represens ha X is a sric subse of Y. The firs condiion represens ha C N is he por of µ. The second condiion ensures ha here exiss a probabiliy disribuion ha saisfies he probabilisic consrains in D as sric inequaliies whenever p i < pi. These wo regulariy condiions will guaranee ha srong dualiy holds based on he generalized Slaer-ype resuls from Shapiro [24] when he Bellman equaion is reformulaed in he nex subsecion. The hird condiion is called he nesing condiion [9], which implies ha here exiss a sric parial order on he confidence ses regarding he se inclusion and ha any incomparable ses are disjoin. This nesing condiion, ogeher wih he wo regulariy condiions, provides a racable dual formulaion of he Bellman equaion wihou loss of opimaliy. B. Dual Bellman Equaion We now reformulae he infinie dimensional minimax opimizaion problem in he Bellman equaion (4) as a semiinfinie program, which can be numerically solved by exising convergen algorihms. Furhermore, his reformulaion based on srong dualiy is exac as shown in he following heorem. Theorem 2. Suppose ha Assumpions 1, 2 and 3 hold. Then, he following equaliy holds for all (, x) T R n : 3 v (x) = inf r(x, u) + (p i κ i p i u,κ,λ,ν λi ) i I s.. (C i w d i ) ν i + i A (i) (κ i λ i ) v +1 (f(x, u, w)) w C N i I u U(x), λ, κ R N +, ν i K i, where A (i) := {i} {i I C i s C i }, wih he erminal condiion v T (x) = q(x). Proof. By inroducing a slack variable z R, we can rewrie he Bellman equaion (4) in he following equivalen form: v (x) = inf z u U(x),z R { s.. r(x, u) µ D } + v +1 (f(x, u, w))dµ(w) z for each (, x) T R n. We firs focus on he maximizaion problem in he inequaliy consrain. I can be rewrien as he following infinie-dimensional linear program: r(x, u) + v +1 (f(x, u, w))dµ(w) µ P( ) R l s.. 1 {w C i C N }dµ(w) p i i I 1 {w C i }dµ(w) p i i I. C N Under Assumpion 3, he generalized Slaer condiion holds [24]. Thus, here is no dualiy gap and we have he following dual formulaion of he problem above wihou loss of opimaliy: inf κ,λ R N + r(x, u) + i I (p i κ i p i λi ) s.. v +1 (f(x, u, w)) + i I 1 {w C i }(λ i κ i ) 0 w C N. 3 In he reformulaed Bellman equaion, min u is merged wih inf κ,λ,ν for a compac represenaion. The minimizaion problem admis an opimal soluion u.

$Le B (i) be he index ses of all he sric subses of C, i i.e., B (i) := {i I C i s C}. i We also le C i := C i \ C i. i B (i) Due o he nesing condiion, { C 1 N,, C } is a disjoin pariion of he por C N.$

5 Le B (i) be he index ses of all he sric subses of C, i i.e., B (i) := {i I C i s C}. i We also le C i := C i \ C i. i B (i) Due o he nesing condiion, { C 1 N,, C } is a disjoin pariion of he por C N. Therefore, he inequaliy consrain of he dual problem can be rewrien as v +1 (f(x, u, w)) + (λ i κ i ) 0 w C i i I. i A (i) The inequaliy consrain associaed wih he index i I is equivalen o w C i v +1 (f(x, u, w)) + (λ i κ i ) 0. i A (i) Since w v +1 (f(x, u, w)) is convex for each (x, u), he objecive funcion is convex wih respec o w. Therefore, he maximum is aained a he ouer boundary of C. i We now observe ha he ouer boundary of C i corresponds o he ouer boundary of C i due o he nesing condiion [9]. Thus, we can rewrie he ih consrain as w C N v +1 (f(x, u, w)) + s.. C i w K i d i. i A (i) (λ i κ i ) Is dual is given by he following semi-infinie program: inf d i ν i + θ i + (λ i κ i ) ν i K i,θi R s.. w C N i A (i) v +1 (f(x, u, w)) (C i w) ν i θ i. Puing he reformulaion resuls all ogeher, we have ha v (x) = inf z s.. r(x, u) + i I (p i κ i p i λi ) z d i ν i + θ i + i A (i) (λ i κ i ) 0 i I v +1 (f(x, u, w)) (C i w) ν i θ i w C N i I u U(x), κ, λ R N +, ν i K i, z R, θ R N. Viewing z and θ as slack variables, we conclude ha he saemen in he heorem holds. Noe ha he convexiy of w v +1 (f(x, u, w)) and he nesing condiion play a criical role in preserving opimaliy in he proposed reformulaion as originally observed by Wiesemann e al. in he conex of single-sage opimizaion [9]. When (u, w) v +1 (f(x, u, w)) is also piecewise affine, our resul is consisen wih Theorem 1 in [9]. Theorem 2 allows us o avoid solving he compuaionally challenging infinie-dimensional minimax opimizaion problems in he original Bellman equaion. Insead, we can evaluae he value funcion backward in ime by solving a C 2 =[3.5, 8.5] C 1 =[5.5, 6.5] w C 3 =[0, 12] Fig. 1: The empirical disribuion µ of w and he confidence ses C i, i = 1, 2, 3. semi-infinie program a each (discreized) sae. This semiinfinie program can be solved by several convergen mehods such as primal-dual mehods, discreizaion mehods, homoopy mehods, exchange mehods, and consrain sampling mehods (see [25], [26], [27] and he references herein). Among hem, we will use he convergen discreizaion mehod developed by Reemsen [28] in he nex secion. V. APPLICATION TO INVENTORY CONTROL We consider a sochasic invenory conrol problem o demonsrae he performance of our disribuionally robus conrol mehod. We use he sandard seing of sochasic Newsvendor problems (e.g., [29]). Le x R be an invenory level of ineres a sage. Given he quaniy u U := [0, 10] ordered and he sochasic demand w a sage, he invenory level evolves as x +1 = x + u w, for T := {0, 1,, 6}. We assume ha any unsaisfied demand is backlogged for he nex sage and hus allow negaive sae values. The sage-wise cos funcion is given by r(x, u, w ) = c o (x + u w ) + + c u (w x u ) +, where c o = 1 is he overage (or sorage) cos and c u = 1 is he underage cos (or he cos of los sales). Fig. 1 shows he empirical disribuion µ of w for all T, and he confidence ses used in our simulaions. We choose p i and p i as 90% and 110% of µ (C). i Thus, µ is conained in he consruced ambiguiy se D. We compare our disribuionally robus conroller designed using D and he sandard sochasic opimal conroller consruced wih he empirical disribuion µ when x 0 = 10. Suppose ha he acual disribuion µ rue of w is uniform in each confidence se and µ rue (C) i = p i. Then, µrue D bu µ rue is differen from he empirical disribuion µ. In our simulaion wih 10 5 rajecories of {w } sampled from {µ rue }, he disribuionally robus conrol mehod reduces he oal expeced cos incurred by he sandard conroller by 29.6%. This resul confirms ha our conroller is robus agains errors in disurbance disribuions while he sandard conroller is no. To invesigae why he proposed conroller performs beer han he sandard conroller under disribuional ambiguiy,

6 invenory level ime (day) (a) ime (day) (b) Fig. 2: Tukey box plos of sae rajecories conrolled by (a) he sandard sochasic opimal conroller and (b) he disribuionally robus conroller. we now compare heir conrolled sae rajecories. As shown in Fig. 2, he disribuionally robus conroller drives he median (and he firs and hird quaniles) of sae rajecories closer o he origin han he sandard conroller. Since our conroller considers he wors-case disribuion in he ambiguiy se, i can conrol he sysem in a desirable manner (mainaining he invenory level close o zero) even when deviaes from he empirical disribuion µ. On he oher hand, he sandard conroller opimizes he sysem performance only when µ rue = µ ; oherwise, here is no performance guaranee. In paricular, he sandard conroller iniially increases he invenory level by using approximaely 97% of he maximum allowable conrol value. This aggressive conrol acion is inended o saisfy demand a laer sages wih he limied conrol range U := [0, 10]. However, µ rue as µ rue deviaes from µ, his sandard conrol sraegy generaes higher overage coss han expeced. On he oher hand, he disribuionally robus conroller is designed o ake ino accoun such possibiliies and is capable of balancing he overage and underage coss when µ rue µ. VI. CONCLUSION AND FUTURE WORK We have proposed a dualiy-based dynamic programming approach o disribuionally robus conrol problems wih conic confidence ses. The srucural propery we idenified allows us o focus on non-randomized Markov policies wih sae feedback. Our exac dualiy-based reformulaion mehod also alleviaes he compuaional issues in he original Bellman equaion ha involves infinie-dimensional minimax opimizaion problems. As a fuure research, i is of grea ineres o develop a scalable numerical mehod for he reformulaed Bellman equaion. Furhermore, adding risk consrains may help in sysemaically discouraging undesirable sysem behaviors. REFERENCES [1] A. Nilim and L. El Ghaoui, Robus conrol of Markov decision processes wih uncerain ransiion marices, Operaions Research, vol. 53, no. 5, pp , [2] S. Samuelson and I. Yang, Daa-driven disribuionally robus conrol of energy sorage o manage wind power flucuaions, in Proceedings of he 1s IEEE Conference on Conrol Technology and Applicaions, [3] H. Xu and S. Mannor, Disribuionally robus Markov decision processes, Mahemaics of Operaions Research, vol. 37, no. 2, pp , [4] P. Yu and H. Xu, Disribuionally robus counerpar in Markov decision processes, IEEE Transacions on Auomaic Conrol, vol. 61, no. 9, pp , [5] I. Yang, A convex opimizaion approach o disribuionally robus Markov decision processes wih Wassersein disance, IEEE Conrol Sysems Leers, vol. 1, no. 1, pp , [6] B. P. G. Van Parys, D. Kuhn, P. J. Goular, and M. Morari, Disribuionally robus conrol of consrained sochasic sysems, IEEE Transacions on Auomaic Conrol, vol. 61, no. 2, pp , [7] I. Yang, A dynamic game approach o disribuionally robus safey specificaions for sochasic sysems, arxiv: , [8] I. Tzorzis, C. D. Charalambous, and T. Charalambous, Dynamic programming subjec o oal variaion disance ambiguiy, SIAM Journal on Conrol and Opimizaion, vol. 53, no. 4, pp , [9] W. Wiesemann, D. Kuhn, and M. Sim, Disribuionally robus convex opimizaion, Operaions Research, vol. 62, no. 6, pp , [10] H. Scarf, K. J. Arrow, and S. Karlin, A min-max soluion of an invenory problem, Sudies in he Mahemaical Theory of Invenory and Producion, pp , [11] J. Dupačová, The minimax approach o sochasic programming and an lllusraive applicaion, Sochasics, vol. 20, pp , [12] E. Delage and Y. Ye, Disribuionally robus opimizaion under momen uncerainy wih applicaion o daa-driven problems, Operaions Research, vol. 58, no. 3, pp , [13] I. Popescu, Robus mean-covariance soluions for sochasic opimizaion, Operaions Research, vol. 55, no. 1, pp , [14] S. Zymler, D. Kuhn, and B. Rusem, Disribuionally robus join chance consrains wih second-order momen informaion, Mahemaical Programming, Ser. A, vol. 137, pp , [15] A. Ben-Tal, D. Den Herog, A. De Waegenaere, B. Melenberg, and G. Rennen, Robus soluions of opimizaion problems affeced by uncerain probabiliies, Managemen Science, vol. 59, no. 2, pp , [16] R. Jiang and Y. Guan, Daa-driven chance consrained sochasic program, Mahemaical Programming, Ser. A, vol. 158, pp , [17] H. Sun and H. Xu, Convergence analysis for disribuionally robus opimizaion and equilibrium problems, Mahemaics of Operaions Research, vol. 41, no. 2, pp , [18] E. Erdoğan and G. Iyengar, Ambiguous chance consrained problems and robus opimizaion, Mahemaical Programming, Ser. B, vol. 107, pp , [19] P. Mohajerin Esfahani and D. Kuhn, Daa-driven disribuionally robus opimizaion using he Wassersein meric: Performance guaranees and racable reformulaions, arxiv: , [20] R. Gao and A. J. Kleyweg, Disribuionally robus sochasic opimizaion wih Wassersein disance, arxiv: , [21] J. I. González-Trejo, O. Hernández-Lerma, and L. F. Hoyos-Reyes, Minimax conrol of discree-ime sochasic sysems, SIAM Journal on Conrol and Opimizaion, vol. 41, no. 5, pp , [22] R. Bellman, Dynamic programming and Lagrange mulipliers, Proceedings of he Naional Academy of Sciences, vol. 42, no. 10, pp , [23] O. Hernández-Lerma and J. B. Lasserre, Discree-Time Markov Conrol Processes: Basic Opimaliy Crieria. Springer, [24] A. Shapiro, On dualiy heory of conic linear problems, in Semi- Infinie Programming. Springer, 2001, pp [25] R. Heich and K. O. Koranek, Semi-infinie programming: Theory, mehods, and applicaions, SIAM Review, vol. 35, no. 3, pp , [26] M. López and G. Sill, Semi-infinie programming, European Journal of Operaional Research, vol. 180, pp , [27] G. Calafiore and M. C. Campi, Uncerain convex programs: randomized soluions and confidence levels, Mahemaical Programming, Ser. A, vol. 102, pp , [28] R. Reemsen, Discreizaion mehods for he soluion of semi-infinie programming problems, Journal of Opimizaion Theory and Applicaions, vol. 71, no. 1, pp , [29] R. Levi, R. O. Roundy, and D. B. Shmoys, Provably near-opimal sampling-based policies for sochasic invenory conrol models, Mahemaics of Operaions Research, vol. 32, no. 4, pp , 2007.

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking