arxiv: v1 [math.oc] 27 Jul 2009

Size: px

Start display at page:

Download "arxiv: v1 [math.oc] 27 Jul 2009"

Reynard Chambers
6 years ago
Views:

1 PARTICLE METHODS FOR STOCHASTIC OPTIMAL CONTROL PROBLEMS PIERRE CARPENTIER GUY COHEN AND ANES DALLAGI arxiv: v1 [mah.oc] 27 Jul 2009 Absrac. When dealing wih numerical soluion of sochasic opimal conrol problems sochasic dynamic programming is he naural framework. In order o ry o overcome he so-called curse of dimensionaliy he sochasic programming school promoed anoher approach based on scenario rees which can be seen as he combinaion of Mone Carlo sampling ideas on he one hand and of a heurisic echnique o handle causaliy or nonanicipaiveness consrains on he oher hand. However if one considers ha he soluion of a sochasic opimal conrol problem is a feedback law which relaes conrol o sae variables he numerical resoluion of he opimizaion problem over a scenario ree should be compleed by a feedback synhesis sage in which a each ime sep of he scenario ree conrol values a nodes are ploed agains corresponding sae values o provide a firs discree shape of his feedback law from which a coninuous funcion can be finally inferred. From his poin of view he scenario ree approach faces an imporan difficuly: a he firs ime sages close o he ree roo here are a few nodes or Mone-Carlo paricles and herefore a relaively scarce amoun of informaion o guess a feedback law bu his informaion is generally of a good qualiy ha is viewed as a se of conrol value esimaes for some paricular sae values i has a small variance because he fuure of hose nodes is rich enough; on he conrary a he final ime sages near he ree leaves he number of nodes increases bu he variance ges large because he fuure of each node ges poor and someimes even deerminisic. Afer his dilemma has been confirmed by numerical experimens we have ried o derive new variaional approaches. Firs of all wo differen formulaions of he essenial consrain of nonanicipaiveness are considered: one is called algebraic and he oher one is called funcional. Nex in boh seings we obain opimaliy condiions for he corresponding opimal conrol problem. For he numerical resoluion of hose opimaliy condiions an adapive mesh discreizaion mehod is used in he sae space in order o provide informaion for feedback synhesis. This mesh is naurally derived from a bunch of sample noise rajecories which need no o be pu ino he form of a ree prior o numerical resoluion. In paricular an imporan consequence of his discrepancy wih he scenario ree approach is ha he same number of nodes or poins are available from he beginning o he end of he ime horizon. And his will be obained wihou sacrifying he qualiy of he resuls ha is he variance of he esimaes. Resuls of experimens wih a hydro-elecric dam producion managemen problem will be presened and will demonsrae he claimed improvemens. Key words. sochasic programming; measurabiliy consrains; discreizaion AMS subjec classificaions. 90C15 49M25 62L20 Inroducion. Taking ino accoun uncerainies in he decision process has become an imporan issue for all indusries. Facing he marke volailiies he weaher whims and he changing policies and regulaory consrains decision makers have o find an opimal way o inroduce heir decision process ino his uncerain framework. One way o ake uncerainies ino accoun is o use he sochasic opimizaion framework. In his approach he decision maker makes his decision by opimizing a mean value wih regard o he muliple possible scenarios weighed wih a probabiliy law. Sochasic opimizaion problems ofen involve informaion consrains: he decision maker makes his decision afer geing observaions abou he possible scenarios. In he lieraure wo differen communiies have deal wih his informaion issue using differen modelling echniques and herefore differen soluion mehods. The École Naionale Supérieure de Techniques Avancées 32 boulevard Vicor Paris Cedex 15 France Pierre.Carpenier@ensa.fr Universié de Paris-Es CERMICS École des Pons Champs sur Marne Marne la Vallée Cedex 2 France guy.cohen@mail.enpc.fr EDF R&D 1 avenue du Général de Gaulle Clamar Cedex France anes.dallagi@edf.fr 1

2 2 A. DALLAGI AND G. COHEN AND P. CARPENTIER quesion is: knowing ha sochasic opimizaion problems are ofen infinie dimensional problems how can we implemen racable soluion mehods which can be implemened using a compuer? To answer his quesion each communiy brough is own answers. The sochasic programming communiy models he informaion srucure by scenarios rees: his involves Mone Carlo sampling plus some manipulaions of he sample rajecories o enforce he ree srucure. Then racable soluion mehods for sochasic opimizaion problems consis in solving deerminisic problems over he decision rees. We refer o [ ] for furher deails on hese mehods. Neverheless sochasic programming faces an imporan difficuly: due o he ree srucure one has a few discreizaion poins nodes of he ree a he early sages which could represen a serious handicap when aemping o synhesizing a feedback law. On he conrary when approaching he las sages one has a large number of discreizaion poins bu wih a fuure which may be almos or compleely deerminisic. The sochasic opimal conrol communiy uses special srucures of he sochasic opimizaion problems ime sequenialiy and sae noions o model he informaion consrains hrough a funcional inerpreaion ha leads o he Dynamic Programming Principle. We refer o [4 5 6] for furher deails on his mehod. However sochasic dynamic programming is also confroned wih a serious obsacle known as he curse of dimensionaliy. In fac his mehod leads one o blindly discreize he whole sae space wihou aking ino accoun a generally nonuniform sae disribuion a he opimal soluion. This paper ries o bridge he gap beween hose wo communiies. We propose a racable soluion mehod for sochasic opimal conrol problems which makes use of he good ideas of boh Mone Carlo sampling and variaional mehods on he one hand funcional handling of he informaion srucure on he oher hand. Oher relaed works [8 20] are sill closer o he sochasic programming poin of view. In our approach he same number of discreizaion sample poins are used from he beginning o he end of he ime horizon and his discreizaion grid is adapive: when ransposed ino he sae space i reflecs he opimal sae disribuion a each ime sage. The proposed mehod is of a variaional naure in ha i is based on gradien calculaions which involve he forward sae variable inegraion and he backward adjoin sae co-sae evaluaion as in he Ponryagin minimum principle. The purpose is o solve he Kuhn-Tucker necessary opimaliy condiions of he considered problem including he informaion consrains. This paper is organized as follows. In 1 we discuss how o model he informaion srucure in sochasic opimizaion problems. Then we derive opimaliy condiions for sochasic opimizaion problems wih informaion consrains. In 2 hese opimaliy condiions are specialized o he siuaion of sochasic opimal conrol problems. A special aenion is paid o he so-called Markovian case in 2.4. The numerical implemenaion of a resoluion mehod based on Mone Carlo and funcional approximaions is presened in 3. Finally in 4 a case sudy namely a hydro-elecric dam managemen problem illusraes he proposed mehod and compares i o he sandard scenario ree approach. 1. Preliminaries. In his secion we presen he main framework of his paper: how o model sochasic opimizaion problems and how o represen he informaion srucure of such problems. This preliminary secion is direcly inspired from he

3 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 3 works of he Sysem and Opimizaion Working Group 1 [2 19 9]. In he sequel of he paper he random variables defined over a probabiliy space Ω A P will be denoed using bold leers e.g. ξ L 2 Ω A P; Ξ whereas heir realizaions will be denoed using normal leers e.g. ξ Ξ Modelling informaion. When unknown facors affec a process ha we ry o conrol we consider a se Ω of possible saes of naure ω among which he rue sae ω 0 is supposed o be. Roughly speaking an informaion srucure is a pariion of Ω ino a collecion of subses G. A poseriori observaions will help us o deermine in which paricular subse of his pariion he rue sae ω 0 lies. Two exreme cases may be menioned. 1. If he pariion is simply he crude pariion { Ω} hen he a poseriori observaions are useless. We will refer o his siuaion as an open-loop informaion srucure. 2. If he pariion is he fines possible one namely all subses G are singleons hen he a poseriori observaions will ell us exacly wha is he rue sae of naure ω 0. This is he siuaion of perfec knowledge prior o making our decisions. In beween afer he a poseriori observaions become available we will remain wih some uncerainy abou he rue ω 0 ha is we will know in which paricular G he rue sae lies bu all ω s in ha G are sill possible. Moreover in dynamic siuaions new observaions may become available a each ime sage and a pariion of Ω corresponding o his informaion srucure mus be considered a each ime sage. We refer o he general case as a closed-loop decision process. In order o have a framework in which various operaions on informaion srucures become possible probabiliy heory inroduces so-called σ-algebras or σ-fields random variables and a pre-order relaion beween random variables. The laer is called measurabiliy. 2 We refer he reader o [17] for furher deails on informaion srucure and probabiliy heory. As far as his paper is concerned we here sae a resul found in [17 Theorem 8 p.108] giving he main properies of he measurabiliy relaion beween random variables. Proposiion 1.1 Measurabiliy relaion. Le Ω A P be a probabiliy space le Y 1 : Ω Y 1 and Y 2 : Ω Y 2 be wo random variables aking heir values in Y 1 and Y 2 respecively. The following saemens are equivalen: 1. Y 1 Y 2 2. σy 1 σy 2 3. φ : Y 2 Y 1 a measurable mapping such ha Y 1 = φ Y 2 4. Y 1 = EY 1 Y 2 P-a.s Modelling a sochasic opimizaion problem. In his secion we consider wo inerpreaions of a sochasic opimizaion problem: an algebraic one in which he informaion consrain is modelled hrough he sandard measurabiliy relaion Saemen 1 of Proposiion 1.1 and a funcional inerpreaion in which we use 1 SOWG École Naionale des Pons e Chaussées: Laeiia Andrieu Kengy Bary Pierre Carpenier Jean-Philippe Chancelier Guy Cohen Anes Dallagi Michel De Lara Pierre Girardeau Babakar Seck Cyrille Srugarek. 2 A random variable Y 1 is measurable wih respec o anoher random variable Y 2 if and only if he σ-field generaed by he firs is included ino he one generaed by he second: σy 1 σy 2. In his paper we will no give furher deails on hese definiions. Insead we refer he reader o [7] for furher deails on he probabiliy and measurabiliy heory.

4 4 A. DALLAGI AND G. COHEN AND P. CARPENTIER he funcional equivalen of measurabiliy relaion Saemen 3 of Proposiion Algebraic inerpreaion. Le Ω A P be a probabiliy space and le ξ : Ω Ξ be a random variable aking values in Ξ := R d ξ noise space. We denoe by U := R du he conrol space and by U he funcional space L 2 Ω A P; U. The cos funcion J : U R is defined as he expecaion of a normal inegrand 3 j : U Ξ R JU := E juξ = j Uω ξω dpω. The feasible se U fe := U as U me accouns for wo differen consrains: almos sure consrains: Ω U as := { U U Uω Γ as ω P-a.s. } 1.1 where Γ as : Ω U is a measurable se-valued mapping see [18 Definiion 14.1] which is convex and closed valued and measurabiliy consrains: U me := { U U U is G measurable } 1.2 where G is a given sub-σ-field of A. From heir respecive definiions i is easy o prove ha U me is a closed subspace of U indeed L 2 Ω G P; U and ha U as is a closed convex subse of U. The opimizaion problem under consideraion is o minimize he cos funcion JU over he feasible subse U fe. The firs model represening he sochasic opimizaion problem is hus min JU s.. U U U Ufe. 1.3 This inerpreaion is called algebraic as i uses an algebraic relaion measurabiliy o define he informaion srucure of Problem Funcional inerpreaion. According o Proposiion 1.1 a funcional model for sochasic opimizaion problems is available in which he opimizaion is achieved wih respec o funcions called feedbacks. Indeed le Y : Ω Y be a random variable called he observaion aking value in Y := R dy Φ be he space L 2 Y B o Y P Y ; U where B o Y is he Borel σ-field of Y and P Y he image of he probabiliy measure P by Y Φ as be he subse of Φ defined by Φ as := { φ Φ φ Y ω Γ as ω P-a.s. }. We define he cos funcion J : Φ R as Jφ := E j φy ξ. We are ineresed in he funcional opimizaion problem: min J φ s.. φ Φ as. 1.4 φ Φ 3 See [18 Definiion 14.27]. We remind ha he normal inegrand assumpion is done o ensure measurabiliy properies [18 Proposiion 14.28].

5 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 5 Proposiion 1.2. If σy = G hen Problem 1.3 is equivalen o Problem 1.4 in he sense ha if U is soluion of 1.3 resp. φ soluion of 1.4 hen here exiss φ soluion of 1.4 resp. U soluion of 1.3 such ha U = φ Y P-a.s.. Proof. This is a sraighforward consequence of Proposiion 1.1 and of he definiion of he feasible ses in boh problems Opimaliy condiions for a sochasic opimizaion problem. Previous works deal wih opimaliy condiions for sochasic opimal conrol problems [15 13]. This paragraph can be viewed as a sligh exension of [15] when considering a sochasic opimizaion problem subjec o almos-sure and measurabiliy consrains. We consider Problem 1.3 presened in 1.2. We recall ha he feasible se U fe is he inersecion of a closed convex subse U as and of a linear subspace U me. In order o apply he resuls given in Appendix A we firs esablish he following lemma. Lemma 1.3. Le U as be he convex se defined by almos sure consrains 1.1 and le U me be he linear subspace defined by measurabiliy consrains 1.2. We assume ha Γ as is a G-mesurable and closed convex valued mapping. Then proj U as U me U me. Proof. We firs prove ha proj U as U ω = projγ as ω Uω P-a.s.. In- deed le FV := 1 2 V U 2 U + χ V. From he definiion of almos sure consrains we have FV = Ω f V ω ω U as dpω wih fv ω := 1 2 v Uω 2 U + χ Γ as ω v. By definiion of he projecion proj U as U is soluion of he opimizaion problem min f V ω ω dpω. V U Ω Using [18 Theorem 14.60] inerchange of minimizaion and inegraion we obain proju as U ω arg min fv ω = { } proj Γ as ω Uω P-a.s. 1.5 v U hence he claimed propery. Le us consider any U U me. We deduce from 1.5 ha proju as 1 U ω = argmin v Uω 2 v Γ as ω 2 U P-a.s.. From [1 Theorem ] measurabiliy of marginal funcions we deduce from he G-measurabiliy of boh U and Γ as ha he arg min funcion proj U as U is also a G-measurable funcion which means ha proj U as U U me. The main resul of his secion is given by he following heorem. Theorem 1.4. Assume ha Γ as is a G-mesurable and closed convex valued mapping ha funcion j is a normal inegrand such ha j ξ is differeniable P-a.s. and ha j u U ξ U for all U U. Le U be a soluion of Problem 1.3. Then E j u U ξ G χu asu. 1.6

6 6 A. DALLAGI AND G. COHEN AND P. CARPENTIER Proof. The differeniabiliy of J : U R is a sraighforward consequence of boh he inegral expression of J and he differeniabiliy assumpion on j. Moreover he expression of he derivaive of J is given by J Uω = j u Uω ξω P-a.s.. Le U be a soluion of 1.3. From Lemma 1.3 and Proposiion A.2 in he Appendix we obain proj U me J U χ U asu his las expression being equivalen o 1.6 hanks o he characerizaion of he condiional expecaion as a projecion and he expression of J. Using he equivalen saemens A.2c he opimaliy condiions given by Theorem 1.4 can be reformulaed as U = proj U as U ǫproj U me JU ǫ > 0. This las formulaion can be used in pracice o solve Problem 1.3 using a projeced gradien algorihm. The projecion over U as involves random variables bu we saw in he proof of Lemma 1.3 ha proj U as can be performed in a poinwise manner ω per ω so ha he implemenaion of he algorihm is effecive. Remark 1. The opimaliy condiions 1.6 which are given in erms of random variables also have a poinwise inerpreaion. As a maer of fac using he equivalen saemens A.2 and he poinwise inerpreaion of proj U as i is sraighforward o prove ha R χ U asu implies Rω χ Γ as ω U ω P-a.s.. 2. Sochasic opimal conrol problems. Sochasic opimal conrol problems res upon he same framework as sochasic opimizaion problems: hey can be modelled as closed-loop sochasic opimizaion problems wih a sequenial ime srucure. In his secion we deal wih he algebraic inerpreaion of sochasic opimal conrol problems in a discree ime framework. We will derive opimaliy condiions following he same principle as in Problem formulaion. We consider a sochasic opimal conrol problem in discree ime T denoing he ime horizon. A each sage = 0... T we denoe by W := R dw he noise space a ime. Le W be a se of random variables defined on Ω A P and aking values in W and le W := W 0 W T. The noise process of he problem is a random vecor W W such ha W = W 0...W T wih W W. A each sage = 0... T 1 we denoe by U := R du he conrol space and by U := L 2 Ω A P; U he space of square inegrable random variables aking values in U. A he decision maker makes a decision a conrol U U. Le U := U 0 U T 1. The conrol process of he problem is a random vecor U U such ha U = U 0... U T 1 wih U U. We also assume ha each conrol variable U is subjec o almos-sure consrains. More precisely for all = 0...T 1 le Γ as : Ω U be a se-valued mapping random se le U as := { U U ω Γ as ω P-a.s.} and le U as := U0 as UT as 1. The almos-sure consrains wries U U as = 0...T

7 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS Dynamics. A each sage = 0...T we denoe by X := R dx he sae space a ime. Le X be a se of random variables defined on Ω A P and aking values in X and le X := X 0... X T. The sae process of he problem is a random vecor X X such ha X = X 0...X T wih X X. I arises from he dynamics f : X U W +1 X +1 of he sysem namely X 0 = W 0 P-a.s. X +1 = f X P-a.s. = 0...T a 2.2b Remark 2. The shif on he ime index beween he random variable X and he conrol U on he one hand and he noise W +1 on he oher hand enlighens he fac ha we are in he so-called decision-hazard scheme: a ime he decision maker chooses a conrol variable U before having any informaion abou he noise W +1 which affecs he dynamics of he sysem Cos funcion. We consider a each sage = 0...T 1 an inegral cos funcion L : X U W +1 R. Moreover a he final sage T we consider a final cos funcion K : X T R. Summing up all hese insananeous coss we obain he overall cos funcion of he problem jx u w := T 1 =0 L x u w +1 + Kx T and he decision maker has o choose he conrol variables U in order o minimize he overall cos expecaion T 1 JX U := E L X + KX T. 2.3 = Informaion srucure. Generally speaking he informaion available on he sysem a ime is modelled as a random variable Y : Ω Y where Y := R dy is he observaion space. Le Y be he se of random variables aking heir values in Y. We suppose ha here exiss an observaion mapping h : W 0 W T Y such ha Y = h W 0...W T P-a.s.. For all = 0...T we denoe by G = σy he sub-σ-field of A generaed by he random variable Y : 4 G = σ h W 0... W T. The decision maker knows Y when choosing he appropriae conrol U a ime { so ha he informaion consrain is U Y. Using he noaions U me = U U is G measurable } and U me = U0 me UT me 1 he measurabiliy consrains of he problem wries U U me = 0...T We also inroduce he sequence of σ-fields F associaed wih he noise =0...T process W where F is he σ-field generaed by he noises prior o : F = σ W 0...W = 0...T. This sequence is a filraion as i saisfies he inclusions F 0 F 1... F T A. When G = F we are in he case of complee causal informaion. 4 Noe ha he σ-fields G s are fixed in he sense ha hey do no depend on he conrol variable U.

8 8 A. DALLAGI AND G. COHEN AND P. CARPENTIER Opimizaion problem. According o he noaion given in he previous paragraphs he sochasic opimal conrol problem we consider here is min U X U X T 1 E L X + KX T 2.5a =0 subjec o boh he dynamics consrains 2.2 and he conrol consrains U U as U me = 0...T b We follow here he algebraic inerpreaion of an opimizaion problem given in 1.2 as he informaion srucure is defined using a measurabiliy relaion. In he way we modelled Problem 2.5 we have o minimize he cos funcion JUX wih respec o boh U and X under he dynamics consrains 2.2. Bu he sae variables X are in fac inermediary variables and i is possible o eliminae hem by recursively incorporaing he dynamics equaions 2.2 ino he cos funcion 2.3. The resuling cos funcion J only depends on he conrol variables U. Is expression is JU = E ju W wih ju w = T 1 =0 L u 0...u w 0... w +1 + Ku 0... u T 1 w 0...w T. In his seing Problem 2.5 is equivalen o min U U JU s.. U U as U me. 2.6 The derivaives of he cos funcion J can be obained from he derivaives of J using he well known adjoin sae mehod. This is based on he following resul saed here wihou proof. Proposiion 2.1. Assuming ha funcions f and L are coninuously differeniable wih respec o heir firs wo argumens and ha funcion K is coninuously differeniable he parial derivaives of j wih respec o u are given by j u u w = L u x u w +1 + λ +1f u x u w +1 where he sae vecor x 0...x T saisfies he forward dynamics equaion x 0 = w 0 x +1 = f x u w +1 whereas he adjoin sae or co-sae vecor λ 0...λ T is chosen o saisfy he backward dynamics equaion λ T = K x T λ = L x x u w +1 + f x x u w +1 λ Assumpions. In order o derive opimaliy condiions for Problem 2.5 we make he following assumpions. Assumpion 1 Consrains srucure. Γ as are closed convex se-valued mappings = 0...T 1.

9 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 9 Assumpion 2 Differeniabiliy. 1. Funcions f dynamics and L cos are coninuous differeniable wih respec o heir firs wo argumens sae and conrol = 0...T Funcion K final cos is coninuously differeniable. 3. Funcions L and f are normal inegrands = 0...T The derivaives of f L and K are square inegrable = 0... T 1. Assumpion 3 Nonanicipaiviy and measurabiliy. 1. G F = 0...T. 2. Mappings Γ as are G -measurable = 0...T 1. Assumpion 2 will allow us all inegraion and derivaion operaions needed for obaining opimaliy condiions. The measurabiliy condiion on Γ as in Assumpion 3 expresses ha he almos-sure consrains mus a leas have he same measurabiliy as he decision variables. The firs condiion in Assumpion 3 expresses he causaliy of he problem: he decision maker has no access o informaion in he fuure! Under his assumpion here exiss a measurable mapping h : W 0 W Y such ha he informaion variable Y wries Y = h W 0...W P-a.s.. From Assumpion 1 and using Lemma 1.3 he following propery is readily available. Proposiion 2.2 Consrains srucure. For all = 0...T 1 1. U as is a closed convex subse of U 2. proj U as U me U me Opimaliy condiions in sochasic opimal conrol problems. We presen here necessary opimaliy condiions for he sochasic opimal conrol problem 2.5 which are an exension of he condiions given in Theorem Non-adaped opimaliy condiions. A firs se of opimaliy condiions is given in he nex heorem. Theorem 2.3. Le he wo random processes X =0...T X and U =0...T 1 U be a soluion of Problem 2.5. Suppose ha Assumpion 1 2 and 3 are saisfied. Then here exiss a random process λ =0...T X such ha for all = 0... T 1 X 0 = W 0 X +1 = f X 2.7a 2.7b λ T = K X T λ = L x X + f x X λ c 2.7d E L ux + λ +1 f ux G χu asu. 2.7e Proof. From he equivalence beween Problem 2.5 and Problem 2.6 we obain using Theorem 1.4 ha he soluion U saisfies proj U me J U χ U asu

10 10 A. DALLAGI AND G. COHEN AND P. CARPENTIER and herefore E j u UW G χu asu for all = 0... T 1. The desired resul follows from Proposiion 2.1. The condiions given by Theorem 2.3 are called non-adaped opimaliy condiions because he dual random process λ =0...T is no adaped o he naural filraion F =0...T ha is λ generally depends on he fuure. We will see in he nex secion ha similar opimaliy condiions can be wrien wih help of an adaped dual random process Adaped opimaliy condiions. The following heorem presens opimaliy condiions involving an adaped dual random process. Theorem 2.4. Le he wo random processes X =0...T X and U =0...T 1 U be a soluion of Problem 2.5. Assume ha Assumpion 1 2 and 3 are saisfied. Then here exiss a process Λ =0...T X adaped o he filraion F =0...T such ha for all = 0... T 1 X 0 = W 0 X +1 = f X 2.8a 2.8b Λ T = K X T Λ = E L x X + f x X Λ +1 F 2.8c 2.8d E L ux + Λ +1 f ux G χu asu. 2.8e Proof. All assumpions of Theorem 2.3 are me so ha here exiss a random process λ =0...T saisfying 2.7. Define for all he random variable Λ by Λ := E λ F. By consrucion he process Λ =0...T is adaped o he filraion F =0...T. A sage T we have Λ T = Eλ T F T = E K X T F T = K X T because X T is F T -measurable. For all = T using he law of oal expecaion E F = E E F +1 F and since all variables X and W +1 are F +1 -measurable we deduce from 2.7 ha Λ = E L x X + f x X Eλ +1 F +1 F = E L x X + f x X Λ +1 F hence he adaped backward dynamics equaions given in 2.8. Assumpion 3 implies ha G F F +1. Using E G = E E F +1 G and he measurabiliy properies of X and W +1 he las opimaliy condiion in 2.7 becomes E L ux + Eλ +1 F +1f ux G χu asu hence he las opimaliy condiion given in 2.8. Noe ha in he opimaliy condiions given in Theorem 2.4 a each sage he gradien is projeced over he subspace generaed by he observaion σ-field G whereas he adaped dual random variable is projeced over he subspace generaed by F which corresponds o he naural filraion of Problem 2.5.

11 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS Opimaliy condiions in he Markovian case. As far as informaion srucure is concerned he opimaliy condiions 2.7 and 2.8 were obained assuming only nonanicipaiveness see Assumpion 3 and he fac ha he observaion σ- fields G =0...T do no depend on he decisions U =0...T 1. The las assumpion allows us o avoid he so-called dual effec of conrol see [3] for furher deails. Anoher feaure ofen available in pracice for he informaion srucure is he perfec memory propery which inuiively means ha he informaion is no los over ime. The las propery implies ha G =0...T is a filraion namely G G +1 = 0...T 1. We will assume in he sequel ha he perfec memory propery holds and we will moreover assume complee causal noise observaion so ha G = F = 0...T. 5 Then boh opimaliy condiions 2.7 and 2.8 involve condiional expecaions wih respec o a random observaion variable he dimension of which increases wih ime a new noise random variable becomes available a each sage. This leads o a compuaional difficuly of he same naure as he so-called curse of dimensionaliy. To address his difficuly i would be an easier siuaion o have a consan dimension for he observaion space as in he sochasic dynamic programming principle [5 6] when he opimal conrol a only depends on he sae variable a he same ime sage. We hus consider new more resricive assumpions which mach he sochasic opimal conrol framework an lead us o he desired siuaion. Assumpion 4 Markovian case. 1. G = F = 0...T perfec memory and causal noise observaion. 2. The random variables W 0...W T are independen whie noise. 3. The mappings Γ as are consan deerminisic consrains: = 0...T 1 Γ as U Γ as ω = Γas P-a.s.. The sandard formulaion of a sochasic opimal conrol problem in he Markovian case is o assume ha he sae is compleely and perfecly observed. The problem formulaion is accordingly min U X U X T 1 E L X + KX T 2.9a =0 subjec o boh he dynamics consrains 2.2 and he conrol consrains U Γ as P-a.s. and U X = 0...T b We now consider he opimaliy condiions 2.7 and 2.8 and we specialize hem o he Markovian case Markovian case: non-adaped opimaliy condiions. We presen a non-adaped version of he opimaliy condiions of Problem 2.5 wih Markovian assumpions. We begin by presening a resul inspired by he sochasic dynamic programming principle. Theorem 2.5. Suppose ha Assumpions 1 2 and 4 are fulfilled and assume ha here exis wo random processes U =0...T 1 U and X =0...T X soluion 5 Noe ha complee causal noise observaion implies he perfec memory propery as far as F =0...T is a filraion.

12 12 A. DALLAGI AND G. COHEN AND P. CARPENTIER of Problem 2.5. Then here exiss a process λ =0...T 1 X saisfying 2.7 and such ha for all = 0...T 1 a λ +1 X +1 W +2...W T b U X c E L ux + λ +1 f ux F = E L u X + λ +1 f u X X P-a.s.. Proof. Firs Assumpion 3 being implied by Assumpion 4 he exisence of he process λ =0...T X saisfying 2.7 is given by Theorem 2.3. Denoing by H he Hamilonian a ime namely H x u w λ = L x u w+λ f x u w opimaliy condiions 2.7d and 2.7e wrie λ = H x X λ a E H u X λ +1 F χu asu. 2.10b The proof of saemens a and b is obained by inducion. For he sake of simpliciy we firs prove he resul when U as = U so ha 2.10b reduces o he equaliy condiion: E H ux λ +1 F = c A sage T we know from 2.7c ha λ T = µ T 1 X T wih µ T 1 being a measurable funcion 6 and hence λ T X T. 2.11a Then using 2.7b he opimaliy condiion 2.10c akes he form: E H T 1 u XT 1 U T 1 W T µ T 1 f T 1 X T 1 U T 1 W T FT 1 = 0. X T 1 and U T 1 being boh F T 1 -measurable random variables and W T being independen of F T 1 whie noise assumpion we deduce ha he condiional expecaion in he las expression reduces o an expecaion. Le G T 1 denoes he funcion resuling from is inegraion namely G T 1 x u = E H T 1 u x u WT µ T 1 f T 1 x u W T. G T 1 is a mesurable mapping 7 and he opimaliy condiion wries G T 1 X T 1 U T 1 = 0. Using he measurable selecion heorem available for implici measurable funcions [14 Theorem 8] 8 we deduce ha here exiss a measurable mapping γ T 1 : X T 1 U T 1 such ha G T 1 XT 1 γ T 1 X T 1 = 0. As a 6 In fac µ T 1 = K. From Assumpion 2 µ T 1 is a coninuous mapping. 7 in fac a coninuous one from Assumpion 2 8 See for insance [21 Secion 7] for a survey of measurable selecion heorems corresponding o he implici case. Noe ha [14 Theorem 8] needs a paricular assumpion concerning he σ-field equipping X T 1. We assume here ha such assumpion holds.

13 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 13 conclusion he conrol variable U T 1 = γ T 1 X T 1 saisfies he opimaliy condiion 2.10c a = T 1 and is such ha U T 1 X T b A sage assume ha λ +1 X +1 W +2...W T. Then here exiss a measurable funcion µ such ha λ +1 = µ X +1 W W T. 2.12a The opimaliy condiion 2.10c a sage akes he form E H u X µ f X W W T F = 0. Wih he same reasoning as a sage T we deduce ha his condiional expecaion reduces o an expecaion so ha he opimaliy condiion wries G X = 0 G being a measurable funcion given by G x u = E H u x u µ f x u W W T. Using again [14 Theorem 8] we deduce ha here exiss a measurable mapping γ : X U such ha U = γ X saisfies he opimaliy condiion 2.10c a. We have accordingly U X. 2.12b Ulimaely saring from he opimaliy condiion 2.10a namely λ = H x X λ +1 using he inducion assumpion 2.12a ogeher wih 2.12b and 2.7c we obain ha λ = H x X γ X µ f X γ X W W T. We conclude ha λ X... W T so ha he desired resul holds rue. 9 Le s go now o he general case U as U. From A.2 he opimaliy condiion 2.10b is again equivalen o an equaliy condiion: proj U as U ǫe H u X λ +1 F U = 0 and he same argumens as in he previous case remain valid. A las from U X λ +1 X... W T and he whie noise assumpion we deduce ha λ +1 depends on F only hrough X so ha E L ux + λ +1 f ux F = E L u X + λ +1 f u X X. 9 Noe ha we obained as an inermediae resul ha λ +1 `X... W T.

14 14 A. DALLAGI AND G. COHEN AND P. CARPENTIER Saemen c is hus saisfied and he proof is complee. In he Markovian case he opimal soluion of Problem 2.5 measurabiliy wih respec o all pas noise variables saisfies he measurabiliy consrains of Problem 2.9 measurabiliy wih respec o he curren sae variable. The wo problems are equivalen same min and argmin. In fac he feasible se of 2.5 conains he feasible se of 2.9 and Theorem 2.5 shows us ha any opimal soluion of 2.5 is feasible for 2.9 and is herefore also opimal also for 2.9. Ulimaely he opimaliy condiions of Problem 2.9 can be wrien as X 0 = W 0 X +1 = f X 2.13a 2.13b λ T = K X T λ = L x X + f x X λ c 2.13d E L u X +λ +1 f u X X χu asu. 2.13e Remark 3. Le G 1 =0...T 1 and G 2 =0...T 1 be he gradien processes associaed wih 2.7 and 2.13 respecively: G 1 :=E L u X + f u X λ +1 F G 2 :=E L u X + f u X λ +1 X. Unlike Problem 2.5 we are unable o compue he opimaliy condiions 2.13 of Problem 2.9 by differeniaing he Lagrangian funcion because he condiioning erm X depends iself on he conrol variables U. Consequenly G 2 is no claimed o represen he projeced gradien of Problem 2.9. The equaliy beween he gradien G 1 and G2 holds rue only a he opimum Markovian case: adaped opimaliy condiions. We now presen he adaped version of he opimaliy condiions of Problem 2.5 under Markovian assumpions. Theorem 2.6. Suppose ha Assumpions 1 2 and 4 are fulfilled and assume ha here exiss wo random processes U =0...T 1 U and X =0...T 1 X soluion of Problem 2.5. Then here exiss a process Λ =0...T 1 X saisfying 2.8 and such ha for all = 0...T 1 a Λ +1 X +1 b U X c E L u X + Λ +1 f u X F = E L ux + Λ +1 f ux X P-a.s.. Proof. The proof follows he same scheme as he one of Theorem 2.5. We jus poin ou ha λ +1 X +1 W W T and Λ +1 = Eλ +1 F +1 implies ha Λ +1 X +1.

15 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 15 From he equivalence beween Problem 2.5 measurabiliy wih respec o he pas noise and Problem 2.9 measurabiliy wih respec o he sae in he Markovian framework we may consider ha he opimaliy condiions of Problem 2.9 in he adaped version are X 0 = W 0 X +1 = f X 2.14a 2.14b Λ T = K X T Λ = E L x X + f x X Λ +1 X 2.14c 2.14d E L ux +Λ +1 f ux X χu asu. 2.14e The opimaliy condiions 2.13 and 2.14 involve condiional expecaions wih respec o he sae variable he dimension of which is in mos cases fixed ha is i does no depend on he ime sage. In order o solve Problem 2.5 and equivalenly Problem 2.9 we have o discreize hose condiions and in paricular o approximae he condiional expecaions. The lieraure on condiional expecaion approximaion only offers biased esimaors wih an inegraed squared error depending on he dimension of he condiioning erm. On he conrary approximaing an expecaion hrough a Mone-Carlo echnique involves non-biased esimaors he variance of which does no depend on he dimension of he underlining space. In he nex secion we propose a funcional inerpreaion of he sochasic opimal conrol problem in order o ge rid of hose condiional expecaions and deal only wih expecaions Opimaliy condiions from a funcional poin of view. Consider he sochasic opimal conrol Problem 2.5. Under Markovian assumpions we have shown in 2.3 ha i is equivalen o Problem 2.9 and ha 2.14 is a se of necessary opimaliy condiions. Hereafer we ransform he opimaliy condiions 2.14 using Theorem 2.6 and he funcional inerpreaion of he measurabiliy relaion beween random variables see Proposiion 1.1. Theorem 2.7. Suppose ha Assumpions 1 2 and 4 are fulfilled and assume ha here exis wo random processes U =0...T 1 U and X =0...T 1 X soluion of Problem 2.5. Le Λ =0...T 1 X be a random process saisfying he opimaliy condiions 2.8. Then here exiss wo sequences of mappings Λ =0...T and φ =0...T 1 Λ : X X and φ : X U such ha for all = 0...T 1 Λ = Λ X U = φ X 2.15a 2.15b Λ = E E L x ` φ + f x ` φ Λ+1`f φ 2.15c ` φ L u` φ + Λ +1`f φ f u χ Φ asφ 2.15d wih Φ as := { φ L 2 X B o X P X ; U φ x Γ as x X } and PX he probabiliy measure associaed wih X.

16 16 A. DALLAGI AND G. COHEN AND P. CARPENTIER Proof. By Theorem 2.6 he random processes U =0...T 1 X =0...T and Λ =0...T 1 are such ha U X and Λ X. By Proposiion 1.1 here exis measurable mappings φ : X U and Λ : X X such ha U = φ X and Λ = Λ X. From U U = L 2 Ω A P; U we deduce ha φ L 2 X B o X P X ; U : Ω U ω 2 dpω = Since U U as Ω φ X ω 2 dpω = X φ x 2 dp X x < +. φ X ω Γ as P-a.s. we obain ha φ Φ as. The co-sae dynamic equaions in 2.14 rewries Λ X = E L x `X φ X + f x `X φ X Λ+1X +1 which using X +1 = f X φ X becomes Λ X = E L x `X φ X X + f x `X φ X Λ+1`fX φ X X. From he whie noise assumpion he random variable W +1 is independen of X so ha he condiional expecaion urns ou o be jus an expecaion wih respec o he noise random variable. The co-sae dynamics equaion wries accordingly as a funcional equaliy: Λ = E L x φ + f x φ Λ +1`f φ. Using similar argumens we easily obain from he las condiion in 2.14: E L u φ + Λ +1`f φ f u φ χ Φ asφ which complees he proof. Theorem 2.7 provides he new funcional opimaliy condiions 2.15 for Problem 2.5 in he Markovian case. These opimaliy condiions do no involve condiional expecaions bu jus expecaions. Therefore we may hope ha in he approximaion process of hese condiions we will obain non biased esimaes he variance of which will no depend on he dimension of he sae space. 3. Adapive discreizaion echnique. In his secion we develop racable numerical mehods for obaining he soluion of Problem 2.5. We will limi ourselves here o he Markovian framework bu mehods for all cases described in 2 can be found in [9]. We briefly discuss wo classical soluion mehods namely sochasic programming and dynamic programming. Then we presen an adapive mesh algorihm which consiss in discreizing he opimaliy condiions obained a 2.4 and in using hem in a gradien-like algorihm Discree represenaion of a funcion. As far as numerical resoluion is concerned we need o manipulae funcions which are infinie dimensional objecs and which in mos cases do no have a closed-form expression. Thus we mus have a discree represenaion of such an objec. Le φ : X U be a funcion. We suppose ha we have a disposal a fixed or variable grid x in X ha is a collecion of elemens in X: x = x i i=1...n X n.

17 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 17 A poin x i will be called a paricle. In order o obain a discree represenaion of φ we define is race ha is a grid u on U which corresponds o he values of φ on x: u = φx i i=1...n Un. We denoe by T U : U X X n U n he race operaor which wih any funcion φ : X U associaes he couple of grids x u ha is he n poins x i φx i X U. On he oher hand knowing he race of φ we need o compue an approximaion of he value aken by φ a any x X. Le R U : X n U n U X be an inerpolaion-regression operaor which wih any couple of grids x u associaes φ : X U represening he iniial funcion. Such an inerpolaion-regression operaor may be defined in differen ways polynomial inerpolaion kernel approximaion closes neighbor ec Sochasic programming. One way o discreize sochasic opimal conrol problems of ype 2.5 is o model he informaion srucure as a decision ree. 1. Firs simulae a given number N of scenarios W kk=1...n =0...T of he noise process. Then by some ree generaion procedures see e.g. [16] or [12] organize hese scenarios in a scenario ree a any node of any ime sage here is a single pas rajecory bu muliple fuures: see Figure 3.1. =0 =1 =2 =3 =T =0 =1 =2 =3 =T N scenarios Scenarios ree Fig From scenarios o a ree 2. Second wrie he componens of he problem sae dynamics and cos expecaion on he scenario ree noe ha he informaion consrains are buil-in in such a ree srucure. Then he approximaion on he scenario ree of Problem 2.5 is solved using an appropriae deerminisic non-linear programming package. The opimal soluion consiss of sae and conrol paricles a each node of he ree. An inerpolaionregression procedure as suggesed a 3.1 has o be performed a each ime sage in order o synhesize a feedback law. Such a mehodology is relaively easy o implemen and need no in fac any Markovian assumpion. Neverheless i faces a serious drawback: a he firs ime

18 18 A. DALLAGI AND G. COHEN AND P. CARPENTIER sages close o he ree roo only a few paricles or nodes are available and experience shows ha he esimaes of he opimal feedback hey provide has a relaively weak variance because he number of pending scenarios a each such node is sill large enough; on he conrary a he final ime sages near he ree leaves a large number of paricles is available bu wih a huge variance. In boh cases he feedback synhesis he inerpolaion-regression process from he available grid will be inaccurae. Such an observaion will be highlighed laer on he case sudy Sochasic dynamic programming. We consider Problem 2.5 in he Markovian case which is hus equivalen o Problem 2.9. From Theorem 2.5 he opimal conrol process U =0...T 1 can be searched as a collecion of feedback laws φ =0...T 1 depending on he process sae X =0...T ha is U = φ X P-a.s.. According o he Dynamic Programming Principle he resoluion is buil up backward from = T o = 0 by solving he Bellman equaion a each ime sage for all sae values x X. Such a principle leads o he following algorihm [5 6]. Algorihm 1 Sochasic Dynamic Programming. A sage T define he Bellman funcion V T as V T x := Kx x X T. Recursively for = T compue he Bellman funcion V as V x = min E L x u + V +1 f x u x X u Γ as he opimal feedback law φ being obained as φ x = arg min E L x u + V +1 f x u x X. u Γ as This algorihm is only concepual because i operaes on infinie dimensional objecs V and expecaions canno always be evaluaed analyically. We mus indeed manipulae hose objecs as indicaed a 3.1. For every = 0...T le x := x i i=1...n be a fixed grid of n discreizaion poins in he sae space X. We approximae he funcions appearing in Algorihm 1 by heir race over ha grid: v i = V x i i = 1...n = 0...T u i = φ x i i = 1...n = 0...T 1. We also need o approximae he expecaions by he Mone Carlo mehod. Le W kk=1...n =0...T denoe N independen and idenically disribued scenarios of he random noise process. The discreized sochasic dynamic programming algorihm is as follows. Algorihm 2 Discreized Sochasic Dynamic Programming. A sage T compue he race v T of he Bellman funcion V T : v i T = V Tx i T i = 1... n T

19 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS 19 Recursively for = T approximae V +1 by inerpolaion-regression: V +1 = R R x +1 v +1 compue he wo grids v and u ha is for each i = 1... n v i 1 = min u Γ as N u i = argmin u Γ as N [ k=1 1 N L x i u W k +1 + V+1 f x i u W k +1 ] N [ L x i u W k +1 + V+1 f x i u W k +1 ] k=1 and obain he feedback law as φ = R U x u. Remark 4. The inerpolaion-regression operaor is in mos cases mandaory in Algorihm 2. As a maer of fac for a given ime sage and a given conrol value u U here usually does no exis any index j such ha x j +1 = f x i u W+1 k which means ha V +1 has o be compued ou of he grid x +1. Noe however ha he Bellman funcion a ime sage T is known analyically so ha inerpolaion is no needed for V T. This mehod faces an imporan difficuly: he curse of dimensionaliy. In fac one generally discreizes each sae coordinae a each ime sage using a scalar grid and a fixed number of poins. Therefore he grid x is obained as he Caresian produc of he scalar grids over all he coordinaes. Thus he number of paricles of ha grid increases exponenially wih he sae space dimension. This is he wellknown drawback of mos mehods derived from Dynamic Programming which do no ake advanage of he repariion of he opimal sae paricles in he sae space in order o concenrae compuaions in significan pars of he sae space The adapive mesh algorihm. Considering he difficulies faced by boh sochasic and dynamic programming mehods we propose an alernaive mehod for solving Problem 2.5 in he Markovian case. The mehod based on opimaliy condiions 2.15 aims a dealing wih he same number of noise paricles from he beginning of he ime horizon o he end: we hus hope ha he generaed feedback law esimaors will have a reduced and fixed variance during all ime sages; aemping o alleviae he curse of dimensionaliy by operaing on an adapive discreizaion grid auomaically generaed from he primary noise discreizaion grid Approximaion. Le us denoe by W kk=1...n =0...T a se of N independen and idenically disribued scenarios obained from he noise random process. Given random conrol grids u := U k k=1...n for = 0...T 1 we can compue he sae random grids x := X k k=1...n by propagaing he sae dynamics equaion: X k 0 = W k 0 k = 1... N 3.1a X k +1 = f X k Uk k k = 1...N = 0...T b

20 20 A. DALLAGI AND G. COHEN AND P. CARPENTIER The feedback a ime sage is obained using an inerpolaion-regression operaor on he grids x u. Noe ha he sae space grids x are no a priori fixed as in Dynamic Programming mehods. They are in fac adaped o he conrol paricules as hey change whenever he decision maker changes is conrol sraegy. Remark 5. If he opimal sae is concenraed in a region of he sae space he opimal feedback will be synhesized only inside ha region. In fac we need no compue i elsewhere because he sae will hardly reach oher regions. In order o obain approximaions Λ of he co-sae funcions Λ inroduced a Theorem 2.7 we compue paricles Λ k by inegraing he co-sae backward dynamic equaion over he sae grids x. We hus obain co-sae grids l = Λ k k=1...n and make use of inerpolaion-regression operaors R X in order o compue he co-sae funcion for values ou of he curren grid. More specifically he process is iniiaed wih Λ T = Λ T = K ; hen for all = T one compues: Λ k = 1 N NX L x X k U k W+1 j + f x X k U k W+1 j Λ b +1 `fx k U k W+1 j j=1 bλ = R X x l. 3.2a 3.2b G k Ulimaely for all k = 1...N and for all = 0...T 1 he gradien paricles are obained as G k = 1 N NX» L u j=1 `Xk U k W+1 j + f u `Xk U k W j bλ f +1 `Xk U k W j. 3.3 As already noiced he direcion associaed wih hese paricles represens he gradien only a he opimum Algorihm. We can now derive a descen-like algorihm o solve Problem 2.5 under Markovian assumpions. A each ieraion sae paricles are propagaed forward wih no ineracion beween paricles hen co-sae paricles are propagaed backward now wih ineracion caused by he regression-inerpolaion operaions. Then gradien paricles are compued using 3.3 and he conrol paricles are updaed using a gradien-like mehod. Ulimaely a funcional represenaion of he feedback laws is obained hanks o a regression-inerpolaion operaor. Algorihm 3. Sep [0]. Le u [0] =0...T 1 = U [0]k Sep [l]. k=1...n be he iniial conrol grids. =0...T 1 1. Compue he sae grids x [l] =0...T by propagaing he dynamics 3.1 wih U = U [l]. 2. Compue boh co-sae grids l [l] =0...T and funcional approximaions Λ [l] =0...T 1 by propagaing he dynamics 3.2 wih U = U [l] and X = X [l]. 3. Compue he gradien paricles G [l]k k=1...n using Equaion 3.3 =0...T 1 wih U = U [l] X = X [l] and Λ = Λ [l].

21 PARTICLE METHODS FOR STOCHASTIC OPTIMIZATION PROBLEMS For all = 0...T 1 and k = 1...N updae he conrol paricles by performing a projeced gradien sep: U [l]k ρ [l] G [l]k. U [l+1]k = proj Γ as 5. Sop if some degree of accuracy is reached. Else se l = l+1 and ierae. Sep [ ]. 1. Se he grids: u [ ] x [ ] =0...T 1 = U [l]k k=1...n =0...T 1 = X [l]k =0...T 1 k=1...n =0...T Obain for all = 0... T 1 he feedback law: [ ] φ = R U x u [ ]. The only a priori discreizaion made during his algorihm is relaive o he noise sampling. Once such a discreizaion has been performed all oher grids used o ulimaely obain he feedback laws are derived by inegraing dynamic equaions. In addiion no condiional expecaion approximaions are involved in he process. We jus approximae expecaions using Mone Carlo echniques and i is well-known ha he variance of such an approximaion does no depend on he dimension of he underlying space. The only space-size dependen operaions are in fac he inerpolaion operaors used o approximae he co-sae mappings during he ieraive process plus he feedback laws once convergence is achieved. Furhermore his adapive discreizaion makes compuaions concenrae in effecive sae space regions unlike dynamic programming which explores he whole sae space. 4. Case sudy. We consider he producion managemen of an hydro-elecric dam. The problem is formulaed as a sochasic opimal conrol and we consider solving i by he hree mehods described in Model. The problem is formulaed in discree ime over 24 hours using a consan ime sep of one hour. The index = 0...T where T = 24 defines he ime discreizaion grid. The waer volume sored in he dam a ime sage = 0...T is a one dimensional random variable X L 2 Ω A P; R corresponding o he sysem sae. This sorage variable has o remain beween given bounds x minimal volume o be kep in he dam and x maximal waer volume he dam can conain so ha for all = 0...T he following almos-sure consrain mus hold: x X x P-a.s The waer inflow ino he dam a sage is denoed by A. I is a one dimensional random variable wih known probabiliy law. We denoe by U L 2 Ω A P; R he one dimensional random variable corresponding o he desired volume of waer o be urbinaed a sage and by E he effecively urbinaed waer volume during he same ime sage. In mos cases E = U. Bu his equaliy is no achievable if he dam goes under is minimal volume. Therefore we have for all = 0...T 1: E = minu X + A +1 x P-a.s.. 4.2

22 22 A. DALLAGI AND G. COHEN AND P. CARPENTIER The shif in he ime indices is due o he fac ha he decision o urbinae is supposed o be made before observing he waer inflow volume enering he dam: we are in a decision-hazard framework. Moreover he conrol variables U are subjec o he following bounds: for all = 0...T 1 u U u P-a.s Taking ino accoun a possible overflow he dam dynamics wrie for = 0...T 1: X +1 = minx E + A +1 x P-a.s Noe ha consrains 4.1 are fully aken ino accoun in he modelling of he dam dynamics. An elecriciy producion P is associaed wih he effecively urbinaed waer volume and i also depends on he waer sorage indeed on he waer level in he dam due o he fall heigh effec: P = gx E. 4.5 Le D =1...T denoes he elecriciy demand which is supposed o be a sochasic process wih known probabiliy law. In our decision-hazard framework producion P has o mee demand D +1 : eiher P D +1 and he producion excess is sold on he elecriciy marke or P D +1 and he gap mus be compensaed for eiher by buying power on he marke or by paying a penaly. The associaed cos is modelled as c D +1 P. 4.6 Ulimaely we suppose ha a penaly funcion K on he final sock X T is given and ha he iniial condiion X 0 is a random variable wih known probabiliy law. Le W =0...T be he noise random process defined as W 0 = X 0 W = A D = 1...T. We assume ha he noises are fully observed in a non-anicipaive way and ha he conrol variables are measurable wih respec o he pas noises. The dam managemen problem is hen he following. min U X T 1 E c D+1 gx E + KX T 4.7a =0 subjec o he consrains and o he measurabiliy consrains U W 0...W = 0...T b I precisely corresponds o he sochasic opimal conrol problem formulaion 2.5 when using he following noaions: w = a d L x u w = c d g x minu x + a x f x u w = min x minu x + a x + a x Γ as = [u u].

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking