Optimal approximate dynamic programming algorithms for a general class of storage problems

Size: px
Start display at page:

Download "Optimal approximate dynamic programming algorithms for a general class of storage problems"

Transcription

1 Opimal approximae dynamic programming algorihms for a general class of sorage problems Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy July 24, 2008

2 Absrac There are many applicaions which involve he dynamic conrol of a scalar quaniy (waer, money, vaccines) in he presence of a vecor of exogenous parameers ha influence he righ amoun of he resource ha should be held. These problems can be formulaed as dynamic programs, bu he resuling vecor-valued sae variable makes hem compuaionally inracable. Approximae dynamic programming is promising, bu opimal algorihms require ha saes be visied infiniely ofen, which has he effec of requiring ha we explore all he saes. We prove convergence of an algorihm ha uses pure exploiaion, eliminaing any need o explicily visi all he saes.

3 1 Inroducion We consider he problem of managing a single resource class (vaccines, money, waer, commodiies, invenory of a produc) in he presence of muliple parameers ha evolve exogenously (disease, prices, ineres raes, climae, echnology, demands). Le R be he scalar quaniy of resource on hand, and le W be a vecor-valued Markov process describing he relevan exogenous parameers, which means ha S = (R, W ) is he sae of our sysem. x is a poenially vecor-valued conrol. x migh describe he amoun of produc we purchase from differen suppliers, and he amoun we sell o differen ypes of cusomers. An opimal policy is described by Bellman s equaion V (S ) = max x X (C(S, x ) + γie {V +1 (S +1 ) S }) (1) where S +1 = f(s, x, W +1 ). We are going o assume ha R and W are discree. Alhough S may only have four or five dimensions, his is ofen enough o render (1) compuaionally inracable. Aside from he challenge of enumeraing he sae space, compuing he expecaion may also be compuaionally difficul. There are many applicaions of his problem class. One nice example involves deermining he hourly flows of differen ypes of raw energy resources (coal, nuclear, ehanol, solar, windmills and hydroelecric power) o serve differen ypes of demand (ligh and heavy indusrial, residenial and differen ypes of ransporaion). Assume ha he only form of sorage is he aggregae amoun of waer in waer reservoirs. The vecor of decisions involve deermining how much of differen ypes of energy resources o conver o serve he differen ypes of demand. The only decision ha links differen ime periods is he amoun of waer held in reservoirs (his is he scalar resource variable). Prices and rainfall evolve exogenously according o a Markov process (his is he vecor-valued exogenous sae variable). We propose a Mone Carlo-based algorihm based on he principles of approximae dynamic programming wih a lookup-able value funcion (since he sae is discree). We exploi he common propery ha C(S, x ) is concave in he resource dimension R, reflecing he propery ha addiional resources conribue diminishing marginal reurns. Approxi- 1

4 mae dynamic programming makes decisions by solving problems of he form ˆv n = max x X n ( C(S n, x ) + γie { }) n 1 V +1 (S +1 ) S n (2) where n 1 V (s) is an approximaion of he value funcion a sae s, and where X n feasible region ha may depend on S n. Le x n is he be he value of x ha solves (2). We use a lookup-able represenaion of he value funcion, which means ha we perform updaes using recursions of he form V n (S n ) = (1 α n 1 ) V n 1 (S n ) + α n 1ˆv n. The power of lookup ables is heir generaliy, since hey make no assumpions abou he srucure of he value funcion. However, virually he enire lieraure ha uses lookup ables also assumes small acion spaces ha make i possible o search over all possible decisions (see Suon & Baro (1998)). In our work, x is a vecor, which generally forces us o use some ype of mah programming algorihm. The srucure of our problem makes his possible. In our algorihm, S+1 n = f(s n, x n, W +1 (ω n )), which means ha he nex sae is chosen greedily. We do no require any form of sae exploraion, a common sraegy o obain a proof of convergence which does no necessarily provide a pracical algorihm (see Bersekas & Tsisiklis (1996) and Powell (2007) for complee discussions of hese issues). Bu we do inroduce a sep ha mainains concaviy in he resource dimension. We prove ha he algorihm converges almos surely o an opimal policy. Our proof echnique combines he convergence proof for a monoone mapping in Bersekas & Tsisiklis (1996) wih he proof of he SPAR algorihm in Powell e al. (2004). Curren proofs of convergence for approximae dynamic programming algorihms such as Q-learning (Tsisiklis (1994), Jaakkola e al. (1994)) and opimisic policy ieraion (Tsisiklis (2002)) require ha we visi saes (and possibly acions) infiniely ofen. A convergence proof for a Real Time Dynamic Programming (Baro e al. (1995)) algorihm ha considers a pure exploiaion scheme is provided in Bersekas & Tsisiklis (1996) [Prop. 5.3 and 5.4], bu i assumes ha expeced values can be compued and he iniial approximaions are opimisic. 2

5 We make no such assumpions, bu i is imporan o emphasize ha our resul depends on he concaviy of he opimal value funcions. Oher approaches o deal wih our problem class would be differen flavors of Benders decomposiion (Van Slyke & Wes (1969), Higle & Sen (1991), Chen & Powell (1999)) and sample average approximaion (Shapiro (2003))(SAA). However, Benders decomposiion will no handle an arbirary ype of exogenous informaion. On he oher hand, SAA relies on generaing random samples ouside of he opimizaion problems and hen solving he corresponding deerminisic problems using an appropriae opimizaion algorihm. Numerical experimens wih he SAA approach applied o problems where an ineger soluion is required can be found in Ahmed & Shapiro (2002). We begin in secion 2 wih an illusraion of a sorage problem. Secion 3 gives he opimaliy equaions and describes some properies of he value funcion. Secion 4 describes he algorihm in deail. Secion 5 gives he convergence proof, which is he hear of he paper. Secion 6 concludes he paper. 2 A sorage model In his secion we illusrae a sorage problem for an applicaion where produc can be purchased from muliple supplies, sold o muliple cusomers, and which mus be managed in he presence of muliple ypes of exogenous parameers (such as weaher ha migh affec he demand for he produc, or prices of compeing producs). This model is only an illusraion, since he cenral resul of he paper is a convergence proof for a general problem class. We assume ha he exogenous informaion process W is Markov, and ha i includes boh he demand for he sored asse as well as any oher exogenous informaion. The demand may be vecor-valued represening differen cusomer ypes, which we represen using D = (D 1,..., D B d), where B d is he number of differen sources of demand. The demand does no need o be fully saisfied, bu no backlogging is allowed. We emphasize ha differen ypes of informaion can also be conained in W. Examples include raes of reurn, 3

6 ineres raes, weaher condiions, emperaures and compeiive response. We assume he demand vecor is ineger valued. We also assume he informaion vecor W is an elemen of a se W, which has finie suppor. We denoe by R he scalar sorage level righ before a decision is aken. We also define S = (W, R ) as he pre-decision sae, where we use S and (W, R ) inerchangeably. The decision x = (x d, x s, x r ) has hree componens. The firs componen, x d, is a vecor ha represens how much o sell o saisfy he demands D. The second componen, x s, is also a vecor ha represens how much o buy from B s differen suppliers o replenish he asse level. Finally, x r is a scalar ha denoes he quaniy ha should be ransferred (discarded), in case he sorage level is oo high, even afer demand has been saisfied. The feasible se, which is dependen on he curren sae (W, R ), is given by X (W, R ) = { x d IR Bd, x s IR Bs, x r IR : B d i=1 x d i + x r R, 0 x d i D i, i = 1,..., B d, 0 x s i M i, i = 1,..., B s, 0 x s i u s (W ), i = 1,..., B s, 0 x r i u r (W ) }, where u s ( ) and u r ( ) are nonnegaive ineger valued funcions and M i is a deerminisic ineger bound. The firs consrain indicaes ha backlogging is no allowed, he second indicaes ha we do no sell more han is demanded and he hird indicaes ha each supplier can only offer a limied amoun of asses. The las wo consrains impose upper bounds ha varies wih he informaion vecor W. The conribuion generaed by he decision is given by B d B d C (W, R, x ) = c u i(w )(D i x d i) c o (W ) R i=1 B s c s i(w )x s i + c d i(w )x d i, i=1 B d i=1 4 i=1 x d i c r (W )x r

7 where c u i( ), c o ( ), c r ( ), c s i( ) and c d i( ) are nonnegaive scalar funcions. The firs hree erms represen underage, overage and ransfer coss, respecively. The fourh erm represens he buying cos and he las one represens he money ha is made saisfying demand. The decision akes he sysem o a new sorage level, denoed by he scalar R x, given by R x = f x (R, x ) B d i=1 B s = R x d i + x s i x r. i=1 where f x (R, x ) is our ransiion funcion ha reurns he pos-decision resource sae. In he remainder of he paper, we use f x (R, x ) wihou assuming any specific srucure. We define S x = (W, R x ) as he pos-decision sae, which is he sae immediaely afer we have made a decision bu before any new informaion has arrived. As wih he predecision sae, we use S x and (W, R x ) inerchangeably. New informaion W +1 becomes available a ime period + 1 and he sorage level evolves o R +1 = R x + ˆR +1, where ˆR +1 is a nonnegaive ineger valued funcion ha represens exogenous changes in he sorage level (such as rainfall ino a reservoir, or hurricane damage o power generaion capaciy). Noe ha in his illusraion, he decision x impacs he asse level in a linear way. Moreover, i does no influence in any way he informaion vecor W +1. For compleeness, we denoe by R 0 he iniial asse level, which is assumed o be a nonnegaive ineger. Given our assumpions, we have ha he pre/pos asse levels are nonnegaive and bounded from above. We le B pre and B pos be posiive inegers ha are upper bounds for R and R x, respecively. 3 The Opimal Value Funcions We define, recursively, he opimal value funcions associaed wih he sorage class. We denoe by V (W, R ) he opimal value funcion around he pre-decision sae (W, R ) and 5

8 by V x (W, R x ) he opimal value funcion around he pos-decision sae (W, R x ). A ime = T, since i is he end of he planning horizon, he value of being in any sae (W T, RT x ) is zero. Hence, V x T (W T, R x T ) = 0. A ime 1, for = T,..., 1, he value of being in any pos-decision sae (W 1, R x 1) does no involve he soluion of a deerminisic opimizaion problem, i only involves an expecaion, since he nex predecision sae (W, R ) only depends on he informaion ha firs becomes available a. On he oher hand, he value of being in any pre-decision sae (W, R ) does no involve expecaions, since he nex pos-decision sae (W, R x ) is a deerminisic funcion of W, i only requires he soluion of an opimizaion problem. Therefore, V x 1(W 1, R x 1) = IE [ V (W, R ) (W 1, R x 1) ]. (3) V (W, R ) = max C (W, R, x ) + γv x (W, R x ) (4) x X (W,R ) In he remainder of he paper, we only use he value funcion V x (S x ) defined around he pos-decision sae variable, since i allows us o make decisions by solving a deerminisic problem as in (3). We show ha V x (W, ) is concave and piecewise linear wih break poins R = 1,..., B pos. This srucural propery combined wih he opimizaion/expecaion inversion is he foundaion of our algorihmic sraegy and is proof of convergence. For W W, le v (W ) = (v (W, 1),..., v (W, B pos )) be a vecor represening he slopes of a funcion V (W, ) : [0, ) IR ha is concave and piecewise linear wih breakpoins r = 1,..., B pos. I is easy o see ha he funcion F (v (W ), W, ), where B pos F (v (W ), W, R ) = max C (W, R, x ) + γ v (W, r)y r x,y subjec o x X (W, R ), B pos r=1 r=1 y r = f x (R, x ), 0 y 1, is also concave and piecewise linear wih breakpoins r = 1,..., B pos, since he demand vecor D and he upper bounds u s i( ), u r ( ) and M i in he consrain se X (W, R ) are all ineger valued. Moreover, he opimal soluion (x, y ) o he linear programming problem ha defines F (v (W ), W, R ) does no depend on V (W, 0) and is an ineger vecor 6

9 whenever he argumen R is ineger. We also have ha F (v (W ), W, R ) is bounded for all W W. We use F o prove he following proposiion abou he opimal value funcion. Proposiion 1. For = 0,..., T and informaion vecor W W, he opimal value funcion V x (W, ) is concave and piecewise linear wih breakpoins R = 1,..., B pos. We denoe is slopes by v (W ) = (v (W, 1),..., v (W, B pos )), where, for R = 1,..., B pos and < T, v (W, R) is given by v (W, R) = V x (W, R) V x (W, R 1) = IE [ F (v +1(W +1 ), W +1, R + ˆR +1 (W +1 )) F (v +1(W +1 ), W +1, R 1 + ˆR +1 (W +1 )) W ]. (5) Proof. The proof is by backward inducion on. The base case = T holds as V x T (W T, ) is equal o zero for all W T W T. For < T he proof is obained noing ha V x (W, R x ) = IE [ γv x +1(W +1, 0) + F (v +1(W +1 ), W +1, R x + ˆR +1 (W +1 )) W ]. Due o he concaviy of V x (W, ), he slope vecor v (W ) is monoone decreasing, ha is, v (W, R) v (W, R + 1). Moreover, hroughou he paper, we work wih he ranslaed B pos version V (W, ) of V x (W, ) given by V (W, R+y) = v (W, r)+yv (W, R+1), where R is a nonnegaive ineger and 0 y 1, since he opimal soluion (x, y ) associaed wih F (v +1(W +1 ), W +1, R) does no depend on V x (W, 0). We nex inroduce he dynamic programming operaor H associaed wih he sorage class. We define H using he slopes of piecewise linear funcions insead of he funcions hemselves. Le v = {v (W ) for = 0,..., T, W W } be a se of slope vecors, where v (W ) = (v (W, 1),..., v (W, B pos )). The dynamic programming operaor H associaed wih he sorage class maps a se of slope vecors v ino a new se Hv as follows. For = 0,..., T 1, 7 r=1

10 W W and R = 1,..., B pos, (Hv) (W, R) = IE [ F (v +1 (W +1 ), W +1, R + ˆR +1 (W +1 )) F (v +1 (W +1 ), W +1, R 1 + ˆR +1 (W +1 )) W ]. (6) A well known propery of dynamic programming operaors is ha he opimal value funcions are heir unique fixed poin (see, for example, Puerman (1994), Theorem 6.1.1). Therefore, he se of slopes v corresponding o he opimal value funcions V (W, R) for = 0,..., T, W W and R = 1,..., B, is he unique fixed poin of H. Moreover, he dynamic programming operaor H defined by (6) is assumed o saisfy he following condiions. Le ṽ = {ṽ (W ) for = 0,..., T, W W } and ṽ = {ṽ (W ) for = 0,..., T, W W } be ses of slope vecors such ha ṽ (W ) = (ṽ (W, 1),..., ṽ (W, B )) and ṽ (W ) = (ṽ (W, 1),..., ṽ (W, B )) are monoone decreasing and ṽ (W ) ṽ (W ). Then, for W W and R = 1,..., B : (Hṽ) (W ) is monoone decreasing, (7) (Hṽ) (W, R) (Hṽ) (W, R), (8) (Hṽ) (W, R) ηe (H(ṽ ηe)) (W, R) (H(ṽ + ηe)) (W, R) (Hṽ) (W, R) + ηe, (9) where η is a posiive ineger and e is a vecor of ones. The condiions in equaions (7) and (9) imply ha he mapping H is coninuous (see he discussion in Bersekas & Tsisiklis (1996), page 158). The dynamic programming operaor H and he associaed condiions (7)-(9) are used laer on o consruc deerminisic sequences ha are provably convergen o he opimal slopes. 4 The SPAR-Sorage Algorihm We propose a pure exploiaion algorihm, namely he SPAR-Sorage Algorihm, ha provably learns he opimal decisions o be aken a pars of he sae space ha can be reached by an opimal policy, which are deermined by he algorihm iself. This is accomplished 8

11 by learning he slopes of he opimal value funcions a imporan pars of he sae space, hrough he consrucion of value funcion approximaions V n (W, ) ha are concave, piecewise linear wih breakpoins R = 1,..., B pos. The approximaion is represened by is slopes v n (W ) = ( v n (W, 1),..., v n (W, B pos )). Figure 1 illusraes he idea. In order o do so, he algorihm combines Mone Carlo simulaion in a pure exploiaion scheme and sochasic approximaion inegraed wih a projecion operaion. V (W, ) Opimal - Unknown V n (W, ) Approximaion - Consruced v (W, R 1 ) v (W, R 2 ) v n (W, R 1 ) v n (W, R 2 ) v (W, R 3 ) v n (W, R 3 ) v (W, R 4 ) v n (W, R 4 ) Imporan Region v n (W, R 5 ) v (W, R 5 ) R 1 R 2 R 3 Asse Dimension Figure 1: Opimal value funcion and he consruced approximaion Figure 2 describes he SPAR-Sorage algorihm. The algorihm requires an iniial concave piecewise linear value funcion approximaion V 0 (W, ), represened by is slopes v 0 (W ) = ( v 0 (W, 1),..., v 0 (W, B pos )), for each informaion vecor W W. Therefore he iniial slope vecor v 0 (W ) has o be monoone decreasing. For example, i is valid o se all he iniial slopes equal o zero. For compleeness, since we know ha he opimal value funcion a he end of he horizon is equal o zero, we se v n T (W T, R) = 0 for all ieraions n, informaion vecors W T W T and asse levels R = 1,..., B pos. The algorihm also requires an iniial asse level o be used in all ieraions. Thus, for all n 0, R x,n 1 nonnegaive ineger, as in STEP 0b. is se o be a A he beginning of each ieraion n, he algorihm observes a sample realizaion of he informaion sequence W0 n,..., WT n, as in STEP 1. The sample can be obained from a sample generaor or acual daa. Afer ha, he algorihm goes over ime periods = 0,..., T. Firs, he pre-decision asse level R n is compued, as in STEP 2. Then, he decision x n, which is opimal wih respec o he curren pre-decision sae (W n, R n ) and value funcion n 1 approximaion V (W n, ) is aken, as saed in STEP 3. We have ha V n (W, R + y) = 9

12 STEP 0: Algorihm Iniializaion: STEP 0a: Iniialize v 0 (W ) for = 1,..., T 1 and W W monoone decreasing. STEP 0b: Se R x,n 1 = r, a nonnegaive ineger, for all n 0 STEP 0c: Se n = 1. STEP 1: Sample/Observe he informaion sequence W n 0,..., W n T. Do for = 0,..., T : STEP 2: Compue he pre-decision asse level: R n STEP 3: Find he opimal soluion x n of = R x,n 1 + ˆR (W n ). max C (W n x X (W n,rn ), R n, x ) + γ STEP 4: Compue he pos-decision asse level: R x,n = f x (R n, x n ). STEP 5: Updae slopes: n 1 V (W n, f x (R, x )). If < T hen STEP 5a: Observe ˆv +1(R n x,n ) and ˆv +1(R n x,n + 1). See (10). STEP 5b: For W W and R = 1,..., B pos : z n (W, R) = (1 ᾱ n (W, R)) v n 1 (W, R) + ᾱ n (W, R)ˆv +1(R). n STEP 5c: Perform he projecion operaion v n = Π C (z n ). See (14). STEP 6: Increase n by one and go o sep 1. Figure 2: SPAR-Sorage Algorihm R v n (W, r)+y v n (W, R+1), where R is a nonnegaive ineger and 0 y 1. Nex, aking r=1 ino accoun he decision, he algorihm compues he pos-decision asse level R x,n STEP 4., as in Time period is concluded by updaing he slopes of he value funcion approximaion. Seps 5a-5c describes he procedure and figure 3 illusraes i. Sample slopes relaive o he pos-decision saes (W n, R x,n ) and (W n, R x,n + 1) are observed, see STEP 5a and figure 3a. 10

13 Afer ha, hese samples are used o updae he approximaion slopes v n 1 (W n ), hrough a emporary slope vecor z n (W n ). This procedure requires he use of a sepsize rule ha is sae dependen, denoed by ᾱ n (W, R), and i may lead o a violaion of he propery ha he slopes are monoonically decreasing, see STEP 5b and figure 3b. Thus, a projecion operaion Π C is performed o resore he propery and updaed slopes v n (W n ) are obained, see STEP 5c and figure 3c. Afer he end of he planning horizon T is reached, he ieraion couner is incremened, as in STEP 6, and a new ieraion is sared from STEP 1. We have ha x n is easily compued since i is he firs componen of he opimal soluion o he linear programming problem in he definiion of F given our assumpions and he properies of F ha R n, x n and R x,n ( v n 1 (W n ), W n, R n ). Moreover, discussed in he previous secion, i is clear are all ineger valued. We also know ha hey are bounded. Therefore, he sequences of decisions, pre and pos decision saes generaed by he algorihm, which are denoed by {x n } n 0, {S n } n 0 = {(W n, R n )} n 0 and {S x,n } = {(W n, R x,n )} n 0, respecively, have a leas one accumulaion poin. Since hese are sequences of random variables, heir accumulaion poins, denoed by x, S and S, respecively, are also random variables. The sample slopes used o updae he approximaion slopes are obained by replacing he expecaion and he slopes v +1(W +1 ) of he opimal value funcion in (5) by a sample realizaion of he informaion W n +1 and he curren slope approximaion v n 1 +1 (W n +1), respecively. Thus, for = 1..., T, he sample slope is given by ˆv n +1(R) = F ( v n 1 +1 (W n +1), W n +1, R + ˆR +1 (W n +1)) F ( v n +1(W n +1), W n +1, R 1 + ˆR +1 (W n +1)). (10) The updae procedure is hen divided ino wo pars. Firs, a emporary se of slope vecors z n = {z n (W, R) : W W, R = 1,..., B pos } is produced combining he curren approximaion and he sample slopes using he sepsize rule ᾱ n (W, R). We have ha where α n ᾱ n (W, R) = α n 1 {W=W n} (1 {R=R x,n } + 1 {R=R x,n +1}), is a scalar beween 0 and 1 and can depend only on informaion ha became available up unil ieraion n and ime. Moreover, on he even ha (W, R ) is an 11

14 F( v n 1 (W n ), W n, R n, x) R n ˆv n +1(R x,n ) x n R x,n ˆv n +1(R x,n + 1) Exac x Slopes of v n 1 (W n, R n ) n 1 V (W n, ) 3a: Curren approximae funcion, opimal decision and sampled slopes W n Concaviy violaion Exac z n (W n, R n ) Temporary slopes F( v n 1 (W n ), W n, R n, x) 3b: Temporary approximae funcion wih violaion of concaviy Exac x v n (W n, R n ) Slopes of V n (W n, ) 3c: Level projecion operaion: updaed approximae funcion wih concaviy resored Figure 3: Updae procedure of he approximae slopes accumulaion poin of {(W n, R x,n )} n 0, we make he sandard assumpions ha n=1 n=1 ᾱ n (W, R ) = a.s., (11) (ᾱ n (W, R )) 2 B α < a.s., (12) where B α is a consan. Clearly, he rule α n = 1 NV n (W,R ) saisfies all he condiions, where NV n (W, R ) is he number of visis o sae (W, R ) up unil ieraion n. Furhermore, 12

15 for all posiive inegers N, n=n (1 ᾱ n (W, R )) = 0 a.s. (13) The proof for (13) follows direcly from he fac ha log(1 + x) x. The second par is he projecion operaion, where he emporary slope vecor z n (W ), ha may no be monoone decreasing, is ransformed ino anoher slope vecor v n (W ) ha has his srucural propery. The projecion operaor imposes he desired propery by simply forcing he violaing slopes o be equal o he newly updaed ones. For W W and R = 1,..., B pos, he projecion is given by z n (W n, R x,n ), if W = W n, Π C (z n )(W, R) = z n (W n, R x,n + 1), if W = W n, R > R x,n z n (W, R), oherwise. R < R x,n, z n (W, R) z n (W n, R x,n ) + 1, z n (W, R) z n (W n, R x,n + 1) (14) For = 0,..., T, informaion vecor W W and asse level R = 1,..., B pos, he sequence of slopes of he value funcion approximaion generaed by he algorihm is denoed by { v n (W, R)} n 0. Moreover, as he funcion F ( v n 1 (W n ), W n, R n ) is bounded and he sepsizes α n are beween 0 and 1, we can easily see ha he sample slopes ˆv n (R), he emporary slopes z n (W, R) and, consequenly, he approximaed slopes v n (W, R) are all bounded. Therefore, he slope sequence { v n (W, R)} n 0 has a leas one accumulaion poin, as he projecion operaion guaranees ha he updaed vecor of slopes are elemens of a compac se. The accumulaion poins are random variables and are denoed by v (W, R), as opposed o he deerminisic opimal slopes v (W, R). We conclude denoing by B v he deerminisic ineger ha bounds ˆv n (R), z n (W, R), v n (W, R) and v (W, R) for all W W and R = 1,..., B pos. 5 Convergence Analysis We sar his secion presening he convergence resuls we wan o prove. The major resul is he almos sure convergence of he approximaion slopes corresponding o saes 13

16 ha are visied infiniely ofen. On he even ha (W, R ) is an accumulaion poin of {(W n, R x,n )} n 0, we obain v n (W, R ) v (W, R ) and v n (W, R + 1) v (W, R + 1) a.s. As a byproduc of he previous resul, we show ha, for = 0,..., T, on he even ha (W, R, x ) is an accumulaion poin of {(W n, R n, x n )} n 0, x = arg max C (W x X (W,R ), R, x ) + γv where V (W, ) is he ranslaed opimal value funcion. (W, f x (R, x )) a.s. (15) Equaion (15) implies ha he algorihm has learned almos surely an opimal decision for all saes ha can be reached by an opimal policy. This implicaion can be easily jusified as follows. Pick ω in he sample space. We omi he dependence of he random variables on ω for he sake of clariy. For = 0, since R x,n 1 = r, a given consan, for all ieraions of he algorihm, we have ha R 1 = r. Moreover, all he elemens in W 0 are accumulaion poins of {W n 0 } n 0, as W 0 has a finie suppor. Thus, (15) ells us ha he accumulaion poins x 0 of he sequence {x n 0} n 0 along he ieraions wih pre-decision sae (W 0, R 1 + ˆR 0 (W 0 )) are in fac an opimal policy for period 0 when he informaion is W 0. This implies ha all accumulaion poins R 0 = f x (R 1 + ˆR 0 (W 0 ), x 0 ) of {R x,n 0 } n 0 are pos-decision asse levels ha can be reached by an opimal policy. By he same oken, for = 1, every elemen in W 1 is an accumulaion poin of {W n 1 } n 0. Hence, (15) ells us ha he accumulaion poins x 1 of he sequence {x n 1} along ieraions wih (W n 1, R n 1 ) = (W 1, R 0 + ˆR 0 (W 1 )) are indeed an opimal policy for period 1 when he informaion is W 1 and he pre-decision asse level is R 1 = R 0 + ˆR 0 (W 1 ). As before, he accumulaion poins R 1 = f x (R 1, x 1 ) of {R,n 1 } n 0 are pos-decision asse levels ha can be reached by an opimal policy. The same reasoning can be applied for = 2,..., T. 5.1 Ouline of he Convergence Proofs Our proofs follow he ideas presened in Bersekas & Tsisiklis (1996) and in Powell e al. (2004). The firs proves convergence assuming ha all saes are visied infiniely ofen. 14

17 The auhors do no consider a concaviy-preserving sep, which is he key elemen ha has allowed us o obain a convergence proof when a pure exploiaion scheme is considered. Alhough he framework in Powell e al. (2004) also considers he concaviy of he opimal value funcions in he asse dimension, he use of a projecion operaion o resore concaviy and a pure exploiaion rouine, heir proof is resriced o wo-sage problems. The main concep o achieve he convergence of he approximaion slopes o he opimal ones is o consruc deerminisic sequences of slopes, namely, {L k (W, R)} k 0 and {U k (W, R)} k 0, ha are provably convergen o he slopes of he opimal value funcions. These sequences are based on he dynamic programming operaor H, as inroduced in (6). We hen use hese sequences o prove almos surely ha for all k 0, L k (W, R ) v n (W, R ) U k (W, R ), (16) L k (W, R + 1) v n (W, R + 1) U k (W, R + 1), (17) on he even ha he ieraion n is sufficienly large and (W, R ) is an accumulaion poin of {(W n, R x,n )} n 0, which implies he convergence of he approximaion slopes o he opimal ones. Esablishing (16) and (17) requires several inermediae seps ha need o ake ino consideraion he pure exploiaion naure of our algorihm and he concaviy preserving operaion. We give all he deails in he proof of L k (W, R ) v n (W, R ) and L k (W, R + 1) v n (W, R + 1). The upper bound inequaliies are obained using a symmerical argumen. Firs, we define wo auxiliary sochasic sequences of slopes, namely, he noise and he bounding sequences, denoed by, { s n (W, R)} n 0, and { l n (W, R)} n 0, respecively. The firs sequence represens he noise inroduced by he observaion of he sample slopes, which replaces he observaion of rue expecaions and he opimal slopes. The second one is a convex combinaion of he deerminisic sequence L k (W, R) and he ransformed sequence (HL k ) (W, R). The wo sochasic sequences are used o show ha on he even ha he 15

18 ieraion n is big enough and (W, ) is an elemen of he random se S, v n 1 (W, ) n 1 l (W, ) s n 1 (W, ) a.s. The se S conains he saes (W, R ) and (W, R + 1), such ha (W, R ) is an accumulaion poin of {(W n, R x,n )} n 0 and he projecion operaion decreased or kep he same he corresponding unprojeced slopes infiniely ofen. Then, on {(W, ) S }, convergence o zero of he noise sequence, he convex combinaion propery of he bounding sequence and he monoone decreasing propery of he approximae slopes, will give us L k (W, ) v n 1 (W, ) a.s. Noe ha his inequaliy does no cover all he accumulaion poins of {(W n, R x,n )} n 0, since hey are resriced o saes in he se S. Neverheless, his inequaliy and some properies of he projecion operaion are used o fulfill he requiremens of a bounding echnical lemma, which is used repeaedly o obain he desired lower bound inequaliies for all accumulaion poins. In order o prove (15), we noe ha F ( v n 1 (W n ), W n, R n, x ) = C (W n, R n, x ) + γ n 1 V (W n, f x (R n, x )) is a concave funcion of X and X (W n, R n ) is a convex se. Therefore, where x n 0 F ( v n 1 (W n ), W n, R n, x n ) X N (W n, R n, x n ), is he opimal decision of he opimizaion problem in STEP 3a of he algorihm, F ( v n 1 (W n ), W n, R n, x n ) is he subdifferenial of F ( v n 1 (W n ), W n, R n, ) a x n and X N (W n, R n, x n ) is he normal cone of X (W n, R n ) a x n. This inclusion and he firs convergence resul are hen combined o prove, as shown in secion 5.4, ha 0 F (v (W ), W, R, x ) X N (W, R, x ). 16

19 5.2 Technical Elemens In his secion, we se he sage o he convergence proofs by defining some echnical elemens. We sar wih he definiion of he deerminisic sequence {L k (W, R)} k 0. For = 0,..., T 1, W W and R = 1,..., B pos, we have ha L 0 (W, R) = v (W, R) 2B v and L k+1 (W, R) = Lk (W, R) + (HL k ) (W, R). (18) 2 A he end of he planning horizon T, L k T (W T, R) = 0 for all k 0. The proposiion below inroduces he required properies of he deerminisic sequence {L k (W, R)} k 0. Is proof is deferred o he appendix. Proposiion 2. For = 0,..., T 1, informaion vecor W W and asse levels R = 1,..., B pos, L k (W ) is monoone decreasing, (19) (HL k ) (W, R) L k+1 (W, R) L k (W, R), (20) L k T (W, R) < v (W, R), and lim k L k (W, R) = v (W, R). (21) The deerminisic sequence {U k (W, R)} k 0 is defined in a symmerical way. I also has he properies saed in proposiion 2, wih he reversed inequaliy signs. We move on o define he random index N ha is used o indicae when an ieraion of he algorihm is large enough for convergence analysis purposes. Le N be he smalles ineger such ha all saes (acions) visied (aken) by he algorihm afer ieraion N are accumulaion poins of he sequence of saes (acions) generaed by he algorihm. In fac, N can be required o saisfy oher consrains of he ype: if an even did no happen infiniely ofen, hen i did no happen afer N. Since we need N o be finie almos surely, he addiional number of consrains have o be finie. We inroduce he se of ieraions, namely N (W, R), ha keeps rack of he effecs produced by he projecion operaion. For W W and R = 1,..., B pos, le N (W, R) be he se of ieraions in which he unprojeced slope corresponding o sae (W, R), ha is, 17

20 z n (W, R) was oo large and had o be decreased by he projecion operaion. Formally, N (W, R) = {n IN : z n (W, R) > v n (W, R)}. For example, based on figure 3c, n N (W n, R n + 2). A relaed se is he se of saes S. A sae (W, R) is an elemen of S if (W, R) is equal o an accumulaion poin (W, R ) of {(W n, R x,n )} n 0 or is equal o (W, R + 1). Is corresponding approximae slope also has o saisfy he condiion z n (W, R) v n (W, R) for all n N, ha is, he projecion operaion increased or kep he same he corresponding unprojeced slopes infiniely ofen. We close his secion dealing wih measurabiliy issues. Le (Ω, F, IP) be he probabiliy space under consideraion. 1, = 0,..., T }. Moreover, for n 1 and = 0,..., T, F n The sigma-algebra F is defined by F = σ{(w n, x n ), ( = σ {(W m, xm ), 0 < m < n, = 0,..., T } ) {(W n, xn ), = 0,..., }. Clearly, F n F n +1 and F n T F n+1 0. Furhermore, given he iniial slopes v 0 (W ) and he iniial asse level r, we have ha R n, R x,n and α n are in F n, while ˆv n +1(R x ), z n (W ), v n (W ) are in F n +1. n A poinwise argumen is used in all he proofs of almos sure convergence presened in his paper. Thus, zero-measure evens are discarded on an as-needed basis. 5.3 Almos sure convergence of he slopes We prove ha he approximaion slopes produced by he SPAR-Sorage algorihm converge almos surely o he slopes of he opimal value funcions of he sorage class for saes ha can be reached by an opimal policy. This resul is saed in heorem 1 below. Along wih he proof of he heorem, we presen he noise and he bounding sochasic sequences and inroduce hree echnical lemmas. Their proofs are given in he appendix so ha he main reasoning is no disruped. Before he heorem, we inroduce a echnical assumpion ha is non-rivial due o he pure exploiaion naure of our algorihm. Moreover, is verificaion is highly dependen on he specific problem wihin he sorage class. An example of such verificaion can be found in Nascimeno & Powell ((o appear) for he lagged acquisiion problem. Given k 0 18

21 and = 0,..., T 1, we assume here exiss a posiive random variable N k {n N k }, such ha on (HL k ) (W n, R n ) (H v n 1 ) (W n, R n ) (HU k ) (W n, R n ) a.s., (22) (HL k ) (W n, R n + 1) (H v n 1 ) (W n, R n + 1) (HU k ) (W n, R n + 1) a.s. (23) Theorem 1. Assume he sepsize condiions (11) (12). Also assume (22) and (23). Then, for all k 0 and = 0,..., T, on he even ha (W, R ) is an accumulaion poin of {(W n, R x,n )} n 0, he sequences of slopes { v n (W, R )} n 0 and { v n (W, R + 1)} n 0 generaed by he SPAR-Sorage algorihm for he sorage class converge almos surely o he opimal slopes v (W, R ) and v (W, R + 1), respecively. Proof. As discussed in secion 5.1, since he deerminisic sequences {L k (W, R x )} k 0 and {U k (W, R x )} k 0 do converge o he opimal slopes, he convergence of he approximaion sequences is obained by showing ha for each k 0 here exiss a nonnegaive random variable N,k such ha on he even ha n N,k of {(W n, R x,n )} n 0, we have L k (W, R ) v n (W, R ) U k (W, R ) a.s. L k (W, R + 1) v n (W, R + 1) U k (W, R + 1) a.s. and (W, R ) is an accumulaion poin We concenrae on he inequaliies L k (W, R ) v n (W, R ) and L k (W, R + 1) v n (W, R + 1). The upper bounds are obained using a symmerical argumen. The proof is by backward inducion on. The base case = T is rivial as L k T (W T, R) = v n T (W T, R) = 0 for all W T can pick, for example, N,k T W T, R = 1,..., B pos, k 0 and ieraions n 0. Thus, we = N, where N, as defined in secion 5.2, is a random variable ha denoes when an ieraion of he algorihm is large enough for convergence analysis purposes. The backward inducion proof is compleed when we prove, for = T 1,..., 0 and k 0, ha here exiss N,k accumulaion poin of {(W n, R x,n )} n 0, such ha on he even ha n N,k L k (W, R ) v n (W, R ) and L k (W, R and (W, R ) is an + 1) v n (W, R + 1) a.s. (24) 19

22 Given he inducion hypohesis for +1, he proof for ime period is divided ino wo pars. In he firs par, we prove for all k 0 ha here exiss a nonnegaive random variable N k such ha L k (W, ) v n 1 (W, ), a.s. on {n N k, (W, ) S }. (25) Is proof is by inducion on k. Noe ha i only applies o saes in he random se S. Then, again for, we ake on he second par, which akes care of he saes no covered by he firs par, proving he exisence of a nonnegaive random variable N,k such ha he lower bound inequaliies are rue on {n N,k } for all accumulaion poins of {(W n, R x,n )} n 0. We sar he backward inducion on. Pick ω Ω. We omi he dependence of he random elemens on ω for compacness. Remember ha he base case = T is rivial and we pick N,k T = N. We also pick, for convenience, N k T = N. Inducion Hypohesis: Given = T 1,..., 0, assume, for +1, and all k 0 he exisence of inegers N k +1 and N,k +1 such ha, for all n N k +1, (25) is rue, and, for all n N,k +1, inequaliies in (24) hold rue for all accumulaion poins (W, R ). Par 1: For our fixed ime period, we prove for any k, he exisence of an ineger N k for n N k, inequaliy(25) is rue. The proof is by forward inducion on k. such ha We sar wih k = 0. For every sae (W, R), we have ha B v v (W, R) B v, implying, by definiion, ha L 0 (W, R) B v. Therefore, (25) is saisfied for all n 1, since we know ha v n 1 (W, R) is bounded by B v and B v for all ieraions. Thus, N 0 = max ( ) 1, N,0 +1 = N,0 +1. The inducion hypohesis on k assumes ha here exiss N k such ha, for all n N k (25) is rue. Noe ha we can always make N k larger han N,k +1, hus we assume ha N k N,k +1. The nex sep is he proof for k + 1. Before we move on, we depar from our poinwise argumen in order o define he sochasic noise sequence and sae a lemma describing an imporan propery of his sequence. We 20

23 sar by defining, for R = 1,..., B pos, he random variable ŝ n +1(R) = (H v n 1 ) (W n, R) ˆv n +1(R) ha measures he error incurred by observing a sample slope. Using ŝ n +1(R) we define for each W W he sochasic noise sequence { s n (W, R)} n 0. We have ha s n (W, R) = 0 on {n < N k } and, on {n N k }, s n (W, R) is equal o max ( 0, (1 ᾱ n (W, R)) s n 1 (W, R) + ᾱ n (W, R)ŝ n +1(R x,n The sample slopes are defined in a way such ha 1 {R R x,n } + (R n + 1)1 {R>R n }) ). IE [ ŝ n +1(R) F n ] = 0. (26) This condiional expecaion is called he unbiasedness propery. This propery, ogeher wih he maringale convergence heorem and he boundedness of boh he sample slopes and he approximae slopes are crucial for proving ha he noise inroduced by he observaion of he sample slopes, which replace he observaion of rue expecaions, go o zero as he number of ieraions of he algorihm goes o infiniy, as is saed in he nex lemma. Lemma 1. On he even ha (W, R ) is an accumulaion poin of {(W n, R x,n )} n 0 we have ha { s n (W, R )} n 0 0 and { s n (W, R + 1)} n 0 0 a.s. (27) Proof of lemma 1. Given in he appendix. Reurning o our poinwise argumen, where we have fixed ω Ω, we use he convenion ha he minimum of an empy se is +. Le { } δl k (HL k ) (W, ) L k (W, ) = min : (W, ) 4 S, (HL k ) (W, ) > L k (W, ). If δ k L < + we define an ineger N L N k o be such ha N L 1 m=n k ( 1 ᾱ m (W, ) ) for all n N L and saes (W, are rue. If δ k L 1/4 and s n 1 (W, = + hen, for all saes (W, since (20) ells us ha (HL k ) (W ) L k (W ) δ k L, (28) ) S. Such an N L exiss because boh (13) and (27) ) S, (HL k ) (W ). Thus, L k+1 21 (W,, ) = L k (W ) = L k (W,, ) ) and

24 we define he ineger N L o be equal o N k. We le N k+1 ha (25) holds for n N k+1. ( = max N L, N,k+1 +1 ) and show We pick a sae (W, ) S. If L k+1 (W, ) = L k (W, ), hen inequaliy (25) follows from he inducion hypohesis. We herefore concenrae on he case where L k+1 (W, ) > L k (W, ). Firs, we depar one more ime from he poinwise argumen o inroduce he sochasic bounding sequence. We also sae a lemma combining his sequence wih he sochasic noise sequence. For W W and R = 1,..., B pos, we have on {n < N k } ha l n (W, R) = L k (W, R) and, on {n N k }, ln (W, R) = (1 ᾱ n (W, R)) n 1 l (W, R) + ᾱ n (W, R)(HL k ) (W, R). The nex lemma saes ha he noise and he sochasic bounding sequences can be used o provide a bound for he approximae slopes as follows. Lemma 2. On {n N k, (W, ) S }, v n 1 (W, ) n 1 l (W, ) s n 1 (W, ) a.s. (29) Proof of lemma 2. Given in he appendix. Back o our fixed ω, a simple inducive argumen proves ha l n (W, R) is a convex combinaion of L k (W, R) and (HL k ) (W, R). Therefore we can wrie, where ln 1 (W, b n 1 = )) = n 1 n 1 b L k (W, ( 1 ᾱ m (W, ) + (1 ) ). For n N k+1 n 1 b )(HL k ) (W, ), N L, we have b n 1 1/4. Moreover, L k (W, m=n k ) (HL k ) (W, ). Thus, using (18) and he definiion of δl k, we obain ln 1 (W, ) 1 4 Lk (W, = 1 2 Lk (W, ) (HLk ) (W, ) ) (HLk ) (W, ) ((HLk ) (W, ) L k (W, )) L k+1 (W, ) + δl. k (30) 22

25 Combining (29) and (30), we obain, for all n N k+1 N L, v n 1 (W, ) L k+1 (W, L k+1 (W, = L k+1 (W, ) + δl k s n 1 (W ) + δl k δl k ),, ) where he las inequaliy follows from (28). Par 2: We coninue o consider ω picked in he beginning of he proof of he heorem. In his par, we ake care of he saes (W, R ) ha are accumulaion poins bu are no in S. In conras o par 1, he proof echnique here is no by forward inducion on k. We rely enirely on he definiion of he projecion operaion and on he elemens defined in secion 5.2, as his par of he proof is all abou saes for which he projecion operaion decreased he corresponding approximae slopes infiniely ofen, which migh happen when some of he opimal slopes are equal. Of course his fac is no verifiable in advance, as he opimal slopes are unknown. Remember ha a ieraion n ime period, we observe he sample slopes ˆv n +1(R x,n ) and ˆv n +1(R x,n + 1) and i is always he case ha ˆv +1(R n x,n ) ˆv +1(R n x,n + 1), implying ha he resuling emporary slope z n (W n, R x,n ) is bigger han z n (W n, R x,n + 1). Therefore, according o our projecion operaor, he updaed slopes v n (W n, R x,n ) and v n (W n, R x,n + 1) are always equal o z n (W n, R x,n ) and z n (W n, R x,n +1), respecively. Due o our sepsize rule, as described in secion 4, he slopes corresponding o (W n, R x,n ) and (W n, R x,n + 1) are he only ones updaed due o a direc observaion of sample slopes a ieraion n, ime period. All he oher slopes are modified only if a violaion of he monoone decreasing propery occurs. Therefore, he slopes corresponding o saes wih informaion vecor W W differen han W n, no maer he asse level R = 1,..., B pos, remain he same a ieraion n ime period, ha is, v n 1 (W, R) = z n (W, R) = v n (W, R). On he oher hand, i is always he case ha he emporary slopes corresponding o saes wih informaion vecor W n and asse levels smaller han R x,n can only be increased by he projecion operaion. If 23

26 necessary, hey are increased o be equal o v n (W n, R x,n ). Similarly, he emporary slopes corresponding o saes wih informaion vecor W n and asse levels greaer han R x,n + 1 can only be decreased by he projecion operaion. If necessary, hey are decreased o be equal o v n (W n, R x,n + 1), see figure 3c. Keeping he previous discussion in mind, i is easy o see ha for each W W, if R Min he minimum asse level such ha (W, R Min ) is an accumulaion poin of {(W n, R x,n )} n 0, hen he slope corresponding o (W, R Min ) could only be decreased by he projecion operaion a finie number of ieraions, as a decreasing requiremen could only be originaed from an asse level smaller han R Min. However, no sae wih informaion vecor W level smaller han R Min is and asse is visied by he algorihm afer ieraion N (as defined in secion 5.2), since only accumulaion poins are visied afer N. We hus have ha (W, R Min ) is an elemen of he se S, showing ha S is a proper se. Hence, for all saes (W, R ) ha are accumulaion poins of {(W n, R x,n )} n 0 and are no elemens of S, here exiss anoher sae (W, ), where is he maximum asse level smaller han R beween siuaion. and R such ha (W, ) S. We argue ha for all asse levels R (inclusive), we have ha N (W, R) =. Figure 4 illusraes he v (W, R) S N (W, R 2) = N (W, R N (W, R 1) = ) = R Min R S R are accumulaion poins Figure 4: Illusraion of echnical elemens relaed o he projecion operaion As inroduced in secion 5.2, we have ha N (W, R) = {n IN : z n (W, R) > v n (W, R)}. By definiion, he ses S and N (W, R) share he following relaionship. Given 24

27 ha (W, R) is an accumulaion poin of {(W n, R x,n )} n 0, hen N (W, R) = if and only if he sae (W, R) is no an elemen of S. Therefore, N (W, R ) =, oherwise (W, R ) would be an elemen of S. If = R 1 we are done. If < R 1, we have o consider wo cases, namely (W, R 1) is accumulaion poin and (W, R 1) is no an accumulaion poin. For he firs case, we have ha N (W, R 1) = from he fac ha his sae is no an elemen of S. For he second case, since (W, R 1) is no an accumulaion poin, is corresponding slope is never updaed due o a direc observaion of sample slopes for n N, by he definiion of N. Moreover, every ime he slope of (W, R ) is decreased due o a projecion (which is coming from he lef), he slope of (W, R 1) has o be decreased as well. Therefore, N (W, R ) {n N} N (W, R 1) {n N}, implying ha N (W, R 1) =. We hen apply he same reasoning for saes (W, R 2),..., (W, 1), obaining ha he corresponding ses of ieraions have an infinie number of elemens. The same reasoning applies o saes (W, R + 1) ha are no in S. We sae a lemma ha is he key elemen for he proof of Par 2, once again going away from he poinwise argumen. Lemma 3. Given an informaion vecor W W and an asse level R = 1,..., B pos 1, if for all k 0, here exiss an ineger random variable N k (W, R) such ha L k (W, R) v n 1 (W, R) almos surely on {n N k (W, R), N (W, R + 1) = }, hen for all k 0, here exiss anoher ineger random variable N k (W, R + 1) such ha L k (W, R + 1) v n 1 (W, R + 1) almos surely on {n N k (W, R + 1)}. Proof of lemma 3. Given in he appendix. Using he properies of he projecion operaor, we reurn o he proof of Par 2 and o our fixed ω. Pick k 0 and a sae (W, R ) ha is an accumulaion poin bu is no in S. The same applies if (W, R + 1) S. Consider he sae (W, ) where is he maximum asse level smaller han R condiion of lemma 3 wih N k (W, such ha (W, ) S. This sae saisfies he ) = N k (from par 1 of he proof). Thus, we can apply his lemma in order o obain, for all k 0, an ineger N k (W, + 1) such ha 25

28 L k (W, + 1) v n 1 (W, + 1), for all n N k (W, + 1). Afer ha, we use lemma 3 again, his ime considering sae (W, he firs applicaion of lemma 3 gave us he ineger N k (W, + 1). Noe ha + 1), necessary o fulfill he condiions of his second usage of he lemma. We repea he same reasoning, applying lemma 3 successively o he saes (W, + 2),..., (W, R 1). In he end, we obain, for each k 0, an ineger N k (W, R ), such ha L k (W, R ) v n 1 (W, R ), for all n N k (W, R ). Figure 5 illusraes his process. lemma 3 (W, + 1) L k (W for n N k (W + 1) v n 1,, + 1) lemma 3 (W, + 1) L k (W for n N k (W + 1) v n 1,, + 1) lemma 3 (W, + 1) L k (W for n N k (W + 1) v n 1,, + 1) v n 1 (W, for n N k (W, ) L k (W, ) = N k ) N (W, + 1) = N (W, + 2) = N (W, R ) = (W, ) S (W, + 1) S (W, = (W, R + 2) S 1) (W, R ) S Projecion Propery Figure 5: Successive applicaions of lemma 3. Finally, if we pick N,k o be greaer han N k of par 1 and greaer han N k (W, R ) and N k (W, R + 1) for all accumulaion poins (W, R ) ha are no in S, hen (24) is rue for all accumulaion poins and n N,k. 5.4 Opimaliy of he Decisions We finish he convergence analysis proving ha, wih probabiliy one, he algorihm learns an opimal decision for all saes ha can be reached by an opimal policy. Theorem 2. Assume he condiions of Theorem 1 are saisfied. For = 0,..., T, on he even ha (W, R, v, x ) is an accumulaion poin of he sequence {(W n, R n, v n 1, x n )} n 1 26

29 generaed by he SPAR-Sorage algorihm, x, is almos surely an opimal soluion of max F (v x X (W,R ) (W ), W, R, x ), (31) where F (v (W ), W, R, x ) = C (W, R, x ) + γv W, R B d x d i + i=1 B s i=1 x s i x r Proof. Fix ω Ω. As before, he dependence on ω is omied. A each ieraion n and ime of he algorihm, he decision x n in STEP 3 of he algorihm is an opimal soluion o he opimizaion problem max F ( v n 1 x X (W n,rn ) (W n ), W n, R n, x ). Since F ( v n 1 (W n ), W n, R n, ) is concave and X (W n, R n ) is convex, we have ha 0 F ( v n 1 (W n ), W n, R n, x n ) + X N (W n, R n, x n ), where F ( v n 1 (W n ), W n, R n, x n ) is he subdifferenial of F ( v n 1 (W n ), W n, R n, ) a x n and X N (W n, R n, x n ) is he normal cone of X (W n, R n ) a x n. Then, by passing o he limi, we can conclude ha each accumulaion poin (W, R, v, x ) of he sequence {(W n, R n, v n 1, x n )} n 1 saisfies he condiion 0 F ( v (W ), W, R, x )+ X N (W, R, x ). We now derive an expression for he subdifferenial. We have ha. F ( v (W ), W, R, x ) = C (W, R, x ) + γ V (W, R + A x ), where A = ( 1,..., 1, 1,..., 1 }{{}}{{} B d B s, 1), ha is, A x = (Bersekas e al., 2003, Proposiion 4.2.5), for x IN Bd +B s +1, B d i=1 xd i + B s i=1 xs i x r. From V (W, R +A x ) = {(A 1 y,..., A B d +B s +1y) T : y [ v (W, R +A x +1), v (W, R +A x )]}. Therefore, as x is ineger valued, F ( v (W ), W, R, x ) = { C (W, R, x ) + γay : y [ v (W, R + A x + 1), v (W, R + A x )]}. 27

30 Since (W, R + A x ) is an accumulaion poin of {(W n, R x,n )} n 0, i follows from heorem 1 ha v (W, R +A x ) = v (W, R +A x ) and v (W, R +A x +1) = v (W, R +A x +1). Hence, F ( v (W ), W, R, x ) = F (v (W ), W, R, x ) and 0 F (v (W ), W, R, x ) + X N (W, R, x ), which proves ha x is he opimal soluion of (31). 6 Summary We proposed a pure exploiaion approximae dynamic programming algorihm in order o find an opimal policy for problems in he sorage class. Problems in his class are of highly pracical imporance, bu may suffer from he curse of dimensionaliy, prevening he use of sandard echniques such as backward dynamic programming, real ime dynamic programming (RTDP) and Q-learning. The key propery of he sorage class is ha he opimal value funcions associaed wih is problems are concave and piecewise linear wih ineger breakpoins in he asse dimension. This feaure was used exensively boh in he design of our algorihm and on he convergence proofs, allowing for he pure exploiaion scheme. The algorihm uses Mone Carlo samples o learn he opimal value funcion only in imporan pars of he sae space, which are deermined by he algorihm iself. Acknowledgemens The auhors would like o acknowledge he valuable commens and suggesions of Andrzej Rusczynski. This research was suppored in par by gran AFOSR conrac FA

31 References Ahmed, S. & Shapiro, A. (2002), The sample average approximaion mehod for sochasic programs wih ineger recourse, E-prin available a hp:// Baro, A. G., Bradke, S. J. & Singh, S. P. (1995), Learning o ac using real-ime dynamic programming, Arificial Inelligence, Special Volume on Compuaional Research on Ineracion and Agency 72, Bersekas, D. & Tsisiklis, J. (1996), Neuro-Dynamic Programming, Ahena Scienific, Belmon, MA. Bersekas, D., Nedic, A. & Ozdaglar, E. (2003), Convex Analysis and Opimizaion, Ahena Scienific, Belmon, Massachuses. Chen, Z.-L. & Powell, W. B. (1999), A convergen cuing-plane and parial-sampling algorihm for mulisage sochasic linear programs wih recourse, Journal of Opimizaion Theory and Applicaions 102(3), Higle, J. & Sen, S. (1991), Sochasic decomposiion: An algorihm for wo sage linear programs wih recourse, Mahemaics of Operaions Research 16(3), Jaakkola, T., Jordan, M. I. & Singh, S. P. (1994), Convergence of sochasic ieraive dynamic programming algorihms, in J. D. Cowan, G. Tesauro & J. Alspecor, eds, Advances in Neural Informaion Processing Sysems, Vol. 6, Morgan Kaufmann Publishers, San Francisco, pp Nascimeno, J. & Powell, W. B. ((o appear)), An opimal approximae dynamic programming algorihm for he lagged asse acquisiion problem, Mahemaics of Operaions Research. Powell, W. B. (2007), Approximae Dynamic Programming: Solving he curses of dimensionaliy, John Wiley and Sons, New York. Powell, W. B., Ruszczyński, A. & Topaloglu, H. (2004), Learning algorihms for separable 29

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

Seminar 4: Hotelling 2

Seminar 4: Hotelling 2 Seminar 4: Hoelling 2 November 3, 211 1 Exercise Par 1 Iso-elasic demand A non renewable resource of a known sock S can be exraced a zero cos. Demand for he resource is of he form: D(p ) = p ε ε > A a

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS DANIEL R. JIANG AND WARREN B. POWELL Absrac. Many sequenial decision problems can be formulaed as Markov Decision Processes (MDPs)

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI: 0.038/NCLIMATE893 Temporal resoluion and DICE * Supplemenal Informaion Alex L. Maren and Sephen C. Newbold Naional Cener for Environmenal Economics, US Environmenal Proecion

More information

The Strong Law of Large Numbers

The Strong Law of Large Numbers Lecure 9 The Srong Law of Large Numbers Reading: Grimme-Sirzaker 7.2; David Williams Probabiliy wih Maringales 7.2 Furher reading: Grimme-Sirzaker 7.1, 7.3-7.5 Wih he Convergence Theorem (Theorem 54) and

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

Electrical and current self-induction

Electrical and current self-induction Elecrical and curren self-inducion F. F. Mende hp://fmnauka.narod.ru/works.hml mende_fedor@mail.ru Absrac The aricle considers he self-inducance of reacive elemens. Elecrical self-inducion To he laws of

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Predator - Prey Model Trajectories and the nonlinear conservation law

Predator - Prey Model Trajectories and the nonlinear conservation law Predaor - Prey Model Trajecories and he nonlinear conservaion law James K. Peerson Deparmen of Biological Sciences and Deparmen of Mahemaical Sciences Clemson Universiy Ocober 28, 213 Ouline Drawing Trajecories

More information

Logic in computer science

Logic in computer science Logic in compuer science Logic plays an imporan role in compuer science Logic is ofen called he calculus of compuer science Logic plays a similar role in compuer science o ha played by calculus in he physical

More information

GENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT

GENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT Inerna J Mah & Mah Sci Vol 4, No 7 000) 48 49 S0670000970 Hindawi Publishing Corp GENERALIZATION OF THE FORMULA OF FAA DI BRUNO FOR A COMPOSITE FUNCTION WITH A VECTOR ARGUMENT RUMEN L MISHKOV Received

More information

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004 ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform

More information

Variational Iteration Method for Solving System of Fractional Order Ordinary Differential Equations

Variational Iteration Method for Solving System of Fractional Order Ordinary Differential Equations IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 1, Issue 6 Ver. II (Nov - Dec. 214), PP 48-54 Variaional Ieraion Mehod for Solving Sysem of Fracional Order Ordinary Differenial

More information

SMT 2014 Calculus Test Solutions February 15, 2014 = 3 5 = 15.

SMT 2014 Calculus Test Solutions February 15, 2014 = 3 5 = 15. SMT Calculus Tes Soluions February 5,. Le f() = and le g() =. Compue f ()g (). Answer: 5 Soluion: We noe ha f () = and g () = 6. Then f ()g () =. Plugging in = we ge f ()g () = 6 = 3 5 = 5.. There is a

More information

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

An Introduction to Backward Stochastic Differential Equations (BSDEs) PIMS Summer School 2016 in Mathematical Finance.

An Introduction to Backward Stochastic Differential Equations (BSDEs) PIMS Summer School 2016 in Mathematical Finance. 1 An Inroducion o Backward Sochasic Differenial Equaions (BSDEs) PIMS Summer School 2016 in Mahemaical Finance June 25, 2016 Chrisoph Frei cfrei@ualbera.ca This inroducion is based on Touzi [14], Bouchard

More information

On the Optimal Policy Structure in Serial Inventory Systems with Lost Sales

On the Optimal Policy Structure in Serial Inventory Systems with Lost Sales On he Opimal Policy Srucure in Serial Invenory Sysems wih Los Sales Woonghee Tim Huh, Columbia Universiy Ganesh Janakiraman, New York Universiy May 21, 2008 Revised: July 30, 2008; December 23, 2008 Absrac

More information

Convergence of the Neumann series in higher norms

Convergence of the Neumann series in higher norms Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann

More information

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints

Applying Genetic Algorithms for Inventory Lot-Sizing Problem with Supplier Selection under Storage Capacity Constraints IJCSI Inernaional Journal of Compuer Science Issues, Vol 9, Issue 1, No 1, January 2012 wwwijcsiorg 18 Applying Geneic Algorihms for Invenory Lo-Sizing Problem wih Supplier Selecion under Sorage Capaciy

More information

On a Discrete-In-Time Order Level Inventory Model for Items with Random Deterioration

On a Discrete-In-Time Order Level Inventory Model for Items with Random Deterioration Journal of Agriculure and Life Sciences Vol., No. ; June 4 On a Discree-In-Time Order Level Invenory Model for Iems wih Random Deerioraion Dr Biswaranjan Mandal Associae Professor of Mahemaics Acharya

More information

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015

Explaining Total Factor Productivity. Ulrich Kohli University of Geneva December 2015 Explaining Toal Facor Produciviy Ulrich Kohli Universiy of Geneva December 2015 Needed: A Theory of Toal Facor Produciviy Edward C. Presco (1998) 2 1. Inroducion Toal Facor Produciviy (TFP) has become

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Planning in POMDPs. Dominik Schoenberger Abstract

Planning in POMDPs. Dominik Schoenberger Abstract Planning in POMDPs Dominik Schoenberger d.schoenberger@sud.u-darmsad.de Absrac This documen briefly explains wha a Parially Observable Markov Decision Process is. Furhermore i inroduces he differen approaches

More information

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes

23.5. Half-Range Series. Introduction. Prerequisites. Learning Outcomes Half-Range Series 2.5 Inroducion In his Secion we address he following problem: Can we find a Fourier series expansion of a funcion defined over a finie inerval? Of course we recognise ha such a funcion

More information

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS CENRALIZED VERSUS DECENRALIZED PRODUCION PLANNING IN SUPPLY CHAINS Georges SAHARIDIS* a, Yves DALLERY* a, Fikri KARAESMEN* b * a Ecole Cenrale Paris Deparmen of Indusial Engineering (LGI), +3343388, saharidis,dallery@lgi.ecp.fr

More information

The expectation value of the field operator.

The expectation value of the field operator. The expecaion value of he field operaor. Dan Solomon Universiy of Illinois Chicago, IL dsolom@uic.edu June, 04 Absrac. Much of he mahemaical developmen of quanum field heory has been in suppor of deermining

More information

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems

Particle Swarm Optimization Combining Diversification and Intensification for Nonlinear Integer Programming Problems Paricle Swarm Opimizaion Combining Diversificaion and Inensificaion for Nonlinear Ineger Programming Problems Takeshi Masui, Masaoshi Sakawa, Kosuke Kao and Koichi Masumoo Hiroshima Universiy 1-4-1, Kagamiyama,

More information

A Shooting Method for A Node Generation Algorithm

A Shooting Method for A Node Generation Algorithm A Shooing Mehod for A Node Generaion Algorihm Hiroaki Nishikawa W.M.Keck Foundaion Laboraory for Compuaional Fluid Dynamics Deparmen of Aerospace Engineering, Universiy of Michigan, Ann Arbor, Michigan

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu ON EQUATIONS WITH SETS AS UNKNOWNS BY PAUL ERDŐS AND S. ULAM DEPARTMENT OF MATHEMATICS, UNIVERSITY OF COLORADO, BOULDER Communicaed May 27, 1968 We shall presen here a number of resuls in se heory concerning

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Chapter 3 Boundary Value Problem

Chapter 3 Boundary Value Problem Chaper 3 Boundary Value Problem A boundary value problem (BVP) is a problem, ypically an ODE or a PDE, which has values assigned on he physical boundary of he domain in which he problem is specified. Le

More information

Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions This aricle was downloaded by: [140.1.241.64] On: 05 January 2016, A: 21:41 Publisher: Insiue for Operaions Research and he Managemen Sciences (INFORMS) INFORMS is locaed in Maryland, USA Operaions Research

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

Chapter 7: Solving Trig Equations

Chapter 7: Solving Trig Equations Haberman MTH Secion I: The Trigonomeric Funcions Chaper 7: Solving Trig Equaions Le s sar by solving a couple of equaions ha involve he sine funcion EXAMPLE a: Solve he equaion sin( ) The inverse funcions

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

Subway stations energy and air quality management

Subway stations energy and air quality management Subway saions energy and air qualiy managemen wih sochasic opimizaion Trisan Rigau 1,2,4, Advisors: P. Carpenier 3, J.-Ph. Chancelier 2, M. De Lara 2 EFFICACITY 1 CERMICS, ENPC 2 UMA, ENSTA 3 LISIS, IFSTTAR

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.

More information

Basic Circuit Elements Professor J R Lucas November 2001

Basic Circuit Elements Professor J R Lucas November 2001 Basic Circui Elemens - J ucas An elecrical circui is an inerconnecion of circui elemens. These circui elemens can be caegorised ino wo ypes, namely acive and passive elemens. Some Definiions/explanaions

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F. Trench. SIAM J. Matrix Anal. Appl. 11 (1990),

SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F. Trench. SIAM J. Matrix Anal. Appl. 11 (1990), SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F Trench SIAM J Marix Anal Appl 11 (1990), 601-611 Absrac Le T n = ( i j ) n i,j=1 (n 3) be a real symmeric

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities: Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Existence of positive solution for a third-order three-point BVP with sign-changing Green s function

Existence of positive solution for a third-order three-point BVP with sign-changing Green s function Elecronic Journal of Qualiaive Theory of Differenial Equaions 13, No. 3, 1-11; hp://www.mah.u-szeged.hu/ejqde/ Exisence of posiive soluion for a hird-order hree-poin BVP wih sign-changing Green s funcion

More information

not to be republished NCERT MATHEMATICAL MODELLING Appendix 2 A.2.1 Introduction A.2.2 Why Mathematical Modelling?

not to be republished NCERT MATHEMATICAL MODELLING Appendix 2 A.2.1 Introduction A.2.2 Why Mathematical Modelling? 256 MATHEMATICS A.2.1 Inroducion In class XI, we have learn abou mahemaical modelling as an aemp o sudy some par (or form) of some real-life problems in mahemaical erms, i.e., he conversion of a physical

More information