Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions

Size: px
Start display at page:

Download "Operations Research. An Approximate Dynamic Programming Algorithm for Monotone Value Functions"

Transcription

1 This aricle was downloaded by: [ ] On: 05 January 2016, A: 21:41 Publisher: Insiue for Operaions Research and he Managemen Sciences (INFORMS) INFORMS is locaed in Maryland, USA Operaions Research Publicaion deails, including insrucions for auhors and subscripion informaion: hp://pubsonline.informs.org An Approximae Dynamic Programming Algorihm for Monoone Value Funcions Daniel R. Jiang, Warren B. Powell To cie his aricle: Daniel R. Jiang, Warren B. Powell (2015) An Approximae Dynamic Programming Algorihm for Monoone Value Funcions. Operaions Research 63(6): hp://dx.doi.org/ /opre Full erms and condiions of use: hp://pubsonline.informs.org/page/erms-and-condiions This aricle may be used only for he purposes of research, eaching, and/or privae sudy. Commercial use or sysemaic downloading (by robos or oher auomaic processes) is prohibied wihou explici Publisher approval, unless oherwise noed. For more informaion, conac The Publisher does no warran or guaranee he aricle s accuracy, compleeness, merchanabiliy, finess for a paricular purpose, or non-infringemen. Descripions of, or references o, producs or publicaions, or inclusion of an adverisemen in his aricle, neiher consiues nor implies a guaranee, endorsemen, or suppor of claims made of ha produc, publicaion, or service. Copyrigh 2015, INFORMS Please scroll down for aricle i is on subsequen pages INFORMS is he larges professional sociey in he world for professionals in he fields of operaions research, managemen science, and analyics. For more informaion on INFORMS, is publicaions, membership, or meeings visi hp://

2 OPERATIONS RESEARCH Vol. 63, No. 6, November December 2015, pp ISSN X (prin) ó ISSN (online) hp://dx.doi.org/ /opre INFORMS Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. An Approximae Dynamic Programming Algorihm for Monoone Value Funcions Daniel R. Jiang, Warren B. Powell Deparmen of Operaions Research and Financial Engineering, Princeon Universiy, Princeon, New Jersey {drjiang@princeon.edu, powell@princeon.edu} Many sequenial decision problems can be formulaed as Markov decision processes (MDPs) where he opimal value funcion (or cos-o-go funcion) can be shown o saisfy a monoone srucure in some or all of is dimensions. When he sae space becomes large, radiional echniques, such as he backward dynamic programming algorihm (i.e., backward inducion or value ieraion), may no longer be effecive in finding a soluion wihin a reasonable ime frame, and hus we are forced o consider oher approaches, such as approximae dynamic programming (ADP). We propose a provably convergen ADP algorihm called Monoone-ADP ha explois he monooniciy of he value funcions o increase he rae of convergence. In his paper, we describe a general finie-horizon problem seing where he opimal value funcion is monoone, presen a convergence proof for Monoone-ADP under various echnical assumpions, and show numerical resuls for hree applicaion domains: opimal sopping, energy sorage/allocaion, and glycemic conrol for diabees paiens. The empirical resuls indicae ha by aking advanage of monooniciy, we can aain high qualiy soluions wihin a relaively small number of ieraions, using up o wo orders of magniude less compuaion han is needed o compue he opimal soluion exacly. Keywords: approximae dynamic programming; monooniciy; opimal sopping; energy sorage; glycemic conrol. Subjec classificaions: dynamic programming/opimal conrol: Markov, finie sae. Area of review: Opimizaion. Hisory: Received July 2014; revisions received May 2015, July 2015; acceped Augus Published online in Aricles in Advance November 4, Inroducion Sequenial decision problems are an imporan concep in many fields, including operaions research, economics, and finance. For a small, racable problem, he backward dynamic programming (BDP) algorihm (also known as backward inducion or finie-horizon value ieraion) can be used o compue he opimal value funcion, from which we ge an opimal decision making policy (Puerman 1994). However, he sae space for many real-world applicaions can be immense, making his algorihm very compuaionally inensive. Hence, we mus ofen urn o he field of approximae dynamic programming, which seeks o solve hese problems via approximaion echniques. One way o obain a beer approximaion is o exploi (problemdependen) srucural properies of he opimal value funcion, and doing so ofen acceleraes he convergence of ADP algorihms. In his paper, we consider he case where he opimal value funcion is monoone wih respec o a parial order. Alhough his paper focuses on he heory behind our ADP algorihm and no a specific applicaion, we firs poin ou ha our echnique can be broadly uilized. Monooniciy is a very common propery because i is rue in many siuaions ha more is beer. To be more precise, problems ha saisfy free disposal (o borrow a erm from economics) or no holding coss are likely o conain monoone srucure. There are also less obvious ways ha monooniciy can come ino play, such as environmenal variables ha influence he sochasic evoluion of a primary sae variable (e.g., exreme weaher can lead o increased expeced ravel imes; high naural gas prices can lead o higher elecriciy spo prices). The following lis is a small sample of real-world applicaions spanning he lieraure of he aforemenioned disciplines (and heir subfields) ha saisfy he special propery of monoone value funcions. Operaions Research The problem of opimal replacemen of machine pars is well sudied in he lieraure (see e.g., Feldsein and Rohschild 1974, Pierskalla and Voelker 1976, and Rus 1987) and can be formulaed as a regeneraive opimal sopping problem in which he value funcion is monoone in he curren healh of he par and he sae of is environmen. Secion 7 discusses his model and provides deailed numerical resuls. The problem of bach servicing of cusomers a a service saion as discussed in Papadaki and Powell (2002) feaures a value funcion ha is monoone in he number of cusomers. Similarly, he relaed problem of muliproduc bach dispach sudied in Papadaki and Powell (2003b) can be shown o have a monoone value funcion in he mulidimensional sae variable ha conains he number of producs awaiing dispach. 1489

3 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions 14 Operaions Research 63(6), pp , 2015 INFORMS Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. Energy In he energy sorage and allocaion problem, one mus opimally conrol a sorage device ha inerfaces wih he spo marke and a sochasic energy supply (such as wind or solar). The goal is o reliably saisfy a possibly sochasic demand in he mos profiable way. We can show ha wihou holding coss, he value funcion is monoone in he resource (see Sco and Powell 2012 and Salas and Powell 2013). Once again, refer o 7 for numerical work in his problem class. The value funcion from he problem of maximizing revenue using baery sorage while bidding hourly in he elecriciy marke can be shown o saisfy monooniciy in he resource, bid, and remaining baery lifeime (see Jiang and Powell 2015). Healhcare Hsih (2010) develops a model for opimal dosing applied o glycemic conrol in diabees paiens. A each decision epoch, one of several reamens (e.g., sensiizers, secreagogues, alpha-glucosidase inhibiors, or pepide analogs) wih varying levels of srengh (i.e., abiliy o decrease glucose levels) bu also varying side effecs, such as weigh gain, needs o be adminisered. The value funcion in his problem is monoone whenever he uiliy funcion of he sae of healh is monoone. See 7 for he complee model and numerical resuls. Sains are ofen used as reamen agains hear disease or sroke in diabees paiens wih lipid abnormaliies. The opimal ime for sain iniiaion, however, is a difficul medical problem due o he compeing forces of healh benefis and side effecs. Kur e al. (2011) models he problem as an MDP wih a value funcion monoone in a risk facor known as he lipid raio. Finance The problem of muual fund cash balancing, described in Nascimeno and Powell (2010), is faced by fund managers who mus decide on he amoun of cash o hold, aking ino accoun various marke characerisics and invesor demand. The value funcions urn ou o be monoone in he ineres rae and he porfolio s rae of reurn. The pricing problem for American opions (see Luenberger 1998) uses he heory of opimal sopping and depending on he model of he price process, monooniciy can be shown in various sae variables: for example, he curren sock price or he volailiy (see Eksröm 2004). Economics Kaplan and Violane (2014) model he decisions of consumers afer receiving fiscal simulus paymens o explain observed consumpion behavior. The household has boh liquid and illiquid asses (he sae variable), in which he value funcions are clearly monoone. A classical model of search unemploymen in economics describes a siuaion where a each period, a worker has a decision of acceping a wage offer or coninuing o search for employmen. The resuling value funcions can be shown o be increasing wih wage (see 10.7 of Sockey and Lucas 1989 and McCall 19). This paper makes he following conribuions. We describe and prove he convergence of an algorihm, called Monoone-ADP 4M-ADP5 for learning monoone value funcions by preserving monooniciy afer each updae. We also provide empirical resuls for he algorihm in he conex of various applicaions in operaions research, energy, and healhcare as experimenal evidence ha exploiing monooniciy dramaically improves he rae of convergence. The performance of Monoone-ADP is compared o several esablished algorihms: kernel-based reinforcemen learning (Ormonei and Sen 2002), approximae policy ieraion (Bersekas 2011), asynchronous value ieraion (Bersekas 2007), and Q-learning (Wakins and Dayan 1992). The paper is organized as follows. Secion 2 gives a lieraure review, followed by he problem formulaion and algorihm descripion in 3 and 4. Nex, 5 provides he assumpions necessary for convergence, and 6 saes and proves he convergence heorem, wih several proofs of lemmas and proposiions posponed unil he appendix and online supplemen (available as supplemenal maerial a hp://dx.doi.org/ /opre ). Secion 7 describes numerical experimens over a suie of problems, wih he larges one having a seven dimensional sae variable and nearly 20 million saes per ime period. We conclude in Lieraure Review General monoone funcions (no necessarily a value funcion) have been exensively sudied in he academic lieraure. The saisical esimaion of monoone funcions is known as isoonic or monoone regression and has been sudied as early as 1955; see Ayer e al. (1955) or Brunk (1955). The main idea of isoonic regression is o minimize a weighed error under he consrain of monooniciy (see Barlow e al for a horough descripion). The problem can be solved in a variey of ways, including he Pool Adjacen Violaors Algorihm 4PAVA5 described in Ayer e al. (1955). More recenly, Mammen (1991) builds upon his previous research by describing an esimaor ha combines kernel regression and PAVA o produce a smooh regression funcion. Addiional sudies from he saisics lieraure include Mukerjee (1988), Ramsay (1998), and Dee e al. (2006). Alhough hese approaches are ouside he conex of dynamic programming, ha hey were developed and well sudied highlighs he perinence of monoone funcions. From he operaions research lieraure, monoone value funcions and condiions for monoone opimal policies are broadly described in Puerman (1994, 4.7) and some general heory is derived herein. Similar discussions of he opic can be found in Ross (1983), Sockey and Lucas

4 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions Operaions Research 63(6), pp , 2015 INFORMS 1491 Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. (1989), Müller (1997), and Smih and McCardle (2002). The algorihm ha we describe in his paper is firs used in Papadaki and Powell (2002) as a heurisic o solve he sochasic bach service problem, where he value funcion is monoone. However, he convergence of he algorihm is no analyzed and he sae variable is scalar. Finally, in Papadaki and Powell (2003a), he auhors prove he convergence of he Discree Online Monoone Esimaion (DOME) algorihm, which akes advanage of a monooniciy preserving sep o ieraively esimae a discree monoone funcion. DOME, hough, was no designed for dynamic programming, and he proof of convergence requires independen observaions across ieraions, which is an assumpion ha canno be made for Monoone-ADP. Anoher common propery of value funcions, especially in resource allocaion problems, is convexiy/concaviy. Raher han using a monooniciy preserving sep as Monoone-ADP does, algorihms such as he Successive Projecive Approximaion Rouine 4SPAR5 of Powell e al. (2004), he Lagged Acquisiion ADP Algorihm of Nascimeno and Powell (2009), and he Leveling Algorihm of Topaloglu and Powell (2003) use a concaviy preserving sep, which is he same as mainaining monooniciy in he slopes. The proof of convergence for our algorihm, Monoone-ADP, uses ideas found in Tsisiklis (1994) (laer also used in Bersekas and Tsisiklis 1996) and Nascimeno and Powell (2009). Convexiy has also been exploied successfully in mulisage linear sochasic programs (see, e.g, Birge 1985, Pereira and Pino 1991, and Asamov and Powell 2015). In our work, we ake as inspiraion he value of convexiy demonsraed in he lieraure and show ha monooniciy is anoher imporan srucural propery ha can be leveraged in an ADP seing. 3. Mahemaical Formulaion We consider a generic problem wih a ime horizon, = T. Le S be he sae space under consideraion, where ósó < à, and le A be he se of acions or decisions available a each ime sep. Le S 2 S be he random variable represening he sae a ime and a 2 A be he acion aken a ime. For a sae S 2 S and an acion a 2 A, le C 4S 1a 5 be a conribuion or reward received in period and C T 4S T 5 be he erminal conribuion. Le A è 2 S! A be he decision funcion a ime for a policy è from he class Á of all admissible policies. Our goal is o maximize he expeced oal conribuion, giving us he following objecive funcion: sup E è2á apple T É1 X C 4S 1A è 4S 55 + C T 4S T 5 1 =0 where we seek a policy o choose he acions a sequenially based on he saes S ha we visi. Le 4W 5 T =0 be a discree ime sochasic process ha encapsulaes all of he randomness in our problem; we call i he informaion process. Assume ha W 2 W for each and ha here exiss a sae ransiion funcion f2 S A W! S ha describes he evoluion of he sysem. Given a curren sae S, an acion a, and an oucome of he informaion process W +1, he nex sae is given by S +1 = f4s 1a 1W +1 (1) Le s 2 S. The opimal policy can be expressed hrough a se of opimal value funcions using he well-known Bellman s equaion: V 4s5=sup 6C 4s1a5+E6V +1 4S +15óS =s1a =a77 a2a for = T É11 VT 4s5=C T 4s51 wih he undersanding ha S +1 ransiions from S according o (1). In many cases, he erminal conribuion funcion C T 4S T 5 is zero. Suppose ha he sae space S is equipped wih a parial order, denoed, and he following monooniciy propery is saisfied for every : s s 0 =) V 4s5 V 4s0 (3) In oher words, he opimal value funcion V is orderpreserving over he sae space S. In he case where he sae space is mulidimensional (see 7 for examples), a common example of is componenwise inequaliy, which we henceforh denoe using he radiional. A second example ha arises very ofen is he following definiion of, which we call he generalized componenwise inequaliy. Assume ha each sae s can be decomposed ino s = 4m1 i5 for some m 2 M and i 2 I. For wo saes s = 4m1 i5 and s 0 = 4m 0 1i 0 5, we have s s 0 () m m 0 1i= i 0 0 (4) In oher words, we know ha whenever i is held consan, hen he value funcion is monoone in he primary variable m. An example of when such a model would be useful is when m represens he amoun of some held resource ha we are boh buying and selling, while i represens addiional sae-of-he-world informaion, such as prices of relaed goods, ranspor imes on a shipping nework, or weaher informaion. Depending on he specific model, he relaionship beween he value of i and he opimal value funcion may be quie complex and a priori unknown o us. However, i is likely o be obvious ha for i held consan, he value funcion is increasing in m, he amoun of resource ha we own. Hence, he definiion (4) is naural for his siuaion. The following proposiion is given in he seing of he generalized componenwise inequaliy and provides a simple condiion ha can be used o verify monooniciy in he value funcion. Proposiion 1. Suppose ha every s 2 S can be wrien as s = 4m1 i5 for some m 2 M and i 2 I, and le S = 4M 1I 5 be he sae a ime, wih M 2 M and I 2 I. Le he parial order on he sae space S be described by (4). Assume he following assumpions hold. (i) For every s1s 0 2 S wih s s 0, a 2 A, and w 2 W, he sae ransiion funcion saisfies f 4s1 a1 w5 f4s 0 1 a1 w (ii) For each <T, s1s 0 2 S wih s s 0, and a 2 A, C 4s1 a5 C 4s 0 1 a5 and C T 4s5 C T 4s 0 (2)

5 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions 1492 Operaions Research 63(6), pp , 2015 INFORMS Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. (iii) For each <T, M and W +1 are independen. Then he value funcions V saisfy he monooniciy propery of (3). Proof. See he online supplemen. There are oher similar ways o check for monooniciy; for example, see Proposiion of Puerman (1994) or Theorem 9.11 of Sockey and Lucas (1989) for condiions on he ransiion probabiliies. We choose o provide he above proposiion because of is relevance o our example applicaions in 7. The mos radiional form of Bellman s equaion has been given in (2), which we refer o as he pre-decision sae version. Nex, we discuss some alernaive formulaions from he lieraure ha can be very useful for cerain problem classes. A second formulaion, called he Q-funcion (or sae-acion) form Bellman s equaion, is popular in he field of reinforcemen learning, especially in applicaions of he widely used Q-learning algorihm (see Wakins and Dayan 1992): Q hc 4s1 a5 = E 4s1 a5 + max a +1 2A Q +1 4S +11a +1 5 S = s1 i a = a for = T É 11 Q T 4s1 a5 = C T 4s51 where we mus now impose he addiional requiremen ha A is a finie se. Q is known as he sae-acion value funcion and he sae space in his case is enlarged o be S A. A hird formulaion of Bellman s equaion is in he conex of pos-decision saes (see Powell 2011 for a deailed reamen of his imporan echnique). Essenially, he posdecision sae, which we denoe S a, represens he sae afer he decision has been made, bu before he random informaion W +1 has arrived (he sae-acion pair is also a pos-decision sae). For example, in he simple problem of purchasing addiional invenory x o he curren sock R o saisfy a nex-period sochasic demand, he pos-decision sae can be wrien as R + x, and he predecision sae is R. I mus be he case ha S a conains he same informaion as he sae-acion pair 4S 1a 5, meaning ha regardless of wheher we condiion on S a or 4S 1a 5, he condiional disribuion of W +1 is he same. The araciveness of his mehod is ha (1) in cerain problems, S a is of lower dimension han 4S 1a 5 and (2) when wriing Bellman s equaion in erms of he pos-decision sae space (using a redefined value funcion), he supremum and he expecaion are inerchanged, giving us some compuaional advanages. Le s a be a pos-decision sae from he posdecision sae space S a. Bellman s equaion becomes h i V a1 4s a 5=E 6C +1 4S +1 1a5+V a1 +1 4S a +1 57óSa =sa sup a2a V a1 T É14s a 5=E6C T 4S T 5óS a T É1 =sa 71 for = T É21 (5) (6) where V 1a is known as he pos-decision value funcion. In approximae dynamic programming, he original Bellman s equaion formulaion (2) can be used if he ransiion probabiliies are known. When he ransiion probabiliies are unknown, we mus ofen rely purely on experience or some form of black box simulaor. In hese siuaions, formulaions (5) and (6) of Bellman s equaion, where he opimizaion is wihin he expecaion, become exremely useful. For he remainder of his paper, raher han disinguishing beween he hree forms of he value funcion (V, Q, and V a1 ), we simply use V and call i he opimal value funcion, wih he undersanding ha i may be replaced wih any of he definiions. Similarly, o simplify noaion, we do no disinguish beween he hree forms of he sae space (S, S A, and S a ) and simply use S o represen he domain of he value funcion (for some ). Le d =ós ó and D = 4T + 15óS ó. We view he opimal value funcion as a vecor in D ; ha is o say, V 2 D has a componen a 41 s5 denoed as V a fixed T, he noaion V resriced o ; i.e., he componens of V 4s5. Moreover, for 2 d is used o describe V are V 4s5 wih s varying over S. We adop his noaional sysem for arbirary value funcions V 2 D as well. Finally, we define he generalized dynamic programming operaor H2 D! D, which applies he righ-hand sides of eiher (2), (5), or (6) o an arbirary V 2 D, i.e., replacing V, Q, and V a wih V. For example, if H is defined in he conex of (2), hen he componen of HV a 41 s5 is given by 4HV 5 4s5 8 >< sup6c 4s1a5+E6V +1 4S +1 5óS =s1a =a77 a2a = for = T É11 >: C T 4s5 for =T0 For (5) and (6), H can be defined in an analogous way. We now sae a lemma concerning useful properies of H. Pars of i are similar o Assumpion 4 of Tsisiklis (1994), bu we can show ha hese saemens always hold rue for our more specific problem seing, where H is a generalized dynamic programming operaor. Lemma 1. The following saemens are rue for H, when i is defined using (2), (5), or (6). (i) H is monoone; i.e., for V1V 0 2 D such ha V V 0, we have ha HV HV 0 (componenwise). (ii) For any <T, le V1V 0 2 D, such ha V +1 V+1 0. I hen follows ha 4HV 5 4HV 0 5. (iii) The opimal value funcion V uniquely saisfies he fixed poin equaion HV = V. (iv) Le V 2 D and e is a vecor of ones wih dimension D. For any á>0, HV É áe H4V É áe5 H4V + áe5 HV + áe0 Proof. See Appendix A. (7)

6 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions Operaions Research 63(6), pp , 2015 INFORMS 1493 Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. 4. Algorihm In his secion, we formally describe he Monoone-ADP algorihm. Assume a probabiliy space 4Ï1 F 1 P5 and le be he approximaion of V a ieraion n, wih he random variable S n 2 S represening he sae ha is visied (by he algorihm) a ime in ieraion n. The observaion of he opimal value funcion a ime, ieraion n, and sae S n is denoed ˆv n4sn 5 and is calculaed using he esimae of he value funcion from ieraion n É 1. The raw observaion ˆv n4sn 5 is hen smoohed wih he previous esimae V né1 4S n 5, using a sochasic approximaion sep, o produce he smoohed observaion z n 4Sn 5. Before presening he descripion of he ADP algorihm, some definiions need o be given. We sar wih Á M, he monooniciy preserving projecion operaor. Noe ha he erm projecion is being used loosely here; he space ha we projec ono acually changes wih each ieraion. Definiion 1. For s r 2 S and z r 2, le 4s r 1z r 5 be a reference poin o which oher saes are compared. Le V 2 d and define he projecion operaor Á M 2 S d! d, where he componen of he vecor Á M 4s r 1z r 1V 5 a s is given by 8 z r if s = s r 1 >< Á M 4s r 1z r z r _ V 1V 54s5 = 4s5 if s r s1 s 6= s r 1 (8) z >: r ^ V 4s5 if s r s1 s 6= s r 1 V 4s5 oherwise. In he conex of he Monoone-ADP algorihm, V is he curren value funcion approximaion, 4s r 1z r 5 is he laes observaion of he value (s r is laes visied sae), and Á M 4s r 1z r 1V 5 is he updaed value funcion approximaion. Violaions of he monooniciy propery of (3) are correced by Á M in he following ways: if z r æ V 4s5 and s r s, hen V 4s5 is oo small and is increased o z r = z r _ V 4s5 and if z r V 4s5 and s r s, hen V 4s5 is oo large and is decreased o z r = z r ^ V 4s5. See Figure 1 for an example showing a sequence of wo observaions and he resuling projecions in he Caresian plane, where is he componenwise inequaliy in wo dimensions. We now provide some addiional moivaion for he definiion of Á M. Because z n 4Sn 5 is he laes observed value and i is obained via sochasic approximaion (see he Sep 2b of Figure 2), our inuiion guides us o keep his value, i.e., by seing V n4sn 5 = zn 4Sn 5. For s 2 S and v 2, le us define he se V M 4s1 z5 = 8V 2 d 2V4s5= z1 V monoone over S 9 which fixes he value a s o be z while resricing o he se of all possible V ha saisfy he monooniciy propery (3). Now, o ge he approximae value funcion of ieraion n and ime, we wan o find V n also saisfies he monooniciy propery: ha is close o É1 bu 2 arg min8òv É É1 ò 2 2V 2 V M 4S n 1zn 4Sn 5591 (9) Figure Example illusraing he projecion operaor Á M. = Observaions where ò ò 2 is he Euclidean norm. Le us now briefly pause and consider a possible alernaive, where we do no require V n4sn 5 = zn 4Sn 5. Insead, suppose we inroduce a vecor ˆV né1 2 d such ha ˆV né1 4s5 = V né1 4s5 for s 6= S n and ˆV né1 4S n5 = zn 4Sn 5. Nex, projec ˆV né1, he space of vecors V ha are monoone over S, o produce V n (his would be a proper projecion, where he space does no change). The problem wih his approach arises in he early ieraions where we have poor esimaes of he value funcion: for example, if V 04s5 = 0 for all s, hen ˆV 0 is a vecor of mosly zeros and he likely resul of he projecion, V 1, would be he original vecor V 0 hence, no progress is made. A poenial explanaion for he failure of such a sraegy is ha i is a naive adapaion of he naural approach for a bach framework o a recursive seing. The nex proposiion shows ha his represenaion of V n is equivalen o one ha is obained using he projecion operaor Á M. Proposiion 2. The soluion o he minimizaion (9) can be characerized using Á M. Specifically, Á M 4S n 1zn 4Sn 51 É1 5 M M 2 arg min8òv É É1 ò 2 2V 2 V M 4S n 1zn 4Sn 5591 so ha we can wrie V n = Á M 4S n1zn 4Sn 51 V né1 5. Proof. See Appendix B. We now inroduce, for each, a (possibly sochasic) sepsize sequence Å n 1 used for smoohing in new observaions. The algorihm only direcly updaes values (i.e., no including updaes from he projecion operaor) for saes ha are visied, so for each s 2 S, le Å n 4s5 = ÅnÉ1 1 8s=S n Le ˆv n 2 d be a noisy observaion of he quaniy 4H É1 5, and le w n 2 d represen he addiive noise associaed wih he observaion: ˆv n = 4H É1 5 + w n 0

7 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions 1494 Operaions Research 63(6), pp , 2015 INFORMS Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. Figure 2. Monoone-ADP algorihm. Sep 0a. Iniialize V 0 2 1V max 7 for each T É 1 such ha monooniciy is saisfied wihin V 0, as described in (3). Sep 0b. Se 4s5 = C T T 4s5 for each s 2 S and n N. Sep 0c. Se n = 1. Sep 1. Selec an iniial sae S n. 0 Sep 2. For = T É 15: Sep 2a. Sample a noisy observaion of he fuure value: ˆv n = 4H É1 5 + w n. Sep 2b. Smooh in he new observaion wih previous value a each s: z n4s5 = 41 É Ån4s55 É1 4s5 + Å n 4s5ˆvn 4s Sep 2c. Perform monooniciy projecion operaor: = Á M 4S n 1zn 4Sn51 É1 5. Sep 2d. Choose he nex sae S n +1 given F né1. Sep 3. If n<n, incremen n and reurn o Sep 1. Alhough he algorihm is asynchronous and only updaes he value for S n (herefore, i only needs ˆv n4sn 5, he componen of ˆv n a S n), i is convenien o assume ˆvn 4s5 and w n 4s5 are defined for all s. We also require a vecor z n 2 d o represen he smoohed observaion of he fuure value; i.e., z n 4s5 is ˆvn 4s5 smoohed wih he previous value V né1 4s5 via he sepsize Å n 4s5. Le us denoe he hisory of he algorihm up unil ieraion n by he filraion 8F n 9 næ1, where F n = ë84s m 1wm 5 m n1 T A precise descripion of he algorihm is given in Figure 2. Noice from he descripion ha if he monooniciy propery (3) is saisfied a ieraion n É 1, hen he fac ha he projecion operaor Á M is applied ensures ha he monooniciy propery is saisfied again a ime n. Our benchmarking resuls of 7 show ha mainaining monooniciy in such a way is an invaluable aspec of he algorihm ha allows i o produce very good policies in a relaively small number of ieraions. Tradiional approximae (or asynchronous) value ieraion, on which Monoone-ADP is based, is asympoically convergen bu exremely slow o converge in pracice (once again, see 7). As we have menioned, Á M is no a sandard projecion operaor, as i projecs o a differen space on every ieraion, depending on he sae visied and value observed; herefore, radiional convergence resuls no longer hold. The remainder of he paper esablishes he asympoic convergence of Monoone-ADP Exensions of Monoone-ADP We now briefly presen wo possible exensions of Monoone-ADP. Firs, consider a discouned, infinie horizon MDP. An exension (or perhaps, simplificaion) o his case can be obained by removing he loop over (and all subscrips of and T ) and acquiring one observaion per ieraion, exacly resembling asynchronous value ieraion for infinie horizon problems. Second, we consider possible exensions when represenaions of he approximae value funcion oher han lookup able are used; for example, imagine we are using basis funcions 8î g 9 g2g for some feaure se G combined wih a coefficien vecor à n (which has componens àg n ), giving he approximaion 4s5 = X g2g à n g î g4s Equaion (9) is he saring poin for adaping Monoone- ADP o handle his case. An analogous version of his updae migh be given by à n 2 arg min8òà É à né1 ò 2 2 4Sn 5 = zn 4Sn 5 and monoone91 (10) where we have alered he objecive o minimize disance in he coefficien space. Unlike (9), here is, in general, no simple and easily compuable soluion o (10), bu special cases may exis. The analysis of his siuaion is beyond he scope of his paper and lef o fuure work. In his paper, we consider he finie horizon case using a lookup able represenaion. 5. Assumpions We begin by providing some echnical assumpions ha are needed for convergence analysis. The firs assumpion gives, in more general erms han previously discussed, he monooniciy of he value funcions. Assumpion 1. The wo monooniciy assumpions are as follows. (i) The erminal value funcion C T is monoone over S wih respec o. (ii) For any <T and any vecor V 2 D such ha V +1 is monoone over S wih respec o, i is rue ha 4HV 5 is monoone over he sae space as well. The above assumpion implies ha for any choice of erminal value funcion VT = C T ha saisfies monooniciy, he value funcions for he previous ime periods are monoone as well. Examples of sufficien condiions include monooniciy in he conribuion funcion plus a condiion on he ransiion funcion, as in (i) of Proposiion 1, or a condiion on he ransiion probabiliies, as in Proposiion of Puerman (1994). Inuiively speaking, when he saemen saring wih more a ) ending wih more a + 1 applies, in expecaion, o he problem a hand, Assumpion 1 is saisfied. One obvious example ha saisfies monooniciy occurs in resource or asse managemen scenarios; ofenimes in hese problems, i is rue ha for any oucome of he random informaion W +1 ha occurs (e.g., random demand, energy producion, or profis), we end wih more of he resource a ime + 1 whenever we sar wih more of he resource a ime. Mahemaically, his propery of resource allocaion problems ranslaes o he sronger saemen: 4S +1 ó S = s1a = a5 4S +1 ó S = s 0 1a = a5 a.s. for all a 2 A when s s 0. This is essenially he siuaion ha Proposiion 1 describes.

8 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions Operaions Research 63(6), pp , 2015 INFORMS 1495 Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. Assumpion 2. For all s 2 S and <T, he sampling policy saisfies àx P4S n = s ó F né1 5 =à n=1 a0s0 By he Exended Borel-Canelli Lemma (see Breiman 1992), any scheme for choosing saes ha saisfies he above condiion will visi every sae infiniely ofen wih probabiliy one. Assumpion 3. Suppose ha he conribuion funcion C 4s1 a5 is bounded: wihou loss of generaliy, le us assume ha for all s 2 S, <T, and a 2 A, 0 C 4s1 a5 C max, for some C max > 0. Furhermore, suppose ha 0 C T 4s5 C max for all s 2 S as well. This naurally implies ha here exiss V max > 0 such ha 0 V 4s5 V max. The nex hree assumpions are sandard ones made on he observaions ˆv n, he noise wn, and he sepsize sequence Å n ; see Bersekas and Tsisiklis (1996) (e.g., Assumpion 4.3 and Proposiion 4.6) for addiional deails. Assumpion 4. The observaions ha we receive are bounded (by he same consan V max ): 0 ˆv n 4s5 V max almos surely, for all s 2 S and <T. Noe ha he lower bounds of zero in Assumpions 3 and 4 are chosen for convenience and can be shifed by a consan o sui he applicaion (as is done in 7). Assumpion 5. The following holds almos surely: E6w n+1 4s5 ó F n 7 = 0, for any sae s 2 S and <T. This propery means ha w n is a maringale difference noise process. Assumpion 6. For each s 2 S and <T, s 2 S, suppose Å n is F n -measurable and (i) P à n=1 Ån 4s5 =à a0s0, (ii) P à n=1 Ån 4s52 < à a0s Remarks on Simulaion Before proving he heorem, we offer some addiional commens regarding he assumpions as hey perain o simulaion. If H is defined in he conex of (2), hen i is no easy o perform Sep 2a of Figure 2, ˆv n = H É1 + wn 1 such ha Assumpion 5 is saisfied. Because he supremum is ouside of he expecaion operaor, an upward bias would be presen in he observaion ˆv n 4s5 unless he expecaion can be compued exacly, in which case w n 4s5 = 0 and we have ˆv n 4s5=sup 6C 4s1a5+E6 É1 +1 4S +15óS =s1a =a7 (11) a2a Thus, any approximaion scheme used o calculae he expecaion inside of he supremum would cause Assumpion 5 o be unsaisfied. When he approximaion scheme is a sample mean, he bias disappears asympoically wih he number of samples (see Kleyweg e al. 2002, which discusses he sample average approximaion or SAA mehod). I is herefore possible ha alhough heoreical convergence is no guaraneed, a large enough sample may sill achieve decen resuls in pracice. On he oher hand, in he conex of (5) and (6), he expecaion and he supremum are inerchanged. This means ha we can rivially obain an unbiased esimae of 4H É1 5 by sampling one oucome of he informaion process W+1 n from he disribuion W +1 ó S = s; compuing he nex sae S+1 n ; and solving a deerminisic opimizaion problem (i.e., he opimizaion wihin he expecaion). In hese wo cases, we would respecively use ˆv n 4s1 a5 = C 4s1 a5 + max a +1 2A and ˆv n 4sa 5 = sup6c +1 4S n +1 1 a5 + V a2a Q né1 Q né1 +1 4Sn +1 1a +15 (12) a1 né1 +1 4S a1n (13) where +1 is he approximaion o Q+1, a1 né1 V is he approximaion o V a1, and S a1 n +1 is he pos-decision sae obained from S+1 n and a. Noice ha (11) conains an expecaion whereas (12) and (13) do no, making hem paricularly well suied for model-free siuaions, where disribuions are unknown and only samples or experience are available. Hence, he bes choice of model depends heavily upon he problem domain. Finally, we give a brief discussion of he choice of sepsize. There are a variey of ways in which we can saisfy Assumpion 6, and here we offer he simples example. Consider any deerminisic sequence 8a n 9 such ha he usual sepsize condiions are saisfied: àx a n =à n=0 and àx 4a n 5 2 < à0 n=0 Le N 4s1 n1 5 = P n m=1 1 8s=S m 9 be he random variable represening he oal number of visis of sae s a ime unil ieraion n. Then Å n = an4sn 1n15 saisfies Assumpion Convergence Analysis of he Monoone-ADP Algorihm We are now ready o show he convergence of he algorihm. Noe ha alhough here is a significan similariy beween his algorihm and he DOME algorihm described in Papadaki and Powell (2003a), he proof echnique is very differen. The convergence proof for he DOME algorihm canno be direcly exended o our problem because of differences in he assumpions. Our proof draws on proof echniques found in Tsisiklis (1994) and Nascimeno and Powell (2009). In he laer, he auhors prove convergence of a purely exploiaive ADP algorihm given a concave, piecewise-linear value funcion

9 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions 1496 Operaions Research 63(6), pp , 2015 INFORMS Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. for he lagged asse acquisiion problem. We canno exploi cerain properies inheren o ha problem, bu in our algorihm we assume exploraion of all saes, a requiremen ha can be avoided when we are able o assume concaviy. Furhermore, a significan difference in his proof is ha we consider he case where S may no be a oal ordering. A consequence of his is ha we exend o he case where he monooniciy propery covers muliple dimensions (e.g., he relaion on S is he componenwise inequaliy), which was no allowed in Nascimeno and Powell (2009). Theorem 1. Under Assumpions 1 6, for each T and s 2 S, he esimae V n 4s5 produced by he Monoone-ADP Algorihm of Figure 2 converge o he opimal value funcion V 4s5 almos surely. Before providing he proof for his convergence resul, we presen some preliminary definiions and resuls. Firs, we define wo deerminisic bounding sequences, U k and L k. The wo sequences U k and L k can be hough of, joinly, as a sequence of shrinking recangles, wih U k being he upper bounds and L k being he lower bounds. The cenral idea o he proof is showing ha he esimaes ener (and say) in smaller and smaller recangles, for a fixed ó 2 Ï (we assume ha he ó does no lie in a discarded se of probabiliy zero). We can hen show ha he recangles converge o he poin V, which in urn implies he convergence of o he opimal value funcion. This idea is aribued o Tsisiklis (1994) and is illusraed in Figure 3. The sequences U k and L k are wrien recursively. Le U 0 = V + V max e1 L 0 = V É V max e1 (14) and le U k+1 = U k + HU k 1 L k+1 = Lk + HL k Lemma 2. For all k æ 0, we have ha HU k U k+1 U k 1 HL k æ L k+1 æ L k 0 Furhermore, U k É! V 1 L k É! V 0 (15) Figure 3. V n (s) U k (s) L k (s) Cenral idea of convergence proof. U k + 1 (s) L k + 1 (s) U k + 2 (s) L k + 2 (s) Ier. n V (s) Proof. The proof of his lemma is given in Bersekas and Tsisiklis (1996) (see Lemmas 4.5 and 4.6). The properies of H given in Proposiion 1 are used for his resul. Lemma 3. The bounding sequences saisfy he monooniciy propery; ha is, for k æ 0, T, s 2 S, s 0 2 S such ha s s 0, we have U k 4s5 U k 4s0 51 L k 4s5 Lk 4s0 Proof. See Appendix C. We coninue wih some definiions peraining o he projecion operaor Á M.A É in he superscrip signifies he value s is oo small and he + signifies he value of s is oo large. Definiion 2. For <T and s 2 S, le N É 4s5 be a random se represening he ieraions for which s was increased by he projecion operaor a ime. Similarly, le N + 4s5 represen he ieraions for which s was decreased: N ÁÉ 4s5 = 8n2 s 6= S n and É1 4s5 < 4s591 N Á+ 4s5 = 8n2 s 6= S n and É1 4s5 > 4s5 Definiion 3. For <T and s 2 S, le N ÁÉ 41 s5 be he las ieraion for which he sae s was increased by Á M a ime. N ÁÉ 4s5 = max N É 4s Similarly, le N Á+ 4s5 = max N + 4s Noe ha N ÁÉ 4s5 =à if ón É Á+ 4s5ó=à and N 4s5 =à if ón + 4s5ó=à. Definiion 4. Le N Á be large enough so ha for ieraions n æ N Á, any sae increased (decreased) finiely ofen by he projecion operaor Á M is no longer affeced by Á M. In oher words, if some sae is increased (decreased) by Á M on an ieraion afer N Á, hen ha sae is increased (decreased) by Á M infiniely ofen. We can wrie he following: N Á = max48n ÁÉ 4s52 < T 1 s 2 S 1N ÁÉ 4s5 < à9 [ 8N Á+ 4s52 < T 1 s 2 S 1N Á+ 4s5 < à We now define, for each, wo random subses S É and S + of he sae space S where S É conains saes ha are increased by he projecion operaor Á M finiely ofen and S + conains saes ha are decreased by he projecion operaor finiely ofen. The role ha hese wo ses play in he proof is as follows: We firs show convergence for saes ha are projeced finiely ofen (s 2 S É or s 2 S + ).

10 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions Operaions Research 63(6), pp , 2015 INFORMS 1497 Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. Nex, because convergence already holds for saes ha are projeced finiely ofen, we use an inducion-like argumen o exend he propery o saes ha are projeced infiniely ofen (s 2 S \S É or s 2 S \S + ). This sep requires he definiion of a ree srucure ha arranges he se of saes and is parial ordering in an inuiive way. Definiion 5. For <T, define S É = 8s 2 S 2N ÁÉ 4s5 < à9 and S + = 8s 2 S 2N Á+ 4s5 < à91 o be random subses of saes ha are projeced finiely ofen. Lemma 4. The random ses S É and S + are almos surely nonempy. Proof. See Appendix D. We now provide several remarks regarding he projecion operaor Á M. The value of a sae s can only be increased by Á M if we visi a smaller sae; i.e., S n s. This saemen is obvious from he second condiion of (8). Similarly, he value of he sae can only be decreased by Á M if he visied sae is larger ; i.e., S n s. Inuiively, i can be useful o imagine ha, in some sense, he values of saes can be pushed up from he lef and pushed down from he righ. Finally, because of our assumpion ha S is only a parial ordering, he updae process (from Á M ) becomes more difficul o analyze han in he oal ordering case. To faciliae he analysis of he process, we inroduce he noions of lower (upper) immediae neighbors and lower (upper) updae rees. Definiion 6. For s = 4m1 i5 2 S, we define he se of lower immediae neighbors S L 4s5 in he following way: S L 4s5 = 8s 0 2 S 2 s 0 s1 s 0 6= s 00 2 S 1s 00 6= s1 s 00 6= s 0 1s 0 s 00 s In oher words, here does no exis s 00 in beween s 0 and s. The se of upper immediae neighbors S U 4s5 is defined in a similar way: S U 4s5 = 8s 0 2 S 2 s 0 s1 s 0 6= s 00 2 S 1s 00 6= s1 s 00 6= s 0 1s 0 s 00 s The inuiion for he nex lemma is ha if some sae s is increased by Á M, hen i mus have been caused by visiing a lower sae. In paricular, eiher he visied sae was one of he lower immediae neighbors or one of he lower immediae neighbors was also increased by Á M. In eiher case, one of he lower immediae neighbors has he same value as s. This lemma is crucial laer in he proof. Lemma 5. Suppose he value of s is increased by Á M on some ieraion n: s 6= S n and V né1 4s5 < V n 4s5. Then here exiss anoher sae s 0 2 S L 4s5 (in he se of lower immediae neighbors) whose value is equal o he newly updaed value: V n4s0 5 = V n4s5. Proof. See Appendix E. Definiion 7. Consider some ó 2 Ï. Le s 2 S \S É, meaning ha s is increased by Á M infiniely ofen: ón É É 4s5ó=à.Alower updae ree T 4s5 is an organizaion of he saes in he se L = 8s 0 2 S 2s 0 s9 where he value of each node is an elemen of L. The ree T É 4s5 is consruced according o he following rules. (i) The roo node of T É 4s5 has value s. (ii) Consider an arbirary node j wih value s j. (a) If s j 2 S \S É, hen for each s jc 2 S L 4s j 5, add a child node wih value s jc o he node j. (b) If s j 2 S É, hen j is a leaf node (i does no have any child nodes). The ree T É 4s5 is unique and can easily be buil by saring wih he roo node and successively applying he rules. The upper updae ree T + 4s5 is defined in a compleely analogous way. Noe ha he lower updae ree is random and we now argue ha for each ó, i is well defined. We observe ha i canno be he case for some sae s o be an elemen of S \S É while S L 4s 0 5 = 89 because for i o be increased infiniely ofen, here mus exis a leas one lower sae whose observaions cause he monooniciy violaions. Using his fac along wih he finieness of S and Lemma 4, which saes ha S É is nonempy, i is clear ha all pahs down he ree reach a leaf node (i.e., an elemen of S É ). The reason for disconinuing he ree a saes in S É is ha our convergence proof employs an inducionlike argumen up he ree, saring wih saes in S É. Lasly, we remark ha i is possible for muliple nodes o have he same value. As an illusraive example, consider he case wih S = wih being he componenwise inequaliy. Assume ha for a paricular ó 2 Ï, s = 4s x 1s y 5 2 S É if and only if s x = 0 or s y = 0 (lower boundary of he square). Figure 4 shows he realizaion of he lower updae ree a evaluaed a he sae Figure 4. S = 2 1 S \S 0 S 1 Illusraion of he lower updae ree. 2 T {(2, 2)} = (0, 2) (0, 1) (1, 0) (1, 1) (0, 1) (1, 2) (1, 0) (1, 1) (2, 2) (2, 1) (2, 0)

11 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions 1498 Operaions Research 63(6), pp , 2015 INFORMS Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. The nex lemma is a useful echnical resul used in he convergence proof. Lemma 6. For any s 2 S, apple Y m lim 41 É Å n 4s55 = 0 a0s0 m!à n=1 Proof. See Appendix F. Wih hese preliminaries in mind (oher elemens will be defined as hey arise), we begin he convergence analysis. Proof of Theorem 1. As previously menioned, o show ha he sequence V n 4s5 (almos surely) converges o V 4s5 for each and s, we need o argue ha V n 4s5 evenually eners every recangle (or inerval, when we discuss a specific componen of he vecor ) defined by he sequence L k k 4s5 and U 4s5. Recall ha he esimaes of he value funcion produced by he algorihm are indexed by n and he bounding recangles are indexed by k. Hence, we aim o show ha for each k, we have ha for n sufficienly large, i is rue ha 8 s 2 S, L k 4s5 4s5 U k 4s (16) Following his sep, an applicaion of (15) in Lemma 2 complees he proof. We show he second inequaliy of (16) and remark ha he firs can be shown in a compleely symmeric way. The goal is hen o show ha 9 N k < à a.s. such ha 8 n æ N k and 8 s 2 S, 4s5 U k 4s (17) Choose ó 2 Ï. For ease of presenaion, he dependence of he random variables on ó is omied. We use backward inducion on o show his resul, which is he same echnique used in Nascimeno and Powell (2009). The inducive sep is broken up ino wo cases, s 2 S É and s 2 S \S É. Base case, = T. Since for all s 2 S, k, and n, we have ha (by definiion) V T n4s5 = U T k 4s5 = 0, we can arbirarily selec NT k. Suppose ha for each k, we choose N T k = N Á, allowing us o use he propery of N Á ha if s 2 S É, hen he esimae of he value a s is no longer affeced by Á M on ieraions n æ N Á. Inducion hypohesis, + 1. Assume for + 1 T ha 8 k æ 0, 9 N k +1 k < à such ha N+1 æ N Á and 8 n æ N k +1 4s5. +1, we have ha 8 s 2 S, V +1 n 4s5 U k Inducive sep from + 1 o. The remainder of he proof concerns his inducive sep and is broken up ino wo cases, s 2 S É and s 2 S \S É. For each s, we show he exisence of a sae dependen ieraion Ñ k4s5 æ N Á, such ha for n æ Ñ k 4s5, (17) holds. The sae independen ieraion N k is hen aken o be he maximum of Ñ k4s5 over s. Case 12 s2 S É. To prove his case, we induc forward on k. Noe ha we are sill inducing backward on, so he inducion hypohesis for + 1 sill holds. The inducive sep is proved in essenially he same manner as Theorem 2 of Tsisiklis (1994). Base case, k = 0 4wihin inducion on 5. By Assumpion 3 and (14), we have ha U 04s5 æ V max. Bu by Assumpion 4, he updaing equaion (Sep 2b of Figure 2), and he iniializaion of V 04s5 2 1V max7, we can easily see ha V n4s5 2 1V max7 for any n and s. Therefore, V n4s5 U 04s5, for any n and s, so we can choose N 0 arbirarily. Le us choose Ñ 04s5 = N , and since N+1 came from he inducion hypohesis for + 1, i is also rue ha Ñ 04s5 æ N Á. Inducion hypohesis, k4wihin inducion on 5. Assume for k æ 0 ha 9 Ñ k k 4s5 < à such ha Ñ 4s5 æ N +1 k æ N Á and 8 n æ Ñ k4s5, we have V n4s5 U k4s5. Before we begin he inducive sep from k o k + 1, we define some addiional sequences and sae a few useful lemmas. Definiion 8. The posiive incurred noise, since a saring ieraion m, is represened by he sequence W n1 m 4s5. For s 2 S, i is defined as follows: W m1 m 4s5 = 01 W n+11m 4s5 = 641 É Å n n1 m 4s55W 4s5 + Å n 4s5wn+1 4s57 + for n æ m0 The erm W n+11m 4s5 is only updaed from W n1 m 4s5 when s = S n, i.e., on ieraions where he sae is visied by he algorihm, because he sepsize Å n 4s5 = 0 whenever s 6= Sn. Lemma 7. For any saring ieraion m æ 0 and any sae s 2 S, under Assumpions 4, 5, and 6, W n1 m 4s5 asympoically vanishes: lim n!à W n1 m 4s5 = 0 a0s0 Proof. The proof is analogous o ha of Lemma 6.2 in Nascimeno and Powell (2009), which uses a maringale convergence argumen. To reemphasize he presence of ó, we noe ha he following definiion and he subsequen lemma boh use he realizaion Ñ k 4s54ó5 from he ó chosen a he beginning of he proof. Definiion 9. The oher auxiliary sequence ha we need is X n 4s5, which applies he smoohing sep o 4HU k 5 4s5. For any sae s 2 S, le k XÑ 4s5 4s5 = U k 4s51 X n+1 4s5 = 41 É Å n 4s55Xn 4s5 + Ån 4s54HU k 5 4s5 Lemma 8. For n æ Ñ k 4s5 and sae s 2 S É, 4s5 Xn 4s5 + W n1 Ñ k 4s5 Proof. See Appendix G. 4s for n æ Ñ k 4s

12 Jiang and Powell: An Approximae Dynamic Programming Algorihm for Monoone Value Funcions Operaions Research 63(6), pp , 2015 INFORMS 1499 Downloaded from informs.org by [ ] on 05 January 2016, a 21:41. For personal use only, all righs reserved. Inducive sep from k o k + 1. If U k4s5 = 4HU k 5 4s5, hen by Lemma 2, we see ha U k k+1 4s5 = U 4s5 so V n U k k+1 4s5 U 4s5 for any n æ Ñ k 4s5 and he proof is complee. Since we know ha 4HU k 5 4s5 U k 4s5 by Lemma 2, we can now assume ha s 2 K, where K = 8s 0 2 S 24HU k 5 4s 0 5<U k 4s0 5 In his case, we can define Ñ k = min s2s É \K Choose Ñ k+1 Y Ñ k+1 4s5É1 n=ñ k 4s5 U k 4s5 É 4HU k 5 4s5 4 4s5 æ Ñ k 4s5 such ha 41 É Å n 4s and for all n æ Ñ k+1 4s5, W n1 Ñ k 4s5 4s5 Ñ k 0 > 00 Noe ha Ñ k+1 4s5 clearly exiss because boh sequences converge o zero, by Lemma 6 and 7. Recursively using he definiion of X n 4s5, we ge ha X n 4s5 = Çn 4s5U k 4s É Çn 4s554HU k 5 4s51 where Ç n 4s5 = Q né1 41 É l=ñ k 4s5 Ål 4s55. Noice ha for n æ Ñ k+1 4s5, we know ha Ç n 4s5 1, so we can wrie 4 X n 4s5 = Çn 4s5 U k 4s É Çn 4s554HU k 5 4s5 = Ç n 4s56U k 4s5 É 4HU k 5 4s57 + 4HU k 5 4s5 1 4 U k 4s HU k 5 4s5 = 1 2 6U k 4s5 + 4HU k 5 4s57 É 1 4 6U k 4s5 É 4HU k 5 4s57 U k+1 4s5 É Ñ k 0 (18) We can apply Lemma 8 and (18) o ge 4s5 Xn 4s5 + W n1 Ñ k 4s5 4s5 4U k+1 4s5 É Ñ k 5 + Ñk = U k+1 4s51 for all n æ Ñ k+1 4s5. Thus, he inducive sep from k o k+1 is complee. Case 22 s2 S \S É. Recall ha we are sill in he inducive sep from + 1 o (where he hypohesis was he exisence of N+1 k ). As previously menioned, he proof for his case relies on an inducion-like argumen over he ree T É 4s5. The following lemma is he core of our argumen, and he proof is provided below. Lemma 9. Consider some k æ 0 and a node j of T É 4s5 wih value s j 2 S \S É and le he C j æ 1 child nodes of j be denoed by he se 8s j11 1s j12 01s j1cj 9. Suppose ha for each s j1c where 1 c C j, we have ha 9 Ñ k4s j1c5<à such ha 8 n æ Ñ k4s j1c5, 4s j1c5 U k 4s j1c (19) Then 9 Ñ k4s j5<à such ha 8 n æ Ñ k4s j5, 4s j5 U k 4s j Proof. Firs, noe ha by he inducion hypohesis, par (ii) of Lemmas 1, and 2, we have he inequaliy 4H 5 4s5 4HU k 5 4s5 U k 4s (20) We break he proof ino several seps. Sep 1. Le us consider he ieraion Ñ defined by Ñ = min4n 2 N ÁÉ 4s j 52 n æ max c Ñ k 4s j1c551 which exiss because s j 2 S \S É and is increased infiniely ofen. This means ha Á M increased he value of sae s j on ieraion Ñ. As he firs sep, we show ha 8 n æ Ñ, 4s j5 U k 4s j5 + W n1 Ñ 4s j 51 (21) using an inducion argumen. Base case, n = Ñ. Using Lemma 5, we know ha for some c C j 9, we have 4s j5 = 4s j1c5 U k 4s j1c5 U k 4s j5 + W n1 Ñ 4s j The fac ha Ñ æ Ñ k4s j1c5 for every c jusifies he firs inequaliy and he second inequaliy above follows from he monooniciy wihin U k (see Lemma 3) and ha W Ñ1Ñ 4s j 5 = 0. Inducion hypohesis, n. Suppose (21) is rue for n where n æ Ñ. Inducive sep from n o n + 1. Consider he following wo cases: (I) Suppose n N ÁÉ 4s j 5. The proof for his is exacly he same as for he base case, excep we use W n+11 Ñ 4s j 5 æ 0 o show he inequaliy. Again, his sep depends heavily on Lemma 5 and on every child node represening a sae ha saisfies (19). (II) Suppose n+1 62 N ÁÉ 4s j 5. There are again wo cases o consider: (A) Suppose S n+1 = s j. Then +1 4s j 5 = z n+1 4s j 5 = 41 É Å n+1 4s j 55 4s j5 + Å n+1 4s j 5ˆv n+1 4s j 5 41 É Å n+1 4s j 554U k 4s j5 + W n1 Ñ 4s j 55 + Å n+1 4s j 564H 5 4s j 5 + w n+1 4s j 57 U k 4s j5 + W n+11 Ñ 4s j 51 where he firs inequaliy follows from he inducion hypohesis for n and he second inequaliy follows by (20).

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS

AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS DANIEL R. JIANG AND WARREN B. POWELL Absrac. Many sequenial decision problems can be formulaed as Markov Decision Processes (MDPs)

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

ONLINE SUPPLEMENT: AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS. 1. Preliminaries

ONLINE SUPPLEMENT: AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS. 1. Preliminaries ONLINE SUPPLEMENT: AN APPROXIMATE DYNAMIC PROGRAMMING ALGORITHM FOR MONOTONE VALUE FUNCTIONS DANIEL R. JIANG AND WARREN B. POWELL Absrac. In his online supplemen we provide he proofs o a condiion for monooniciy

More information

Optimal approximate dynamic programming algorithms for a general class of storage problems

Optimal approximate dynamic programming algorithms for a general class of storage problems Opimal approximae dynamic programming algorihms for a general class of sorage problems Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

The Strong Law of Large Numbers

The Strong Law of Large Numbers Lecure 9 The Srong Law of Large Numbers Reading: Grimme-Sirzaker 7.2; David Williams Probabiliy wih Maringales 7.2 Furher reading: Grimme-Sirzaker 7.1, 7.3-7.5 Wih he Convergence Theorem (Theorem 54) and

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

Martingales Stopping Time Processes

Martingales Stopping Time Processes IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765. Volume 11, Issue 1 Ver. II (Jan - Feb. 2015), PP 59-64 www.iosrjournals.org Maringales Sopping Time Processes I. Fulaan Deparmen

More information

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions

More information

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8)

Econ107 Applied Econometrics Topic 7: Multicollinearity (Studenmund, Chapter 8) I. Definiions and Problems A. Perfec Mulicollineariy Econ7 Applied Economerics Topic 7: Mulicollineariy (Sudenmund, Chaper 8) Definiion: Perfec mulicollineariy exiss in a following K-variable regression

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Air Traffic Forecast Empirical Research Based on the MCMC Method

Air Traffic Forecast Empirical Research Based on the MCMC Method Compuer and Informaion Science; Vol. 5, No. 5; 0 ISSN 93-8989 E-ISSN 93-8997 Published by Canadian Cener of Science and Educaion Air Traffic Forecas Empirical Research Based on he MCMC Mehod Jian-bo Wang,

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits

Exponential Weighted Moving Average (EWMA) Chart Under The Assumption of Moderateness And Its 3 Control Limits DOI: 0.545/mjis.07.5009 Exponenial Weighed Moving Average (EWMA) Char Under The Assumpion of Moderaeness And Is 3 Conrol Limis KALPESH S TAILOR Assisan Professor, Deparmen of Saisics, M. K. Bhavnagar Universiy,

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims

Problem Set 5. Graduate Macro II, Spring 2017 The University of Notre Dame Professor Sims Problem Se 5 Graduae Macro II, Spring 2017 The Universiy of Nore Dame Professor Sims Insrucions: You may consul wih oher members of he class, bu please make sure o urn in your own work. Where applicable,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

An Introduction to Backward Stochastic Differential Equations (BSDEs) PIMS Summer School 2016 in Mathematical Finance.

An Introduction to Backward Stochastic Differential Equations (BSDEs) PIMS Summer School 2016 in Mathematical Finance. 1 An Inroducion o Backward Sochasic Differenial Equaions (BSDEs) PIMS Summer School 2016 in Mahemaical Finance June 25, 2016 Chrisoph Frei cfrei@ualbera.ca This inroducion is based on Touzi [14], Bouchard

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

) were both constant and we brought them from under the integral.

) were both constant and we brought them from under the integral. YIELD-PER-RECRUIT (coninued The yield-per-recrui model applies o a cohor, bu we saw in he Age Disribuions lecure ha he properies of a cohor do no apply in general o a collecion of cohors, which is wha

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models

A Specification Test for Linear Dynamic Stochastic General Equilibrium Models Journal of Saisical and Economeric Mehods, vol.1, no.2, 2012, 65-70 ISSN: 2241-0384 (prin), 2241-0376 (online) Scienpress Ld, 2012 A Specificaion Tes for Linear Dynamic Sochasic General Equilibrium Models

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

GMM - Generalized Method of Moments

GMM - Generalized Method of Moments GMM - Generalized Mehod of Momens Conens GMM esimaion, shor inroducion 2 GMM inuiion: Maching momens 2 3 General overview of GMM esimaion. 3 3. Weighing marix...........................................

More information

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite

The Optimal Stopping Time for Selling an Asset When It Is Uncertain Whether the Price Process Is Increasing or Decreasing When the Horizon Is Infinite American Journal of Operaions Research, 08, 8, 8-9 hp://wwwscirporg/journal/ajor ISSN Online: 60-8849 ISSN Prin: 60-8830 The Opimal Sopping Time for Selling an Asse When I Is Uncerain Wheher he Price Process

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

Comparing Means: t-tests for One Sample & Two Related Samples

Comparing Means: t-tests for One Sample & Two Related Samples Comparing Means: -Tess for One Sample & Two Relaed Samples Using he z-tes: Assumpions -Tess for One Sample & Two Relaed Samples The z-es (of a sample mean agains a populaion mean) is based on he assumpion

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

Echocardiography Project and Finite Fourier Series

Echocardiography Project and Finite Fourier Series Echocardiography Projec and Finie Fourier Series 1 U M An echocardiagram is a plo of how a porion of he hear moves as he funcion of ime over he one or more hearbea cycles If he hearbea repeas iself every

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011

Maintenance Models. Prof. Robert C. Leachman IEOR 130, Methods of Manufacturing Improvement Spring, 2011 Mainenance Models Prof Rober C Leachman IEOR 3, Mehods of Manufacuring Improvemen Spring, Inroducion The mainenance of complex equipmen ofen accouns for a large porion of he coss associaed wih ha equipmen

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3

Macroeconomic Theory Ph.D. Qualifying Examination Fall 2005 ANSWER EACH PART IN A SEPARATE BLUE BOOK. PART ONE: ANSWER IN BOOK 1 WEIGHT 1/3 Macroeconomic Theory Ph.D. Qualifying Examinaion Fall 2005 Comprehensive Examinaion UCLA Dep. of Economics You have 4 hours o complee he exam. There are hree pars o he exam. Answer all pars. Each par has

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Removing Useless Productions of a Context Free Grammar through Petri Net

Removing Useless Productions of a Context Free Grammar through Petri Net Journal of Compuer Science 3 (7): 494-498, 2007 ISSN 1549-3636 2007 Science Publicaions Removing Useless Producions of a Conex Free Grammar hrough Peri Ne Mansoor Al-A'ali and Ali A Khan Deparmen of Compuer

More information

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS

CENTRALIZED VERSUS DECENTRALIZED PRODUCTION PLANNING IN SUPPLY CHAINS CENRALIZED VERSUS DECENRALIZED PRODUCION PLANNING IN SUPPLY CHAINS Georges SAHARIDIS* a, Yves DALLERY* a, Fikri KARAESMEN* b * a Ecole Cenrale Paris Deparmen of Indusial Engineering (LGI), +3343388, saharidis,dallery@lgi.ecp.fr

More information

This document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC

This document was generated at 1:04 PM, 09/10/13 Copyright 2013 Richard T. Woodward. 4. End points and transversality conditions AGEC his documen was generaed a 1:4 PM, 9/1/13 Copyrigh 213 Richard. Woodward 4. End poins and ransversaliy condiions AGEC 637-213 F z d Recall from Lecure 3 ha a ypical opimal conrol problem is o maimize (,,

More information

CHAPTER 2 Signals And Spectra

CHAPTER 2 Signals And Spectra CHAPER Signals And Specra Properies of Signals and Noise In communicaion sysems he received waveform is usually caegorized ino he desired par conaining he informaion, and he undesired par. he desired par

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds

Kriging Models Predicting Atrazine Concentrations in Surface Water Draining Agricultural Watersheds 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Kriging Models Predicing Arazine Concenraions in Surface Waer Draining Agriculural Waersheds Paul L. Mosquin, Jeremy Aldworh, Wenlin Chen Supplemenal Maerial Number

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

The Arcsine Distribution

The Arcsine Distribution The Arcsine Disribuion Chris H. Rycrof Ocober 6, 006 A common heme of he class has been ha he saisics of single walker are ofen very differen from hose of an ensemble of walkers. On he firs homework, we

More information

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter

State-Space Models. Initialization, Estimation and Smoothing of the Kalman Filter Sae-Space Models Iniializaion, Esimaion and Smoohing of he Kalman Filer Iniializaion of he Kalman Filer The Kalman filer shows how o updae pas predicors and he corresponding predicion error variances when

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

Seminar 4: Hotelling 2

Seminar 4: Hotelling 2 Seminar 4: Hoelling 2 November 3, 211 1 Exercise Par 1 Iso-elasic demand A non renewable resource of a known sock S can be exraced a zero cos. Demand for he resource is of he form: D(p ) = p ε ε > A a

More information

On Multicomponent System Reliability with Microshocks - Microdamages Type of Components Interaction

On Multicomponent System Reliability with Microshocks - Microdamages Type of Components Interaction On Mulicomponen Sysem Reliabiliy wih Microshocks - Microdamages Type of Componens Ineracion Jerzy K. Filus, and Lidia Z. Filus Absrac Consider a wo componen parallel sysem. The defined new sochasic dependences

More information

4 Sequences of measurable functions

4 Sequences of measurable functions 4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences

More information

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should

In this chapter the model of free motion under gravity is extended to objects projected at an angle. When you have completed it, you should Cambridge Universiy Press 978--36-60033-7 Cambridge Inernaional AS and A Level Mahemaics: Mechanics Coursebook Excerp More Informaion Chaper The moion of projeciles In his chaper he model of free moion

More information

Energy Storage and Renewables in New Jersey: Complementary Technologies for Reducing Our Carbon Footprint

Energy Storage and Renewables in New Jersey: Complementary Technologies for Reducing Our Carbon Footprint Energy Sorage and Renewables in New Jersey: Complemenary Technologies for Reducing Our Carbon Fooprin ACEE E-filliaes workshop November 14, 2014 Warren B. Powell Daniel Seingar Harvey Cheng Greg Davies

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updaes Michael Kearns AT&T Labs mkearns@research.a.com Sainder Singh AT&T Labs baveja@research.a.com Absrac We give he firs rigorous upper bounds on he error

More information

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004

ODEs II, Lecture 1: Homogeneous Linear Systems - I. Mike Raugh 1. March 8, 2004 ODEs II, Lecure : Homogeneous Linear Sysems - I Mike Raugh March 8, 4 Inroducion. In he firs lecure we discussed a sysem of linear ODEs for modeling he excreion of lead from he human body, saw how o ransform

More information

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS *

CHERNOFF DISTANCE AND AFFINITY FOR TRUNCATED DISTRIBUTIONS * haper 5 HERNOFF DISTANE AND AFFINITY FOR TRUNATED DISTRIBUTIONS * 5. Inroducion In he case of disribuions ha saisfy he regulariy condiions, he ramer- Rao inequaliy holds and he maximum likelihood esimaor

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

Appendix 14.1 The optimal control problem and its solution using

Appendix 14.1 The optimal control problem and its solution using 1 Appendix 14.1 he opimal conrol problem and is soluion using he maximum principle NOE: Many occurrences of f, x, u, and in his file (in equaions or as whole words in ex) are purposefully in bold in order

More information