On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Size: px
Start display at page:

Download "On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems"

Transcription

1 MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp ISSN X (prin) ISSN (online) hp://dx.doi.org/ /moor INFORMS On Boundedness of Q-Learning Ieraes for Sochasic Shores Pah Problems Huizhen Yu Laboraory for Informaion and Decision Sysems, Massachuses Insiue of Technology, Cambridge, Massachuses 02139, Dimiri P. Berseas Laboraory for Informaion and Decision Sysems and Deparmen of EECS, Massachuses Insiue of Technology, Cambridge, Massachuses 02139, We consider a oally asynchronous sochasic approximaion algorihm, Q-learning, for solving finie space sochasic shores pah (SSP) problems, which are undiscouned, oal cos Marov decision processes wih an absorbing and cos-free sae. For he mos commonly used SSP models, exising convergence proofs assume ha he sequence of Q-learning ieraes is bounded wih probabiliy one, or some oher condiion ha guaranees boundedness. We prove ha he sequence of ieraes is naurally bounded wih probabiliy one, hus furnishing he boundedness condiion in he convergence proof by Tsisilis [Tsisilis JN (1994) Asynchronous sochasic approximaion and Q-learning. Machine Learn. 16: ] and esablishing compleely he convergence of Q-learning for hese SSP models. Key words: Marov decision processes; Q-learning; sochasic approximaion; dynamic programming; reinforcemen learning MSC2000 subjec classificaion: Primary: 90C40, 93E20, 90C39; secondary: 68W15, 62L20 OR/MS subjec classificaion: Primary: dynamic programming/opimal conrol, analysis of algorihms; secondary: Marov, finie sae Hisory: Received June 6, 2011; revised April 18, Published online in Aricles in Advance November 28, Inroducion. Sochasic shores pah (SSP) problems are Marov decision processes (MDP) in which here exiss an absorbing and cos-free sae, and he goal is o reach ha sae wih minimal expeced cos. In his paper we focus on finie sae and conrol models under he undiscouned oal cos crierion. We call a policy proper if under ha policy he goal sae is reached wih probabiliy 1 (w.p.1) for every iniial sae, and improper oherwise. Le SD denoe he se of saionary and deerminisic policies. We consider a broad class of SSP models, which saisfy he following general assumpion inroduced in Berseas and Tsisilis [2]: Assumpion 1.1. (i) There is a leas one proper policy in SD, and (ii) any improper policy in SD incurs infinie cos for a leas one iniial sae. We will analyze a oally asynchronous sochasic approximaion algorihm, he Q-learning algorihm (Wains [9], Tsisilis [8]), for solving SSP problems. This algorihm generaes a sequence of so-called Q-facors, which represen expeced coss associaed wih iniial sae-conrol pairs, and i aims o obain in he limi he opimal Q-facors of he problem, from which he opimal coss and opimal policies can be deermined. Under Assumpion 1.1, Tsisilis [8, Theorems 2 and 4(c)] proved ha if he sequence Q of Q-learning ieraes is bounded w.p.1, hen Q converges o he opimal Q-facors Q w.p.1. Regarding he boundedness condiion, earlier resuls given in Tsisilis [8, Lemma 9] and he boo by Berseas and Tsisilis [3, 5.6] show ha i is saisfied in he special case where boh he one-sage coss and he iniial values Q 0 are nonnegaive. Alernaive o Tsisilis [8], here is also a line of convergence analysis of Q-learning given in Abounadi e al. [1], which does no require he boundedness condiion. However, i requires a more resricive asynchronous compuaion framewor han he oally asynchronous framewor reaed in Tsisilis [8]; in paricular, i requires some addiional condiions on he iming and frequency of componen updaes in Q-learning. In his paper we prove ha Q is naurally bounded w.p.1 for SSP models saisfying Assumpion 1.1. Our resul hus furnishes he boundedness condiion in he convergence proof by Tsisilis [8] and, ogeher wih he laer, esablishes compleely he convergence of Q-learning for hese SSP models. This boundedness resul is useful as well in oher conexs concerning SSP problems. In paricular, i is used in he convergence analysis of a new Q-learning algorihm for SSP, proposed recenly by he auhors Yu and Berseas [12], where he boundedness of he ieraes of he new algorihm was relaed o ha of he classical Q-learning algorihm considered here. The line of analysis developed in his paper has also been applied by Yu in [11] o show he boundedness and convergence of Q-learning for sochasic games of he SSP ype. We organize he paper and he resuls as follows. In 2 we inroduce noaion and preliminaries. In 3 we give he boundedness proof. Firs we show in 3.1 ha Q is bounded above w.p.1. We hen give in 3.2 a shor 209

2 210 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS proof ha Q is bounded below w.p.1 for a special case wih nonnegaive expeced one-sage coss. In 3.3 we prove ha Q is bounded below w.p.1 for he general case; he proof is long, so we divide i ino several seps given in separae subsecions. In 4 we illusrae some of hese proof seps using a simple example. 2. Preliminaries Noaion and definiions. Le S o = 0 1 n denoe he sae space, where sae 0 is he absorbing and cos-free goal sae. Le S = S o \ 0. For each sae i S, le U i denoe he finie se of feasible conrols, and for noaional convenience, le U 0 = 0. We denoe by he conrol space, = i S o U i. We define R o o be he se of sae and feasible conrol pairs, i.e., R o = i u i S o u U i, and we define R = R o \ 0 0. The sae ransiions and associaed one-sage coss are defined as follows. From sae i wih conrol u U i, a ransiion o sae j occurs wih probabiliy p ij u and incurs a one-sage cos ĝ i u j, or more generally, a random one-sage cos ĝ i u j where is a random disurbance. In he laer case random one-sage coss are all assumed o have finie variance. Le he expeced one-sage cos of applying conrol u a sae i be g i u. For sae 0, p 00 0 = 1 and he self ransiion incurs cos 0. We denoe a general hisory-dependen, randomized policy by. A randomized Marov policy is a policy of he form = 0 1, where each funcion, 0, maps each sae i S o o a probabiliy disribuion i over he se of feasible conrols U i. A randomized Marov policy of he form is said o be a saionary randomized policy and is also denoed by. A saionary deerminisic policy is a saionary randomized policy ha for each sae i assigns probabiliy 1 o a single conrol i in U i ; he policy is also denoed by. The problem is o solve he oal cos MDP on S o, where we define he oal cos of a policy for iniial sae i S o be J i = lim inf J i wih J i being he expeced -sage cos of saring from sae i. Assumpion 1.1 is saed for his oal cos definiion. The opimal cos for iniial sae i is J i = inf J i. Under Assumpion 1.1, i is esablished in Berseas and Tsisilis [2] ha he Bellman equaion (or he oal cos opimaliy equaion) { J i = TJ i = def min g i u + } p ij u J j i S (2.1) u U i j S has a unique soluion, which is he opimal cos funcion J, and here exiss an opimal policy in SD, which is proper, of course. The Q-learning algorihm operaes on he so-called Q-facors, Q = Q i u i u R o R R o. They represen coss associaed wih iniial sae-conrol pairs. For each sae-conrol pair i u R o, he opimal Q-facor Q i u is he cos of saring from sae i, applying conrol u, and aferwards following an opimal policy. (Here Q 0 0 = 0, of course.) Then, by he resuls of Berseas and Tsisilis [2] menioned above, under Assumpion 1.1, he opimal Q-facors and opimal coss are relaed by Q i u = g i u + p ij u J j J i = min Q i u u U i j S and Q resriced o R is he unique soluion of he Bellman equaion for Q-facors: i u R Q i u = FQ i u = def g i u + p ij u min Q j v i u R (2.2) v U j j S Under Assumpion 1.1, he Bellman operaors T and F given in Equaions (2.1), (2.2) are no necessarily conracion mappings wih respec o he sup-norm, bu are only nonexpansive. They would be conracions wih respec o a weighed sup-norm if all policies were proper (see Berseas and Tsisilis [3, Proposiion 2.2, pp ]), and he convergence of Q-learning in ha case was esablished by Tsisilis [8, Theorems 3 and 4(b)]. Anoher basic fac is ha for a proper policy SD, he associaed Bellman operaor F given by F Q i u = g i u + j S p ij u Q j j i u R (2.3) is a weighed sup-norm conracion, wih he norm and he modulus of conracion depending on. This fac also follows from Berseas and Tsisilis [3, Proposiion 2.2, pp ].

3 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS Q-learning algorihm. The Q-learning algorihm is an asynchronous sochasic ieraive algorihm for finding Q. Given an iniial Q 0 R R o wih Q = 0, he algorihm generaes a sequence Q by updaing a subse of Q-facors a each ime and eeping he res unchanged. In paricular, Q 0 0 = 0 for all. For each i u R and 0, le j iu S o be he successor sae of a random ransiion from sae i afer applying conrol u, generaed a ime according o he ransiion probabiliy p ij u. Then, wih s = j iu as a shorhand o simplify noaion, he ierae Q +1 i u is given by ( ) Q +1 i u = 1 i u Q i u + i u g i u + i u + min Q sv i u s v (2.4) v U s The variables in he above ieraion need o saisfy cerain condiions, which will be specified shorly. Firs we describe wha hese variables are. (i) i u 0 is a sepsize parameer, and i u = 0 if he i u h componen is no seleced o be updaed a ime. (ii) g i u + i u is he random one-sage cos of he ransiion from sae i o j iu wih conrol u; i.e., i u is he difference beween he ransiion cos and is expeced value. (iii) jv i u j v R o, are nonnegaive inegers wih jv i u. We will refer o hem as he delayed imes. In a disribued asynchronous compuaion model, if we associae a processor wih each componen i u, whose as is o updae he Q-facor for i u, hen jv i u can be viewed as he communicaion delay beween he processors a i u and j v a ime. We now describe he condiions on he variables. We regard all he variables in he Q-learning algorihm as random variables on a common probabiliy space F P. This means ha he sepsizes and delayed imes can be chosen based on he hisory of he algorihm. To deermine he values of hese variables, including which componens o updae a ime, he algorihm may use auxiliary variables ha do no appear in Equaion (2.4). Thus, o describe rigorously he dependence relaion beween he variables, i is convenien o inroduce a family F of increasing sub- -fields of F. Then he following informaion srucure condiion is required: Q 0 is F 0 -measurable, and for every i u and j v R and 0, i u and jv i u are F -measurable, and i u and j iu are F +1 -measurable. The condiion means ha in ieraion (2.4), he algorihm eiher chooses he sepsize i u and he delayed imes jv i u j v R, before generaing j iu, or i chooses he values of he former variables in a way ha does no use he informaion of j iu. We noe ha alhough his condiion seems absrac, i is naurally saisfied by he algorihm in pracice. In probabilisic erms and wih he noaion jus inroduced, he successor saes and random ransiion coss appearing in he algorihm need o saisfy he following relaions: for all i u R and 0, P j iu = j F = p ij u j S o (2.5) E i u F = 0 E 2 i u F C (2.6) where C is some deerminisic consan. There are wo more condiions on he algorihm. In he oally asynchronous compuaion framewor, we have he following minimal requiremen on he delayed imes used in each componen updae: w.p.1, jv lim i u = i u j v R (2.7) We require he sepsizes o saisfy a sandard condiion for sochasic approximaion algorihms: w.p.1, i u = i u 2 < i u R (2.8) 0 0 We collec he algorihmic condiions menioned above in one assumpion below. We noe ha hese condiions are naural and fairly mild for he Q-learning algorihm. The informaion srucure condiion holds, and w.p.1, Equa- Assumpion 2.1 (Algorihmic Condiions). ions (2.5) (2.8) are saisfied. For boundedness of he Q-learning ieraes, he condiion (2.7) is in fac no needed (which is no surprising inuiively, since bounded delayed imes canno conribue o insabiliy of he ieraes). We herefore also sae a weaer version of Assumpion 2.1, excluding condiion (2.7), and we will use i laer in he boundedness resuls for he algorihm. Assumpion 2.2. saisfied. The informaion srucure condiion holds, and w.p.1, Equaions (2.5), (2.6), (2.8) are

4 212 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS 2.3. Convergence of Q-learning: Earlier resuls. The following convergence and boundedness resuls for Q-learning in SSP problems are esablished essenially in Tsisilis [8]; see also Berseas and Tsisilis [3, 4.3 and 5.6]. Theorem 2.1 (Tsisilis [8]). Le Q be he sequence generaed by he ieraion (2.4) wih any given iniial Q 0. Then, under Assumpion 2.1, Q converges o Q w.p.1 if eiher of he following holds: (i) all policies of he SSP are proper; (ii) he SSP saisfies Assumpion 1.1 and in addiion, Q is bounded w.p.1. In case (i), we also have ha Q is bounded w.p.1 under Assumpion 2.2 (insead of Assumpion 2.1). Noe ha for a proper policy SD, by considering he SSP problem ha has as is only policy, he conclusions of Theorem 2.1 in case (i) also apply o he evaluaion of policy wih Q-learning. In his conex, Q in he conclusions corresponds o he Q-facor vecor Q, which is he unique fixed poin of he weighed sup-norm conracion mapping F (see Equaion (2.3)). The conribuion of his paper is o remove he boundedness requiremen on Q in case (ii). Our proof argumens will be largely differen from hose used o esablish he preceding heorem. For compleeness, however, in he res of his secion, we explain briefly he basis of he analysis ha gives Theorem 2.1, and he condiions involved. In he analyical framewor of Tsisilis [8], we view ieraion (2.4) as a sochasic approximaion algorihm and rewrie i equivalenly as Q +1 i u = 1 i u Q i u + i u FQ iu i u + i u i u (2.9) where F is he Bellman operaor given by Equaion (2.2); Q iu i u j v j v R o (which involve he delayed imes); and i u is a noise erm given by Q jv i u = g i u + i u + min v U s Q sv denoes he vecor of Q-facors wih componens i u s v FQ iu i u (wih s = j iu ). The noise erms i u i u R, are F +1 -measurable. Condiional on F, hey can be shown o have zero mean and mee a requiremen on he growh of he condiional variance, when he Q-learning algorihm saisfies cerain condiions (he same as hose in Assumpion 2.1 excep for a slighly sronger sepsize condiion, which will be explained shorly). We hen analyze ieraion (2.9) as a special case of an asynchronous sochasic approximaion algorihm where F is eiher a conracion or a monoone nonexpansive mapping (wih respec o he sup-norm) and Q is he unique fixed poin of F. These wo cases of F correspond o he wo differen SSP model assumpions in Theorem 2.1: when all policies of he SSP are proper, F is a weighed sup-norm conracion, whereas when Assumpion 1.1 holds, F is monoone and nonexpansive (see 2.1). The conclusions of Theorem 2.1 for case (i) follow essenially from Tsisilis [8, Theorems 1 and 3] for conracion mappings, whereas Theorem 2.1 in case (ii) follows essenially from Tsisilis [8, Theorem 2] for monoone nonexpansive mappings. A specific echnical deail relaing o he sepsize condiion is worh menioning. To apply he resuls of Tsisilis [8] here, we firs consider, wihou loss of generaliy, he case where all sepsizes are bounded by some deerminisic consan. Theorem 2.1 under his addiional condiion hen follows direcly from Tsisilis [8]; see also Berseas and Tsisilis [3, 4.3]. 1 (We menion ha he echnical use of his addiional sepsize condiion is only o ensure ha he noise erms i u i u R have well-defined condiional expecaions.) We hen remove he addiional sepsize condiion and obain Theorem 2.1 as he immediae consequence, by using a sandard, simple runcaion echnique as follows. For each posiive ineger m, define runcaed sepsizes ˆ m i u = min m i u i u R which are by definiion bounded by m, and consider he sequence ˆQ m generaed by ieraion (2.4) wih ˆQ 0 m = Q 0 and wih ˆ m i u in place of i u. This sequence has he following properies. If he original 1 The sepsize condiion appearing in Tsisilis [8] is slighly differen han condiion (2.8); i is 0 i u 2 < C w.p.1, for some (deerminisic) consan C, insead of C being, and in addiion, i is required ha i u 0 1. However, by srenghening one echnical lemma (Lemma 1) in Tsisilis [8] so ha is conclusions hold under he weaer condiion (2.8), he proof of Tsisilis [8] is essenially inac under he laer condiion. The deails of he analysis can be found in Berseas and Tsisilis [3, Proposiion 4.1 and Example 4.3, pp ] (see also Corrollary 4.1 and herein). A reproducion of he proofs in Tsisilis [8], Berseas and Tsisilis [3] wih sligh modificaions is also available Yu [10].

5 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS 213 sequence Q saisfies Assumpions 2.1 or 2.2, hen so does ˆQ m. Moreover, since he original sepsizes i u 0 i u R, are bounded w.p.1, we have ha for each sample pah from a se of probabiliy one, Q coincides wih ˆQ m for some sufficienly large ineger m. The laer means ha if for each m, ˆQ m converges o Q (or ˆQ m is bounded) w.p.1, hen he same holds for Q. Hence he conclusions of Theorem 2.1 for case (i) are direc consequences of applying he weaer version of he heorem menioned earlier o he sequences ˆQ m for each m. Case (ii) of Theorem 2.1 follows from exacly he same argumen, in view of he fac ha under Assumpion 2.1, if Q is bounded w.p.1, hen ˆQ m is also bounded w.p.1 for each m. [To see his, observe ha by condiion (2.8), he sepsizes in Q and ˆQ m coincide for sufficienly large; more precisely, w.p.1, here exiss some finie (pah-dependen) ime such ha for all and i u R, ˆ m i u = i u 0 1. I hen follows by he definiion of ˆQ m ha Q ˆQ m max Q ˆQ m for all.] So, echnically speaing, Theorem 2.1 wih he general sepsizes is a corollary of is weaer version menioned earlier. 3. Main resuls. We will prove in his secion he following heorem. I furnishes he boundedness condiion required in Tsisilis [8, Theorem 2] (see Theorem 2.1(ii)), and ogeher wih he laer, esablishes compleely he convergence of Q o Q w.p.1. Theorem 3.1. Under Assumpions 1.1 and 2.2, for any given iniial Q 0, he sequence Q generaed by he Q-learning ieraion (2.4) is bounded w.p.1. Our proof consiss of several seps which will be given in separae subsecions. Firs we show ha Q is bounded above w.p.1. This proof is shor and uses he conracion propery of he Bellman operaor F associaed wih a proper policy in SD. A similar idea has been used in earlier wors of Tsisilis [8, Lemma 9] and Berseas and Tsisilis [3, Proposiion 5.6, p. 249] o prove he boundedness of ieraes for cerain nonnegaive SSP models. In he proofs of his secion, for breviy, we will parially suppress he word w.p.1 when he algorihmic condiions are concerned. Whenever a subse of sample pahs wih a cerain propery is considered, i will be implicily assumed o be he inersecion of he se of pahs wih ha propery and he se of pahs ha saisfy he assumpion on he algorihm currenly in effec (e.g., Assumpion 2.1 or 2.2). In he proofs, he noaion a.s. sands for almos sure convergence Boundedness from above. Proposiion 3.1. Under Assumpions 1.1(i) and 2.2, for any given iniial Q 0, he sequence Q generaed by he Q-learning ieraion (2.4) is bounded above w.p.1. Proof. Le be any proper policy in SD ; such a policy exiss by Assumpion 1.1(i). Firs we define ieraes (random variables) ˆQ on he same probabiliy space as he Q-learning ieraes Q. Le ˆQ 0 = Q 0 and ˆQ 0 0 = 0 for 0. For each i u R and 0, le ˆQ +1 i u = 1 i u ˆQ i u + i u ( ( g i u + i u + ˆQ s v i u j iu j iu )) where in he superscrip of s v i u, s is a shorhand for j iu and v is a shorhand for j iu, inroduced o avoid noaional cluer; and i u, j iu and i u, as well as he delayed imes jv i u j v R o, are he same random variables ha appear in he Q-learning algorihm (2.4). The sequence ˆQ is generaed by he Q-learning algorihm (2.4) for he SSP problem ha has he proper policy as is only policy, and involves he mapping F, which is a weighed sup-norm conracion (see 2.1 and he discussion following Theorem 2.1). The sequence ˆQ also saisfies Assumpion 2.2 (since ˆQ and Q involve he same sepsizes, ransiion coss and delayed imes). Therefore, by Theorem 2.1(i), ˆQ is bounded w.p.1. Consider now any sample pah from he se of probabiliy one on which ˆQ is bounded. In view of he sepsize condiion (2.8), here exiss a ime such ha i u 1 for all and i u R. Le = max ( max Q i u ˆQ i u ) i u R Then Q i u ˆQ i u + i u R

6 214 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS We show by inducion ha his relaion also holds for all >. To his end, suppose ha for some, he relaion holds for all. Then, for each i u R, we have ha Q +1 i u 1 i u Q i u + i u ( g i u + i u + Q s v i u s v ) 1 i u ˆQ i u + + i u ( g i u + i u + ˆQ s v i u s v + ) = ˆQ +1 i u + where he firs inequaliy follows from he definiion of Q +1 and he fac i u 0, he second inequaliy follows from he inducion hypohesis and he fac i u 0 1, and he las equaliy follows from he definiion of ˆQ +1. This complees he inducion and shows ha Q is bounded above w.p Boundedness from below for a special case. The proof ha Q is bounded below w.p.1 is long and consiss of several seps o be given in he nex subsecion. For a special case wih nonnegaive expeced one-sage coss, here is a shor proof, which we give here. Togeher wih Proposiion 3.1, i provides a shor proof of he boundedness and hence convergence of he Q-learning ieraes for a class of nonnegaive SSP models saisfying Assumpion 1.1. Earlier wors of Tsisilis [8, Lemma 9] and Berseas and Tsisilis [3, Proposiion 5.6, p. 249] have also considered nonnegaive SSP models and esablished convergence resuls for hem, bu under sronger assumpions han ours. (In paricular, i is assumed here ha all ransiions incur coss ĝ i u j 0, as well as oher condiions, so ha all ieraes are nonnegaive.) To eep he proof simple, we will use Assumpion 2.1, alhough Assumpion 2.2 would also suffice. Proposiion 3.2. Suppose ha g i u 0 for all i u R and moreover, for hose i u wih g i u = 0, every possible ransiion from sae i under conrol u incurs cos 0. Then, under Assumpion 2.1, for any given iniial Q 0, he sequence Q generaed by he Q-learning ieraion (2.4) is bounded below w.p.1. Proof. We wrie Q as he sum of wo processes: for each i u R o, Q i u = g i u + Y i u 0 (3.1) where g 0 0 = g 0 0 = 0 and Y 0 0 = 0 for all, and for each i u R, g +1 i u = 1 i u g i u + i u ( g i u + i u ) Y +1 i u = 1 i u Y i u + i u min Q sv i u s v v U s wih g 0 0, Y 0 = Q 0, and s being a shorhand for j iu (o avoid noaional cluer). Using he condiions (2.6) and (2.8) of he Q-learning algorihm, i follows from he sandard heory of sochasic approximaion (see e.g., Berseas and Tsisilis [3, Proposiion 4.1 and Example 4.3, pp ] or Kushner and Yin [5], Borar [4]) ha g i u a.s. g i u for all i u R. 2 Consider any sample pah from he se of probabiliy one, on which his convergence aes place. Then by Equaion (3.1), on ha sample pah, Q is bounded below if and only if Y is bounded below. Now from he definiion of Y and Equaion (3.1) we have Y +1 i u = 1 i u Y i u + i u min v U s ( g sv i u s v + Y sv i u s v ) (3.2) By condiion (2.7) of he Q-learning algorihm, and in view also of our assumpion on one-sage coss, he convergence g j v a.s. g j v for all j v R implies ha on he sample pah under our consideraion, for all sufficienly large, g jv i u j v 0 j v R o Therefore, using Equaion (3.2) and he fac ha evenually i u 0 1 [cf. Equaion (2.8)], we have ha for all sufficienly large and for all i u R, Y +1 i u 1 i u Y i u + i u min Y sv i u s v v U s min min Y j v j v R o 2 This convergence follows from a basic resul of sochasic approximaion heory (see he aforemenioned references) if besides (2.6) and (2.8), i is assumed in addiion ha he sepsizes are bounded by some (deerminisic) consan. The desired resul hen follows by removing he addiional condiion wih he sepsize runcaion proof echnique described in 2.3. More deails can also be found in Yu [10, Lemma 1]; herein implies he convergence desired here.

7 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS 215 which implies ha for all sufficienly large, min min Y j v min min Y j v +1 j v R o j v R o Hence Y is bounded below on ha sample pah. The proof is complee Boundedness from below in general. In his secion, we will prove he following resul in several seps. Togeher wih Proposiion 3.1 i implies Theorem 3.1. Proposiion 3.3. Under Assumpions 1.1 and 2.2, he sequence Q generaed by he Q-learning ieraion (2.4) is bounded below w.p.1. The proof can be oulined roughly as follows. In we will inroduce an auxiliary sequence Q of a cerain form such ha Q is bounded below w.p.1 if and only if Q is bounded below w.p.1. In and we will give, for any given > 0, a specific consrucion of he sequence Q for each sample pah from a se of probabiliy 1, such ha each Q i u can be inerpreed as he expeced oal cos of some randomized Marov policy for a ime-inhomogeneous SSP problem ha can be viewed as a -perurbaion of he original problem. Finally, o complee he proof, we will show in ha when is sufficienly small, he expeced oal coss achievable in any of hese perurbed SSP problems can be bounded uniformly from below, so ha he auxiliary sequence Q consruced for he corresponding mus be bounded below w.p.1. This hen implies ha he Q-learning ieraes Q mus be bounded below w.p.1. In wha follows, le denoe he se of sample pahs on which he algorihmic condiions in Assumpion 2.2 hold. Noe ha has probabiliy one under Assumpion Auxiliary sequence Q. The firs sep of our proof is a echnically imporan observaion. Le us wrie he Q-learning ieraes given in Equaion (2.4) equivalenly, for all i u R and 0, as where v iu Q +1 i u = 1 i u Q i u + i u ( g i u + i u + Q sv i u j iu v iu ) (3.3) is a conrol ha saisfies v iu arg min v U s Q s v i u j iu v (3.4) and s v in he superscrip of sv i u are shorhand noaion: s sands for he sae j iu, and v now sands for he conrol v iu. We observe he following. Suppose we define an auxiliary sequence Q where Q 0 0 = 0 0 (3.5) and for some nonnegaive ineger 0, and for all i u R, Q +1 i u = 1 i u Q i u + i u ( g i u + i u + Q sv i u j iu v iu ) 0 (3.6) Q i u = Q 0 i u 0 (3.7) Le us consider each sample pah from he se. In view of Equaion (2.8), here exiss 0 0 such ha i u 0 1 for all 0 and i u R. By Equaions (3.3) and (3.6), we hen have ha for all 0 and i u R, Q +1 i u Q +1 i u 1 i u Q i u Q i u which implies + i u Q sv i u j iu v iu max Q Q Q sv i u j iu v iu max Q Q max Q Q (3.8) +1 Therefore, on ha sample pah, Q is bounded below if and only if Q is bounded below. We sae his as a lemma. Lemma 3.1. For any sample pah from he se, and for any values of 0 and Q 0, he Q-learning sequence Q is bounded below if and only if Q given by Equaions (3.5) (3.7) is bounded below. This observaion is he saring poin for he proof of he lower boundedness of Q. We will consruc a sequence Q ha is easier o analyze han Q iself. In paricular, we will choose, for each sample pah from a se of probabiliy one, he ime 0 and he iniial Q 0 in such a way ha he auxiliary sequence Q is endowed wih a special inerpreaion and srucure relaing o perurbed versions of he SSP problem.

8 216 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS Choosing 0 and iniial Q 0 for a sample pah. Firs we inroduce some noaion and definiions o be used hroughou he res of he proof. For a finie se D, le D denoe he se of probabiliy disribuions on D. For p D and x D, le p x denoe he probabiliy of x and supp p denoe he suppor of p, x D p x 0. For p 1 p 2 D, we wrie p 1 p 2 if p 1 is absoluely coninuous wih respec o p 2, ha is, supp p 1 supp p 2. For signed measures p on D, we define he noaion p x and supp p as well as he noion of absolue coninuiy similarly. We denoe by D he se of signed measures p on D such ha x D p x = 1. This se conains he se D. For each i u R o, we define he following. Le p iu o S o correspond o he ransiion probabiliies a i u : p iu o j = p ij u j S o For each > 0, le A i u S o denoe he se of probabiliy disribuions ha are boh in he -neighborhood of p iu o and absoluely coninuous wih respec o piu o, i.e., A i u = { d S o } d j p ij u j S o and d p iu o (In paricular, for i u = 0 0, p 00 o 0 = 1 and A 0 0 = p 00 o.) Le g denoe he vecor of expeced one-sage coss, g i u i u R o. Define B o be he subse of vecors in he -neighborhood of g whose 0 0 h componen is zero: wih c = c i u i u R o, B = { c c 0 0 = 0 and c i u g i u i u R } We now describe how we choose 0 and Q 0 for he auxiliary sequence Q on a cerain se of sample pahs ha has probabiliy one. We sar by defining wo sequences, a sequence g of one-sage cos vecors 3 and a sequence q of collecions of signed measures in S o. They are random sequences defined on he same probabiliy space as he Q-learning ieraes, and hey can be relaed o he empirical one-sage coss and empirical ransiion frequencies on a sample pah. We define he sequence g as follows: for 0, g +1 i u = 1 i u g i u + i u g i u + i u i u R (3.9) g 0 i u = 0 i u R and g 0 0 = 0 0 We define he sequence q as follows. I has as many componens as he size of he se R of sae-conrol pairs. For each i u R, define he componen sequence q iu by leing qiu 0 be any given disribuion in S o wih q iu, and by leing q 0 piu o iu +1 = 1 i u q iu + i u e j iu 0 (3.10) where e j denoes he indicaor of j e j S o wih e j j = 1 for j S o. Since he sepsizes i u may exceed 1, in general q iu S o. Since j iu is a random successor sae of sae i afer applying conrol u [cf. condiion (2.5)], w.p.1, q iu p iu o 0 (3.11) By he sandard heory of sochasic approximaion (see, e.g., Berseas and Tsisilis [3, Proposiion 4.1 and Example 4.3, pp ] or Kushner and Yin [5], Borar [4]; see also Foonoe 2), Equaions (2.6) and (2.8) imply ha whereas Equaions (2.5) and (2.8) imply ha g i u a.s. g i u i u R (3.12) q iu a.s. p iu i u R (3.13) o Equaions (3.13) and (3.11) ogeher imply ha w.p.1, evenually q iu lies in he se S o of probabiliy disribuions. The following is hen eviden, in view also of he sepsize condiion (2.8). Lemma 3.2. Le Assumpion 2.2 hold. Consider any sample pah from he se of probabiliy one of pahs which lie in and on which he convergence in Equaions (3.12), (3.13) aes place. Then for any > 0, here exiss a ime 0 such ha g B q iu A i u i u 1 i u R 0 (3.14) In he res of 3.3, le us consider any sample pah from he se of probabiliy one given in Lemma 3.2. For any given > 0, we choose 0 given in Lemma 3.2 o be he iniial ime of he auxiliary sequence Q. (Noe ha 0 depends on he enire pah and hence so does Q.) 3 The sequence g also appeared in he proof of Proposiion 3.2; for convenience, we repea he definiion here.

9 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS 217 We now define he iniial Q 0. Our definiion and he proof ha follows will involve a saionary randomized policy. Recall ha u i denoes he probabiliy of applying conrol u a sae i under, for u U i i S o. Recall also ha = i S o U i is he conrol space. We now regard i as a disribuion in wih is suppor conained in he feasible conrol se U i [ha is, u i = 0 if u U i ]. To define Q 0, le be a proper randomized saionary policy, which exiss under Assumpion 1.1(i). We define each componen Q 0 i u of Q 0 separaely, and we associae wih Q 0 i u a ime-inhomogeneous Marov chain and ime-varying one-sage cos funcions as follows. For each i u R, consider a ime-inhomogeneous Marov chain i 0 u 0 i 1 u 1 on he space S o wih iniial sae i 0 u 0 = i u, whose probabiliy disribuion is denoed P iu 0 and whose ransiion probabiliies a ime 1 are given by: for all ī ū j v R o, P iu 0 ( i1 = j u 1 = v i 0 = i u 0 = u ) = q iu 0 j v j P iu 0 ( i = j u = v i 1 = ī u 1 = ū ) = pī j ū v j for = 1 for 2 where P iu 0 denoes condiional probabiliy. (The ransiion probabiliies a ī ū R o can be defined arbirarily because regardless of heir values, w.p.1, he chain will never visi such sae-conrol pairs a any ime.) For each i u R, we also define ime-varying one-sage cos funcions g iu 0 R o R, 0, by g iu 0 0 = g 0 for = 0 and g iu 0 = g for 1 We exend g iu 0 o S o by defining is values ouside he domain R o o be +, and we will rea 0 = 0. This convenion will be followed hroughou. We now define Q 0 i u = E Piu 0 [ =0 ] g iu 0 i u i u R (3.15) where E Piu 0 denoes expecaion under P iu 0. The above expecaion is well defined and finie, and furhermore, he order of summaion and expecaion can be exchanged, i.e., Q 0 i u = E Piu 0 g iu 0 i u =0 This follows from he fac ha under P iu 0, from ime 1 onwards, he process i u 1 evolves and incurs coss as in he original SSP problem under he saionary proper policy. In paricular, since is a proper policy, =0 giu 0 i u is finie almos surely wih respec o P iu 0, and hence he summaion =0 giu 0 i u is well defined and also finie P iu 0 -almos surely. Since is a saionary proper policy for a finie sae SSP, we have ha under, from any sae in S, he expeced ime of reaching he sae 0 is finie, and consequenly, E Piu 0 =0 giu 0 i u is also finie. I hen follows from he dominaed convergence heorem ha he wo expressions given above for Q 0 i u are indeed equal Inerpreing Q as coss in cerain ime-inhomogeneous SSP problems. We now show ha wih he preceding choice of 0 and iniial Q 0, each componen of he ieraes Q 0 is equal o, briefly speaing, he expeced oal cos of a randomized Marov policy (represened by iu 1 below) in a ime-inhomogeneous SSP problem whose parameers (ransiion probabiliies and one-sage coss, represened by p iu g iu 0 below) lie in he -neighborhood of hose of he original problem. While he proof of his resul is lenghy, i is mosly a sraighforward verificaion. In he nex, final sep of our analysis, given in 3.3.4, we will, for sufficienly small, lower-bound he coss of hese ime-inhomogeneous SSP problems and hereby lower-bound Q. As in he preceding subsecion, for any probabiliy disribuion P, we wrie P for condiional probabiliy and E P for expecaion under P. Recall also ha he ses A i u where i u R o and he se B, defined in he preceding subsecion, are subses conained in he -neighborhood of he ransiion probabiliy parameers and expeced one-sage cos parameers of he original SSP problem, respecively. Lemma 3.3. Le Assumpions 1.1(i) and 2.2 hold. Consider any sample pah from he se of probabiliy one given in Lemma 3.2. For any > 0, wih 0 and Q 0 given as in for he chosen, he ieraes Q i u defined by Equaions (3.5) (3.7) have he following properies for each i u R and 0: (a) Q i u can be expressed as [ ] Q i u = E Piu g iu i u = E Piu g iu i u =0 =0

10 218 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS for some probabiliy disribuion P iu of a Marov chain i u 0 on S o and one-sage cos funcions g iu R o R, 0 (wih g iu + on S o \R o ). (b) The Marov chain i u 0 in (a) sars from sae i 0 u 0 = i u and is ime-inhomogeneous. Is ransiion probabiliies have he following produc form: for all ī ū j v R o, P iu i 1 = j u 1 = v i 0 = i u 0 = u = p iu 0 j i u iu 1 v j for = 1 P iu i = j u = v i 1 = ī u 1 = ū = p iu 1 j ī ū iu v j for 2 where for all 1 and ī ū R o, j S o, p iu 1 ī ū A ī ū iu and moreover, p iu 0 i u = q iu if 0. (c) The one-sage cos funcions g iu in (a) saisfy j wih supp iu j U j g iu B 0 and moreover, g iu 0 i u = g i u if 0. (d) For he Marov chain in (a), here exiss an ineger such ha i u evolves and incurs coss as in he original SSP problem under he proper policy ; i.e., for, iu ī = ī p iu ī ū = pīū o giu ī ū = g ī ū ī ū R o Proof. The proof is by inducion on. For = 0, Q 0 saisfies properies (a) (d) by is definiion and our choice of he sample pah and 0 (cf. Lemma 3.2). [In paricular, for each i u R, p iu 0 and iu 0 in (a) are given by: for = 0, p iu 0 0 i u = q iu 0, p iu 0 0 ī ū = pīū o, ī ū R o \ i u and for all 1, p iu 0 ī ū = pīū o, ī ū R o, iu 0 = whereas 0 = 1 in (d).] For < 0, since Q = Q 0 by definiion, hey also saisfy (a) (d). So le us assume ha properies (a) (d) are saisfied by all Q, 0, for some 0. We will show ha Q +1 also has hese properies. Consider Q +1 i u for each i u R. To simplify noaion, denoe = i u 0 1 (cf. Lemma 3.2). By Equaion (3.6), Q +1 i u = 1 Q i u + g i u + i u + Q sv i u s v where s = j iu v = v iu sv, and i u. By he inducion hypohesis, Q and Q sv i u can be expressed as in (a), so denoing = sv i u for shor and noicing P iu i 0 = i u 0 = u = 1 by propery (b), we have Q +1 i u = 1 E Piu i u + g i u + i u + E Psv g i u where =0 g iu = 1 g iu 0 i u + g i u + i u { + 1 E P iu g iu i u + E Psv g 1 i 1 u 1 } = =1 C (3.16) 0 C = 1 E Piu C 0 = 1 g iu 0 i u + g i u + i u (3.17) g iu i u + E Psv g 1 i 1 u 1 1 (3.18) Nex we will rewrie each erm C in a desirable form. During his procedure, we will consruc he ransiion probabiliies p iu +1 and iu +1 ha compose he probabiliy disribuion P+1 iu of he ime-inhomogeneous Marov chain for + 1, as well as he one-sage cos funcions g iu +1 required in he lemma. For clariy we divide he res of he proof ino five seps. (1) We consider he erm C 0 in Equaion (3.17) and define he ransiion probabiliies and one-sage coss for = 0 and + 1. By he inducion hypohesis and propery (c), g iu 0 i u = g i u. Using his and he definiion of g [cf. Equaion (3.9)], we have =0 C 0 = 1 g i u + g i u + i u = g +1 i u (3.19)

11 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS 219 Le us define he cos funcion and ransiion probabiliies for = 0 and + 1 by g iu +1 0 = g +1 p iu +1 0 i u = q iu +1 and p iu +1 0 ī ū = pīū o ī ū R o \ i u By Lemma 3.2 and our choice of he sample pah, g +1 B and q+1 iu A i u, so g iu +1 0 and p iu +1 0 saisfy he requiremens in properies (b) and (c). (2) We now consider he erm C in Equaion (3.18), and we inroduce several relaions ha will define he ransiion probabiliies and one-sage coss for 1 and + 1 (he precise definiions will be given in he nex wo seps). Consider each 1. Le P1 i 1 u 1 i under P sv. Le P 3 denoe he convex combinaion of hem: denoe he law of i u i +1 under P iu, and le P 2 P 3 = 1 P 1 + P 2 denoe he law of We regard P1, P 2, P 3 as probabiliy measures on he sample space = S o S o, and we denoe by X Y and Z he funcion ha maps a poin ī ū j o is firs, second, and hird coordinae, respecively. By propery (b) of P iu and P sv from he inducion hypohesis, i is clear ha under eiher P1 or P 2, he possible values of X Y are from he se R o of sae and feasible conrol pairs, so he subse R o S o of has probabiliy 1 under P3. Thus we can wrie C in Equaion (3.18) equivalenly as C = ī S o ( 1 P ū U ī 1 X = ī Y = ū g iu ī ū + P 2 X = ī Y = ū g 1 ī ū ) (3.20) In he nex wo seps, we will inroduce one-sage cos funcions g iu +1 o rewrie Equaion (3.20) equivalenly as P 3 X = ī Y = ū g iu +1 ī ū (3.21) C = ī S o ū U ī We will also define he ransiion probabiliies iu +1 ī and p iu +1 ī ū o express P3 as P 3 X = ī Y = ū = P 3 X = ī iu +1 ū ī (3.22) P 3 X = ī Y = ū Z = j = P 3 X = ī Y = ū p iu +1 j ī ū (3.23) for all ī ū R o and j S o. Noe ha in he above, by he definiion of P 3, P 3 X = ī = 1 Piu i = ī + P sv i 1 = ī ī S o (3.24) (3) We now define he one-sage cos funcions for 1 and + 1. Consider each 1. Define he cos funcion g iu +1 as follows: for each ī ū R o, g iu +1 ī ū = 1 P 1 X = ī Y = ū P3 X = ī Y = ū g iu ī ū + P 2 X = ī Y = ū P3 X = ī Y = ū gsv 1 ī ū (3.25) if P3 X = ī Y = ū > 0, and giu +1 ī ū = g ī ū oherwise. Wih his definiion, i is clear ha C can be expressed as in Equaion (3.21) and his expression is equivalen o he one given in Equaion (3.20). We verify ha g iu +1 saisfies he requiremen in propery (c); ha is, g iu +1 B (3.26) Consider each ī ū R o and discuss wo cases. If P3 X = ī Y = ū = 0, hen giu +1 ī ū g ī ū = 0 by definiion. Suppose P3 X = ī Y = ū > 0. Then by Equaion (3.25), giu +1 ī ū is a convex combinaion of g iu ī ū and g 1 ī ū, whereas giu, g 1 B by he inducion hypohesis (propery (c)). This implies, by he definiion of B, ha g iu +1 ī ū g ī ū for ī ū R and g iu = 0 for ī ū = 0 0. Combining he wo cases, and in view also of he definiion of B, we have ha g iu +1 saisfies Equaion (3.26).

12 220 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS We verify ha g iu +1 saisfies he requiremen in propery (d). By he inducion hypohesis g iu = g for and g 1 = g for + 1, whereas each componen of g iu +1 by definiion eiher equals he corresponding componen of g or is a convex combinaion of he corresponding componens of g iu and g 1. Hence g iu +1 def = g +1 = max + 1 (3.27) (4) We now define he ransiion probabiliies for 1 and + 1. Consider each 1. Define he ransiion probabiliy disribuions iu +1 and p iu +1 as follows: iu +1 ī = P 3 Y = X = ī ī S o (3.28) p iu +1 ī ū = P 3 Z = X = ī Y = ū ī ū R o (3.29) If in he righ-hand sides of Equaions (3.28) (3.29), an even being condiioned upon has probabiliy zero, hen le he corresponding condiional probabiliy (which can be defined arbirarily) be defined according o he following: P 3 Y = X = ī = ī P 3 Z = X = ī Y = ū = pīū o if P 3 X = ī = 0 if P 3 X = ī Y = ū = 0 Wih he above definiions, he equaliies (3.22) and (3.23) desired in sep (2) of he proof clearly hold. We now verify ha iu +1 and p iu +1 saisfy he requiremens in properies (b) and (d). Firs, we show ha p iu +1 saisfies he requiremen in propery (b); ha is, p iu +1 ī ū A ī ū ī ū R o This holds by he definiion of p iu +1 ī ū if P3 X = ī Y = ū = 0, so le us consider he case P3 X = ī Y = ū > 0 for each ī ū R o. By he inducion hypohesis, P iu and P sv saisfy propery (b). Using his and he definiion of P1 and P 2, we have ha for all j S o, which implies P 1 X = ī Y = ū Z = j = P iu i = ī u = ū p iu j ī ū P 2 X = ī Y = ū Z = j = P sv i 1 = ī u 1 = ū p 1 j ī ū P 1 Z = X = ī Y = ū = piu ī ū P 2 Z = X = ī Y = ū = psv 1 ī ū (3.30) and by propery (b) from he inducion hypohesis again, Then, since P 3 = 1 P 1 + P 2 P 1 Z = X = ī Y = ū A ī ū P 2 Z = X = ī Y = ū A ī ū (3.31) wih 0 1, we have P 3 Z = X = ī Y = ū = P 3 X = ī Y = ū Z = P 3 X = ī Y = ū where = 1 ī ū P 1 Z = X = ī Y = ū + ī ū P 2 Z = X = ī Y = ū (3.32) ī ū = P2 X = ī Y = ū P1 X = ī Y = ū + P2 X = ī Y = ū Since he se A ī ū is convex, using he fac ha ī ū 0 1, Equaions (3.31) (3.32) imply ha P 3 Z = X = ī Y = ū A ī ū and herefore, by definiion [cf. Equaion (3.29)], p iu +1 ī ū = P3 Z = X = ī Y = ū A ī ū. We now verify ha p iu +1 saisfies he requiremen in propery (d): for all ī ū R o, p iu +1 ī ū = pīū o +1 = max + 1 (3.33)

13 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS 221 By he inducion hypohesis, propery (d) is saisfied for, and in paricular, for all ī ū R o, p iu ī ū = pīū o for and p ī ū = pīū o for. In view of Equaions (3.30) and (3.32), we have ha if P3 X = ī Y = ū > 0, hen piu +1 ī ū is a convex combinaion of p iu ī ū and p 1 ī ū and hence saisfies Equaion (3.33). Bu if P3 X = ī Y = ū = 0, piu +1 ī ū = pīū o by definiion. Hence Equaion (3.33) holds. We now verify ha iu +1 given by Equaion (3.28) saisfies he requiremens in properies (b) and (d). For each ī S o, iu +1 ī = ī by definiion if P3 X = ī = 0; oherwise, similar o he preceding proof, ī can be expressed as a convex combinaion of iu ī and ī : iu +1 iu +1 ī = 1 P 1 X = ī P3 X = ī iu 1 ī + P 2 X = ī P3 X = ī sv 1 ī where if = 1 and ī = s, we le 0 s denoe he disribuion in ha assigns probabiliy 1 o he conrol v [if = 1 and ī s, hen he second erm above is zero because P sv i 0 = s u 0 = v = 1 by he inducion hypohesis and consequenly, P2 1 X = ī = Psv i 0 = ī = 0]. Combining he wo cases, and using properies (b) and (d) of he inducion hypohesis, we hen have ha supp iu +1 ī U ī for ī S o, and iu +1 ī = ī +1 ī S o (3.34) which are he requiremens for iu +1 in properies (b) and (d). (5) In his las sep of he proof, we define he Marov chain for + 1 and verify he expression for Q +1 i u given in propery (a). Le he ime-inhomogeneous Marov chain i u 0 wih probabiliy disribuion P+1 iu, required in propery (a) for + 1, be as follows. Le he chain sar wih i 0 u 0 = i u, and le is ransiion probabiliies have he produc forms given in propery (b) for + 1, where p iu +1, 0 and iu +1, 1, are he funcions ha we defined in he preceding proof. Also le he ime-varying one-sage cos funcions g iu +1, 0, be as defined earlier. We have shown ha hese ransiion probabiliies and one-sage cos funcions saisfy he requiremens in properies (b) (d). To prove he lemma, wha we sill need o show is ha wih our definiions, he expression given in propery (a) equals Q +1 i u. Firs of all, because our definiions of he ransiion probabiliies and one-sage cos funcions for + 1 saisfy propery (d), hey ensure ha under P+1 iu, i u +1 evolves and incurs coss as in he original SSP problem under he proper saionary policy. Consequenly, E Piu +1 =0 giu +1 i u is well defined and finie, and he order of summaion and expecaion can be exchanged (he reason is he same as he one we gave a he end of for he expression of Q 0 ): E Piu +1 [ =0 ] g iu +1 i u = Hence, o prove propery (a) for + 1, ha is, o show Q +1 i u = g iu +1 0 i u + =0 E Piu +1 g iu +1 i u (3.35) E Piu +1 g iu +1 =1 i u we only need o show, in view of he fac ha Q +1 i u = =0 C [cf. Equaion (3.16)], ha C 0 = g iu +1 0 i u C = E Piu +1 g iu +1 i u 1 (3.36) The firs relaion is rue since by definiion g iu +1 0 i u = g +1 i u = C 0 [cf. Equaion (3.19)]. We now prove he second equaliy for C, 1. For 1, recall ha by Equaion (3.21), C = ī S o ū U ī P 3 X = ī Y = ū g iu +1 ī ū Hence, o prove he desired equaliy for C, i is sufficien o prove ha P iu +1 i = ī u = ū = P 3 X = ī Y = ū ī ū R o (3.37)

14 222 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS By he definiion of P iu +1, Piu +1 u = ū i = ī = iu +1 ū ī for all ī ū R o, so in view of Equaion (3.22), he equaliy (3.37) will be implied if we prove P iu +1 i = ī = P 3 X = ī ī S o (3.38) We verify Equaion (3.38) by inducion on. For = 1, using Equaion (3.24) and propery (b) of P iu P sv, we have ha for every ī S o, P 1 3 X = ī = 1 Piu i 1 = ī + P sv i 0 = ī = 1 p iu 0 ī i u + e s ī = 1 q iu ī + e j iu ī = q iu +1 ī = piu +1 0 ī i u = P iu +1 i 1 = ī where he las hree equaliies follow from he definiion of q+1 iu [cf. Equaion (3.10)], he definiion of piu +1 and he definiion of P+1 iu, respecively. Hence Equaion (3.38) holds for = 1. Suppose Equaion (3.38) holds for some 1. Then, by he definiion of P+1 iu, we have ha for all j S o, P iu +1 i +1 = j = P iu +1 i = ī iu +1 ū ī p iu +1 j ī ū ī S o ū U ī = ī S o ū U ī P 3 X = ī iu +1 = P 3 Z = j = P +1 3 X = j ū ī p iu +1 j ī ū where he second equaliy follows from he inducion hypohesis, he hird equaliy follows from Equaions (3.22) (3.23), and he las equaliy follows from he definiion of P3 +1 and P3. This complees he inducion and proves Equaion (3.38) for all 1, which in urn esablishes Equaion (3.37) for all 1. Consequenly, for all 1, he desired equaliy (3.36) for C holds, and we conclude ha Q +1 i u equals he expressions given in Equaion (3.35). This complees he proof of he lemma Lower boundedness of Q. In and 3.3.3, we have shown ha for each sample pah from a se of probabiliy one, and for each > 0, we can consruc a sequence Q such ha Q i u for each i u R is he expeced oal cos of a randomized Marov policy in an MDP ha has ime-varying ransiion and one-sage cos parameers lying in he -neighborhood of he respecive parameers of he original SSP problem. By Lemma 3.1, herefore, o complee he boundedness proof for he Q-learning ieraes Q, i is sufficien o show ha when is sufficienly small, he expeced oal coss of all policies in all hese neighboring MDPs canno be unbounded from below. The laer can in urn be addressed by considering he following oal cos MDP. I has he same sae space S o wih sae 0 being absorbing and cos-free. For each sae i S, he se of feasible conrols consiss of no only he regular conrols U i, bu also he ransiion probabiliies and one-sage cos funcions. More precisely, he exended conrol se a sae i is defined o be U i = { u p iu i u U i p iu A i u i B i } where B i is a se of one-sage cos funcions a i: wih z = z u u U i, B i = { z z u g i u u U i } and 0, Applying conrol u p iu i a i S, he one-sage cos, denoed by c u i i, is c u i i = i u and he probabiliy of ransiion from sae i o j is p iu j. We refer o his problem as he exended SSP problem. If we can show ha he opimal oal coss of his problem for all iniial saes are finie, hen i will imply ha Q is bounded below because by Lemma 3.3, for each and i u R, Q i u equals he expeced oal cos of some policy in he exended SSP problem for he iniial sae i.

15 Mahemaics of Operaions Research 38(2), pp , 2013 INFORMS 223 The exended SSP problem has a finie number of saes and a compac conrol se for each sae. Is one-sage cos c u i i is a coninuous funcion of he conrol componen u i, whereas is ransiion probabiliies are coninuous funcions of he conrol componen u p iu for each sae i. Wih hese compacness and coninuiy properies, he exended SSP problem falls ino he se of SSP models analyzed in Berseas and Tsisilis [2]. Based on he resuls of Berseas and Tsisilis [2], he opimal oal cos funcion of he exended SSP problem is finie everywhere if Assumpion 1.1 holds in his problem ha is, if he exended SSP problem saisfies he following wo condiions: (i) here exiss a leas one proper deerminisic saionary policy, and (ii) any improper deerminisic saionary policy incurs infinie cos for some iniial sae. Lemma 3.4 (Berseas and Tsisilis [2]). opimal oal cos is finie for every iniial sae. If he exended SSP problem saisfies Assumpion 1.1, hen is The exended SSP problem clearly has a leas one proper deerminisic saionary policy, which is o apply a a sae i S he conrol i p i i o g i, where is a proper policy in he se SD of he original SSP problem (such a policy exiss in view of Assumpion 1.1(i) on he original SSP problem). We now show ha for sufficienly small, any improper deerminisic saionary policy of he exended SSP problem incurs infinie cos for some iniial sae. To his end, le us resric o be no greaer han some 0 > 0, for which p ij u > 0 implies p iu j > 0 for all p iu A i u and i u R; i.e., [Recall ha we also have p iu p iu o p iu o piu p iu A i u i u R 0 (3.39) in view of he definiion of A i u.] To simplify noaion, denoe = A i u i u R Recall he definiion of he se B, which is a subse of vecors in he -neighborhood of he expeced one-sage cos vecor g of he original problem: wih c = c i u i u R o, B = { c c 0 0 = 0 and c i u g i u i u R } Noe ha B = i S o B i, where B 0 = 0 and B i, i S are as defined earlier [for he conrol ses U i of he exended SSP problem]. For each and B, le us call an MDP a perurbed SSP problem wih parameers, if i is he same as he original SSP problem excep ha he ransiion probabiliies and one-sage coss for i u R are given by he respecive componens of and. Consider now a deerminisic and saionary policy of he exended SSP problem, which applies a each sae i some feasible conrol i = i p i i i U i. The regular conrols i ha applies a saes i correspond o a deerminisic saionary policy of he original SSP problem, which we denoe by. Then, by Equaion (3.39), is proper (or improper) in he exended SSP problem if and only if is proper (or improper) in he original SSP problem. This is because by Equaion (3.39), he opology of he ransiion graph of he Marov chain on S o ha induces in he exended SSP problem is he same as ha of he Marov chain induced by in he original SSP problem, regardless of he wo oher conrol componens p i i i of. Therefore, for Assumpion 1.1(ii) o hold in he exended SSP problem, i is sufficien ha any improper policy in SD of he original problem has infinie cos for a leas one iniial sae, in all perurbed SSP problems wih parameers and B [cf. he relaion beween, B and he conrol ses U i ]. The nex lemma shows ha he laer is rue for sufficienly small, hus providing he resul we wan. Lemma 3.5. Suppose he original SSP problem saisfies Assumpion 1.1(ii). Then here exiss 1 0 0, where 0 is as given in Equaion (3.39), such ha for all 1, he following holds: for any improper policy SD of he original problem, here exiss a sae i (depending on ) wih lim inf J i = + B where J is he -sage cos funcion of in he perurbed SSP problem wih parameers. For he proof, we will use a relaion beween he long-run average cos of a saionary policy and he oal cos of ha policy, and we will also use a coninuiy propery of he average cos wih respec o perurbaions of ransiion probabiliies and one-sage coss. The nex wo lemmas sae wo facs ha will be used in our proof.

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details!

Finish reading Chapter 2 of Spivak, rereading earlier sections as necessary. handout and fill in some missing details! MAT 257, Handou 6: Ocober 7-2, 20. I. Assignmen. Finish reading Chaper 2 of Spiva, rereading earlier secions as necessary. handou and fill in some missing deails! II. Higher derivaives. Also, read his

More information

4 Sequences of measurable functions

4 Sequences of measurable functions 4 Sequences of measurable funcions 1. Le (Ω, A, µ) be a measure space (complee, afer a possible applicaion of he compleion heorem). In his chaper we invesigae relaions beween various (nonequivalen) convergences

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Solutions from Chapter 9.1 and 9.2

Solutions from Chapter 9.1 and 9.2 Soluions from Chaper 9 and 92 Secion 9 Problem # This basically boils down o an exercise in he chain rule from calculus We are looking for soluions of he form: u( x) = f( k x c) where k x R 3 and k is

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE Topics MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES 2-6 3. FUNCTION OF A RANDOM VARIABLE 3.2 PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE 3.3 EXPECTATION AND MOMENTS

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu

11!Hí MATHEMATICS : ERDŐS AND ULAM PROC. N. A. S. of decomposiion, properly speaking) conradics he possibiliy of defining a counably addiive real-valu ON EQUATIONS WITH SETS AS UNKNOWNS BY PAUL ERDŐS AND S. ULAM DEPARTMENT OF MATHEMATICS, UNIVERSITY OF COLORADO, BOULDER Communicaed May 27, 1968 We shall presen here a number of resuls in se heory concerning

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

EXERCISES FOR SECTION 1.5

EXERCISES FOR SECTION 1.5 1.5 Exisence and Uniqueness of Soluions 43 20. 1 v c 21. 1 v c 1 2 4 6 8 10 1 2 2 4 6 8 10 Graph of approximae soluion obained using Euler s mehod wih = 0.1. Graph of approximae soluion obained using Euler

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Martingales Stopping Time Processes

Martingales Stopping Time Processes IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765. Volume 11, Issue 1 Ver. II (Jan - Feb. 2015), PP 59-64 www.iosrjournals.org Maringales Sopping Time Processes I. Fulaan Deparmen

More information

Convergence of the Neumann series in higher norms

Convergence of the Neumann series in higher norms Convergence of he Neumann series in higher norms Charles L. Epsein Deparmen of Mahemaics, Universiy of Pennsylvania Version 1.0 Augus 1, 003 Absrac Naural condiions on an operaor A are given so ha he Neumann

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

arxiv: v1 [math.pr] 19 Feb 2011

arxiv: v1 [math.pr] 19 Feb 2011 A NOTE ON FELLER SEMIGROUPS AND RESOLVENTS VADIM KOSTRYKIN, JÜRGEN POTTHOFF, AND ROBERT SCHRADER ABSTRACT. Various equivalen condiions for a semigroup or a resolven generaed by a Markov process o be of

More information

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions

More information

The Strong Law of Large Numbers

The Strong Law of Large Numbers Lecure 9 The Srong Law of Large Numbers Reading: Grimme-Sirzaker 7.2; David Williams Probabiliy wih Maringales 7.2 Furher reading: Grimme-Sirzaker 7.1, 7.3-7.5 Wih he Convergence Theorem (Theorem 54) and

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Ann. Funct. Anal. 2 (2011), no. 2, A nnals of F unctional A nalysis ISSN: (electronic) URL:

Ann. Funct. Anal. 2 (2011), no. 2, A nnals of F unctional A nalysis ISSN: (electronic) URL: Ann. Func. Anal. 2 2011, no. 2, 34 41 A nnals of F uncional A nalysis ISSN: 2008-8752 elecronic URL: www.emis.de/journals/afa/ CLASSIFICAION OF POSIIVE SOLUIONS OF NONLINEAR SYSEMS OF VOLERRA INEGRAL EQUAIONS

More information

Stationary Distribution. Design and Analysis of Algorithms Andrei Bulatov

Stationary Distribution. Design and Analysis of Algorithms Andrei Bulatov Saionary Disribuion Design and Analysis of Algorihms Andrei Bulaov Algorihms Markov Chains 34-2 Classificaion of Saes k By P we denoe he (i,j)-enry of i, j Sae is accessible from sae if 0 for some k 0

More information

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t

Hamilton- J acobi Equation: Explicit Formulas In this lecture we try to apply the method of characteristics to the Hamilton-Jacobi equation: u t M ah 5 2 7 Fall 2 0 0 9 L ecure 1 0 O c. 7, 2 0 0 9 Hamilon- J acobi Equaion: Explici Formulas In his lecure we ry o apply he mehod of characerisics o he Hamilon-Jacobi equaion: u + H D u, x = 0 in R n

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Chapter 2. First Order Scalar Equations

Chapter 2. First Order Scalar Equations Chaper. Firs Order Scalar Equaions We sar our sudy of differenial equaions in he same way he pioneers in his field did. We show paricular echniques o solve paricular ypes of firs order differenial equaions.

More information

Mixing times and hitting times: lecture notes

Mixing times and hitting times: lecture notes Miing imes and hiing imes: lecure noes Yuval Peres Perla Sousi 1 Inroducion Miing imes and hiing imes are among he mos fundamenal noions associaed wih a finie Markov chain. A variey of ools have been developed

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities: Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial

More information

A Note on Superlinear Ambrosetti-Prodi Type Problem in a Ball

A Note on Superlinear Ambrosetti-Prodi Type Problem in a Ball A Noe on Superlinear Ambrosei-Prodi Type Problem in a Ball by P. N. Srikanh 1, Sanjiban Sanra 2 Absrac Using a careful analysis of he Morse Indices of he soluions obained by using he Mounain Pass Theorem

More information

Existence of positive solution for a third-order three-point BVP with sign-changing Green s function

Existence of positive solution for a third-order three-point BVP with sign-changing Green s function Elecronic Journal of Qualiaive Theory of Differenial Equaions 13, No. 3, 1-11; hp://www.mah.u-szeged.hu/ejqde/ Exisence of posiive soluion for a hird-order hree-poin BVP wih sign-changing Green s funcion

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Class Meeting # 10: Introduction to the Wave Equation

Class Meeting # 10: Introduction to the Wave Equation MATH 8.5 COURSE NOTES - CLASS MEETING # 0 8.5 Inroducion o PDEs, Fall 0 Professor: Jared Speck Class Meeing # 0: Inroducion o he Wave Equaion. Wha is he wave equaion? The sandard wave equaion for a funcion

More information

arxiv: v1 [math.fa] 9 Dec 2018

arxiv: v1 [math.fa] 9 Dec 2018 AN INVERSE FUNCTION THEOREM CONVERSE arxiv:1812.03561v1 [mah.fa] 9 Dec 2018 JIMMIE LAWSON Absrac. We esablish he following converse of he well-known inverse funcion heorem. Le g : U V and f : V U be inverse

More information

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature

On Measuring Pro-Poor Growth. 1. On Various Ways of Measuring Pro-Poor Growth: A Short Review of the Literature On Measuring Pro-Poor Growh 1. On Various Ways of Measuring Pro-Poor Growh: A Shor eview of he Lieraure During he pas en years or so here have been various suggesions concerning he way one should check

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Existence Theory of Second Order Random Differential Equations

Existence Theory of Second Order Random Differential Equations Global Journal of Mahemaical Sciences: Theory and Pracical. ISSN 974-32 Volume 4, Number 3 (22), pp. 33-3 Inernaional Research Publicaion House hp://www.irphouse.com Exisence Theory of Second Order Random

More information

6. Stochastic calculus with jump processes

6. Stochastic calculus with jump processes A) Trading sraegies (1/3) Marke wih d asses S = (S 1,, S d ) A rading sraegy can be modelled wih a vecor φ describing he quaniies invesed in each asse a each insan : φ = (φ 1,, φ d ) The value a of a porfolio

More information

Lecture 33: November 29

Lecture 33: November 29 36-705: Inermediae Saisics Fall 2017 Lecurer: Siva Balakrishnan Lecure 33: November 29 Today we will coninue discussing he boosrap, and hen ry o undersand why i works in a simple case. In he las lecure

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem

An Optimal Approximate Dynamic Programming Algorithm for the Lagged Asset Acquisition Problem An Opimal Approximae Dynamic Programming Algorihm for he Lagged Asse Acquisiion Problem Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

A New Perturbative Approach in Nonlinear Singularity Analysis

A New Perturbative Approach in Nonlinear Singularity Analysis Journal of Mahemaics and Saisics 7 (: 49-54, ISSN 549-644 Science Publicaions A New Perurbaive Approach in Nonlinear Singulariy Analysis Ta-Leung Yee Deparmen of Mahemaics and Informaion Technology, The

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Stable approximations of optimal filters

Stable approximations of optimal filters Sable approximaions of opimal filers Joaquin Miguez Deparmen of Signal Theory & Communicaions, Universidad Carlos III de Madrid. E-mail: joaquin.miguez@uc3m.es Join work wih Dan Crisan (Imperial College

More information

POSITIVE SOLUTIONS OF NEUTRAL DELAY DIFFERENTIAL EQUATION

POSITIVE SOLUTIONS OF NEUTRAL DELAY DIFFERENTIAL EQUATION Novi Sad J. Mah. Vol. 32, No. 2, 2002, 95-108 95 POSITIVE SOLUTIONS OF NEUTRAL DELAY DIFFERENTIAL EQUATION Hajnalka Péics 1, János Karsai 2 Absrac. We consider he scalar nonauonomous neural delay differenial

More information

Supplementary Material

Supplementary Material Dynamic Global Games of Regime Change: Learning, Mulipliciy and iming of Aacks Supplemenary Maerial George-Marios Angeleos MI and NBER Chrisian Hellwig UCLA Alessandro Pavan Norhwesern Universiy Ocober

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still.

Lecture 2-1 Kinematics in One Dimension Displacement, Velocity and Acceleration Everything in the world is moving. Nothing stays still. Lecure - Kinemaics in One Dimension Displacemen, Velociy and Acceleraion Everyhing in he world is moving. Nohing says sill. Moion occurs a all scales of he universe, saring from he moion of elecrons in

More information

BOUNDEDNESS OF MAXIMAL FUNCTIONS ON NON-DOUBLING MANIFOLDS WITH ENDS

BOUNDEDNESS OF MAXIMAL FUNCTIONS ON NON-DOUBLING MANIFOLDS WITH ENDS BOUNDEDNESS OF MAXIMAL FUNCTIONS ON NON-DOUBLING MANIFOLDS WITH ENDS XUAN THINH DUONG, JI LI, AND ADAM SIKORA Absrac Le M be a manifold wih ends consruced in [2] and be he Laplace-Belrami operaor on M

More information

Optimal Server Assignment in Multi-Server

Optimal Server Assignment in Multi-Server Opimal Server Assignmen in Muli-Server 1 Queueing Sysems wih Random Conneciviies Hassan Halabian, Suden Member, IEEE, Ioannis Lambadaris, Member, IEEE, arxiv:1112.1178v2 [mah.oc] 21 Jun 2013 Yannis Viniois,

More information

arxiv:math/ v1 [math.nt] 3 Nov 2005

arxiv:math/ v1 [math.nt] 3 Nov 2005 arxiv:mah/0511092v1 [mah.nt] 3 Nov 2005 A NOTE ON S AND THE ZEROS OF THE RIEMANN ZETA-FUNCTION D. A. GOLDSTON AND S. M. GONEK Absrac. Le πs denoe he argumen of he Riemann zea-funcion a he poin 1 + i. Assuming

More information

Clarke s Generalized Gradient and Edalat s L-derivative

Clarke s Generalized Gradient and Edalat s L-derivative 1 21 ISSN 1759-9008 1 Clarke s Generalized Gradien and Edala s L-derivaive PETER HERTLING Absrac: Clarke [2, 3, 4] inroduced a generalized gradien for real-valued Lipschiz coninuous funcions on Banach

More information

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17

Designing Information Devices and Systems I Spring 2019 Lecture Notes Note 17 EES 16A Designing Informaion Devices and Sysems I Spring 019 Lecure Noes Noe 17 17.1 apaciive ouchscreen In he las noe, we saw ha a capacior consiss of wo pieces on conducive maerial separaed by a nonconducive

More information

Families with no matchings of size s

Families with no matchings of size s Families wih no machings of size s Peer Franl Andrey Kupavsii Absrac Le 2, s 2 be posiive inegers. Le be an n-elemen se, n s. Subses of 2 are called families. If F ( ), hen i is called - uniform. Wha is

More information

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q

More information

Let us start with a two dimensional case. We consider a vector ( x,

Let us start with a two dimensional case. We consider a vector ( x, Roaion marices We consider now roaion marices in wo and hree dimensions. We sar wih wo dimensions since wo dimensions are easier han hree o undersand, and one dimension is a lile oo simple. However, our

More information

Optimal approximate dynamic programming algorithms for a general class of storage problems

Optimal approximate dynamic programming algorithms for a general class of storage problems Opimal approximae dynamic programming algorihms for a general class of sorage problems Juliana M. Nascimeno Warren B. Powell Deparmen of Operaions Research and Financial Engineering Princeon Universiy

More information

Boundedness and Exponential Asymptotic Stability in Dynamical Systems with Applications to Nonlinear Differential Equations with Unbounded Terms

Boundedness and Exponential Asymptotic Stability in Dynamical Systems with Applications to Nonlinear Differential Equations with Unbounded Terms Advances in Dynamical Sysems and Applicaions. ISSN 0973-531 Volume Number 1 007, pp. 107 11 Research India Publicaions hp://www.ripublicaion.com/adsa.hm Boundedness and Exponenial Asympoic Sabiliy in Dynamical

More information

On Oscillation of a Generalized Logistic Equation with Several Delays

On Oscillation of a Generalized Logistic Equation with Several Delays Journal of Mahemaical Analysis and Applicaions 253, 389 45 (21) doi:1.16/jmaa.2.714, available online a hp://www.idealibrary.com on On Oscillaion of a Generalized Logisic Equaion wih Several Delays Leonid

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN Inernaional Journal of Scienific & Engineering Research, Volume 4, Issue 10, Ocober-2013 900 FUZZY MEAN RESIDUAL LIFE ORDERING OF FUZZY RANDOM VARIABLES J. EARNEST LAZARUS PIRIYAKUMAR 1, A. YAMUNA 2 1.

More information

Chapter 6. Systems of First Order Linear Differential Equations

Chapter 6. Systems of First Order Linear Differential Equations Chaper 6 Sysems of Firs Order Linear Differenial Equaions We will only discuss firs order sysems However higher order sysems may be made ino firs order sysems by a rick shown below We will have a sligh

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

Final Spring 2007

Final Spring 2007 .615 Final Spring 7 Overview The purpose of he final exam is o calculae he MHD β limi in a high-bea oroidal okamak agains he dangerous n = 1 exernal ballooning-kink mode. Effecively, his corresponds o

More information

An Excursion into Set Theory using a Constructivist Approach

An Excursion into Set Theory using a Constructivist Approach An Excursion ino Se Theory using a Consrucivis Approach Miderm Repor Nihil Pail under supervision of Ksenija Simic Fall 2005 Absrac Consrucive logic is an alernaive o he heory of classical logic ha draws

More information

Approximating positive solutions of nonlinear first order ordinary quadratic differential equations

Approximating positive solutions of nonlinear first order ordinary quadratic differential equations Dhage & Dhage, Cogen Mahemaics (25, 2: 2367 hp://dx.doi.org/.8/233835.25.2367 APPLIED & INTERDISCIPLINARY MATHEMATICS RESEARCH ARTICLE Approximaing posiive soluions of nonlinear firs order ordinary quadraic

More information

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems.

di Bernardo, M. (1995). A purely adaptive controller to synchronize and control chaotic systems. di ernardo, M. (995). A purely adapive conroller o synchronize and conrol chaoic sysems. hps://doi.org/.6/375-96(96)8-x Early version, also known as pre-prin Link o published version (if available):.6/375-96(96)8-x

More information

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F. Trench. SIAM J. Matrix Anal. Appl. 11 (1990),

SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F. Trench. SIAM J. Matrix Anal. Appl. 11 (1990), SPECTRAL EVOLUTION OF A ONE PARAMETER EXTENSION OF A REAL SYMMETRIC TOEPLITZ MATRIX* William F Trench SIAM J Marix Anal Appl 11 (1990), 601-611 Absrac Le T n = ( i j ) n i,j=1 (n 3) be a real symmeric

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

arxiv: v1 [math.pr] 28 Nov 2016

arxiv: v1 [math.pr] 28 Nov 2016 Backward Sochasic Differenial Equaions wih Nonmarkovian Singular Terminal Values Ali Devin Sezer, Thomas Kruse, Alexandre Popier Ocober 15, 2018 arxiv:1611.09022v1 mah.pr 28 Nov 2016 Absrac We solve a

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

Quasi-sure Stochastic Analysis through Aggregation

Quasi-sure Stochastic Analysis through Aggregation E l e c r o n i c J o u r n a l o f P r o b a b i l i y Vol. 16 (211), Paper no. 67, pages 1844 1879. Journal URL hp://www.mah.washingon.edu/~ejpecp/ Quasi-sure Sochasic Analysis hrough Aggregaion H. Mee

More information

4. Advanced Stability Theory

4. Advanced Stability Theory Applied Nonlinear Conrol Nguyen an ien - 4 4 Advanced Sabiliy heory he objecive of his chaper is o presen sabiliy analysis for non-auonomous sysems 41 Conceps of Sabiliy for Non-Auonomous Sysems Equilibrium

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE

INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE INDEPENDENT SETS IN GRAPHS WITH GIVEN MINIMUM DEGREE JAMES ALEXANDER, JONATHAN CUTLER, AND TIM MINK Absrac The enumeraion of independen ses in graphs wih various resricions has been a opic of much ineres

More information

Oscillation of an Euler Cauchy Dynamic Equation S. Huff, G. Olumolode, N. Pennington, and A. Peterson

Oscillation of an Euler Cauchy Dynamic Equation S. Huff, G. Olumolode, N. Pennington, and A. Peterson PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DYNAMICAL SYSTEMS AND DIFFERENTIAL EQUATIONS May 4 7, 00, Wilmingon, NC, USA pp 0 Oscillaion of an Euler Cauchy Dynamic Equaion S Huff, G Olumolode,

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

The Arcsine Distribution

The Arcsine Distribution The Arcsine Disribuion Chris H. Rycrof Ocober 6, 006 A common heme of he class has been ha he saisics of single walker are ofen very differen from hose of an ensemble of walkers. On he firs homework, we

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

Almost Sure Degrees of Truth and Finite Model Theory of Łukasiewicz Fuzzy Logic

Almost Sure Degrees of Truth and Finite Model Theory of Łukasiewicz Fuzzy Logic Almos Sure Degrees of Truh and Finie odel Theory of Łukasiewicz Fuzzy Logic Rober Kosik Insiue of Informaion Business, Vienna Universiy of Economics and Business Adminisraion, Wirschafsuniversiä Wien,

More information

Longest Common Prefixes

Longest Common Prefixes Longes Common Prefixes The sandard ordering for srings is he lexicographical order. I is induced by an order over he alphabe. We will use he same symbols (,

More information

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED

0.1 MAXIMUM LIKELIHOOD ESTIMATION EXPLAINED 0.1 MAXIMUM LIKELIHOOD ESTIMATIO EXPLAIED Maximum likelihood esimaion is a bes-fi saisical mehod for he esimaion of he values of he parameers of a sysem, based on a se of observaions of a random variable

More information

Approximation Algorithms for Unique Games via Orthogonal Separators

Approximation Algorithms for Unique Games via Orthogonal Separators Approximaion Algorihms for Unique Games via Orhogonal Separaors Lecure noes by Konsanin Makarychev. Lecure noes are based on he papers [CMM06a, CMM06b, LM4]. Unique Games In hese lecure noes, we define

More information