arxiv: v1 [math.oc] 11 Sep 2017

Size: px
Start display at page:

Download "arxiv: v1 [math.oc] 11 Sep 2017"

Transcription

1 Online Learning in Weakly Coupled Markov Decision Processes: A Convergence ime Sudy Xiaohan Wei, Hao Yu and Michael J. Neely arxiv: v [mah.oc] Sep 07. Inroducion Absrac: We consider muliple parallel Markov decision processes MDPs coupled by global consrains, where he ime varying objecive and consrain funcions can only be observed afer he decision is made. Special aenion is given o how well he decision maker can perform in slos, saring from any sae, compared o he bes feasible randomized saionary policy in hindsigh. We develop a new disribued online algorihm where each MDP makes is own decision each slo afer observing a muliplier compued from pas informaion. While he scenario is significanly more challenging han he classical online learning conex, he algorihm is shown o have a igh O regre and consrain violaions simulaneously. o obain such a bound, we combine several new ingrediens including ergodiciy and mixing ime bound in weakly coupled MDPs, a new regre analysis for online consrained opimizaion, a drif analysis for queue processes, and a perurbaion analysis based on Farkas Lemma. Keywords and phrases: Sochasic programming, Consrained programming, Markov decision processes. his paper considers online consrained Markov decision processes OCMDP where boh he objecive and consrain funcions can vary each ime slo afer he decision is made. We assume a sloed ime scenario wih ime slos {0,,,...}. he OCMDP consiss of K parallel Markov decision processes wih indices k {,,..., K}. he k-h MDP has sae space S k, acion space A k, and ransiion probabiliy marix P a k which depends on he chosen acion a A k. Specifically, P a k = P a k s, s where where s k P a k s, s = P r s k + = s k s = s, a k = a and a k are he sae and acion for sysem k on slo. We assume ha boh he sae space and he acion space are finie for all k {,,, K}. Afer each MDP k {,..., K} makes he decision a ime and assuming he curren sae is s k = s and he acion is a k. he nex sae s k +.. A penaly funcion f k 3. A collecion of m consrain funcions g k,, = a, he following informaion is revealed: s, a ha depends on he curren sae s and he curren acion a. s, a,..., gk m, s, a ha depend on s and a. he funcions f k and g k i, are all bounded mappings from S k A k o R and represen differen ypes of coss incurred by sysem k on slo depending on he curren sae and acion. For example, in a muli-server daa cener, he differen sysems k {,..., K} can represen differen servers, he cos funcion for a paricular server k migh represen energy or moneary expendiure for ha server, and he consrain coss for server k can represen Deparmen of lecrical ngineering, Universiy of Souhern California

2 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs negaive rewards such as service raes or qualiies. Coupling beween he server sysems comes from using all of hem o collecively suppor a common sream of arriving jobs. A key aspec of his general problem is ha he funcions f k and g k i, are unknown unil afer he slo decision is made. hus, he precise coss incurred by each sysem are only known a he end of he slo. For a fixed ime horizon of slos, he overall penaly and consrain accumulaion resuling from a policy P is: F d 0, P := f k a k, s k d 0, P, and G i, d 0, P := = = g k i, a k, s k d 0, P, where d 0 represens a given disribuion on he iniial join sae vecor s 0,, sk 0. Noe ha a k, s k denoes he sae-acion pair of he kh MDP, which is a pair of random variables deermined by d 0 and P. Define a consrain se G := {P, d 0 : G i, d 0, P 0, i =,,, m}. Define he regre of a policy P wih respec o a paricular join randomized saionary policy Π along wih an arbirary saring sae disribuion d 0 as: F d 0, P F d 0, Π, he goal of OCMDP is o choose a policy P so ha boh he regre and consrain violaions grow sublinearly wih respec o, where regre is measured agains all feasible join randomized saionary policies Π... A moivaing example As an example, consider a daa cener wih a cenral conroller and K servers see Fig.. Jobs arrive randomly and are sored in a queue o awai service. he sysem operaes in sloed ime {0,,,...} and each server k {,..., K} is modeled as a 3-sae MDP wih saes acive, idle, and seup: Acive: In his sae he server is available o serve jobs. Server k incurs a ime varying elecriciy cos on every acive slo, regardless of wheher or no here are jobs o serve. I has a conrol opion o say acive or ransiion o he idle sae. Idle: In his sae no jobs can be served. his sae has muliple sleep modes as conrol opions, each wih differen per-slo coss and seup imes required for ransiioning from idle o acive. Seup: his is a ransiion sae beween idle and acive. No jobs can be served and here are no conrol opions. he seup coss and duraions are possibly consan random variables depending on he preceding chosen sleep mode. he goal is o minimize he overall elecriciy cos subjec o sabilizing he job queue. In a ypical daa cener scenario, he performance of each server on a given slo is governed by he curren elecriciy price and he service rae under each decision, boh of which can be ime varying and unknown o he server beforehand. his problem is challenging because:

3 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 3 If one server is currenly in a seup sae, i has zero service rae and canno make anoher decision unil i reaches he acive sae which ypically akes more han one slo, whereas oher acive servers can make decisions during his ime. hus, servers are acing asynchronously. he elecriciy price exhibis variaion across ime, locaion, and uiliy providers. Is behavior is irregular and can be difficul o predic. As an example, Fig. plos he average per 5 minue spo marke price beween 05/0/07 and 05/0/07 a New York zone CNRL []. Servers in differen locaions can have differen price offerings, and his piles up he uncerainy across he whole sysem. Despie hese difficulies, his problem fis ino he formulaion of his paper: he elecriciy price acs as he global penaly funcion, and sabiliy of he queue can be reaed as a global consrain ha he expeced oal number of arrivals is less han he expeced service rae. Fig. Illusraion of a daa cener server scheduling model. 450 lecriciy marke price Price dollar/mwh Number of slos each 5 min Fig. A ypical race of elecriciy marke price. A review on daa server provision can be found in [] and references herein. Prior daa cener analysis ofen assumes he sysem has up-o-dae informaion on service raes and elecriciy coss see, for example, [3],[4]. On he oher hand, work ha reas oudaed informaion such as [5], [6] generally does no consider he poenial Markov srucure of he problem. he curren paper reas he Markov srucure of he problem and allows rae and price informaion o be unknown and oudaed.

4 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 4.. Relaed work Online convex opimizaion OCO: his concerns muli-round cos minimizaion wih arbirarily-varying convex loss funcions. Specifically, on each slo he decision maker chooses decisions x wihin a convex se X before observing he loss funcion f x in order o minimize he oal regre compared o he bes fixed decision in hindsigh, expressed as: regre = = f x min x X f x. See [7] for an inroducion o OCO. Zinkevich inroduced OCO in [8] and shows ha an online projecion gradien descen OGD algorihm achieves O regre. his O regre is proven o be he bes in [9], alhough improved performance is possible if all convex loss funcions are srongly convex. he OGD decision requires o compue a projecion of a vecor ono a se X. For complicaed ses X wih funcional equaliy consrains, e.g., X = {x X 0 : g k x 0, k {,,..., m}}, he projecion can have high complexiy. o circumven he projecion, work in [0,,, 3] proposes alernaive algorihms wih simpler per-slo complexiy and ha saisfy he inequaliy consrains in he long erm raher han on every slo. Recenly, new primal-dual ype algorihms wih low complexiy are proposed in [4, 5] o solve more challenging OCO wih ime-varying funcional inequaliy consrains. Online Markov decision processes: his exends OCO o allow sysems wih a more complex Markov srucure. his is similar o he seup of he curren paper of minimizing he expression, bu does no have he consrain se. Unlike radiional OCO, he curren penaly depends no only on he curren acion and he curren unknown penaly funcion, bu on he curren sysem sae which depends on he hisory of previous acions. Furher, he number of policies can grow exponenially wih he sizes of he sae and acion spaces, so ha soluions can be compuaionally inensive. he work [6] develops an algorihm in his conex wih O regre. xended algorihms and regularizaion mehods are developed in [7][8][9] o reduce complexiy and improve dependencies on he number of saes and acions. Online MDP under bandi feedback where he decision maker can only observe he penaly corresponding o he chosen acion is considered in [7][0]. Consrained MDPs: his aims o solve classical MDP problems wih known cos funcions bu subjec o addiional consrains on he budge or resources. Linear programming mehods for MDPs are found, for example, in [], and algorihms beyond LP are found in [] [3]. Formulaions closes o our seup appear in recen work on weakly coupled MDPs in [4][5] ha have known cos and resource funcions. Reinforcemen Learning RL: his concerns MDPs wih some unknown parameers such as unknown funcions and ransiion probabiliies. ypically, RL makes sronger assumpions han he online seing, such as an environmen ha is unknown bu fixed, whereas he unknown environmen in he online conex can change over ime. Mehods for RL are developed in [6][7][8][9]..3. Our conribuions he curren paper proposes a new framework for online MDPs wih ime varying consrains. Furher, i considers muliple MDP sysems ha are weakly coupled. While he scenario is =

5 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 5 significanly more challenging han he original Zinkevich OGD conex as well as oher classical online learning scenarios, he algorihm is shown o achieve igh O regre in boh he objecive funcion and he consrains, which ies he opimal O regre for hose simpler unconsrained OCO problems. Along he way, we show he bound grows polynomially wih he number of MDPs and linearly wih respec o he number of saes and acions in each MDP heorem 5... Preliminaries.. Basic Definiions hroughou his paper, given an MDP wih sae space S and acion space A, a policy P defines a possibly probabilisic mehod of choosing acions a A a sae s S based on he pas informaion. We sar wih some basic definiions of imporan classes of policies: Definiion.. For an MDP, a randomized saionary policy π defines an algorihm which, whenever he sysem is in sae s S, chooses an acion a A according o a fixed condiional probabiliy funcion πa s, defined for all a A and s S. Definiion.. For an MDP, a pure policy π is a randomized saionary policy wih all probabiliies equal o eiher 0 or. ha is, a pure policy is defined by a deerminisic mapping beween saes s S and acions a A. Whenever he sysem is in a sae s S, i always chooses a paricular acion a s A wih probabiliy. Noe ha if an MDP has a finie sae and acion space, he se of all pure policies is also finie. Consider he MDP associaed wih a paricular sysem k {,..., K}. For any randomized saionary policy π, i holds ha a A k πa s = for all s Sk. Define he ransiion probabiliy marix P k π under policy π o have componens as follows: P π k s, s = a s, s, s, s S k. 3 a A k πa sp k I is easy o verify ha P k π is indeed a sochasic marix, ha is, i has rows wih nonnegaive componens ha sum o. Le d k 0 [0, ] Sk be an arbirary iniial disribuion for he k-h MDP. Define he sae disribuion a ime under π as d k. By he Markov propery of he sysem, we have d k π, = dk 0 P k π π is ergodic if i gives rise o a Markov chain ha is irreducible and aperiodic. Since he sae space is finie, an ergodic marix P k π probabiliy vecor solving d = dp k π,. A ransiion probabiliy marix P k has a unique saionary disribuion denoed d k π π., so ha d k is he unique Assumpion. Unichain model. here exiss a universal ineger r such ha for any ineger r r and every k {,..., K}, we have he produc P π k P π k P k π r is a ransiion marix wih sricly posiive enries for any sequence of pure policies π, π,, π r associaed wih he kh MDP. Remark.. Assumpion. implies ha each MDP k {,..., K} is ergodic under any pure policy. his follows by aking π, π,, π r all he same in Assumpion.. Since he ransiion marix of any randomized saionary policy can be formed as a convex combinaion of hose of For any se S, we use S o denoe he cardinaliy of he se. π

6 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 6 pure policies, any randomized saionary policy resuls in an ergodic MDP for which here is a unique saionary disribuion. Assumpion. is easy o check via he following simple sufficien condiion. Proposiion.. Assumpion. holds if, for every k {,..., K}, here is a fixed ergodic marix P k i.e., a ransiion probabiliy marix ha defines an irreducible and aperiodic Markov chain such ha for any pure policy π on MDP k we have he decomposiion P k π = δ π P k + δ π Q k π, where δ π 0, ] depends on he pure policy π and Q k π is a sochasic marix depending on π. Proof. Fix k {,..., K} and assume every pure policy on MDP k has he above decomposiion. Since here are only finiely many pure policies, here exiss a lower bound δ min > 0 such ha δ π δ min for every pure policy π. Since P k is an ergodic marix, here exiss an ineger r k > 0 large enough such ha P k r has sricly posiive componens for all r r k. Fix r r k and le π,..., π r be any sequence of r pure policies on MDP k. hen P k π P π k r δ min P k r > 0 he universal ineger ˆr can be aken as he maximum ineger r k over all k {,..., K}. Definiion.3. A join randomized saionary policy Π on K parallel MDPs defines an algorihm which chooses a join acion a := a, a,, a K A A A K given he join sae s := s, s,,, s K S S S K according o a fixed condiional probabiliy Π a s. he following special class of separable policies can be implemened separaely over each of he K MDPs and plays a role in boh algorihm design and performance analysis. Definiion.4. A join randomized saionary policy π is separable if he condiional probabiliies π := π, π,, π K decompose as a produc π a s = K π k a k s k for all a A A K, s S S K... echnical assumpions he funcions f k and g k i, are deermined by random processes defined over = 0,,,. Specifically, le Ω be a finie dimensional vecor space. Le {ω } =0 and {µ } =0 be wo sequences of random vecors in Ω. hen for all a A k, s S k, i {,,, m} we have g k i, g k i, a, s = ĝk i a, s, ω, f k a, s = ˆf k a, s, µ where ĝ k i and ˆf k formally define he ime-varying funcions in erms of he random processes ω and µ. I is assumed ha he processes {ω } =0 and {µ } =0 are generaed a he sar of slo 0 before any conrol acions are aken, and revealed gradually over ime, so ha funcions and f k are only revealed a he end of slo.

7 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 7 Remark.. he funcions generaed in his way are also called oblivious funcions. Such an assumpion is commonly adoped in previous unconsrained online MDP works e.g. [6], [7] and [9]. Furher, i is also shown in [7] ha wihou his assumpion, one can choose a sequence of objecive funcions agains he decision maker in a specifically designed MDP scenario so ha one never achieves he sublinear regre. he funcions are also assumed o be bounded by a universal consan Ψ, so ha: ĝ k i a, s, ω Ψ, ˆf k a, s, µ Ψ, k {,..., K}, a A k, s S k, ω, µ Ω. 4 I is assumed ha {ω } =0 is independen, idenically disribued i.i.d. and independen of {µ } =0. Hence, he consrain funcions can be arbirarily correlaed on he same slo, bu appear i.i.d. over differen slos. On he oher hand, no specific model is imposed on {µ } =0. hus, he funcions f k can be arbirarily ime varying. Le H be he sysem informaion up o ime, hen, for any {0,,, }, H conains sae and acion informaion up o ime, i.e. s 0,, s, a 0,, a, and {ω } =0 and {µ } =0. hroughou his paper, we make he following assumpions. Assumpion. Independen ransiion. For each MDP, given he sae s k acion a k A k, he nex sae s k + as well as he sae ransiion s j P r s k + = s H, s j +, j k = P r where H conains all pas informaion up o ime. S k and is independen of all oher pas informaion up o ime +, j k, i.e., for all s Sk i holds ha s k, a k + = s sk Inuiively, his assumpion means ha all MDPs are running independenly in he join probabiliy space and hus he only coupling among hem comes from he consrains, which reflecs he noion of weakly coupled MDPs in our ile. Furhermore, by definiion of H, given s k, a k, he nex ransiion s k + is also independen of funcion pahs {ω } =0 and {µ } =0. he following assumpion saes he consrain se is sricly feasible. Assumpion.3 Slaer s condiion. here exiss a real value η > 0 and a fixed separable randomized saionary policy π such ha [ K g k i, a k ], s k d π, π η, i {,,, m}, where he iniial sae is d π and is he unique saionary disribuion of policy π, and he expecaion is aken wih respec o he random iniial sae and he sochasic funcion g k i, a, s i.e., w. Slaer s condiion is a common assumpion in convergence ime analysis of consrained convex opimizaion e.g. [30], [3]. Noe ha his assumpion readily implies he consrain se G can be achieved by he above randomized saionary policy. Specifically, ake d k 0 = d π k and P = π, hen, we have [ K G i, d 0, π = =0 g k i, a k ], s k d π, π η < 0.

8 .3. he sae-acion polyhedron X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 8 In his secion, we recall he well-known linear program formulaion of an MDP see, for example, [] and [3]. Consider an MDP wih a sae space S and an acion space A. Le R S A be a probabiliy simplex, i.e. = θ R S A : θs, a =, θs, a 0. s,a S A Given a randomized saionary policy π wih saionary sae disribuion d π, he MDP is a Markov chain wih ransiion marix P π given by 3. hus, i mus saisfy he following balance equaion: d π sp π s, s = d π s, s S. s S Defining θa, s = πa sd π s and subsiuing he definiion of ransiion probabiliy 3 ino he above equaion gives θs, a, s S. s S a A θs, ap a s, s = a A he variable θa, s is ofen inerpreed as a saionary probabiliy of being a sae s S and aking acion a A under some randomized saionary policy. he sae acion polyhedron Θ is hen defined as { } Θ := θ : θs, a, s S. s S a A θs, ap a s, s = a A Given any θ Θ, one can recover a randomized saionary policy π a any sae s S as πa s = { θa,s a A θa,s, if a A θa, s 0, 0, oherwise. 5 Given any fixed penaly funcion fa, s, he bes policy minimizing he penaly wihou consrain is a randomized saionary policy given by he soluion o he following linear program LP: min f, θ, s.. θ Θ. 6 where f := [fa, s] a A, s S. Noe ha for any policy π given by he sae-acion pair θ according o 5, f, θ = s dπ,a π s [fa, s], hus, f, θ is ofen referred o as he saionary sae penaly of policy π. I can also be shown ha any sae-acion pair in he se Θ can be achieved by a convex combinaion of sae-acion vecors of pure policies, and hus all corner poins of he polyhedron Θ are from pure policies. As a consequence, he bes randomized saionary policy solving 6 is always a pure policy.

9 .4. Preliminary resuls on MDPs X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 9 In his secion, we give preliminary resuls regarding he properies of our weakly coupled MDPs under randomized saionary policies. he proofs can be found in he appendix. We sar wih a lemma on he uniform mixing of MDPs. Lemma.. Suppose Assumpion. and. hold. here exiss a posiive ineger r and a consan τ such ha for any wo sae disribuions d and d, sup d k d k P k P k e /τ d k d k, k {,,, K} π k,,πk r π k P k π k π r k where { he supremum } is aken wih respec o any sequence of r randomized saionary policies π k,, πk r. Lemma.. Suppose Assumpion. and. hold. Consider he produc MDP wih produc sae space S S K and acion space A A K. hen, he following hold:. he produc MDP is irreducible and aperiodic under any join randomized saionary policy.. he saionary sae-acion probabiliy {θ k } K of any join randomized saionary policy saisfies θ k Θ k, k {,,, K}. An immediae conclusion we can draw from his lemma is ha given any penaly and consrain funcions f k and g k i, k =,,, K, he saionary penaly and consrain value of any join randomized saionary policy can be expressed as f k, θ k, g k i, θ k, i =,,, m, wih θ k Θ k. his in urn implies such saionary sae-acion probabiliies {θ k } K can also be realized via a separable randomized saionary policy π wih π k a s = θ k a, s a A k θk a, s, a Ak, s S k, 7 and he corresponding saionary penaly and consrain value can also be achieved via his policy. his fac suggess ha when considering he saionary sae performance only, he class of separable randomized saionary policies is large enough o cover all possible saionary penaly and consrain values. In paricular, le π = π,, π K be he separable randomized saionary policy associaed wih he Slaer condiion Assumpion.3. Using he fac ha he consrain funcions g k i,, k =,,, K i.e. w are i.i.d.and Assumpion. on independence of probabiliy ransiions, we have he consrain funcions g k i, and he sae-acion pairs a any ime are muuallly independen. hus, [ K g k i, a k ], s k d π, π = where θ k corresponds o π according o 7. g k i,, θ k,

10 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 0 hen, Slaer s condiion can be ranslaed o he following: here exiss a sequence of saeacion probabiliies { θ k } K from a separable randomized saionary policy such ha θ k Θ k, k, and g k i,, θ k η, i =,,, m, 8 he assumpion on separabiliy does no lose generaliy in he sense ha if here is no separable randomized saionary policy ha saisfies 8, hen, here is no join randomized saionary policy ha saisfies 8 eiher..5. he blessing of slow-updae propery in online MDPs he curren sae of an MDP depends on previous saes and acions. As a consequence, he slo penaly no only depends on he curren penaly funcion and curren acion, bu also on he sysem hisory. his complicaion does no arise in classical online convex opimizaion [7],[8] as here is no noion of sae and he slo penaly depends only on he slo penaly funcion and acion. Now imagine a virual sysem where, on each slo, a policy π is chosen raher han an acion. Furher imagine he MDP immediaely reaching is corresponding saionary disribuion d π. hen he saes and acions on previous slos do no maer and he slo performance depends only on he chosen policy π and on he curren penaly and consrain funcions. his imaginary sysem now has a srucure similar o classical online convex opimizaion as in he Zinkevich scenario [8]. A key feaure of online convex opimizaion algorihms as in [8] is ha hey updae heir decision variables slowly. For a fixed ime scale over which O regre is desired, he decision variables are ypically changed no more han a disance O/ from one slo o he nex. An imporan insigh in prior unconsrained MDP workse.g. [6], [7] and [9] is ha such slow updaes also guaranee he approximae convergence of an MDP o is saionary disribuion. As a consequence, one can design he decision policies under he imaginary assumpion ha he sysem insanly reaches is saionary disribuion, and laer bound he error beween he rue sysem and he imaginary sysem. If he error is on he same order as he desired O regre, hen his approach works. his idea serves as a cornersone of our algorihm design of he nex secion, which reas he case of muliple weakly coupled sysems wih boh objecive funcions and consrain funcions. 3. OCMDP algorihm Our proposed algorihm is disribued in he sense ha each ime slo, each MDP solves is own subproblem and he consrain violaions are conrolled by a simple updae of global mulipliers called virual queues a he end of each slo. Le Θ, Θ,, Θ K be sae-acion polyhedron of K MDPs, respecively. Le θ k Θ k be a sae-acion vecor a ime slo. A = 0, each MDP chooses is iniial sae-acion vecor θ k 0 resuling from any separable randomized saionary policy π k 0. For example, one could choose a uniform policy πk a s = / A k, s S k, solve he equaion d k π = d k 0 π P k o ge a probabiliy vecor d 0 π k k π, and 0 0 obain θ k 0 a, s = d π k s/ A k. For each consrain i {,,, m}, le Q i be a virual 0 queue defined over slos = 0,,, wih he iniial condiion Q i 0 = Q i = 0, and updae

11 equaion: Q i + = max X. Wei, H. Yu, M. J. Neely/Online consrained MDPs { Q i + } g k i,, θ, 0, {,, 3, }. 9 Our algorihm uses wo parameers V > 0 and α > 0 and makes decisions as follows: A he sar of each slo {,, 3, }, he k-h MDP observes Q i, i =,,, m and chooses θ k o solve he following subproblem: θ k = argmin θ Θ k V f k m + Q i g k i,, θ + α θ θ k. 0 Consruc he randomized saionary policy π k according o 5 wih θ = θ k, and choose. he acion a k Updae he virual queue Q i according o 9 for all i =,,, m. a k-h MDP according o he condiional disribuion π k Noe ha for any slo, his algorihm gives a separable randomized saionary policy, so ha each MDP chooses is own policy based on is own funcion f k, gk i,, i {,,, m}, and a common muliplier Q := Q,, Q m. he nex lemma shows ha solving 0 is in fac a projecion ono he sae-acion polyhedron. For any se X R n and a vecor y R n, define he projecion operaor P X y as P X y = arginf x X x y. Lemma 3.. Fix an α > 0 and {,, 3, }. he θ ha solves 0 is where w k θ k = P Θ k θ k wk = V f k + m Q ig k i, R Ak S k. Proof. By definiion, we have θ k =argmin θ Θ k w k, θ + α θ θ k =argmin θ Θ k w k, θ θ k + α θ θ k + w k =argmin θ Θ k α α, θ θk + θ θ k =argmin θ Θ k α θ θk + wk = P α Θ k θ k finishing he proof. α, w k, θ k + wk α w k s k, θ k, 3.. Inuiion of he algorihm and roadmap of analysis he inuiion of his algorihm follows from he discussion in Secion.5. Insead of he Markovian regre and consrain se, we work on he imaginary sysem ha afer he decision maker chooses any join policy Π and he penaly/consrain funcions are revealed, he K

12 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs parallel Markov chains reach saionary sae disribuion righ away, wih sae-acion probabiliy vecors for K parallel MDPs. hus here is no Markov sae in such a { } K sysem θ k anymore and he corresponding saionary penaly and consrain funcion value a ime can be expressed as K f k, θ k and K g k i,, θk, i =,,, m, respecively. As a consequence, we are now facing a relaively easier ask of minimizing he following regre: { where θ k } K =0 f k, θ k =0 f k, θ k, are he sae-acion probabiliies corresponding o he bes fixed join randomized saionary policy wihin he following saionary consrain se { } G := θ k Θ k, k {,,, K} :, θ k 0, i =,,, m, wih he assumpion ha Slaer s condiion 8 holds. o analyze he proposed algorihm, we need o ackle he following wo major challenges: Wheher or no he policy decision of he proposed algorihm would yield O regre and consrain violaion on he imaginary sysem ha reaches seady sae insananeously on each slo. Wheher he error beween he imaginary and rue sysems can be bounded by O. In he nex secion, we answer hese quesions via a muli-sage analysis piecing ogeher he resuls of MDPs from Secion.4 wih muliple ingrediens from convex analysis and sochasic queue analysis. We firs show he O regre and consrain violaion in he imaginary online linear program incorporaing a new regre analysis procedure wih a sochasic drif analysis for queue processes. hen, we show if he benchmark randomized saionary algorihm always sars from is saionary sae, hen, he discrepancy of regres beween he imaginary and rue sysems can be conrolled via he slow-updae propery of he proposed algorihm ogeher wih he properies of MDPs developed in Secion.4. Finally, for he problem wih arbirary nonsaionary saring sae, we reformulae i as a perurbaion on he aforemenioned saionary sae problem and analyze he perurbaion via Farkas Lemma. g k i, 4. Convergence ime analysis 4.. Saionary sae performance: An online linear program Le Q := [Q, Q,, Q m ] be he virual queue vecor and L = Q. Define he drif := L + L Sample-pah analysis his secion develops a couple of bounds given a sequence of penaly funcions f k 0, f k,, f k and consrain funcions g k i,0, gk i,,, gk i,. he following lemma provides bounds for virual queue processes:

13 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 3 Lemma 4.. For any i {,,, m} a {,, }, he following holds under he virual queue updae 9, = g k i,, θk Q i + Q i + Ψ where Ψ > 0 is he consan defined in 4. = Proof. By he queue updaing rule 9, for any N, { } Q i + = max Q i + g k i,, θk, 0 Q i + =Q i + Q i + g k i,, θk g k i,, θk + g k i,, θk A k S k θ k g k i,, θk θ k g k i, Noe ha he consrain funcions are deerminisically bounded, g k A k S k Ψ. i, θ k θ k, θ k, Subsiuing his bound ino he above queue bound and rearranging he erms finish he proof. he nex lemma provides a bound for he drif. Lemma 4.. For any slo, we have Proof. By definiion, we have mk Ψ + m Q i g k i,, θk. = Q + Q m Q i + g k i,, θk Q i m = Q i g k i,, θk + Noe ha by he queue updae 9, we have g k i,, θk K g k i, m K g k i,, θk. θ k KΨ. Subsiuing his bound ino he drif bound finishes he proof.

14 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 4 Consider a convex se X R n. Recall ha for a fixed real number c > 0, a funcion h : X R is said o be c-srongly convex, if hx c x is convex over x X. I is easy o see ha if q : X R is convex, c > 0 and b R n, he funcion qx + c x b is c-srongly convex. Furhermore, if he funcion h is c-srongly convex ha is minimized a a poin x min X, hen see, e.g., Corollary in [33]: hx min hy c y x min, y X. 3 he following lemma is a direc consequence of he above srongly convex resul. I also demonsraes he key propery of our minimizaion subproblem 0. Lemma 4.3. he following bound holds for any k {,,, K} and any fixed θ k Θ k : V f k, θk θ k + V m f k, θk θ k Q i g k i,, θk + α θ k θ k + m Q i g k i,, θk + α θ k θ k α θ k θ k. 4 his lemma follows easily from he fac ha he proposed algorihm 0 gives θ k Θ k minimizing he lef hand side, which is a srongly convex funcion, and hen, applying 3, wih h θ k = V f k, θk θ k + m Q i g k i,, θk + α θ k θ k Combining he previous wo lemmas gives he following drif-plus-penaly bound. Lemma 4.4. For any fixed {θ k bound, + V f k, θk θ k 3 mk Ψ + V } K + α such ha θk θ k θ k f k, θk θ k + α + Θ k and N, we have he following m Q i g k i,, θk θ k Proof. Using Lemma 4. and hen Lemma 4.3, we obain + V mk Ψ + mk Ψ + + α θ k f k, θk θ k m Q i + α g k i,, θk f k, θk θ k θ k α θ k + θ k + V θ k θ k α f k, θk θ k m Q i g k i,, θk + α θ k θ k θ k 5 θ k θ k. 6

15 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 5 Noe ha by he queue updaing rule 9, we have for any, Q i Q i g k i,, θk K g k θ k i, KΨ, and for =, Q i Q i = 0 by he iniial condiion of he algorihm. Also, we have for any θ k Θ k, g k i,, θk K g k θ k i, KΨ. hus, we have m Q i g k i,, θk m Q i Subsiuing his bound ino 6 finishes he proof. g k i,, θk + mk Ψ Objecive bound heorem 4.. For any {θ k } K in he consrain se and any {,, 3, }, he proposed algorihm has he following saionary sae performance bound: K =0 f k, θ k K =0 f k, θ k + αk V + mk Ψ + V Ψ α In paricular, choosing α = and V = gives he O regre K =0 f k, θ k K =0 + f k, θ k K + Ψ S k A k + 3 mk Ψ, V S k A k + 5 mk Ψ. Proof. Firs of all, noe ha {g k i, }K is i.i.d. and independen of all sysem hisory up o, and hus independen of Q i, i =,,, m. We have Q i g k i,, θk = Q i K g k i,, θk 0 7 where he las inequaliy follows from he assumpion ha {θ k } K is in he consrain se.

16 Subsiuing θ k + V 3 mk Ψ +V 3 mk Ψ +V X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 6 ino 5 and aking expecaion wih respec o boh sides give K K K f k, θk θ k K + α f k, θk θ k + α K θ k θ k m Q i θ k f k, θk θ k +α K θ k θ k θ k g k i,, θk α K +α K θ k θ k θ k where he second inequaliy follows from 7. Noe ha for any k, compleing he squares gives V f k, θk θ k + α θ k θ k α θ k θ k + V α/ f k Subsiuing his inequaliy ino he previous bound and rearranging he erms give V Ψ S k A k. α K V f k, K θk V f k, θk + V K Ψ S k A k + 3 α mk Ψ K K + α θ k θ k α θ k θ k. aking elescoping sums from o and dividing boh sides by V gives, K f k, K θk f k, L0 L + θk + V = K α θk θ k α + V K f k, θk + V K Ψ S k A k α where we use he fac ha L0 = 0 and θ k A drif lemma and is implicaions + V K Ψ S k A k α K θk θ k θk θ k. θ k + 3 mk Ψ + αk V V, From Lemma 4., we know ha in order o ge he consrain violaion bound, we need o look a he size of he virual queue Q i +, i =,,, m. he following drif lemma serves as a cornersone for our goal. Lemma 4.5 Lemma 5 of [5]. Le {Z, } be a discree ime sochasic process adaped o a filraion {F, }. Suppose here exis ineger 0 > 0, real consans λ R, δ max > 0, θ k + 3 mk Ψ V

17 and 0 < ζ δ max such ha X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 7 Z + Z δ max, 8 { 0 δ [Z + 0 Z F ] max, if Z < λ 0 ζ, if Z λ. 9 hold for all {,,...}. hen, he following holds 4δmax. [Z] λ + 0 ζ log [ + 8δ max e ζ/4δmax], {,,...}. ζ. For any consan 0 < µ <, we have PrZ z µ, {,,...} where z = λ + 0 4δ max ζ log [ + 8δ max e ζ/4δmax] 4δ + ζ max 0 ζ log µ. Noe ha a special case of above drif lemma for 0 = daes back o he seminal paper of Hajek [34] bounding he size of a random process wih srongly negaive drif. Since hen, is power has been demonsraed in various scenarios ranging from seady sae queue bound [35] o feasibiliy analysis of sochasic opimizaion [36]. he curren generalizaion o a muli-sep drif is firs considered in [5]. his lemma is useful in he curren conex due o he following lemma, whose proof can be found in he appendix. Lemma 4.6. Le F, be he sysem hisory funcions up o ime, including f k 0,, f k, g k 0,i,, gk,i, i =,,, m, k =,,, K, and F 0 is a null se. Le 0 be an arbirary posiive ineger, hen, we have Q + Q mkψ, { [ Q + 0 Q 0 mkψ, if Q < λ F ] η, 0, if Q λ where λ = 8V KΨ+3mK Ψ +4Kα+ 0 0 mψ+mkψη 0 +η 0 η 0. Combining he previous wo lemmas gives he virual queue bound as Q 8V KΨ + 3mK Ψ + 4Kα mψ + mkψη 0 + η 0 η mkψ log [ + 8e /4 ]. We hen choose 0 =, V = and α =, which implies ha Q Cm, K, Ψ, η, 0 where Cm, K, Ψ, η = 8KΨ η + 3mK Ψ η + 4K + mψ η + mkψ + η + 4 mkψ log + 8e / he slow-updae condiion and consrain violaion In his secion, we prove he slow-updae propery of he proposed algorihm, which no only implies he he O consrain violaion bound, bu also plays a key role in Markov analysis.

18 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 8 Lemma 4.7. he sequence of sae-acion vecors θ k, {,,, } saisfies θ k θ k m A k S k Ψ Q A + k S k ΨV. α α In paricular,choosing V = and α = gives a slow-updae condiion θ k θ k A k S k Ψ + C m A k S k Ψ, where C = Cm, K, Ψ, η is defined in 0. Proof. Firs, choosing θ = θ in 4 gives V f k, θk θ k + m Rearranging he erms gives α θ k θ k V f k, θk θ k Q i g k i,, θk + α θ k θ k m V f k θ k θ k + V f θ k m Q i g k i,, θk α θk θk. Q i g k i,, θk θ k m θ k + Q m Q i g k i, θ k θ k g k i, θk θ k, where he second and hird inequaliy follow from Cauchy-Schwarz inequaliy. hus, i follows θ k θ k V f k + Q α m gk i, Applying he fac ha f k A k S k Ψ, g k i, A k S k Ψ and aking expecaion from boh sides give he firs bound in he lemma. he second bound follows direcly from he firs bound by furher subsiuing 0. heorem 4.. he proposed algorihm has he following saionary sae consrain violaion bound: K g k i,, θk C + m A k S k ΨC + A k S k Ψ, =0 where C = Cm, K, Ψ, η is defined in 0. Proof. aking expecaion from boh sides of Lemma 4. gives K g k i,, θk Q i + + Ψ A k S k θ k θ k. = = Subsiuing he bounds 0 and in o he above inequaliy gives he desired resul..

19 4.. Markov analysis X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 9 So far, we have shown ha our algorihm achieves an O regre and consrain violaion simulaneously regarding he saionary online linear program wih consrain se given by in he imaginary sysem. In his secion, we show how hese saionary sae resuls leads o a igh performance bound on he original rue online MDP problem and comparing o any join randomized saionary algorihm saring from is saionary sae Approximae mixing of MDPs Le F, be he se of sysem hisory funcions up o ime, including f k 0,, f k, g k 0,i,, gk,i, i =,,, m, k =,,, K, and F 0 is a null se. Le d k π be he saionary sae disribuion a k-h MDP under he randomized saionary policy π k in he proposed algorihm. Le v k be he rue sae disribuion a ime slo under he proposed algorihm given he funcion pah F and saring sae d k 0, i.e. for any s Sk, v k s := P r s k = s F and v k 0 = d k 0. he following lemma provides a key esimae on he disance beween saionary disribuion and rue disribuion a each ime slo. I builds upon he slow-updae condiion Lemma 4.7 of he proposed algorihm and uniform mixing bound of general MDPs Lemma.. Lemma 4.8. Consider he proposed algorihm wih V = and α =. For any iniial sae disribuion {d k 0 }K and any {0,,,, }, we have dπ k v k τr A k S k Ψ + C m A k S k Ψ + e τr +, where τ and r are mixing parameers defined in Lemma. and C is an absolue consan defined in 0. Proof. By Lemma 4.7 we know ha for any {,,, }, A k θ k θ k S k Ψ + C m A k S k Ψ, hus, Since for any s S k, d k π θ k θ k s d k π s = A k S k Ψ + C m A k S k Ψ, θ k a A k a, s θ k a, s a A k θ k a, s θ k a, s, i hen follows d π d θ k k π θ k k A k S k Ψ + C m A k S k Ψ.

20 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 0 dπ Now, we use he above relaion o bound k v k for any r. dπ k v k d k π d k π + d k π v k A k S k Ψ + C m A k S k Ψ + d k π v k A k S k Ψ + C m A k S k Ψ = + d k π v k P k π k, 3 where he second inequaliy follows from he slow-updae condiion and he final equaliy follows from he fac ha given he funcion pah F, he following holds d k π v k = d k π v k P k. 4 π k o see his, noe ha from he proposed algorihm, he policy π k is deermined by F. hus, by definiion of saionary disribuion, given F, we know ha d k π = d k π P k, and i is π k enough o show ha given F, v k = v k Pk π k Firs of all, he sae disribuion v k is deermined by v k, πk and probabiliy ransiion from s o s, which are in urn deermined by F. hus, given F, for any s S k, and v k s = P rs = s s = s, F v k s, s S k P rs = s s = s, F = P rs = s a = a, s = s, F P ra = a s = s, F a A k = P a s, sp ra = a s = s, F = P a s, sπ k a s = P k π s, s, a A k a A k where he second inequaliy follows from he Assumpion., he hird equaliy follows from he fac ha π k is deermined by F, hus, for any, π k a s = P ra = a s = s, F, a A k, s S k, and he las equaliy follows from he definiion of ransiion probabiliy 3. his gives and hus 4 holds. v k s = s S k P π k. s, sv k s,

21 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs We can ieraively apply he procedure 3 r imes as follows dπ k v k A k S k Ψ + C m A k S k Ψ + d k π d k π A k S k Ψ + C m A k S k Ψ + d k π v k A k S k Ψ + C m A k S k Ψ + d k π v k A k S k Ψ + C m A k S k Ψ r + P k π k P k π k P k π k d k π v k r k r k + P k π k P k π k r d k π v k P k π k, where he second inequaliy follows from he nonexpansive propery in l norm of he sochasic marix P k ha π k d k π d k π P k π k d π k d k π, and hen using he slow-updae condiion again. By Lemma., we have dπ k v k r Ieraing his inequaliy down o = 0 gives dπ k v k finishing he proof. /τ j=0 /τ j=0 A k S k Ψ + C m A k S k Ψ e j/τ r e j/τ r A k S k Ψ + C m A k S k Ψ A k S k Ψ + C m A k S k Ψ + e /τ d k π v k r r A k S k Ψ + C m A k S k Ψ e x/τ dx r x=0 A k S k Ψ + C m A k S k Ψ τr + e rτ + + dπ k 0 + e /r /τ + e rτ +. v k 0 P k π k e /r /τ 4... Benchmarking agains policies saring from saionary sae Combining he resuls derived so far, we have he following regre bound regarding any randomized saionary policy Π saring from is saionary sae disribuion d Π such ha d Π, Π in he consrain se G defined in. heorem 4.3. Le P be he sequence of randomized saionary policies resuling from he proposed algorihm wih V = and α =. Le d 0 be he saring sae of he proposed algorihm. For an one-line proof, see 39 in he appendix.

22 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs For any randomized saionary policy Π saring from is saionary sae disribuion d Π such ha d Π, Π G, we have F d 0, P F d Π, Π O G i, d 0, P O m 3/ K K m 3/ K K A k S k, A k S k, i =,,, m. Proof. Firs of all, by Lemma., for any randomized saionary policy Π, here exiss some saionary sae-acion probabiliy vecors {θ k } K such ha θk Θ k, F d Π, Π = K =0 and G i, d Π, Π = K =0 g i,, θ k. As a consequence, d Π, Π G implies G i, d Π, Π = K =0 g i,, θ k 0, i {,,, m} and i follows {θ k } K is in he imaginary consrain se G defined in. hus, we are in a good shape applying heorem 4. from imaginary sysems. We hen spli F d 0, P F d Π, Π ino wo erms: F d 0, P F d 0, Π f k a k, s k d 0, P f k, θ k =0 =0 }{{} I By heorem 4., we ge II + =0 f k, θ k f, θ k } {{ } II K + Ψ S k A k + 5 mk Ψ. 5 We hen bound I. Consider each ime slo {0,,, }. We have f k, θ k = sπ k a sf k a, s f k a k, s k d 0, P = s S k a A k s S k a A k where he firs equaliy follows from he definiion of θ k d π k v k he following: Given a specific funcion pah F, he policy π k are fixed. hus, we have, v k f k a k, s k d 0, P, F = s S k v k a A k sπ k a sf k a, s,. and he second equaliy follows from and he rue sae disribuion sπ k a sf k a, s. f, θ k,

23 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 3 aking he full expecaion regarding he funcion pah gives he resul. hus, f k a k, s k d 0, P f k, θ k a s v k s d k π s π k s S k a A k Ψ v k d k π Ψ τr + C m A k S k Ψ + e τr + Ψ where he las inequaliy follows from Lemma 4.8. hus, i follows, I =0 τr + C m A k S k Ψ + e τr + Ψ τr + C m A k S k Ψ + ΨK =0 e x τr + dx τrψ + C m K A k S k + eψkτr. 6 Overall, combining 5,6 and subsiuing he consan C = Cm, K, Ψ, η defined in 0 gives he objecive regre bound. For he consrain violaion, we have G i, d 0, P = g k i, a, s d 0, P g k i, =0 = }{{} IV, θ + = g k i,, θ. } {{ } V he erm V can be readily bounded using heorem 4. as K g k i,, θk C + m A k S k ΨC + A k S k Ψ. =0 For he erm IV, we have g k i,, θk = g k i, ak, s k d 0, P = s S k a A k s S k a A k where he firs equaliy follows from he definiion of θ k d π k v k he following: Given a specific funcion pah F, he policy π k are fixed. hus, we have, v k g k a k, s k d 0, P, F = s S k v k a A k sπ k a sg k i, a, s sπ k a sg k i,, a, s and he second equaliy follows from and he rue sae disribuion sπ k a sg k a, s.

24 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 4 aking he full expecaion regarding he funcion pah gives he resul. hen, repea he same proof as ha of 6 gives IV τrψ + C m K A k S k + eψkτr. his finishes he proof of consrain violaion. 5. A more general regre bound agains policies wih arbirary saring sae Recall ha heorem 4.3 compares he proposed algorihm wih any randomized saionary policy Π saring from is saionary sae disribuion d Π, so ha d Π, Π G. In his secion, we generalize heorem 4.3 and obain a bound of he regre agains all d 0, Π G where d 0 is an arbirary saring sae disribuion no necessarily he saionary sae disribuion. he main echnical difficuly doing such a generalizaion is as follows: For any randomized saionary policy Π such ha d 0, Π G, le {θ k ha θ k } K Θ k and G i, d Π, Π = =0 K be he saionary sae-acion probabiliies such g i,, θ k. For some finie horizon, here migh exis some low-cos saring sae disribuion d 0 such ha G i, d 0, Π < G i, d Π, Π for some i {,,, m}. As a consequence, one coud have G i, d 0, Π 0, and =0 g i,, θ k > 0. his implies alhough d 0, Π is feasible for our rue sysem, is saionary sae-acion probabiliies {θ k } K can be infeasible wih respec o he imaginary consrain se, and all our analysis so far fails o cover such randomized saionary policies. o resolve his issue, we have o enlarge he imaginary consrain se so as o cover all } K arising from any randomized saionary policy Π such ha d 0, Π G. Bu a perurbaion of consrain se would resul in a perurbaion of objecive in he imaginary sysem also. Our main goal in his secion is o bound such a perurbaion and show ha he perurbaion bound leads o he final O regre bound. sae-acion probabiliies {θ k A relaxed consrain ses We begin wih a supporing lemma on he uniform mixing ime bound over all join randomized saionary policies. he proof is given in he appendix. Lemma 5.. Consider any randomized saionary policies Π in wih arbirary saring sae disribuion d 0 S S K. Le P Π be he corresponding ransiion marix on he produc sae space. hen, he following holds d 0 d Π P Π e r /r, {0,,, }, 7 where r is fixed posiive consan independen of Π. he following lemma shows a relaxaion of O/ on he imaginary consrain se is enough o cover all he {θ k } K discussed a he beginning of his secion.

25 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 5 Lemma 5.. For any {,, } and any randomized saionary policies Π in, wih arbirary saring sae disribuion d 0 S S K and saionary sae-acion probabiliy {θ k } K, K f k a k, s k d 0, Π f k, θ k C KΨ 8 =0 K g k i, ak, s k d 0, Π g k i,, θ k C KΨ 9 =0 where C is an absolue consan. In paricular, {θ k consrain se G + := { θ k Θ k, k =,,, K : for some universal posiive consan r > 0. g k i, } K is conained in he following relaxed, θ k C KΨ, i =,,, m Proof. Le v S S K be he join sae disribuion a ime under policy Π. Using he fac ha Π is a fixed policy independen of g k i, and Assumpion. ha he probabiliy ransiion is also independen of funcion pah given any sae and acion, he funcion g k i, and sae-acion pair a k, s k are muually independen. hus, for any {0,,,, } K g k i, ak, s k d 0, Π = v sπa s g k i, ak, s k, s S S K a A A K where s = [s,, s K ] and a = [a,, a K ] and he laer expecaion is aken wih respec o g k i, i.e. he random variable w. On he oher hand, by Lemma., we know ha for any randomized saionary policy Π, he corresponding saionary sae-acion probabiliy can be expressed as {θ k wih θk Θ k. hus, g k i,, θ k = } K s S S K a A A K d Π sπa s Hence, we can conrol he difference: K g k i, ak, s k d 0, Π g k i,, θ k =0 v s d Π s Πa s =0 KΨ s S S K a A A K KΨ =0 v d Π KΨ =0 e r /r ekψ 0 } g k i, ak, s k. e /r d = er KΨ, where he hird inequaliy follows from Lemma 5.. aking C = er finishes he proof of 9 and 8 can be proved in a similar way.,

26 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 6 In paricular, we have for any randomized saionary policy Π ha saisfies he consrain, we have K =0 g k i,, θ k g k i, ak er KΨ + 0 = er KΨ, finishing he proof., s k d 0, Π g k i,, θ k K Bes saionary performance over he relaxed consrain se =0 g k i, ak, s k d 0, Π Recall ha he bes saionary performance in hindsigh over all randomized saionary policies in he consrain se G can be obained as he minimum achieved by he following linear program. min s.. =0 g k i, f k, θ k 30, θ k 0, i =,,, m. 3 On he oher hand, if we consider all he randomized saionary policies conained in he original consrain se, hen, By Lemma 5., he relaxed consrain se G conains all such policies and he bes saionary performance over his relaxed se comes from he minimum achieved by he following perurbed linear program: min s.. =0 g k i, f k, θ k 3, θ k C KΨ, i =,,, m. 33 We aim o show ha he minimum achieved by 3-33 is no far away from ha of In general, such a conclusion is no rue due o he unboundedness of Lagrange mulipliers in consrained opimizaion. However, since Slaer s condiion holds in our case, he perurbaion can be bounded via he following well-known Farkas lemma [3]: Lemma 5.3 Farkas Lemma. Consider a convex program wih objecive fx and consrain funcion g i x, i =,,, m: min fx, 34 s.. g i x b i, i =,,, m, 35 x X, 36 for some convex se X R n. Le x be one of he soluions o he above convex program. Suppose here exiss x X such ha g i x < 0, i {,,, m}. hen, here exiss a separaion hyperplane paramerized by, µ, µ,, µ m such ha µ i 0 and m m fx + µ i g i x fx + µ i b i, x X.

27 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 7 he parameer µ = µ, µ,, µ m is usually referred o as a Lagrange muliplier. From he geomeric perspecive, Farkas Lemma saes ha if Slaer s condiion holds, hen, here exiss a non-verical separaion hyperplane suppored a fx, b,, b m and conains he { } se fx, g x,, g m x, x X on one side. hus, in order o bound he perurbaion of objecive wih respec o he perurbaion of consrain level, we need o bound he slope of he supporing hyperplane from above, which boils down o conrolling he magniude of he Lagrange muliplier. his is summarized in he following lemma: Lemma 5.4 Lemma of [30]. Consider he convex program 34-36, and define he Lagrange dual funcion qµ = inf x X {fx + m µ ig i x b i }. Suppose here exiss x X such ha g i x b i η, i {,,, m} for some posiive consan η > 0. hen, he level se V µ = {µ, µ,, µ m 0, qµ q µ} is bounded for any nonnegaive µ. Furhermore, we have max µ V µ µ min i m { g i x+b i } f x q µ. he echnical imporance of hese wo lemmas in he curren conex is conained in he following corollary. { } Corollary 5.. Le θ k K { } and θ k K be soluions o 30-3 and 3-33, respecively. hen, he following holds Proof. ake =0 f f k, θ k θ,, θ K = g i θ,, θ K = =0 f k, θ k C K mψ η =0 X = Θ Θ Θ K, f k, θ k, g k i,, θ k, and b i = 0 in Farkas Lemma and we have he following display =0 f k, θ k + m K µ i g k i,, θ k =0 f k, θ k, for any θ,, θ K X and some µ, µ,, µ m 0. In paricular, subsiuing ino he above display gives =0 f k, θ k =0 f k, θ k =0 m K µ i f k, θ k g k i, C KΨ, θ k θ,, θ K m µ i, 37 where he final inequaliy follows from he fac ha θ,, θ K saisfies he relaxed consrain K, θ k and µ i 0, i {,,, m}. Now we need o bound g k i, C KΨ

28 X. Wei, H. Yu, M. J. Neely/Online consrained MDPs 8 he magniude of Lagrange muliplier µ,, µ m. Noe ha in our scenario, f θ,, θ K = =0 f k, θ k Ψ, and he Lagrange muliplier µ is he soluion o he maximizaion problem max qµ, µ i 0,i {,,,m} where qµ is he dual funcion defined in Lemma 5.4. hus, i mus be in any super level se V µ = {µ, µ,, µ m 0, qµ q µ}. In paricular, aking µ = 0 in Lemma 5.4 and using Slaer s condiion 8, we have here exiss θ,, θ K such ha m µ i m θ m µ f,, µ θ K inf f θ,, θ K mψk, θ,,θ K X η where he final inequaliy follows from he deerminisic bound of fθ,, θ K by ΨK. Subsiuing his bound ino 37 gives he desired resul. As a simple consequence of he above corollary, we have our final bound on he regre and consrain violaion regarding any d 0, Π G. heorem 5.. Le P be he sequence of randomized saionary policies resuling from he proposed algorihm wih V = and α =. Le d 0 be he saring sae of he proposed algorihm. For any randomized saionary policy Π saring from he sae d 0 such ha d 0, Π G, we have F d 0, P F d 0, Π O G i, d 0, P O m 3/ K K m 3/ K K A k S k, A k S k, i =,,, m. Proof. Le Π be he randomized saionary policy corresponding o he soluion {θ k } K o 30-3 and le Π be any randomized saionary policy such ha d 0, Π G. Since G i, d Π, Π = K =0 we know ha g i,, θ k F d 0, P F d Π, Π O 0, i follows d Π, Π G. By heorem 4.3, m 3/ K K A k S k, and G i, d 0, P saisfies he bound in he saemen. I is hen enough o bound F d Π, Π F d 0, Π. We spli i in o wo erms: F d Π, Π F d 0, Π F d Π, Π F d Π, Π + F }{{} d Π, Π F d 0, Π. }{{} I II By 8 in Lemma 5., he erm II is bounded by C KΨ. I remains o bound he firs erm. Since d 0, Π G, by Lemma 5., he corresponding sae-acion probabiliies {θ k } K of Π

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs

A Primal-Dual Type Algorithm with the O(1/t) Convergence Rate for Large Scale Constrained Convex Programs PROC. IEEE CONFERENCE ON DECISION AND CONTROL, 06 A Primal-Dual Type Algorihm wih he O(/) Convergence Rae for Large Scale Consrained Convex Programs Hao Yu and Michael J. Neely Absrac This paper considers

More information

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 2, May 2013, pp. 209 227 ISSN 0364-765X (prin) ISSN 1526-5471 (online) hp://dx.doi.org/10.1287/moor.1120.0562 2013 INFORMS On Boundedness of Q-Learning Ieraes

More information

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB

T L. t=1. Proof of Lemma 1. Using the marginal cost accounting in Equation(4) and standard arguments. t )+Π RB. t )+K 1(Q RB Elecronic Companion EC.1. Proofs of Technical Lemmas and Theorems LEMMA 1. Le C(RB) be he oal cos incurred by he RB policy. Then we have, T L E[C(RB)] 3 E[Z RB ]. (EC.1) Proof of Lemma 1. Using he marginal

More information

Notes on online convex optimization

Notes on online convex optimization Noes on online convex opimizaion Karl Sraos Online convex opimizaion (OCO) is a principled framework for online learning: OnlineConvexOpimizaion Inpu: convex se S, number of seps T For =, 2,..., T : Selec

More information

5. Stochastic processes (1)

5. Stochastic processes (1) Lec05.pp S-38.45 - Inroducion o Teleraffic Theory Spring 2005 Conens Basic conceps Poisson process 2 Sochasic processes () Consider some quaniy in a eleraffic (or any) sysem I ypically evolves in ime randomly

More information

Lecture Notes 2. The Hilbert Space Approach to Time Series

Lecture Notes 2. The Hilbert Space Approach to Time Series Time Series Seven N. Durlauf Universiy of Wisconsin. Basic ideas Lecure Noes. The Hilber Space Approach o Time Series The Hilber space framework provides a very powerful language for discussing he relaionship

More information

1 Review of Zero-Sum Games

1 Review of Zero-Sum Games COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any

More information

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem.

E β t log (C t ) + M t M t 1. = Y t + B t 1 P t. B t 0 (3) v t = P tc t M t Question 1. Find the FOC s for an optimum in the agent s problem. Noes, M. Krause.. Problem Se 9: Exercise on FTPL Same model as in paper and lecure, only ha one-period govenmen bonds are replaced by consols, which are bonds ha pay one dollar forever. I has curren marke

More information

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection

Appendix to Online l 1 -Dictionary Learning with Application to Novel Document Detection Appendix o Online l -Dicionary Learning wih Applicaion o Novel Documen Deecion Shiva Prasad Kasiviswanahan Huahua Wang Arindam Banerjee Prem Melville A Background abou ADMM In his secion, we give a brief

More information

Notes for Lecture 17-18

Notes for Lecture 17-18 U.C. Berkeley CS278: Compuaional Complexiy Handou N7-8 Professor Luca Trevisan April 3-8, 2008 Noes for Lecure 7-8 In hese wo lecures we prove he firs half of he PCP Theorem, he Amplificaion Lemma, up

More information

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon

3.1.3 INTRODUCTION TO DYNAMIC OPTIMIZATION: DISCRETE TIME PROBLEMS. A. The Hamiltonian and First-Order Conditions in a Finite Time Horizon 3..3 INRODUCION O DYNAMIC OPIMIZAION: DISCREE IME PROBLEMS A. he Hamilonian and Firs-Order Condiions in a Finie ime Horizon Define a new funcion, he Hamilonian funcion, H. H he change in he oal value of

More information

Vehicle Arrival Models : Headway

Vehicle Arrival Models : Headway Chaper 12 Vehicle Arrival Models : Headway 12.1 Inroducion Modelling arrival of vehicle a secion of road is an imporan sep in raffic flow modelling. I has imporan applicaion in raffic flow simulaion where

More information

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach

Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach 1 Decenralized Sochasic Conrol wih Parial Hisory Sharing: A Common Informaion Approach Ashuosh Nayyar, Adiya Mahajan and Demoshenis Tenekezis arxiv:1209.1695v1 [cs.sy] 8 Sep 2012 Absrac A general model

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :45 PM 8/8/04 Copyrigh 04 Richard T. Woodward. An inroducion o dynamic opimizaion -- Opimal Conrol and Dynamic Programming AGEC 637-04 I. Overview of opimizaion Opimizaion is

More information

Online Convex Optimization Example And Follow-The-Leader

Online Convex Optimization Example And Follow-The-Leader CSE599s, Spring 2014, Online Learning Lecure 2-04/03/2014 Online Convex Opimizaion Example And Follow-The-Leader Lecurer: Brendan McMahan Scribe: Sephen Joe Jonany 1 Review of Online Convex Opimizaion

More information

Online Appendix to Solution Methods for Models with Rare Disasters

Online Appendix to Solution Methods for Models with Rare Disasters Online Appendix o Soluion Mehods for Models wih Rare Disasers Jesús Fernández-Villaverde and Oren Levinal In his Online Appendix, we presen he Euler condiions of he model, we develop he pricing Calvo block,

More information

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles

Diebold, Chapter 7. Francis X. Diebold, Elements of Forecasting, 4th Edition (Mason, Ohio: Cengage Learning, 2006). Chapter 7. Characterizing Cycles Diebold, Chaper 7 Francis X. Diebold, Elemens of Forecasing, 4h Ediion (Mason, Ohio: Cengage Learning, 006). Chaper 7. Characerizing Cycles Afer compleing his reading you should be able o: Define covariance

More information

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle

Physics 235 Chapter 2. Chapter 2 Newtonian Mechanics Single Particle Chaper 2 Newonian Mechanics Single Paricle In his Chaper we will review wha Newon s laws of mechanics ell us abou he moion of a single paricle. Newon s laws are only valid in suiable reference frames,

More information

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes

23.2. Representing Periodic Functions by Fourier Series. Introduction. Prerequisites. Learning Outcomes Represening Periodic Funcions by Fourier Series 3. Inroducion In his Secion we show how a periodic funcion can be expressed as a series of sines and cosines. We begin by obaining some sandard inegrals

More information

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions

Inventory Analysis and Management. Multi-Period Stochastic Models: Optimality of (s, S) Policy for K-Convex Objective Functions Muli-Period Sochasic Models: Opimali of (s, S) Polic for -Convex Objecive Funcions Consider a seing similar o he N-sage newsvendor problem excep ha now here is a fixed re-ordering cos (> 0) for each (re-)order.

More information

Lecture 20: Riccati Equations and Least Squares Feedback Control

Lecture 20: Riccati Equations and Least Squares Feedback Control 34-5 LINEAR SYSTEMS Lecure : Riccai Equaions and Leas Squares Feedback Conrol 5.6.4 Sae Feedback via Riccai Equaions A recursive approach in generaing he marix-valued funcion W ( ) equaion for i for he

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Noes for EE7C Spring 018: Convex Opimizaion and Approximaion Insrucor: Moriz Hard Email: hard+ee7c@berkeley.edu Graduae Insrucor: Max Simchowiz Email: msimchow+ee7c@berkeley.edu Ocober 15, 018 3

More information

Optimality Conditions for Unconstrained Problems

Optimality Conditions for Unconstrained Problems 62 CHAPTER 6 Opimaliy Condiions for Unconsrained Problems 1 Unconsrained Opimizaion 11 Exisence Consider he problem of minimizing he funcion f : R n R where f is coninuous on all of R n : P min f(x) x

More information

Cash Flow Valuation Mode Lin Discrete Time

Cash Flow Valuation Mode Lin Discrete Time IOSR Journal of Mahemaics (IOSR-JM) e-issn: 2278-5728,p-ISSN: 2319-765X, 6, Issue 6 (May. - Jun. 2013), PP 35-41 Cash Flow Valuaion Mode Lin Discree Time Olayiwola. M. A. and Oni, N. O. Deparmen of Mahemaics

More information

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010

Simulation-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Simulaion-Solving Dynamic Models ABE 5646 Week 2, Spring 2010 Week Descripion Reading Maerial 2 Compuer Simulaion of Dynamic Models Finie Difference, coninuous saes, discree ime Simple Mehods Euler Trapezoid

More information

Energy Storage Benchmark Problems

Energy Storage Benchmark Problems Energy Sorage Benchmark Problems Daniel F. Salas 1,3, Warren B. Powell 2,3 1 Deparmen of Chemical & Biological Engineering 2 Deparmen of Operaions Research & Financial Engineering 3 Princeon Laboraory

More information

20. Applications of the Genetic-Drift Model

20. Applications of the Genetic-Drift Model 0. Applicaions of he Geneic-Drif Model 1) Deermining he probabiliy of forming any paricular combinaion of genoypes in he nex generaion: Example: If he parenal allele frequencies are p 0 = 0.35 and q 0

More information

Comments on Window-Constrained Scheduling

Comments on Window-Constrained Scheduling Commens on Window-Consrained Scheduling Richard Wes Member, IEEE and Yuing Zhang Absrac This shor repor clarifies he behavior of DWCS wih respec o Theorem 3 in our previously published paper [1], and describes

More information

14 Autoregressive Moving Average Models

14 Autoregressive Moving Average Models 14 Auoregressive Moving Average Models In his chaper an imporan parameric family of saionary ime series is inroduced, he family of he auoregressive moving average, or ARMA, processes. For a large class

More information

An introduction to the theory of SDDP algorithm

An introduction to the theory of SDDP algorithm An inroducion o he heory of SDDP algorihm V. Leclère (ENPC) Augus 1, 2014 V. Leclère Inroducion o SDDP Augus 1, 2014 1 / 21 Inroducion Large scale sochasic problem are hard o solve. Two ways of aacking

More information

Games Against Nature

Games Against Nature Advanced Course in Machine Learning Spring 2010 Games Agains Naure Handous are joinly prepared by Shie Mannor and Shai Shalev-Shwarz In he previous lecures we alked abou expers in differen seups and analyzed

More information

Notes on Kalman Filtering

Notes on Kalman Filtering Noes on Kalman Filering Brian Borchers and Rick Aser November 7, Inroducion Daa Assimilaion is he problem of merging model predicions wih acual measuremens of a sysem o produce an opimal esimae of he curren

More information

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1

SZG Macro 2011 Lecture 3: Dynamic Programming. SZG macro 2011 lecture 3 1 SZG Macro 2011 Lecure 3: Dynamic Programming SZG macro 2011 lecure 3 1 Background Our previous discussion of opimal consumpion over ime and of opimal capial accumulaion sugges sudying he general decision

More information

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD

PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD PENALIZED LEAST SQUARES AND PENALIZED LIKELIHOOD HAN XIAO 1. Penalized Leas Squares Lasso solves he following opimizaion problem, ˆβ lasso = arg max β R p+1 1 N y i β 0 N x ij β j β j (1.1) for some 0.

More information

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018

MATH 5720: Gradient Methods Hung Phan, UMass Lowell October 4, 2018 MATH 5720: Gradien Mehods Hung Phan, UMass Lowell Ocober 4, 208 Descen Direcion Mehods Consider he problem min { f(x) x R n}. The general descen direcions mehod is x k+ = x k + k d k where x k is he curren

More information

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients

Section 3.5 Nonhomogeneous Equations; Method of Undetermined Coefficients Secion 3.5 Nonhomogeneous Equaions; Mehod of Undeermined Coefficiens Key Terms/Ideas: Linear Differenial operaor Nonlinear operaor Second order homogeneous DE Second order nonhomogeneous DE Soluion o homogeneous

More information

Lecture 9: September 25

Lecture 9: September 25 0-725: Opimizaion Fall 202 Lecure 9: Sepember 25 Lecurer: Geoff Gordon/Ryan Tibshirani Scribes: Xuezhi Wang, Subhodeep Moira, Abhimanu Kumar Noe: LaTeX emplae couresy of UC Berkeley EECS dep. Disclaimer:

More information

Expert Advice for Amateurs

Expert Advice for Amateurs Exper Advice for Amaeurs Ernes K. Lai Online Appendix - Exisence of Equilibria The analysis in his secion is performed under more general payoff funcions. Wihou aking an explici form, he payoffs of he

More information

BU Macro BU Macro Fall 2008, Lecture 4

BU Macro BU Macro Fall 2008, Lecture 4 Dynamic Programming BU Macro 2008 Lecure 4 1 Ouline 1. Cerainy opimizaion problem used o illusrae: a. Resricions on exogenous variables b. Value funcion c. Policy funcion d. The Bellman equaion and an

More information

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t.

O Q L N. Discrete-Time Stochastic Dynamic Programming. I. Notation and basic assumptions. ε t : a px1 random vector of disturbances at time t. Econ. 5b Spring 999 C. Sims Discree-Time Sochasic Dynamic Programming 995, 996 by Chrisopher Sims. This maerial may be freely reproduced for educaional and research purposes, so long as i is no alered,

More information

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1.

Robotics I. April 11, The kinematics of a 3R spatial robot is specified by the Denavit-Hartenberg parameters in Tab. 1. Roboics I April 11, 017 Exercise 1 he kinemaics of a 3R spaial robo is specified by he Denavi-Harenberg parameers in ab 1 i α i d i a i θ i 1 π/ L 1 0 1 0 0 L 3 0 0 L 3 3 able 1: able of DH parameers of

More information

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II

Zürich. ETH Master Course: L Autonomous Mobile Robots Localization II Roland Siegwar Margaria Chli Paul Furgale Marco Huer Marin Rufli Davide Scaramuzza ETH Maser Course: 151-0854-00L Auonomous Mobile Robos Localizaion II ACT and SEE For all do, (predicion updae / ACT),

More information

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t...

t is a basis for the solution space to this system, then the matrix having these solutions as columns, t x 1 t, x 2 t,... x n t x 2 t... Mah 228- Fri Mar 24 5.6 Marix exponenials and linear sysems: The analogy beween firs order sysems of linear differenial equaions (Chaper 5) and scalar linear differenial equaions (Chaper ) is much sronger

More information

Some Ramsey results for the n-cube

Some Ramsey results for the n-cube Some Ramsey resuls for he n-cube Ron Graham Universiy of California, San Diego Jozsef Solymosi Universiy of Briish Columbia, Vancouver, Canada Absrac In his noe we esablish a Ramsey-ype resul for cerain

More information

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models.

Technical Report Doc ID: TR March-2013 (Last revision: 23-February-2016) On formulating quadratic functions in optimization models. Technical Repor Doc ID: TR--203 06-March-203 (Las revision: 23-Februar-206) On formulaing quadraic funcions in opimizaion models. Auhor: Erling D. Andersen Convex quadraic consrains quie frequenl appear

More information

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY

RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY ECO 504 Spring 2006 Chris Sims RANDOM LAGRANGE MULTIPLIERS AND TRANSVERSALITY 1. INTRODUCTION Lagrange muliplier mehods are sandard fare in elemenary calculus courses, and hey play a cenral role in economic

More information

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC

1. An introduction to dynamic optimization -- Optimal Control and Dynamic Programming AGEC This documen was generaed a :37 PM, 1/11/018 Copyrigh 018 Richard T. Woodward 1. An inroducion o dynamic opimiaion -- Opimal Conrol and Dynamic Programming AGEC 64-018 I. Overview of opimiaion Opimiaion

More information

Lecture 4 Notes (Little s Theorem)

Lecture 4 Notes (Little s Theorem) Lecure 4 Noes (Lile s Theorem) This lecure concerns one of he mos imporan (and simples) heorems in Queuing Theory, Lile s Theorem. More informaion can be found in he course book, Bersekas & Gallagher,

More information

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE

MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE Topics MODULE 3 FUNCTION OF A RANDOM VARIABLE AND ITS DISTRIBUTION LECTURES 2-6 3. FUNCTION OF A RANDOM VARIABLE 3.2 PROBABILITY DISTRIBUTION OF A FUNCTION OF A RANDOM VARIABLE 3.3 EXPECTATION AND MOMENTS

More information

Scheduling of Crude Oil Movements at Refinery Front-end

Scheduling of Crude Oil Movements at Refinery Front-end Scheduling of Crude Oil Movemens a Refinery Fron-end Ramkumar Karuppiah and Ignacio Grossmann Carnegie Mellon Universiy ExxonMobil Case Sudy: Dr. Kevin Furman Enerprise-wide Opimizaion Projec March 15,

More information

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017

Two Popular Bayesian Estimators: Particle and Kalman Filters. McGill COMP 765 Sept 14 th, 2017 Two Popular Bayesian Esimaors: Paricle and Kalman Filers McGill COMP 765 Sep 14 h, 2017 1 1 1, dx x Bel x u x P x z P Recall: Bayes Filers,,,,,,, 1 1 1 1 u z u x P u z u x z P Bayes z = observaion u =

More information

10. State Space Methods

10. State Space Methods . Sae Space Mehods. Inroducion Sae space modelling was briefly inroduced in chaper. Here more coverage is provided of sae space mehods before some of heir uses in conrol sysem design are covered in he

More information

Object tracking: Using HMMs to estimate the geographical location of fish

Object tracking: Using HMMs to estimate the geographical location of fish Objec racking: Using HMMs o esimae he geographical locaion of fish 02433 - Hidden Markov Models Marin Wæver Pedersen, Henrik Madsen Course week 13 MWP, compiled June 8, 2011 Objecive: Locae fish from agging

More information

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality

Matrix Versions of Some Refinements of the Arithmetic-Geometric Mean Inequality Marix Versions of Some Refinemens of he Arihmeic-Geomeric Mean Inequaliy Bao Qi Feng and Andrew Tonge Absrac. We esablish marix versions of refinemens due o Alzer ], Carwrigh and Field 4], and Mercer 5]

More information

An Introduction to Malliavin calculus and its applications

An Introduction to Malliavin calculus and its applications An Inroducion o Malliavin calculus and is applicaions Lecure 5: Smoohness of he densiy and Hörmander s heorem David Nualar Deparmen of Mahemaics Kansas Universiy Universiy of Wyoming Summer School 214

More information

t dt t SCLP Bellman (1953) CLP (Dantzig, Tyndall, Grinold, Perold, Anstreicher 60's-80's) Anderson (1978) SCLP

t dt t SCLP Bellman (1953) CLP (Dantzig, Tyndall, Grinold, Perold, Anstreicher 60's-80's) Anderson (1978) SCLP Coninuous Linear Programming. Separaed Coninuous Linear Programming Bellman (1953) max c () u() d H () u () + Gsusds (,) () a () u (), < < CLP (Danzig, yndall, Grinold, Perold, Ansreicher 6's-8's) Anderson

More information

Mixing times and hitting times: lecture notes

Mixing times and hitting times: lecture notes Miing imes and hiing imes: lecure noes Yuval Peres Perla Sousi 1 Inroducion Miing imes and hiing imes are among he mos fundamenal noions associaed wih a finie Markov chain. A variey of ools have been developed

More information

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems

Essential Microeconomics : OPTIMAL CONTROL 1. Consider the following class of optimization problems Essenial Microeconomics -- 6.5: OPIMAL CONROL Consider he following class of opimizaion problems Max{ U( k, x) + U+ ( k+ ) k+ k F( k, x)}. { x, k+ } = In he language of conrol heory, he vecor k is he vecor

More information

Timed Circuits. Asynchronous Circuit Design. Timing Relationships. A Simple Example. Timed States. Timing Sequences. ({r 6 },t6 = 1.

Timed Circuits. Asynchronous Circuit Design. Timing Relationships. A Simple Example. Timed States. Timing Sequences. ({r 6 },t6 = 1. Timed Circuis Asynchronous Circui Design Chris J. Myers Lecure 7: Timed Circuis Chaper 7 Previous mehods only use limied knowledge of delays. Very robus sysems, bu exremely conservaive. Large funcional

More information

Tom Heskes and Onno Zoeter. Presented by Mark Buller

Tom Heskes and Onno Zoeter. Presented by Mark Buller Tom Heskes and Onno Zoeer Presened by Mark Buller Dynamic Bayesian Neworks Direced graphical models of sochasic processes Represen hidden and observed variables wih differen dependencies Generalize Hidden

More information

Lecture 2 October ε-approximation of 2-player zero-sum games

Lecture 2 October ε-approximation of 2-player zero-sum games Opimizaion II Winer 009/10 Lecurer: Khaled Elbassioni Lecure Ocober 19 1 ε-approximaion of -player zero-sum games In his lecure we give a randomized ficiious play algorihm for obaining an approximae soluion

More information

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence

Supplement for Stochastic Convex Optimization: Faster Local Growth Implies Faster Global Convergence Supplemen for Sochasic Convex Opimizaion: Faser Local Growh Implies Faser Global Convergence Yi Xu Qihang Lin ianbao Yang Proof of heorem heorem Suppose Assumpion holds and F (w) obeys he LGC (6) Given

More information

Linear Response Theory: The connection between QFT and experiments

Linear Response Theory: The connection between QFT and experiments Phys540.nb 39 3 Linear Response Theory: The connecion beween QFT and experimens 3.1. Basic conceps and ideas Q: How do we measure he conduciviy of a meal? A: we firs inroduce a weak elecric field E, and

More information

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H.

ACE 562 Fall Lecture 5: The Simple Linear Regression Model: Sampling Properties of the Least Squares Estimators. by Professor Scott H. ACE 56 Fall 005 Lecure 5: he Simple Linear Regression Model: Sampling Properies of he Leas Squares Esimaors by Professor Sco H. Irwin Required Reading: Griffihs, Hill and Judge. "Inference in he Simple

More information

Lecture 10: The Poincaré Inequality in Euclidean space

Lecture 10: The Poincaré Inequality in Euclidean space Deparmens of Mahemaics Monana Sae Universiy Fall 215 Prof. Kevin Wildrick n inroducion o non-smooh analysis and geomery Lecure 1: The Poincaré Inequaliy in Euclidean space 1. Wha is he Poincaré inequaliy?

More information

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4.

Reading from Young & Freedman: For this topic, read sections 25.4 & 25.5, the introduction to chapter 26 and sections 26.1 to 26.2 & 26.4. PHY1 Elecriciy Topic 7 (Lecures 1 & 11) Elecric Circuis n his opic, we will cover: 1) Elecromoive Force (EMF) ) Series and parallel resisor combinaions 3) Kirchhoff s rules for circuis 4) Time dependence

More information

Unit Root Time Series. Univariate random walk

Unit Root Time Series. Univariate random walk Uni Roo ime Series Univariae random walk Consider he regression y y where ~ iid N 0, he leas squares esimae of is: ˆ yy y y yy Now wha if = If y y hen le y 0 =0 so ha y j j If ~ iid N 0, hen y ~ N 0, he

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Aricle from Predicive Analyics and Fuurism July 6 Issue An Inroducion o Incremenal Learning By Qiang Wu and Dave Snell Machine learning provides useful ools for predicive analyics The ypical machine learning

More information

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS

LECTURE 1: GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS LECTURE : GENERALIZED RAY KNIGHT THEOREM FOR FINITE MARKOV CHAINS We will work wih a coninuous ime reversible Markov chain X on a finie conneced sae space, wih generaor Lf(x = y q x,yf(y. (Recall ha q

More information

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities:

Math 2142 Exam 1 Review Problems. x 2 + f (0) 3! for the 3rd Taylor polynomial at x = 0. To calculate the various quantities: Mah 4 Eam Review Problems Problem. Calculae he 3rd Taylor polynomial for arcsin a =. Soluion. Le f() = arcsin. For his problem, we use he formula f() + f () + f ()! + f () 3! for he 3rd Taylor polynomial

More information

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux

Guest Lectures for Dr. MacFarlane s EE3350 Part Deux Gues Lecures for Dr. MacFarlane s EE3350 Par Deux Michael Plane Mon., 08-30-2010 Wrie name in corner. Poin ou his is a review, so I will go faser. Remind hem o go lisen o online lecure abou geing an A

More information

Lecture 4: November 13

Lecture 4: November 13 Compuaional Learning Theory Fall Semeser, 2017/18 Lecure 4: November 13 Lecurer: Yishay Mansour Scribe: Guy Dolinsky, Yogev Bar-On, Yuval Lewi 4.1 Fenchel-Conjugae 4.1.1 Moivaion Unil his lecure we saw

More information

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t

R t. C t P t. + u t. C t = αp t + βr t + v t. + β + w t Exercise 7 C P = α + β R P + u C = αp + βr + v (a) (b) C R = α P R + β + w (c) Assumpions abou he disurbances u, v, w : Classical assumions on he disurbance of one of he equaions, eg. on (b): E(v v s P,

More information

Two Coupled Oscillators / Normal Modes

Two Coupled Oscillators / Normal Modes Lecure 3 Phys 3750 Two Coupled Oscillaors / Normal Modes Overview and Moivaion: Today we ake a small, bu significan, sep owards wave moion. We will no ye observe waves, bu his sep is imporan in is own

More information

Christos Papadimitriou & Luca Trevisan November 22, 2016

Christos Papadimitriou & Luca Trevisan November 22, 2016 U.C. Bereley CS170: Algorihms Handou LN-11-22 Chrisos Papadimiriou & Luca Trevisan November 22, 2016 Sreaming algorihms In his lecure and he nex one we sudy memory-efficien algorihms ha process a sream

More information

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN The MIT Press, 2014 Lecure Slides for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydin@boun.edu.r hp://www.cmpe.boun.edu.r/~ehem/i2ml3e CHAPTER 2: SUPERVISED LEARNING Learning a Class

More information

Appendix 14.1 The optimal control problem and its solution using

Appendix 14.1 The optimal control problem and its solution using 1 Appendix 14.1 he opimal conrol problem and is soluion using he maximum principle NOE: Many occurrences of f, x, u, and in his file (in equaions or as whole words in ex) are purposefully in bold in order

More information

Inventory Control of Perishable Items in a Two-Echelon Supply Chain

Inventory Control of Perishable Items in a Two-Echelon Supply Chain Journal of Indusrial Engineering, Universiy of ehran, Special Issue,, PP. 69-77 69 Invenory Conrol of Perishable Iems in a wo-echelon Supply Chain Fariborz Jolai *, Elmira Gheisariha and Farnaz Nojavan

More information

KINEMATICS IN ONE DIMENSION

KINEMATICS IN ONE DIMENSION KINEMATICS IN ONE DIMENSION PREVIEW Kinemaics is he sudy of how hings move how far (disance and displacemen), how fas (speed and velociy), and how fas ha how fas changes (acceleraion). We say ha an objec

More information

2. Nonlinear Conservation Law Equations

2. Nonlinear Conservation Law Equations . Nonlinear Conservaion Law Equaions One of he clear lessons learned over recen years in sudying nonlinear parial differenial equaions is ha i is generally no wise o ry o aack a general class of nonlinear

More information

Random Walk with Anti-Correlated Steps

Random Walk with Anti-Correlated Steps Random Walk wih Ani-Correlaed Seps John Noga Dirk Wagner 2 Absrac We conjecure he expeced value of random walks wih ani-correlaed seps o be exacly. We suppor his conjecure wih 2 plausibiliy argumens and

More information

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing

Application of a Stochastic-Fuzzy Approach to Modeling Optimal Discrete Time Dynamical Systems by Using Large Scale Data Processing Applicaion of a Sochasic-Fuzzy Approach o Modeling Opimal Discree Time Dynamical Sysems by Using Large Scale Daa Processing AA WALASZE-BABISZEWSA Deparmen of Compuer Engineering Opole Universiy of Technology

More information

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes

2.7. Some common engineering functions. Introduction. Prerequisites. Learning Outcomes Some common engineering funcions 2.7 Inroducion This secion provides a caalogue of some common funcions ofen used in Science and Engineering. These include polynomials, raional funcions, he modulus funcion

More information

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x

WEEK-3 Recitation PHYS 131. of the projectile s velocity remains constant throughout the motion, since the acceleration a x WEEK-3 Reciaion PHYS 131 Ch. 3: FOC 1, 3, 4, 6, 14. Problems 9, 37, 41 & 71 and Ch. 4: FOC 1, 3, 5, 8. Problems 3, 5 & 16. Feb 8, 018 Ch. 3: FOC 1, 3, 4, 6, 14. 1. (a) The horizonal componen of he projecile

More information

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation:

Hamilton- J acobi Equation: Weak S olution We continue the study of the Hamilton-Jacobi equation: M ah 5 7 Fall 9 L ecure O c. 4, 9 ) Hamilon- J acobi Equaion: Weak S oluion We coninue he sudy of he Hamilon-Jacobi equaion: We have shown ha u + H D u) = R n, ) ; u = g R n { = }. ). In general we canno

More information

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013

IMPLICIT AND INVERSE FUNCTION THEOREMS PAUL SCHRIMPF 1 OCTOBER 25, 2013 IMPLICI AND INVERSE FUNCION HEOREMS PAUL SCHRIMPF 1 OCOBER 25, 213 UNIVERSIY OF BRIISH COLUMBIA ECONOMICS 526 We have exensively sudied how o solve sysems of linear equaions. We know how o check wheher

More information

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H.

ACE 562 Fall Lecture 8: The Simple Linear Regression Model: R 2, Reporting the Results and Prediction. by Professor Scott H. ACE 56 Fall 5 Lecure 8: The Simple Linear Regression Model: R, Reporing he Resuls and Predicion by Professor Sco H. Irwin Required Readings: Griffihs, Hill and Judge. "Explaining Variaion in he Dependen

More information

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data

Chapter 2. Models, Censoring, and Likelihood for Failure-Time Data Chaper 2 Models, Censoring, and Likelihood for Failure-Time Daa William Q. Meeker and Luis A. Escobar Iowa Sae Universiy and Louisiana Sae Universiy Copyrigh 1998-2008 W. Q. Meeker and L. A. Escobar. Based

More information

Math 10B: Mock Mid II. April 13, 2016

Math 10B: Mock Mid II. April 13, 2016 Name: Soluions Mah 10B: Mock Mid II April 13, 016 1. ( poins) Sae, wih jusificaion, wheher he following saemens are rue or false. (a) If a 3 3 marix A saisfies A 3 A = 0, hen i canno be inverible. True.

More information

Economics 8105 Macroeconomic Theory Recitation 6

Economics 8105 Macroeconomic Theory Recitation 6 Economics 8105 Macroeconomic Theory Reciaion 6 Conor Ryan Ocober 11h, 2016 Ouline: Opimal Taxaion wih Governmen Invesmen 1 Governmen Expendiure in Producion In hese noes we will examine a model in which

More information

Stable Scheduling Policies for Maximizing Throughput in Generalized Constrained Queueing Systems

Stable Scheduling Policies for Maximizing Throughput in Generalized Constrained Queueing Systems 1 Sable Scheduling Policies for Maximizing Throughpu in Generalized Consrained Queueing Sysems Prasanna Chaporar, Suden Member, IEEE, Saswai Sarar, Member, IEEE Absrac We consider a class of queueing newors

More information

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales

The Asymptotic Behavior of Nonoscillatory Solutions of Some Nonlinear Dynamic Equations on Time Scales Advances in Dynamical Sysems and Applicaions. ISSN 0973-5321 Volume 1 Number 1 (2006, pp. 103 112 c Research India Publicaions hp://www.ripublicaion.com/adsa.hm The Asympoic Behavior of Nonoscillaory Soluions

More information

A New Perturbative Approach in Nonlinear Singularity Analysis

A New Perturbative Approach in Nonlinear Singularity Analysis Journal of Mahemaics and Saisics 7 (: 49-54, ISSN 549-644 Science Publicaions A New Perurbaive Approach in Nonlinear Singulariy Analysis Ta-Leung Yee Deparmen of Mahemaics and Informaion Technology, The

More information

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3

d 1 = c 1 b 2 - b 1 c 2 d 2 = c 1 b 3 - b 1 c 3 and d = c b - b c c d = c b - b c c This process is coninued unil he nh row has been compleed. The complee array of coefficiens is riangular. Noe ha in developing he array an enire row may be divided or

More information

STATE-SPACE MODELLING. A mass balance across the tank gives:

STATE-SPACE MODELLING. A mass balance across the tank gives: B. Lennox and N.F. Thornhill, 9, Sae Space Modelling, IChemE Process Managemen and Conrol Subjec Group Newsleer STE-SPACE MODELLING Inroducion: Over he pas decade or so here has been an ever increasing

More information

A Dynamic Model of Economic Fluctuations

A Dynamic Model of Economic Fluctuations CHAPTER 15 A Dynamic Model of Economic Flucuaions Modified for ECON 2204 by Bob Murphy 2016 Worh Publishers, all righs reserved IN THIS CHAPTER, OU WILL LEARN: how o incorporae dynamics ino he AD-AS model

More information

Robust estimation based on the first- and third-moment restrictions of the power transformation model

Robust estimation based on the first- and third-moment restrictions of the power transformation model h Inernaional Congress on Modelling and Simulaion, Adelaide, Ausralia, 6 December 3 www.mssanz.org.au/modsim3 Robus esimaion based on he firs- and hird-momen resricions of he power ransformaion Nawaa,

More information

non -negative cone Population dynamics motivates the study of linear models whose coefficient matrices are non-negative or positive.

non -negative cone Population dynamics motivates the study of linear models whose coefficient matrices are non-negative or positive. LECTURE 3 Linear/Nonnegaive Marix Models x ( = Px ( A= m m marix, x= m vecor Linear sysems of difference equaions arise in several difference conexs: Linear approximaions (linearizaion Perurbaion analysis

More information

Online Learning of Power Allocation Policies in Energy Harvesting Communications

Online Learning of Power Allocation Policies in Energy Harvesting Communications Online Learning of Power Allocaion Policies in Energy Harvesing Communicaions Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Deparmen of Elecrical Engineering Vierbi School of Engineering Universiy

More information

Ordinary Differential Equations

Ordinary Differential Equations Ordinary Differenial Equaions 5. Examples of linear differenial equaions and heir applicaions We consider some examples of sysems of linear differenial equaions wih consan coefficiens y = a y +... + a

More information