Power Constrained DTNs: Risk MDP-LP Approach

Size: px

Start display at page:

Download "Power Constrained DTNs: Risk MDP-LP Approach"

Gregory Shaw
5 years ago
Views:

1 Power Constrined DTNs: Risk MDP-LP Approch Atul Kumr IEOR, IIT Bomby, Indi Veerrun Kvith IEOR, IIT Bomby, Indi N Hemchndr nh@iitb.c.in, IEOR, IIT Bomby, Indi. Abstrct Dely Tolernt Networks (DTNs) hve gined importnce in the recent pst, s cost-effective lterntive, in scenrios where delys cn be ccommodted. They work well in discretely connected network, where there is no direct connectivity between some/ll components of the system. But the mobility of nodes cretes occsionl contct opportunities. The rndomly moving nodes cooperte to help fixed source, in delivering messge to fr wy destintion within the given time threshold. The objective is to optimize the delivery success probbility, which turns out to be risk sensitive cost. The success probbility depends upon the contct rtes, which in turn depend upon the power used by the nodes to remin visible. The more the power used by node, the lrger is the rdius for which it is visible. However these nodes re power constrined. This leds to constrined finite horizon, Risk sensitive Mrkov Decision Process (MDP). In this pper we propose liner progrm (LP) bsed pproch to solve the corresponding dynmic progrmming equtions. This pproch enbles us in hndling the constrints. We showed using numericl simultions tht, given hrd power constrint, the solution of the constrined MDP performs significntly superior in comprison with solution obtined by optimizing joint cost. DTNs, Power lloction, Risk sensitive cost, Liner progrms, Dynmic progrmming. I. INTRODUCTION We consider lrge re with N ctive nd freely moving mobiles s in [1, wherein connectivity between different devices is vilble only occsionlly. The im is to trnsfer messge from sttic source to fixed destintion within the prescribed dedline, using the occsionl contcts between the vrious moving elements. A source trnsfers the messge to ny mobile tht comes in contct with it nd the messge is delivered to the destintion if ny one of the mobiles with messge come in contct with it. One rely mobile cnnot trnsfer the messge to nother nd this is clled the two-hop protocol (e.g., [2). And these networks re clled Dely Tolernt Networks (DTNs). Alterntively DTNs cn operte using full epidemics, i.e, the relys cn trnsfer the messge to ny other rely (see for exmple [5). The messge trnsmission performnce with full epidemics is much better; however, in terms of the power consumed it is inferior. Further there cn be flooding of messges cross the network. DTNs re operted in vrious configurtions nd using different protocols nd there is vst literture nlyzing these networks (e.g., [1, [2, [5, etc, nd references therein). One of the common techniques to nlyze these networks is using men-field dynmics (e.g., [5). This pproch is vlid in scenrios with lrge popultion. There re lso ppers tht consider the rndom system dynmics which is more ccurte with limited popultion (e.g., [1). We lso consider the rndom dynmics. An element interested in mking contct trnsmits becons (short pulses) regulrly nd contct is estblished with mobile if the lter receives one such becon. The rnge of visibility is proportionl to the power trnsmitted. Thus the more the power used, the better re contct opportunities nd the better is the probbility of successful messge trnsmission. However, the mobiles re power constrined nd the min im of this pper is to mximize the probbility of successful messge delivery, under the given power constrints. The contct process is modeled by Poisson process ([2), hence the probbility of success or equivlently probbility of delivery filure includes terms composed of powers of exponent, resulting in Risk sensitive Mrkov Decision Process (MDP) cost. Previously in DTN relted literture, such costs re hndled by exchnging the expected vlue nd the exponent using Jensen s inequlity nd the solution is obtined by optimizing bound on the objective function (e.g., [3 etc.,) Recently in [1 uthors solved the direct problem, using risk MDP pproch. However, they solved the problem using soft constrints: joint cost composed of successful delivery probbility nd the power trnsmitted is proposed, nd is optimized. While in this pper we consider the control problem with hrd constrints on the power. In [4 we showed tht the solution of risk MDP problem cn be obtined by solving corresponding Liner Progrm (LP). We obtined the solution to the power constrined DTNs by solving the LPs provided in the technicl report [4. When one is interested in hrd power control problem, the solution obtined using joint cost of [1 is obviously inferior to our direct solution of the constrined problem. More interestingly we noticed huge improvement in the performnce, becuse rndomized policy optimizes the hrd problem while the soft problem (joint cost problem) is optimized by pure policy. Thus our newly proposed LP bsed solutions re very useful in the context of hrd power constrints. II. SYSTEM MODEL AND PROBLEM DEFINITION A sttic source hs to trnsfer messge to sttic destintion within the given dedline T, nd they re sufficiently fr wy to hve ny direct communiction. The re surrounding the two hs N coopertive nd moving nodes (mobiles) tht ssist the source to deliver the messge. The source cn trnsmit only to those mobiles tht rrive in its rnge of /15/ 2015 IFIP 154

2 trnsmission. Similrly destintion cn receive informtion from only those mobiles, tht rrive within the rnge of trnsmission. And the rnge depends upon the power used for trnsmission. We sy contct occurred whenever mobile comes in the communiction rnge of the source/destintion. In lrge res with smll trnsmission rnge, the contcts re rre. In such scenrios, the contct process cn be modeled by Poisson process [2, for vriety of mobility models like rndom wlk, rndom wypoint, etc. We ssume tht the contct time is sufficient to trnsfer the messge. The source trnsfers the messge to the contcted mobile. We refer these mobiles s infected mobiles. If there is contct between the destintion nd n infected mobile within dedline T, then the messge delivery is ccomplished. Otherwise, delivery is filed. The probbility of successful delivery depends upon the power used by the source. The source derives power from bttery nd hence is power constrined. In fct, the source is provided with fixed mount of power nd it hs to ccomplish its gol utilizing the vilble power leding to hrd power constrint. The source spends power for two purposes: 1) for trnsmitting becons, to show its presence; 2) to trnsfer messge to the contcted mobiles. The power spent per trnsmitting messge could be significntly lrger thn the power spent for beconing. However beconing needs to be done t regulr instnces while the contcts re rre (in time frme of few minutes one cn t mximum mke one contct), mking the second component negligible. If the source trnsmits becons with higher power, the contct rnge increses, which further increses the contct opportunities. However the power is consumed within shorter time. On the other hnd if it trnsmits becons with lower power, it cn remin ctive for longer period, but with smller contct rnge. Thus there is n inherent trdeoff between remining ctive for longer durtion nd remining ctive with lrger contct rnge. Mobiles contcted during the erlier stges hve better chnces of delivery. Hence it might be dvntgeous to consider vrying powers for trnsmitting becons, cross the entire delivery durtion. A. Resource lloction policy We consider time slotted system, nd becons re trnsmitted with constnt power in one time slot. Without loss of generlity we ssume unit time slots. A policy represents the decisions of power levels trnsmitted in ech time slot nd the im of our pper is to obtin power policy which mximizes the probbility of successful delivery or equivlently minimizes the probbility of delivery filure. The rte of source-mobile contct Poisson process is represented by λ while tht of the destintion-mobile is given by ν. In [1, it is shown tht the contct rtes re proportionl to the power used nd hence we consider n equivlent policy in terms of (source) contct rtes. The system hs M different choices of trnsmit powers tht cn be used in ny time slot nd let Λ = {λ 0,..., λ M } represent the corresponding set of ll possible source contct rtes. Let Y t represent the contct rte chosen in time slot t. Vector Π = {π 1,, π T 1 } represents rndomized policy, where π t for ech t, is probbility distribution over Λ: π t (λ) = P rob(y t = λ) for ny λ Λ. B. Probbility of filure given policy The probbility of filure P f (Π) for given policy Π is derived in [1 nd we briefly summrize the sme here. Let X t be the number of mobiles infected t the beginning of time slot t. The sequence X t is controlled Mrkov chin, controlled by policy Π. The trnsition probbility mtrix of this controlled Mrkov chin is given by ([1): Ps λ 2 (N s 1 ), if s 1 + s 2 N p(s 1 + s 2 s 1, λ) = 0, else ( ) r with Ps λ 2 (r) := (1 e λ ) s2 e λ(r s2). s 2 Bsiclly the number infected increses by s 2 if ny s 2 mong the non-infected (N s 1 ) mobiles contct the source nd the bove is the probbility of precisely this event. A filure event occurs, when none of the X t infected mobiles contct the destintion in time slot t nd if this is true for ll the time slots. Probbility of filure is clculted by conditioning on Mrkov chin trjectory {X t } t T nd is given by (see [1 for detils): P f (Π) = E α,π [ e ν t Xt. (1) In the bove E α,π represents the expecttion under policy Π nd when the initil condition X 0 is distributed ccording to α, written s X 0 α. Here P rob(x 0 = s) = α(s). C. Totl power spent given policy The contct rte λ is proportionl to p β, where p is the trnsmitted power nd β is constnt depending upon propgtion chrcteristics of the re in which the mobiles re operting (Appendix of [1). In other words if one chooses rte λ, the power trnsmitted is proportionl to λ β. Without loss of generlity, let the constnt of proportionlity be one. Thus the totl (rndom) power spent over the T time slots equls: D. Power control problem T 1 P(Π) = Y β t. (2) t=0 The problem is to minimize the probbility of filure P f, given hrd constrint B on the verge totl power, E α,π [P, spent by the source: [ min Π Eα,Π e ν T t=0 Xt [ T 1 such tht E α,π t=0 Y β t B. (3) 155

3 Alterntively, in [1 the uthors consider joint cost depending both upon the probbility of filure P f nd e hp, term proportionl to the power [ spent: min Π Eα,Π e ν T t=0 Xt+h T 1 t=0 Y β t. (4) In the bove, h defines the weight fctor given for totl power term in the joint cost. We refer this s soft constrint (SC) problem, becuse this does not gurntee to operte within given hrd bound on the power spent. We showed in the technicl report ([4) tht the solution of constrined risk MDP problem cn be derived using the solution of n pproprite Liner Progrm (19) given in the Appendix. Thus we cn directly obtin the solution to the hrd constrint (HC) problem. The detils re given below in section III. A direct solution would obviously perform better, we compre the two solutions in section IV to determine the percentge of improvement obtined. III. LP BASED APPROACH The soft constrint (SC) problem (4) cn be cst s risk sensitive MDP problem of Appendix. The joint cost in (4), except for the logrithmic function, is similr to the risk sensitive cost J(α, Π) given s eqution (9) of Appendix, with running nd terminl costs given by rt SC (s, λ) = νs + hλ β nd rt SC (s) = νs. (5) By monotonicity, the optimiztion of cost is equivlent to optimizing the logrithm of the sme cost. The corresponding dynmic progrmming (DP) equtions of the SC problem re given by (11) fter substituting the running nd terminl costs ppropritely. One cn obtin the nlysis of the optiml policy by solving the dul LP given by (13) of Appendix. The hrd constrined (HC) problem (3) is similr to the constrined risk MDP problem (17) of Appendix. The corresponding running nd terminl costs re rt HC (s, λ) = νs nd rt HC (s) = νs, (6) while the constrint function f HC t (s, λ) = λ β for ll t. (7) Its optiml policy cn be obtined by solving Dul LP given by (19). IV. NUMERICAL ANALYSIS In [1, Lemm 2, uthors obtined structurl properties of the policy, tht optimizes the SC problem. Lemm 2 estblishes the existence of switch off threshold s off on the number infected. The optiml policy switches off (zero contct rte) the becon trnsmission, once the number infected reches the threshold. It lso showed tht the contct rte chosen below the threshold is lwys non-zero. However this sttement needs smll correction 1. For every s < s off, there exists threshold Ts (depending upon s) such tht λ t (s) λ 1 for ll t < T s nd λ t (s) = 0 for ll t T s. Thus the difference is tht, beyond s off it is lwys OFF s in [1. However below switch off popultion threshold, the ctul switch off threshold depends upon the number infected s. A. Verifiction nd comprison of SC, HC problems We begin with the verifiction of our solution to SC problem. We obtin the required solution by solving the LP (13) with the running nd terminl costs s given by (5). For simultions, we used Mtlb nd AMPL. We did most of the coding in Mtlb except for LP prt. We used AMPL to model the LP nd Gurobi solver to solve the LP. The solution x of the LP provides the optiml policy Π x s given by eqution (15) of Appendix. We then verify tht the solution stisfies the Lemm 2 of [1, fter the correction. We consider n exmple with N = 15, S = {0, 1,... 15}, T = 20, h = 20, ν = 0.1, β = 2.1 nd λ = {0, 0.1,..., 0.3}. For this exmple, the s off = s given by [1, Lemm 2. The simultion results re following the structure given by the [1, Lemm 2, s seen from the Tble I. For exmple for ll s s off, Ts = 0 nd for others it is non-zero. We hve conducted few more exmples nd verified the sme. In similr wy we obtin the solution for HC problem, now solving LP (19) with running nd terminl costs given by (6). We lso consider the constrint given by (7). We consider the following procedure for verifiction. We first solve SC problem for some vlue of weight fctor h. We compute the totl power spent by the SC problem using gin the dditionl stte component Ψ of the Appendix. Tht is, we solve SC problem lso using LP (19) with ft SC 0 for ll t. We compute the totl verge power PSC spent by the system under SC optiml policy Π SC, once gin using the eqution (18) with f t ( t ) = β t nd x = x Π SC. We then obtin the solution of HC problem with bound B set to PSC. Note here tht this procedure is only used for computing the power utilized under the lredy computed optiml policy Π SC, nd not for the purpose of constrined optimiztion. With this procedure we noticed tht both the policies consume sme power, i.e., PSC = P HC. But there is good improvement in the performnce with HC policy (see Tbles II-III). In the limited exmples tht we conducted, we observed n improvement s high s 26%. In ll these exmples we set M = 1, resulting in ON-OFF control. Thus, when the two problems obtin optiml policies with the sme power constrint, the HC solution performs superior, 1 In [1, pge 9 in the proof of Lemm2, the line fter the sentence strting with When n < n off, Q nλ β 1 < λ 1... need not be true lwys. There cn be scenrios in which f T 1 (0) < f T 1 (λ 1 ). However the lines fter tht re correct. Hence for ny n < n off, if there exists t + 1 such tht λ t+1 (n) λ 1 then for ll τ t λ τ (n) λ 1. Thus we hve the bove modifiction with Tn = t, the first t for which λ t+1 (n) λ

4 Sttes (s) Threshold time (Ts ) TABLE I VERIFICATION OF SC POLICY USING [1, LEMMA 2 T, N h ν β λ 1 P Pf S Pf H % Improvement 5, , , , TABLE II IMPROVEMENT IN P f : HC VERSUS SC, WITH EQUALLY LIKELY INITIAL CONDITIONS, α(s) = 1/( S ) FOR s S. T, N h ν β λ 1 P Pf S Pf H % Improvement 5, , , , TABLE III IMPROVEMENT IN P f : HC VERSUS SC STARTING WITH ZERO INFECTED MOBILES, α(s) = 1 {s=0} FOR s S. obviously becuse it directly solves the constrined problem. However the more interesting observtion is tht the improvement gined cn be significnt. We would now like to look t the comprison problem with different perspective. Sy we re given ny rbitrry totl verge power constrint B. Requirement is n optiml policy tht opertes within this power constrint nd which minimizes the filure probbility P f, precisely the HC problem. But if one pproches this vi SC problem, then one needs to solve the SC problem for vrious vlues of weight fctors h to obtin vrious {Pf SC (h)} h nd the corresponding totl verge powers {PSC (h)} h. Consider only those h for which totl verge power is less thn given threshold, i.e., PSC (h) B. Among these chose the best filure probbility Pf SC (h ) s the solution. Tht is, one needs to continue the serch mong SC policies, until they hit upon tht vlue of h for which the totl verge power is the mximum possible one, which is still below the given limit B. The SC solutions re known to be pure policies: π t =1 or 0, for ll t. We lso observe this to be the cse in simultions. With pure policies, the vrious choices of totl verge power would be discrete. One cn hve vrious SC solutions by considering vrious vlues of weight fctors h. However the set of ll possible totl verge powers obtined even fter exhusting the entire rnge of h, would be finite. On the other hnd HC solution is rndomized policy nd chieves the bound B with equlity, s long s it is fesible. Thus the improvement seen by directly using our LP bsed HC solution would be much more significnt thn tht demonstrted in Tbles II nd III. This effect is shown in Figure 1. This figure plots best P f performnce versus power constrint B, under both HC nd SC policies. The curve with dotted mrks, represents the best performnce fcilitted by SC solution, s function of the power constrint B, obtined by trying ll possible vlues of h. As seen from the figure, the Pf SC performnce remins constnt over rnge of power constrints B, confirming our erlier discussions. This is minly becuse the optiml policies of SC problem re lwys pure. The other curve in Figure 1 represents the P f performnce under HC policy s function of power constrint B. The entries of the Tbles II nd III correspond to the SC nd HC pir of points, where SC points re precisely the corner points of the SC curve tht re ner the HC curve. These entries lredy showed n improvement (up to 26%), nd we hve much lrger gins in the performnce t the other points (see the horizontl portions of the SC curve in Figure1). Fig. 1. P f performnce s function of power bound B B. Structurl properties We noticed from vrious exmples of the simultions tht the SC policies re ll pure policies while the HC policies re rndomized. Further, for ny given time slot t the policy suggests complete switch ON for ll sttes less thn threshold s t, rndomized switch ON-OFF t the threshold stte s = 157

5 s t nd complete switch OFF for ll sttes, s > s t. This threshold depends upon the time slot nd of course, the power constrint B. ACKNOWLEDGEMENTS: The work towrds the risk sensitive MDP problem originted with Prof. Eitn Altmn s remrk bout finding the connections between LPs nd risk sensitive MDP problems. V. CONCLUSIONS We considered power constrined Dely Tolernt Networks. We obtined optiml policies for this problem, vi the solution of n pproprite LP, fter modeling it s constrined risk sensitive MDP. The equivlence of the two is provided in the technicl report. Previously joint cost comprising of probbility of delivery filure nd term proportionl to totl power spent is considered. While in this pper we directly solve the constrined optimiztion problem. We compred the probbility of filure performnce of the DTNs under the policy so obtined, with tht of the optiml policy obtined by considering n unconstrined problem with the joint cost. We observed huge improvement. This improvement is becuse of two fctors. When the requirement is to operte optimlly within given power constrint our solution provides the optiml solution while the solution of the unconstrined problem with joint cost would be sub-optiml. Secondly nd more importntly, the optimiztion of the unconstrined risk sensitive cost results in pure policies which provide only finite choices of totl verge power spent. While our proposed constrined risk MDP solution results in optiml policies tht re rndomized nd hence provide solution, which stisfies with equlity the constrint defining the power constrint. This is true s long s the power constrint is chievble. Thus our solution performs significntly superior nd is very useful in the scenrios tht demnd for strict power constrints. APPENDIX: FINITE HORIZON RISK SENSITIVE MDP AND LINEAR PROGRAMMING Mrkov Decision Process (MDP) provides tool for solving sequentil decision mking problems in stochstic situtions. A typicl MDP consists of set S of ll possible sttes, set A of ll possible ctions nd n immedite rewrd function : S A R for ech time slot t. The terminl cost r t r T depends only upon s S. The set S nd A cn depend upon the time slot t, however we consider the sme set for ll the time indices. It is further chrcterized by trnsition function p : S A S, which defines the ction dependent stte trnsitions. Here p(s s, ) gives the probbility of the stte trnsition from s to s, when ction is chosen. We consider finite horizon problem nd let {X t } t T, {Y t } t T 1 respectively represent the trjectories of the stte nd the ction. In the lst time slot T, there is no requirement for further ction nd we only hve terminl cost. A policy = (π t, π t+1 π T 1 ) is sequence of stte dependent nd possibly rndomized ctions, given for time slots between t nd T 1. Given policy Π t nd initil condition X t = s, Π t the stte nd ction pir evolve rndomly over the time slot t < n < T, with trnsitions s governed by the following lws: q Πt (s, s, ) = P (X n = s, Y n = X n 1 = s, Y n 1 = ) = π n (s, )p(s s, ) where p(s s, ) = P (X n = s X n 1 = s, Y n 1 = ) nd π n (s, ) = P (Y n = S n = ). (8) Let E s,πt represent the expecttion opertor with initil condition X t = s nd when the policy Π t is used. Let E α,πt represent the sme expecttion opertor when the initil condition is distributed ccording to α, written s X t α. Here α(s) = P (X t = s). We re interested in optimizing the following risk sensitive objective: J t(α, Π t ) = γ 1 log (E [ α,πt e γ ) T 1 n=t rn(xn,yn)+r T (X T ). (9) The bove represents the cost to go from time slot t to T under the policy Π t with X t α. The vlue function, function of (s, t), is defined s the optiml vlue of the bove risk sensitive objective given the initil condition X t = s: V t (s) := min J t (s, Π t ) for ny s S. (10) Π t We re interested in the optiml policy Π 0 = Π (we discrd 0 in superscript when it strts from 0) tht optimizes the risk cost J 0 (s, Π 0 ), or equivlently policy tht chieves the vlue function, i.e., Π such tht V 0 (s) = J 0 (s, Π ) for ll s S. Dynmic progrmming (DP) is well known technique, tht provides solution to such control problems, nd DP equtions re given by bckwrd induction s below ([6): V T (s) = r T (s), nd for ny 0 t T 1, nd s S { [ } V t(s) = min A r t(s, ) + γ 1 log p(s s, )e γv t+1(s ). We consider the following trnsltion of the vlue function: u t (s) = e γvt(s) for ll 0 t T 1, nd s S. The DP equtions cn now be rewritten s: u t (s) = e γr T (s) nd for ny 0 t T 1, nd s S { } u t (s) = min Liner Progrmming Formultion e γrt(s,) [ p(s s, )u t+1 (s ). (11) The dynmic progrmming bsed pproch suffers from the curse of dimension. As we increse the number of sttes nd/or time epochs, the complexity of the problem increses significntly. This results in limited pplicbility of dynmic progrmming. In the context of liner MDPs, it is well known fct tht DP problem cn be reformulted s Liner Progrm (LP), under considerble generlity (see for e.g., [7, [8 in the context of infinite horizon problems). However this conversion my not solve the problem of dimension. But recent improvements in LP solvers mkes it n ttrctive lterntive. 158

6 Further nd more importntly the LP bsed pproch extends esily nd provides solutions for constrined MDPs. In technicl report ([4), we extend the LP bsed ide to finite horizon risk MDPs. In this ppendix we briefly summrize the corresponding results, while the detils nd the proofs re vilble in ([4). We hve shown in ([4) tht the solution of the unconstrined risk MDP problem (10) cn be obtined vi the solution of ny one of the two LPs, priml nd dul. In ll the discussions below, we bsorb γ into the running costs r t (.). The priml LP is given by: mx α(s)u 0 (s) (12) {u t(s)} s S,t T 1} s S subject to: u T 1(s) b s, for ll s,, u t(s) e r t(s,) p(s s, )u t+1(s ) 0 with b s, := e r T 1(s,) p(s s, )e r T (s ). for ll, s nd t T 2 In the bove {α(s); s S} is ny positive set of weights stisfying s S α(s) = 1. These cn be interpreted s the probbility distribution on initil condition. For exmple to solve (10) with t = 0, the problem with initil condition X 0 = s, one needs to set α(s) = 1 nd α(s ) = 0 for ny s s. The solution of the priml gives the trnslted vlue functions {u t (s)} while the optiml policy is directly obtined using the Dul LP: min [ e r T 1(s,) p(s s, )e r T (s ) x(t 1, s, ) (13) s S subject to: x(0, s, ) = α(s ) for ll s S x(t, s, ) = [ e rt 1(s,) p(s s, )x(t 1, s, ) s S for ll 1 t T 1 nd s S. Here gin α represents the probbility distribution on the initil condition. We hve the following results (detils in [4). We discrd the nottion superscript 0 for risk policies in the following. The bold letters represent the vectors, e.g., x = {x(t, s, )} t,s, represents fesibility vector of Dul LP (13). While s n k represents the vector s n k = [s k,, s n. Theorem 1: The following results connecting the Dul LP (13) nd the trnslted risk MDP (11) re true. 1) Fesible region nd the set of risk Policies: There is one to one correspondence between the two s below: i) For ny policy Π of risk MPD, there exists vector x Π which stisfies ll the constrints of Dul LP (13). The fesible vector is given by the eqution (see (8)): x Π(0, s 0, 0) = α(s 0)π 0(s 0, 0) for ll s 0 S, 0 A, x Π(t, s t, t) = α(s t 1 0)e n=0 rn(sn,n) Π t n=0 q Π (s n, n s n 1, n 1) t 1 0,s t 1 0 for ll s t S, t A, nd 1 t < T. (14) ii) Given vector x in the fesibility region of Dul LP, define policy Π x using the following rule: π x,t (s, ) := x(t, s, ) x(t, s, ) for ll s S, nd A. (15) The vector x Πx defined by eqution (14) of point (i) is gin in the fesibility region nd equls x. 2) Optiml policies nd solutions: () If x is n optiml solution of the Dul LP, then Π x defined by (15) is n optiml policy for risk MDP. 3) Expecttion t optiml Policy: For ny fesible point x of Dul LP nd for ny integrble function f, s t, t x(t, s t, t )f(s t, t ) Constrined risk MDP = E Πx [ e t 1 n=0 rn(xn,yn) f(x t, Y t ). (16) We now consider constrined MDP problem (detils re in [4), with n dditionl constrint s given below: min Π J 0(α, Π) (17) Subject to: t Eα,Π [f t (X t, Y t ) B, for some set of integrble function {f t }, initil distribution α nd bound B. The eqution (16) of Theorem 1 could hve been useful in obtining the expecttion defining the constrint, but for the extr fctor Ψ 1 t with Ψ t := e t 1 n=0 rn(xn,yn), s seen from the right hnd side of the eqution (16). We propose to dd Ψ t s dditionl stte component to the originl Mrkov process {X t } to tckle this problem. We consider two component stte evolution {(X t, Ψ t )} nd the corresponding probbility trnsition mtrix depends explicitly upon time index s below: p t+1 (s, ψ t+1 s, ψ t, ) = 1 {ψ t+1 =ψ te r t (s,) }p(s s, ). With the introduction of the new stte component, for ny Dul LP fesible point x we hve: s t,ψ t, t x(t, s t, ψ t, t) ψ t f(s t, t) = E Πx [f(x t, Y t). (18) 159

7 Thus one cn obtin optiml policy of constrined risk MDP (17) by considering n dditionl stte component nd by dding n extr constrint to the Dul LP (13) s below: min [ e r T 1(s,) p(s s, )e r T (s ) x(t 1, s, ) (19) s subject to: x(t, s, ) = ψ t x(t, s, ψ t, ) x(0, s, ψ 0, ) = α(s)1 {ψ0 =1} for ll s, ψ 0 x(t, s, ψ t, ) = t s,ψ t,,s,ψ t 1 e rt 1(s,) p(s, ψ t s, ψ t 1, )x(t 1, s, ψ t 1, ) x(t, s, ψ t, )ψ tf t(s, ) B. for ll 1 t T 1 nd s, ψ t nd We would like to mention here tht ψ 0 is lwys initilized to one, i.e., ψ 0 = 1, ψ 1 cn tke t mximum S A vlues while ψ t for ny t cn tke t mximum S t A t possible vlues. There will lso be considerble deletions if the concerned mpping ( t 0, s t 0) e t n=0 rn(sn,n) is not one-one. One needs to consider this time dependent stte spce while solving the Dul LP given bove nd we omit the discussion of these obvious detils. REFERENCES [1 E. Altmn, V. Kvith, F. De Pellegrini, V. Kmble, nd V. Borkr, Risk sensitive optiml control frmework pplied to dely tolernt networks, in INFOCOM, 2011 Proceedings IEEE. IEEE, 2011, pp [2 R. Groenevelt, P. Nin, nd G. Koole, Messge dely in mnet, in ACM SIGMETRICS Performnce Evlution Review, vol. 33, no. 1. ACM, 2005, pp [3 E. Altmn, T. Bşr, nd F. De Pellegrini, Optiml monotone forwrding policies in dely tolernt mobile d-hoc networks, Performnce Evlution, vol. 67, no. 4, pp , [4 Atul Kumr, Veerrun Kvith nd N. Hemchndr, Finite horizon risk sensitive MDP nd liner progrmming, Mnuscript under preprtion. Technicl report vilble t [5 Khouzni, M. H. R., Eshghi, S., Srkr, S., Shroff, N. B., nd Venktesh, S. S. (2012, June). Optiml energy-wre epidemic routing in DTNs, In Proceedings of the thirteenth ACM interntionl symposium on Mobile Ad Hoc Networking nd Computing (pp ). ACM. [6 S. P. Corluppi nd S. I. Mrcus, Risk-sensitive queueing, in Proceedings of the Annul Allerton Conference on Communiction Control nd Computing, vol. 35. Citeseer, 1997, pp [7 M. L. Putermn, Mrkov decision processes: discrete stochstic dynmic progrmming. John Wiley & Sons, [8 Eitn Altmn, Constrined Mrkov decision processes, volume 7, 1999, CRC Press. 160

Finite Horizon Risk Sensitive MDP and Linear Programming

Finite Horizon Risk Sensitive MDP nd Liner Progrmming Atul Kumr, Veerrun Kvith nd N. Hemchndr IEOR, Indin Institute of Technology Bomby, Indi Abstrct In the context of stndrd Mrkov decision processes (MDPs),