Mobility-Induced Service Migration in Mobile. Micro-Clouds

arxiv:503054v [csdc] 7 Mar 205 Mobility-Induced Service Migration in Mobile Micro-Clouds Shiiang Wang, Rahul Urgaonkar, Ting He, Murtaza Zafer, Kevin Chan, and Kin K LeungTime Oerating after ossible Deartment of Electrical and Electronic Engineering, Imerial College London, at time United t Kingdom IBM T J Watson Research Center, Yorktown Heights, NY, United s Samsung Research America, San Jose, CA, United s -- -- -- -- Army Research Laboratory, Adelhi, MD, United s Email: shiiangwang, kinleung}@imerialacuk, M M+ -2 - rurgaon, the}@usibmcom, murtazazaferus@ieeeorg, kevinschanciv@mailmil -- Abstract Mobile micro-cloud is an emerging technology in distributed comuting, which is aimed at roviding seamless comuting/data access to the edge of the network when a centralized service may suffer from oor connectivity and long latency Different from the traditional cloud, a mobile micro-cloud is smaller and deloyed closer to users, tyically attached to a cellular basestation or wireless network access oint Due to the relatively small coverage area of each basestation or access oint, when a user moves across areas covered by different basestations or access oints which are attached to different micro-clouds, issues of service erformance and service become imortant In this aer, we consider such issues We model the general roblem as a Markov decision rocess (MDP), and show that, in the secial case where the mobile user follows a one-dimensional asymmetric random walk mobility model, the otimal olicy for service is a threshold olicy We obtain the analytical solution for the cost resulting from arbitrary thresholds, and then roose an algorithm for finding the otimal thresholds The roosed algorithm is more efficient than standard mechanisms for solving MDPs Index Terms Cloud comuting, Markov decision rocess (MDP), mobile micro-cloud, mobility, service, wireless networks I INTRODUCTION Cloud technologies have been develoing successfully in the ast decade, which enable the centralization of comuting and data resources so that they can be accessed in an on-demand basis by different end users Traditionally, clouds are centralized, in the sense that services are rovided by large data-centers that may be located far away from the user A user may suffer from oor connectivity and long latency when it connects to such a centralized service In recent years, efforts have been made to distribute the cloud closer to users, to rovide faster access and higher reliability to end users in a articular geograhical area A notable concet in this regard is the mobile micro-cloud, where a small cloud consisting of a small set of servers is Contributions of the author to this work are not related to his current emloyment at Samsung Research America 204 IEEE Personal use of this material is ermitted Permission from IEEE must be obtained for all other uses, in any current or future media, including rerinting/reublishing this material for advertising or romotional uroses, creating new collective works, for resale or redistribution to servers or lists, or reuse of any coyrighted comonent of this work in other works Area Figure t t+ Backhaul network 2 Area 2 -- -- -- 0 2 N- N i Area i Alication scenario with mobile micro-cloud attached directly to the wireless communication infrastructure (eg, a cellular basestation or wireless access oint) to rovide service to users within its coverage Alications of the mobile micro-cloud include data and comutation offloading for mobile devices [], [2], which is a comlement for the relatively low comutational and data storage caacity of mobile users It is also beneficial for scenarios reuiring high robustness or high data-rocessing caability closer to the user, such as in hostile environments [3] or for vehicular networks [4] There are a few other concets which are similar to that of the mobile microcloud, including edge comuting [5], Cloudlet [3], and Follow Me Cloud [6] We use the term mobile micro-cloud throughout this aer A significant issue in the mobile micro-cloud is service caused by the mobility of users Because different micro-clouds are attached to different basestations or access oints, a decision needs to be made on whether and where to migrate the service, when a user moves outside the service area of a micro-cloud that is roviding its service Consider the scenario as shown in Fig, which resembles the case where a micro-cloud is connected to a basestation that covers a articular area, and these micro-clouds are also interconnected with each other via a backhaul network When a mobile user moves from one area to another area, we can either continue to run the service on the micro-cloud for the revious area, and transmit data to/from the user via the backhaul network, or we can migrate the service to the micro-cloud resonsible for the new area In both cases, a cost is incurred; there is a data transmission cost for the first case, and a cost for the second case

In the literature, only a few aers have studied the imact of mobility and its relationshi to service for mobile micro-clouds In [7], analytical results on various erformance factors of the mobile micro-cloud are studied, by assuming a symmetric 2-dimensional (2-D) random walk mobility model A service rocedure based on Markov decision rocess (MDP) for -D random walk is studied in [8] It mainly focuses on formulating the roblem with MDP, which can then be solved with standard techniues for solving MDPs In this aer, similarly to [8], we consider an MDP formulation of the roblem In contrast to [8], we roose an otimal threshold olicy to solve for the otimal action of the MDP, which is more efficient than standard solution techniues A threshold olicy means that we always migrate the service for a user from one micro-cloud to another when the user is in states bounded by a articular set of thresholds, and not migrate otherwise We first rove the existence of an otimal threshold olicy and then roose an algorithm with olynomial time-comlexity for finding the otimal thresholds The analysis in this aer can also hel us gain new insights into the roblem, which set the foundation for more comlicated scenarios for further study in the future The remainder of this aer is organized as follows In Section II, we describe the roblem formulation Section III shows that an otimal threshold olicy exists and rooses an algorithm to obtain the otimal thresholds Simulation results are shown in Section IV Section V draws conclusions II PROBLEM FORMULATION We consider a -D region artitioned into a discrete set of areas, each of which is served by a micro-cloud, as shown in Fig Such a scenario models user mobility on roads, for instance A time-slotted system (Fig 2) is considered, which can be viewed as a samled version of a continuous-time system, where the samling can either be eually saced over time or occur right after a handoff instance Mobile users are assumed to follow a -D asymmetric random walk mobility model In every new timeslot, a node moves with robability (or ) to the area that is on the right (or left) of its revious area, it stays in the same area with robability If the system is samled at handoff instances, then = 0, but we consider the general case with 0 Obviously, this mobility model can be described as a Markov chain We only focus on a single mobile user in our analysis; euivalently, we assume that there is no correlation in the service or mobility among different users The state of the user is defined as the offset between the mobile user location and the location of the micro-cloud running the service at the beginning of a slot, before ossible service, ie, the state in slot t is defined as s t = u t h t, where u t is the location (index of area) of the mobile user, and h t the location of the micro-cloud hosting the service Note that s t can be zero, ositive, or negative At the beginning of each timeslot, the current state is observed, and the decision t Oerating after ossible t+ at time t Figure 2 Timing of the roosed mechanism Time -- -- -- -- -- -- -- on whether to migrate the service is made If is necessary, it haens right after the state, ie, at the beginning of the timeslot We assume that the time taken M M+ -2-0 2 N- N for is negligible comared --with the length of a slot We study whether and where to migrate the service when the mobile user has moved from one area to another The cost in a single timeslot C a (s) is defined as the cost under state s when erforming action a, where a reresents a decision for the service such Backhaul as no network or to a secified micro-cloud The goal is to minimize the discounted sum cost over time Micro- finda olicy πcloud that 2mas each ossible cloud state i s to an Secifically, Micro- under the current state Micro- s 0, we would like tocloud action a = π(s) such that the exected long-term discounted Area Area 2 Area i sum cost is minimized, ie, [ ] V (s 0 ) = min E γ t C π(st) (s t ) π s 0 () t=0 where E [ ] denotes the conditional exectation, and 0 < γ < is a discount factor Because we consider a scenario where all micro-clouds are connected via the backhaul (as shown in Fig ), and the backhaul is regarded as a central entity (which resembles the case for cellular networks, for examle), we consider the following one-timeslot cost function for taking action a in state s in this aer: 0, if no or data transmission, if only data transmission C a (s) =, if only +, if both and data transmission (2) Euation (2) is exlained as follows If the action a under state s causes no or data transmission (eg, if the node and the micro-cloud hosting the service are in the same location, ie, s = 0, and we do not migrate the service to another location), we do not need to communicate via the backhaul network, and the cost is zero A non-zero cost is incurred when the node and the micro-cloud hosting the service are in different locations In this case, if we do not migrate to the current node location at the beginning of the timeslot, the data between the micro-cloud and mobile user need to be transmitted via the backhaul network This data transmission incurs a cost of When we erform, we need resources to suort The cost is assumed to be, ie, the cost C a (s) is normalized by the cost Finally, if both and data transmission occur, in which case we migrate to a location that is different from the current node

location, the total cost is + Lemma Migrating to locations other than the current location of the mobile user is not otimal Proof: We consider an arbitrary trajectory of the mobile user Denote t u as the first timeslot (starting from the current timeslot) that the mobile user is in the location indexed u Assume that the user is currently in location u 0, then the current timeslot is t u0 Case migrating to location u u 0 at t u0 : This incurs a cost of + at timeslot t u0, because t u > t u0 as a node cannot be in two different locations at the same time Define a variable t m [t u0 +, t u ] being the largest timeslot index such that we do not erform further at timeslots within the interval [t u0 +, t m ], which means that either we erform at t m or we have t m = t u Then, we have a cost of at each of the timeslots t [t u0 +, t m ] Case 2 no at t u0 : In this case, the cost at each timeslot t [t u0, t m ] is either (if s t = u t h t 0) or zero (if s t = 0) For the timeslot t m, we construct the following olicy If t m < t u, we migrate to the same location as in Case, which means that the cost at t m cannot be larger than that in Case If t m = t u, we migrate to u, which can increase the cost at t m by at most one unit comared with the cost in Case With the above olicy, the costs at timeslots t > t m in Cases and 2 are the same The cost at t u0 in Case is one unit larger than that in Case 2, and the cost at t m in Case is at most one unit smaller than that in Case 2 Because 0 < γ <, Case 2 brings lower discounted sum cost than Case Therefore, there exists a better olicy than migrating to u u 0 at t u0 This holds for any movement attern of the mobile user, and it ensures that the cost in any timeslot is either 0,, or From Lemma, we only have two candidate actions, which are migrating to the current user location or not migrating This simlifies the action sace to two actions: a action, denoted as a = ; and a no- action, denoted as a = 0 In ractice, there is usually a limit on the maximum allowable distance between the mobile user and the microcloud hosting its service for the service to remain usable We model this limitation by a maximum negative offset M and a maximum ositive offset N (where M < 0, N > 0), such that the service must be migrated (a ) when the node enters state M or N This imlies that, although the node can move in an unbounded sace, the state sace of our MDP for service control is finite The overall transition diagram of the resulting MDP is illustrated in Fig 3 Note that because each state transition is the concatenated effect of (ossible) and node movement, and the states are defined as the offset between node and host location, the next state after taking a action is either, 0, or With the above considerations, the cost function in (2) can be modified to the following: 0, if s = 0 C a (s) =, if s 0, M < s < N, a = 0, if s 0, M s N, a = With the one-timeslot cost defined as in (3), we obtain the following Bellman s euations for the discounted sum cost when resectively taking action a = 0 and a = : γ j= 0jV (j), if s = 0 V (s a = 0) = V (s) = + γ s+ j=s sjv (j), γ (3) if s 0, M<s<N (4) j= V (s a = ) = 0jV (j), if s = 0 + γ j= 0jV (j), if s 0, M s N (5) where ij denotes the (one-ste) transition robability from state i to state j, their secific values are related to arameters and as defined earlier The otimal cost V (s) is minv (s a = 0), V (s a = )}, if M <s<n V (s a = ), III OPTIMAL THRESHOLD POLICY A Existence of Otimal Threshold Policy if s=m or s=n (6) We first show that there exists a threshold olicy which is otimal for the MDP in Fig 3 Proosition There exists a threshold olicy (k, k 2 ), where M < k 0 and 0 k 2 < N, such that when k s k 2, the otimal action for state s is a (s) = 0, and when s < k or s > k 2, a (s) = Proof: It is obvious that different actions for state zero a(0) = 0 and a(0) = are essentially the same, because the mobile user and the micro-cloud hosting its service are in the same location under state zero, either action does not incur cost and we always have C a(0) (0) = 0 Therefore, we can conveniently choose a (0) = 0 In the following, we show that, if it is otimal to migrate at s = k and s = k 2 +, then it is otimal to migrate at all states s with M s k or k 2 + s N We relax the restriction that we always migrate at states M and N for now, and later discuss that the results also hold for the unrelaxed case We only focus on k 2 + s N, because the case for M s k is similar If it is otimal to migrate at s = k 2 +, we have V (k 2 + a = ) γ t = (7) γ t=0 where the right hand-side of (7) is the discounted sum cost of a never-migrate olicy suosing that the user never returns back to state zero when starting from state s = k 2 + This cost is an uer bound of the cost incurred from any ossible statetransition ath without, and cannot bring

at time t -- -- -- -- -- -- -- M M+ -2-0 2 N- N -- Figure 3 MDP model for service The solid lines denote transition under action a = 0 and the dotted lines denote transition under action a = When taking action a = from any state, the next state is s = with robability, s = 0 with robability, or s = with robability higher cost than this because otherwise it contradicts with the resumtion that it is otimal to migrate at s = k 2 + Suose we do not migrate at state Backhaul s where k 2 + network < s N, then we have a (one-timeslot) cost of in each timeslot until the user reaches a state (ie, a state at which we erform ) From (5), we know that V (s a = ) is constant for t Area m 2 s 0 Therefore, any state-transition ath L starting from state s has a discounted sum cost of V L (s ) = t=0 Area 2 γ t + γ tm V (k 2 + a = ) where t m > 0 is a arameter reresenting the first timeslot that the user is in a state after reaching state s (assuming that we reach state s at t = 0), which is deendent on the statetransition ath L We have V L (s ) V (k 2 + a = ) = ( ) γtm ( γ tm) V (k 2 + a = ) γ = ( γ tm) ( ) γ V (k 2 + a = ) 0 where the last ineuality follows from (7) It follows that, for any ossible state-transition ath L, V L (s ) V (k 2 + a = ) Hence, it is always otimal to migrate at state s, which brings cost V (s a = ) = V (k 2 + a = ) The result also holds with the restriction that we always migrate at states M and N, because no matter what thresholds (k, k 2 ) we have for the relaxed roblem, migrating at states M and N always yield a threshold olicy Proosition shows the existence of an otimal threshold olicy The otimal threshold olicy exists for arbitrary values of M, N,, and B Simlifying the Cost Calculation The existence of the otimal threshold olicy allows us simlify the cost calculation, which hels us develo an algorithm that has lower comlexity than standard MDP solution algorithms When the thresholds are given as (k, k 2 ), the value udating function (6) is changed to the following: V (s a = 0), if k s k 2 V (s) = (8) V (s a = ), otherwise From (4) and (5), we know that, for a given olicy with thresholds (k, k 2 ), we only need to comute V (s) with k s k 2 +, because the values of V (s) with s k are identical, and the values of V (s) with s k 2 are also identical Note that we always have k M and k 2 + N, because k > M and k 2 < N as we always i migrate when at states M and N Define Area i v (k,k 2) = [V (k ) V (k ) V (0) V (k 2 ) V (k 2 + )] T c (k,k 2) = [ 0 }} k elements }} k 2 elements ] T 0,k 00 0,k2+ k,k k,0 k,k 2+ P (k =,k 2) 0,k 00 0,k2+ k2,k k2,0 k2,k 2+ 0,k 00 0,k2+ where suerscrit T denotes the transose of the matrix Then, (4) and (5) can be rewritten as (9) (0) () v (k,k 2) = c (k,k 2) + γp (k,k 2) v (k,k 2) (2) The value vector v (k,k 2) can be obtained by ( v (k,k 2) = I γp (k,k 2)) c(k,k 2) (3) ( ) The matrix I γp (k,k 2) is invertible for 0 < γ <, because in this case there exists a uniue solution for v (k,k 2) from (2) Euation (3) can be comuted ( using Gaussian elimination that has a comlexity of O ( M + N) 3) However, noticing that P (k is a sarse matrix (because,k 2) ij = 0 for j i > ), there can exist more efficient algorithms to comute (3) C Algorithm for Finding the Otimal Thresholds To find the otimal thresholds, we can erform a search on values of (k, k 2 ) Further, because an increase/decrease in V (s) for some s increases/decreases each element in the cost vector v due to cost roagation following balance euations

Algorithm Modified olicy iteration algorithm for finding the otimal thresholds : Initialize k 0, k2 0 2: reeat 3: k k, k 2 k2 //record revious thresholds 4: Construct c (k,k2) and P according to (0) and (k,k 2) () 5: Evaluate v (k,k from (3) 2) 6: Extend v (k,k to obtain V (s) for all M s N 2) 7: for i =, 2 do 8: if i = then 9: if + γ j= 0jV (j) < V (k) then 0: dir, loovec [k +, k + 2,, 0] : k k + 2: else 3: dir 0, loovec [k, k 2,, M + ] 4: end if 5: else if i = 2 then 6: if + γ j= 0jV (j) < V (k2) then 7: dir, loovec [k2, k2 2,, 0] 8: k2 k2 9: else 20: dir 0, loovec [k2 +, k2 + 2,, N ] 2: end if 22: end if 23: for k i = each value in loovec do 24: if dir = 0 then 25: if + γ k i+ j=k i k i,jv (j) < V (k i ) then 26: ki k i 27: else if + γ k i+ j=k i k i,jv (j)>v (k i ) then 28: exit for 29: end if 30: else if dir = then 3: if + γ j= 0jV (j) < V (k i ) then 32: ki k i sign(k i ) 33: else if + γ j= 0jV (j) > V (k i ) then 34: exit for 35: end if 36: end if 37: end for 38: end for 39: until k = k and k2 = k 2 40: return k, k2 (4) and (5), we only need to minimize V (s) for a secific state s We roose an algorithm for finding the otimal thresholds, as shown in Algorithm, which is a modified version of the standard olicy iteration mechanism [9, Ch 3] Algorithm is exlained as follows We kee iterating until the thresholds no longer change, which imlies that the otimal thresholds have been found The thresholds (k, k 2) are those obtained from each iteration Lines 4 6 comute V (s) for all s under the given thresholds (k, k 2) Then, Lines 8 22 determine the search direction for k and k 2 Because V (s) in each iteration is comuted using the current thresholds (k, k 2), we have actions a(k ) = a(k 2) = 0, and (4) is automatically satisfied when relacing its left hand-side with V (k ) or V (k 2) Lines 9 and 6 check whether iterating according to (5) can yield lower cost If it does, it means that migrating is a better action at state k (or k 2), which also imlies that we should migrate at states s with M s k (or k 2 s N) according to Proosition In this case, k (or k 2) should be set closer to zero, and we search through those thresholds that are closer to zero than k (or k 2) If Line 9 (or Line 6) is not satisfied, according to Proosition, it is good not to migrate at states s with k s 0 (or 0 s k 2), so we search k (or k 2 ) to the direction aroaching M (or N), to see whether it is good not to migrate under those states Lines 23 37 adjust the thresholds If we are searching toward state M or N and Line 25 is satisfied, it means that it is better not to migrate under this state (k i ), and we udate the threshold to k i When Line 27 is satisfied, it means that it is better to migrate at state k i According to Proosition, we should also migrate at any state closer to M or N than state k i, thus we exit the loo If we are searching toward state zero and Line 3 is satisfied, it is good to migrate under this state (k i ), therefore the threshold is set to one state closer to zero (k i sign(k i )) When Line 33 is satisfied, we should not migrate at state k i According to Proosition, we should also not migrate at any state closer to zero than state k i, and we exit the loo Proosition 2 The threshold-air (k, k 2) is different in every iteration of the loo starting at Line 2, otherwise the loo terminates Proof: The loo starting at Line 2 changes k and k 2 in every iteration so that V (s) for all s become smaller It is therefore imossible that k and k 2 are the same as in one of the revious iterations and at the same time reduce the value of V (s), because V (s) comuted from (3) is the stationary cost value for thresholds (k, k 2) The only case when (k, k 2) are the same as in the revious iteration (which does not change V (s)) terminates the loo Corollary The number of iterations in Algorithm is O( M N) Proof: According to Proosition 2, there can be at most M N + iterations in the loo starting at Line 2 If we use Gaussian elimination ( to comute (3), the timecomlexity of Algorithm is O M N ( M + N) 3) IV SIMULATION RESULTS We comare the roosed threshold method with the standard value iteration and olicy iteration methods [9, Ch 3] Simulations are run on MATLAB, on a comuter with 64-bit Windows 7, Intel Core i7-2600 CPU, and 8GB memory The value iteration terminates according to an error bound of ɛ = 0 in the discounted sum cost Note that the roosed method and the standard olicy iteration method always roduce the otimal cost The number of states M = N = 0 The transition

Running time (s) 0 2 0 3 Proosed Policy iteration Value iteration Running time (s) 0 0 2 0 3 Proosed Policy iteration Value iteration Running time (s) 0 0 0 0 2 0 3 Proosed Policy iteration Value iteration 0 4 0 4 0 4 Discounted sum cost 5 4 3 2 Otimal Never migrate Always migrate Discounted sum cost 25 20 5 0 5 Otimal Never migrate Always migrate Discounted sum cost 250 200 50 00 50 Otimal Never migrate Always migrate 0 (a) 0 (b) 0 (c) Figure 4 Performance under different : (a) γ = 05, (b) γ = 09, (c) γ = 099 robabilities and are randomly generated Simulations are run with 000 different random seeds in each setting to obtain the average erformance The running time and the discounted sum costs under different values of are shown in Fig 4 The results show that the roosed method always has lowest running time, and the running time of the standard olicy iteration method is 2 to 5 times larger than that of the roosed algorithm, while the value iteration aroach consumes longer time This is because the roosed algorithm simlifies the solution search rocedure comared with standard mechanisms The results also show that the roosed method can rovide the otimal cost comared with a never-migrate (excet for states M and N) or always-migrate olicy It is also interesting to observe that the otimal cost aroaches the cost of a nevermigrate olicy when is small, and it aroaches the cost of an always-migrate olicy when is large Such a result is intuitive, because a small imlies a small data transmission cost, and when is small enough, then it is not really necessary to migrate; when is large, the data transmission cost is so large so that it is always good to migrate to avoid data communication via the backhaul network V CONCLUSION AND FUTURE WORK In this aer, we have roosed a threshold olicy-based mechanism for service in mobile micro-clouds We have shown the existence of otimal threshold olicy and roosed an algorithm for finding the otimal thresholds The roosed algorithm has olynomial time-comlexity which is indeendent of the discount factor γ This is romising because the time-comlexity of standard algorithms for solving MDPs, such as value iteration or olicy iteration, are generally deendent on the discount factor, and they can only be shown to have olynomial time-comlexity when the discount factor is regarded as a constant [0] Although the analysis in this aer is based on -D random walk of mobile users, it can serve as a theoretical basis for more comlicated scenarios, such as 2-D user-mobility, in the future ACKNOWLEDGMENT This research was artly sonsored by the US Army Research Laboratory and the UK Ministry of Defence and was accomlished under Agreement Number W9NF-06-3- 000 The views and conclusions contained in this document are those of the author(s) and should not be interreted as reresenting the official olicies, either exressed or imlied, of the US Army Research Laboratory, the US Government, the UK Ministry of Defence or the UK Government The US and UK Governments are authorized to reroduce and distribute rerints for Government uroses notwithstanding any coyright notation hereon REFERENCES [] Y Abe, R Geambasu, K Joshi, H A Lagar-Cavilla, and M Satyanarayanan, vtube: efficient streaming of virtual aliances over last-mile networks, in Proc of the 4th annual Symosium on Cloud Comuting ACM, 203 [2] K Ha, Z Chen, W Hu, W Richter, P Pillai, and M Satyanarayanan, Towards wearable cognitive assistance, in Proc of ACM MobiSys, 204 This is unless the MDP is deterministic, which is not the case in this aer

[3] M Satyanarayanan, G Lewis, E Morris, S Simanta, J Boleng, and K Ha, The role of cloudlets in hostile environments, IEEE Pervasive Comuting, vol 2, no 4, 40 49, Oct 203 [4] S Wang, L Le, N Zahariev, and K K Leung, Centralized rate control mechanism for cellular-based vehicular networks, in Proc of IEEE GLOBECOM 203, 203 [5] S Davy, J Famaey, J Serrat-Fernandez, J Gorricho, A Miron, M Dramitinos, P Neves, S Latre, and E Goshen, Challenges to suort edge-as-a-service, IEEE Communications Magazine, vol 52, no, 32 39, Jan 204 [6] T Taleb and A Ksentini, Follow me cloud: interworking federated clouds and distributed mobile networks, IEEE Network, vol 27, no 5, 2 9, Set 203 [7], An analytical model for follow me cloud, in Proc of IEEE GLOBECOM 203, 203 [8] A Ksentini, T Taleb, and M Chen, A Markov decision rocess-based service rocedure for follow me cloud, in Proc of IEEE ICC 204, 204 [9] W B Powell, Aroximate Dynamic Programming: Solving the curses of dimensionality John Wiley & Sons, 2007 [0] I Post and Y Ye, The simlex method is strongly olynomial for deterministic markov decision rocesses in Proc of ACM-SIAM Symosium on Discrete Algorithms (SODA), 203, 465 473