THE emerging application of Mobile Ad Hoc Networks

Size: px

Start display at page:

Download "THE emerging application of Mobile Ad Hoc Networks"

Julia Little
5 years ago
Views:

1 ANNUAL CONFERENCE OF ITA, SEPT. Controlling Autonomous Data Ferries with Partial Observations Ting He, Kang-Won Lee, and Ananthram Swami Abstract We consider supporting communications in highlypartitioned mobile wireless networks via controllable data ferries. While existing ferry control techniques assume either stationary nodes or complete ferry observation of node locations, we address the more challenging scenario of highly mobile nodes and partial ferry observations. Using the tool of Partially Observable Markov Decision Processes (POMDP), we develop a comprehensive framework where we expand the solution space from predetermined trajectories to policies that can map ferry observations to navigation actions dynamically. Under this framework, we present an optimal and several efficient heuristic policies. We compare the proposed policies with predetermined control through analysis and simulations with respect to multiple node mobility parameters including speed, locality, activeness, and range of movement. The comparison shows a significant performance gain of up to twice a contact rate in cases of high uncertainty. In cases of low uncertainty, we give a sufficient condition under which the predetermined control is optimal. Index Terms Dynamic data ferry control, Partially Observable Markov Decision Process, Performance analysis. I. INTRODUCTION THE emerging application of Mobile Ad Hoc Networks (MANETs) in mission-critical environments brings new technical challenges. Consider a MANET consisting of highly mobile nodes covering a large, infrastructureless field. Examples of such networks can be found in military tactical operations, search-and-rescue scenarios, as well as mobile sensor networks. Such networks exhibit properties drastically different from traditional networks: low density, high mobility, and poor connectivity. On top of these, the mobility in these networks is usually dictated by their mission needs and physical constraints such as roads and obstacles, which may not coincide with the mobility needed to provide sufficient contact opportunities. These challenges make it extremely difficult to enable efficient communications via in-network solutions. Meanwhile, the fast development of robot technology has brought to life robots with increasingly sophisticated mobility, communication, and computation capabilities at significantly reduced size and cost. In particular, existing work [] has shown that it is feasible to build light-weight robots from T. He and K-W. Lee are with the IBM T.J. Watson Research Center, Hawthorne, NY 53 {the,kangwon}@us.ibm.com A. Swami is with the Army Research Laboratory, Adelphi, MD 783 e- mail: aswami@arl.army.mil Research was sponsored by the U.S. Army Research Laboratory and the U.K. Ministry of Defence under Agreement Number W9NF The views and conclusions contained in this document are those of the author(s) and do not represent the official policies of the U.S. and U.K. Governments. The Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. Part of this work was reported in ACM MobiHoc []. off-the-shelf elements that can be mounted on many types of mobile platforms including unmanned ground or aerial vehicles. The advance in robotic technology and the challenges in mission-critical MANETs inspire our investigation of applying autonomous data ferries in highly-partitioned MANETs. Specifically, we propose to enhance these networks with dedicated communication nodes acting as data ferries. We envision a heterogeneous network structure as in [3], where the network consists of regular nodes that passively move according to their inherent mobility patterns and data ferries that actively move according to external control. The regular nodes have communication needs among themselves, while the data ferries only serve as relays in a load-carry-and-deliver manner. To fully utilize these ferries, however, effective control of their mobility is needed. The problem of ferry mobility control has been extensively studied in the case of stationary regular nodes with known locations [3] [5]. The solutions there involve designing ferry routes to cover the nodes and letting the ferries follow these routes deterministically like shuttles. When nodes are mobile, however, very little work has been done, with the only existing solutions to our knowledge [6], [7] assuming complete observation of nodes current locations by the ferries. In practice, this assumption may fail due to obstacles, limited communication range, or poor channel conditions. In this paper, we consider the scenario when nodes move stochastically, and data ferries can only observe a subset of the field at a time, referred to as partial observations. Without complete observations, the ferries will have uncertainties about node locations. Our goal is to develop a solution that enables the ferries to self-navigate based only on partial observations and statistical knowledge of node mobility patterns. A. Related Work The idea of using controlled mobility to facilitate communications was first proposed in [8], where nodes deliberately change trajectories to send messages to otherwise disconnected nodes. This solution has limited application in mission-critical networks since regular nodes often cannot afford to change trajectories for communication purposes. For such applications, designated communication nodes are used instead [3], [4], [6], [7]. To compensate for the sparsity of the networks, a delay-tolerant scheme is used, where communication nodes physically carry data between task nodes [3], [4], [6], [7] or from sensors to base stations [5]. The solution in [7] actually belongs to the stationary-node category as nodes are stationary unless chosen to be ferries.

2 ANNUAL CONFERENCE OF ITA, SEPT. Our work belongs to this category. The existing work is limited by several assumptions: most solutions assume the task nodes to be stationary with known positions [3] [5]; when task nodes are mobile, the solutions assume that ferries know nodes current locations through either out-of-band communications [6] or external location management systems [7]. There have been multiple extensions in terms of channel models, performance metrics, traffic demands, etc., but the main assumption of fully observable network state still remains. In contrast, we aim to explicitly handle stochastic node movements and incomplete ferry observations. Technically, our approach belongs to stochastic control and in particular Partially Observable Markov Decision Processes (POMDP) [9]. Although previously applied to agent navigation in robotics [], [], its application to mobility control in communication networks has not been explored before. Our problem differs from the agent navigation problem in that the subsystem that evolves dynamically (task nodes) is different from the subsystem that is controlled (data ferries), which leads to unique properties as explained later. B. Our Approach and Results We take the approach of stochastic control, where in contrast to predetermined control in stationary networks, we show that the optimal control in mobile networks needs to be dynamic in nature. Specifically, our contributions are: A comprehensive control framework: We model the entire network of nodes and ferries as an interactive system represented by a Partially Observable Markov Decision Process (POMDP), which seamlessly combines the tasks of information collection and decision making. The framework enables a systematic solution to handle uncertainties due to stochastic node movements and partial ferry observations. Optimal and efficient heuristic policies: Based on the proposed framework, we develop both optimal and heuristic policies by applying POMDP techniques. We develop a recursive algorithm to compute the optimal policy and a set of efficient heuristic algorithms due to the complexity of the optimal policy. Moreover, we show that our problem has a unique structure that allows an explicit policy representation linked back to ferry trajectories. Analysis and simulations: We conduct rigorous analysis of the proposed solution verified by simulations. In particular, since dynamic control generally requires higher costs than predetermined control, we want to characterize their performance gap and how it is influenced by external parameters to provide general guidance on system design. Using ferry-node inter-contact time as the performance measure, we derive both upper bounds (achievability) and lower bounds (converse). Our achievability results show promising performance of the proposed dynamic control in that it outperforms predetermined control the most when improvement is most needed: the field is large (or the sensing range is small) and the prior knowledge is minimal (nodes steady-state distributions are close to uniform). In this case, our solution can reduce the inter-contact time by at least Ω( n) for a domain of size n, and even Θ(n) if nodes are constrained to -D routes. The r node ferry (current) ferry (previous) Fig.. Example of two consecutive slots: one (gateway) node per domain; contact only occurs in the second slot. r: ferry sensing radius. converse shows that this reduction is at most half, achievable by our policy under symmetric and infrequent mobility. On the other hand, when nodes steady-state distributions are localized, we also give a sufficient condition under which predetermined control is optimal. The rest of the paper is organized as follows. Section II formulates the problem of dynamic ferry control. Section III presents a control framework via POMDP. Section IV develops the optimal and heuristic policies, whose performance is analyzed in Section V. Section VI evaluates the solution through simulations, and Section VII concludes the paper. A. Network Model II. PROBLEM FORMULATION As illustrated in Fig., we assume that the network of regular (i.e., non-ferry) nodes is divided into N partitions, each occupying a disjoint region called a domain. Within each domain, at least one node is selected as the gateway for inter-domain communications (assuming sufficient intradomain contacts). We make this distinction because selected nodes may need special equipments (e.g., high-power antennas) to communicate with ferries, although for our purpose it suffices to model only these gateways, which will simply be called nodes in the sequel. Each node is equipped with a radio of range r and moves within its domain according to some stochastic mobility model to be specified later. Assume that each data ferry is equipped with a radio of the same range as the nodes. In addition, it has a buffer to store messages and a controller to specify its movements according to a control policy π (defined later). Assume the ferry self-navigates periodically with a predetermined interval (called slot length), and at the end of each slot t, it senses node s presence (e.g., by broadcasting beacons) to obtain a binary observation z t (z t = for contact and for miss). Assume the sensing range to be equal to the radio range. Upon contact, the ferry will exchange messages with the node and then move to the next domain to ferry data. B. Main Problem: Ferry Mobility Control The main problem in this paper is to design a control policy that optimizes the inter-domain communication performance. To facilitate control, we discretize the space by partitioning each domain into a number of cells of radius r (see Fig. ) Equivalently, r denotes the smaller of ferry/node ranges. r

3 ANNUAL CONFERENCE OF ITA, SEPT. 3 such that the ferry can sense the entire cell from its center. Each control action u t specifies the cell for the ferry to move to during slot t. We model the design criteria by a reward function 3 E[R(z T, ut )], where R( ) is the performance metric as a function of the observation history z T and the action history u T up to time T, and T > defines a time window of control (called the horizon). We focus on the infinite horizon case (T = ) for long-term control. Compared with ferry mobility control in the literature (e.g., [3]), the main difference here is that instead of designing the entire ferry trajectory u T ahead of time, we look at it as a stochastic sequence, where each action u t can be optimized individually at run time. This means that the control decision u t can be based not only on prior knowledge such as nodes steady-state distributions, but also on run-time information including the ferry s past observations z t and actions u t. Accordingly, the control problem also shifts from designing the trajectory to designing the control policy π, which is a sequence of mappings π=(π t ) T t= that maps all the( available information to the next action at each slot, i.e., π t z t, u t ) = ut. Our goal is to find a policy π that maximizes the total reward: π = arg max E π [ ] R(z T, u T ) u t = π t (z t, u t ). () III. DESIGN FRAMEWORK We will focus on the control of an individual ferry. Consider the case when there is one (gateway) node per domain and the number of domains N is large (the cases of multiple nodes per domain and small N are addressed separately in []). To ensure scalability, we divide the ferry control into global control and local control as follows. Global control: The global control determines which domain to go to. An intuitive solution is to iterate among all the domains in a predetermined or random order. More sophisticated solutions involve optimization of certain performance metrics. Since the domains are fixed, we can view an entire domain as a stationary node and apply existing solutions for ferry route design [3] to order the domains. Local control: The local control handles the specific movements among cells within a domain. This is a more challenging task because the ferry can only see one cell at a time, and will be the focus of this paper. A. A Simple Example To illustrate the difficulty of this problem, let us consider a simple example. Suppose that each domain is only partitioned into cells, labeled cell and cell. In each slot, a node will move from cell to with probability p and from cell to with probability q. Assuming that a contact occurs every time the ferry is in the same cell of a node, after which it moves to a different domain, we want to maximize the contact rate (r c= lim T R(zT, ut )/T, where R( ) is the number of contacts). 3 We use x t t to denote the vector (x t,..., x t ) for t t. Simple as it seems, much thought is required to achieve optimality. A baseline policy of random movements will only give a contact rate of.5. An intuitive solution is to check the other cell whenever the ferry has a miss. This policy leads to oscillating movements, whose contact rate can be shown to be r o c = (p + q)( pq) + p + q ( p q)( pq). Another intuitive solution is to let the ferry wait in one cell, whose contact rate can be shown to be r w c = (p + q)pq p + q + pq(p + q). We see that none of these policies can guarantee optimality, e.g., the waiting policy is better if p = q >.5, and the oscillating policy is better if p = q <.5. A smarter design is perhaps to keep chasing the node to its most likely cell, called the myopic policy. Interestingly, even this policy is not optimal in general (e.g., for p =.5, q =.5, the oscillating policy outperforms the myopic policy). In the sequel, we will introduce a systematic approach to achieve optimality. B. A Systematic Solution via POMDP The preceding simple example turns out to require a sophisticated solution []. In general, the problem is even trickier since the ferry will not know the exact location of the node before contact. In this section, we introduce a systematic design framework using the tool of Partially Observable Markov Decision Process (POMDP) [9]. POMDP models a control problem where the goal is to select control actions based on run-time observations reflecting the state of the controlled system to optimize reward. A POMDP problem is represented by the following tuple < S, U, Z, P, P z, r, γ, b > []: ) S: the state space, defined as the set of internal states of the controlled system; here the state s t denotes the cell of the current node at the beginning of slot t and the state space S the set of all cells denoted by {,,..., n}; ) U: the action space, defined as the set of control options; here the action u t denotes the cell the ferry will move to in slot t, which can be either in the current domain or in the next domain (specified by the global control); accordingly, U is divided into follow actions U f = {,..., n} (i U f means to move to cell i of the current domain) and switch actions U s = {,..., n } (j U s means to switch to cell j of the next domain); 3) Z: the observation space, defined as the set of possible observations; here Z = {, }, with denoting contact and miss (z t Z is taken at the end of slot t); 4) P: the state transition model specifying the transition probabilities of the system state; here it is the transition matrix from node mobility model (assuming Markovian mobility), where P(i, j) = Pr{s t+ = j s t = i} is the probability for the node to move from cell i to cell j during slot t; 5) P z : the observation model specifying how observations are related to states and actions; here P z (s, u, z) =

4 ANNUAL CONFERENCE OF ITA, SEPT. 4 Pr{z t = z s t+ = s, u t = u} is the probability of sensing z if the node moves to cell s and the ferry to cell u; we assume P z (s, u, z) = if and only if z = (s == u) (perfect sensing); 6) r(z, u): the payoff function specifying the reward of taking action u and observing z; here we use r(z, u) = z to give a unit reward for each contact; 7) γ (, ]: the discount factor specifying how much delayed rewards will be discounted; the total reward is given by 4 R(z T, ut ) = T t= γt r(z t, u t ); 8) b : the initial belief specifying the initial distribution of the system state; here we assume it is the node s steadystate distribution b (i.e., the solution to b T = bt P, assuming it exists). Many of the above elements are design choices and can be extended. The main assumption behind this framework is the Markovian node mobility model. While noting that it is a limiting assumption, we point out that most mobility models, e.g., random walk, random waypoint, and their variations, can be mapped to Markovian models under proper quantization. IV. OPTIMAL AND HEURISTIC POLICIES We now develop policies for the local control under the framework in Section III-B. We will first develop a mathematical representation of the optimal policy and then investigate the algorithms. For clarify, we assume nodes in different domains move i.i.d. with transition matrix P and steady-state distribution b. The non-i.i.d. case can be easily handled by plugging in different matrices and distributions. A. Optimal Policy It is known in POMDP theory that a sufficient statistic is the belief vector b t, defined as the posterior distribution of the node (at the beginning of slot t) given all past actions and observations, i.e., b t (s) = Pr{s t = s z t, u t }. It thus suffices to design policy based on b t. We introduce an auxiliary function VT π called the value function, where V T π(b) denotes the total discounted reward gained by policy π over a horizon T starting from belief b [9]. It is known that the optimal value function must satisfy the following equation known as the value iteration: V T (b) = γ max u U [ r(b, u) + ] P z (b, u, z)v T (f(b, u, z)), () z Z where r(b, u) is the expected reward in the current slot, P z (b, u, z) the probability of observing z, and f(b, u, z) the belief after taking action u and observation z. The optimal policy is the solution to (), i.e., π T (b) is the action u T achieving V T (b). We now translate the above definitions to our problem. In our problem, the belief vector experiences two updates per slot: a belief transition and a Bayesian update. The belief transition takes place after the ferry takes an action u t at the 4 There are other reward structures, but we select discounted reward as it is the most common one for infinite horizon. beginning of slot t and before it takes an observation z t at the end of slot t to model node movements during this slot: { b P t = f (b t, u t ) = T b t if u t U f, (3) b if u t U s, where the belief is reset if the ferry switches to a new domain. The Bayesian update occurs after the ferry observes 5 z t : { b t+ = f (b b if z t, u t, z t ) = t =, (b (4) t) \ut if z t =, where the belief is reset after a contact (and the ferry will switch domains), or updated by the Bayesian rule after a miss. Then the procedure continues in the next slot. Based on the above belief updates and the definitions in Section III-B, the value iteration () in our problem becomes V T (b) = γ max u U [b (u) + b (u)v T ( b ) + [ b (u) ] V T ( b \u ) ], (5) and the action achieving (5) gives the optimal policy. Physically, V T (b) is the maximum discounted number of contacts over T slots starting from a node distribution b. Although γ does not appear in the physical problem, it is needed to ensure a unique optimal solution in the long run. Fact 4. ( [3]): There exists a unique stationary policy π that optimizes the total reward as T if γ <, and its value function V is invariant under (). B. Policy-Computing Algorithms Although mathematically sound, the equation (5) cannot be directly used to compute the policy as there are infinitely many belief vectors. In this section, we explore algorithms that can compute the policy in a finite number of steps. ) Hardness of Optimal Policy: Sondik [9] first showed the possibility of computing the optimal policy by utilizing the piece-wise linear and convex property of the optimal value function: V T (b) = max b α, where α is a coefficient vector, α Γ T called the value vector, that represents a linear segment of V T. The problem is then reduced to finding the set of value vectors Γ T (R + ) n. Our problem differs from the classic POMDP in that we have two belief updates per slot (3, 4) that are interleaving with the action. Consequently, our optimal value function has a unique form as follows. Proposition 4.: The optimal value function of (5) can be written as ( ) V T (b) = max max b α, max b α α Γ f α Γ s T T Moreover, Γ f T and Γs T (T ) satisfy the recursion6 : Γ f T = P Γ T, Γ s T. (6) = arg maxb α, (7) α Γ T 5 The notation b = b \u means b (u) = and b (s) = b(s)/( b(u)) for s u. 6 For T =, Γ f = Γs = {(,..., )}.

5 ANNUAL CONFERENCE OF ITA, SEPT. 5 where P Γ T={Pα : α ΓT }, and Γ T={ α u, α, α u : u S, α Γ f T } for { γ( + α u, c ) if s = u, α (s) = γα (8) (s) o.w., { γ( + α u c ) if s = u, (s) = (9) γc o.w., c = max α Γ f T Γs T b α, and c = max α Γ s T b α. The above result also gives us an explicit algorithm to compute the optimal policy. During the offline phase (when bootstrapping the controller), we start with Γ f = Γs = and iteratively apply (7) for T =,,... to compute Γ f T and Γ s T for any given T. During the online phase (when applying control), we extract the policy by solving for the action that achieves the maximization on the right-hand side of (5), where V T (b ) and V T (b \u ) are computed from Γf T and Γs T by (6). The drawback of the above algorithm is that the number of value vectors grows fast. It can be shown that Γ f T = S ( S T )/( S ) = O( S T ) and Γ s T, and thus the worst case complexity grows exponentially with T (O( S T+ )). This is actually not specific to our algorithm, but due to the intrinsic hardness of POMDP []. ) Efficient Heuristic Policies: Due to the high complexity of the optimal policy, we turn to heuristic policies. Most heuristic policies seek to approximate the optimal value function. For example, we can approximate the exact value iteration (5) by the Interpolation-Extrapolation method [], which quantizes the belief space into a finite set of representative belief vectors and runs value iterations only on these beliefs; the values at other beliefs are computed via interpolation. Alternatively, we can approximate the value-vector-based algorithm in Section IV-B by the Point-Based Value Iteration method [], which limits the set of value vectors to those achieving the maximum value in (6) for at least one of the representative beliefs. Other heuristic methods include reducing the problem to a fully observable case (MDP and QMDP approximations), directly approximating the value function by curve fitting (Least-Squares Fit), etc.; see [], [] and references therein. An extreme case is when we ignore the value function altogether, which leads to a much simpler policy, called the myopic policy, that greedily optimizes the probability of immediate contact: π (b) = arg maxb (u). () u U Unlike the above sophisticated policies, this policy does not require any offline computation and has very short online response time. On the other hand, it may suffer more performance loss due to greediness. In certain cases, however, we show that the myopic policy will give the optimal action. Theorem 4.3: At horizon T and belief state b, it is optimal to follow the myopic policy if γ b () ( b () )[ + b γ( γt )/( γ)], () where b () = max u U b (u) (achieved at u ) is the largest immediate reward, and b () = max u U\u b (u) the second largest immediate reward. This theorem gives a sufficient condition for the myopic polity to be optimal. The condition depends on two parameters, the number of remaining slots T and the current belief b. By this theorem, we can safely short-circuit value iterations in certain subsets of the belief space by following the myopic policy if condition () is satisfied. 3) Explicit Policy Representation: Different from general POMDPs, our problem has a special property as follows. Proposition 4.4: Each belief-based dynamic policy π corresponds to a unique trajectory u π =(u π (t)) d t= (d can be ) within one domain such that π is equivalent to a policy of following u π and repeating it in different domains until contact, after which this process is repeated in a new domain. This is because the ferry s observations are binary and the controller restarts (in a new domain) after each contact, which reduces a policy to a sequence of actions (u π ) if the observations are consecutive misses. This property allows us to represent the policy explicitly by u π. Note that this is different from predetermined ferry trajectories because instead of following u π deterministically, the ferry will stop and switch domain immediately after a contact, and its actual trajectory will be stochastic. To distinguish from the actual trajectory, we will call u π the policy trajectory of policy π. V. PERFORMANCE ANALYSIS We analyze the performance of the proposed policies using the measure of average inter-contact time E[K], i.e., the mean number of slots between two consecutive ferry-node contacts (/E[K] is the contact rate). Our goal is two-fold. First, we want to compare dynamic policies with predetermined policies. Since dynamic policies are more expensive to implement, the comparison will provide guidance on performance-cost tradeoff. Moreover, we want to see how the performance is influenced by node mobility models. Instead of limiting to specific models, we focus on generic parameters such as speed, locality, activeness, and the size and the shape of the field to draw insights on when dynamic control is worthwhile and when predetermined control suffices. A. Dynamic vs. Predetermined Control To analyze the fundamental performance gain, we need to compare the optimal dynamic policy with the optimal predetermined counterpart. We begin with the following observation. Claim 5.: The optimal predetermined policy is to keep switching among the most likely cells of different domains (called the switching policy). Therefore, it suffices to compare dynamic policies with the switching policy. In the sequel, we use the notion that (b (i) )n i= is an ordered sequence of the elements in b such that b () b () the switching policy. It is easy to see that E[K SW ] = /b ()... Let K SW denote the inter-contact time of. For

6 ANNUAL CONFERENCE OF ITA, SEPT. 6 dynamic policies, an exact analysis is intractable as we cannot even write down the optimal policy explicitly (see Section IV-B). Instead, we seek to bound its performance from both above and below. For the upper bound, we need to analyze a good but simple dynamic policy. Specifically, consider the myopic policy with policy trajectory u = (u(t)) d t=, and let p t denote the conditional probability of contact in step t given that there has been no contact in the previous t steps. Then p t and u(t) can be computed recursively by p t = maxb t(s), s u(t) = argmaxb t(s), () s where b t+ = PT (b t) \u(t) and b =b ; d is the smallest t such that p t+ < b (). For the lower bound, we need to modify the problem into one that allows superior performance. Consider a hypothetical scenario where all nodes are stationary with distribution b such that the ferry never needs to visit a cell twice. The policy trajectory in this case is the sequence of cells corresponding to b (), b(),.. (. The conditional ) contact probability p t is given by p t = b(t) / t b (i), and the trajectory length d is i= the smallest t such that p t+ < b(). With these notations, we present the following bounds. Theorem 5.: The average inter-contact time E[K DY ] of the optimal dynamic policy satisfies K + d ( p ) p d where p = t= E[K DY ] K + d( p ) p, (3) ( p t ), K = d t p tp t t= i= ( p i ), and p, K are similarly defined with p t, d replaced by p t, d. This theorem provides both achievability and converse results on dynamic ferry control. The achievability result (the upper bound) gives a guaranteed performance gain in comparison with predetermined control (E[K SW ]). The converse result (the lower bound) establishes a fundamental limit that no control policy can outperform, which can serve as a reference in evaluating the effectiveness of specific policies. B. Impact of Domain Size & Node Speed We now apply the result in Theorem 5. to analyze the impact of node mobility parameters. We first examine the impact of domain size, measured by the number of cells n per domain, and node speed, measured by the number of cells m a node can move across per slot (i.e., it can move to locations at most m cells away). We consider a symmetric mobility model where nodes are uniformly distributed in the steady state to only focus on n and m; asymmetric mobility models will be treated separately in Section V-C. Intuitively, the ferry starts with little knowledge about node location, but will learn more as it visits cells and makes its belief vector more concentrated. However, due to node movements, if the ferry misses the node in one cell, its probabilities of seeing the node in neighboring cells will also y y x x (a) (b) Fig.. Dominating trajectories for a -D random walk: (a), (b) are both the longest. : starting point; : ending point. drop in the next slot, and this effect will propagate to further neighbors over time. To compensate for this effect, we make the ferry follow a dominating trajectory, defined as a sequence of cells c(), c(),... such that no c(k) can be reached by a node starting from cell c(j) k j slots ago. The idea is to let the ferry scan the field by visiting only one cell per neighborhood to best utilize the contact opportunities; see Fig. for illustrations. Using this policy, we obtain the following bounds. Corollary 5.3: If b is uniform, then E[K SW ] = n, and n E[K DY ] n d(n), (4) where d(n) is the length of the longest dominating trajectory. The above gives closed-form bounds on control performance in terms of domain size and the length of dominating trajectory. Intuitively, the more stretched the domain is and the slower the node moves, the longer the dominating trajectory, and the better dynamic policies will perform. For example, for random walks (i.e., m = ), d(n) n/ if the domain is -D (trajectory, 3,...) and d(n) n if -D (see Fig. ). In general, it can be shown that d(n) = Θ(n/m) in -D and d(n) = Ω( n/m) in -D. This result is consistent with the intuition that it is easier to catch nodes that move slowly or in constrained domains. Meanwhile, since a predetermined controller only uses the steady-state distribution, its performance is independent of the above parameters. As the domain size n increases (for fixed m), the absolute performance gain E[K SW ] E[K DY ] will always increase, at least as Ω( n). If, in addition, nodes are constrained to some routes (-D mobility), then even the relative gain (E[K SW ] E[K DY ])/E[K SW ] will be positive, i.e., there will be a positive-fraction improvement in the contact rate. This is significant given the fact that the contact rate can at most be doubled as seen from the lower bound. Although not proved for -D mobility, simulation shows that the upper bound is loose in this case and dynamic control still provides significant improvement in contact rate (see Section VI). C. Impact of Node Locality As nodes mobility becomes asymmetric, their steady-state distribution b will deviate from the uniform distribution. After certain point, b alone can already give a good prediction of node location, and dynamic control becomes unnecessary. Indeed, we have confirmed the above intuition by the following result

7 ANNUAL CONFERENCE OF ITA, SEPT. 7 Corollary 5.4: If the node mobility is sufficiently biased such that b () > b () /( b() ), then E[K DY ] = E[K SW ], i.e., the switching policy is optimal. This result provides a sufficient condition for the switching policy to be optimal. When this condition applies, we know for sure that having the ferry hop among the most probable cells of different domains is the best policy, and no dynamic controller is needed. D. Impact of Node Activeness Intuitively, there are two aspects of nodes level of mobility: how far a node can move in one slot and with what probability. The first aspect reflects node speed and has been discussed in Section V-B; the second reflects how active the nodes are and will be the focus of this subsection. To model node activeness, we start from a base transition matrix P and scale all the off-diagonal entries by a parameter β >, called the activeness parameter. We then modify the diagonal entries to make it a valid transition matrix again, i.e., the new matrix P β is given by (P β ) ij = βp ij for i j and (P β ) ii = β j i P ij, where < β < /[max i j i P ij]. Intuitively, β models how frequently nodes cross cell boundaries during one slot. It is orthogonal to the steady-state distribution, as stated below. Lemma 5.5: Let P be a base transition matrix and P β its scaled version according to an activeness parameter β. Then their associated steady-state distribution b and b are equal for all β (, /( min P ii )). i Since b is independent of β, the previous results that only depend on b, e.g., the performance of the switching policy and the lower bound in (3), are also independent of β. Although the general upper bound in (3) still holds here (with P replaced by P β ), we can narrow the gap by explicitly including β as follows. Theorem 5.6: If we scale the transition matrix in Theorem 5. by an activeness parameter β, then besides the bounds in (3), E[K DY ] also satisfies E[K DY ] K + d ( p )/ p, where d p = t= K = p b (t) βδ d t= ( d t t b (t) βδ t= b (t) (d t)(d t + ), (5) i= b (i) (t i) ), (6) for δ = max i =j P ij and d defined as in Theorem 5.. In particular, as β, E[K DY ] converges to the lower bound at O(β). Besides verifying the intuition that as nodes become less active, it will be easier to infer node locations from past observations and the control performance will improve, the above result also guarantees that the convergence rate is (at least) linear in β, i.e., an x% decrease in node activeness will generate at least x% decrease in the performance gap compared with a ferry in stationary networks. VI. SIMULATIONS We now test the proposed policies via simulations. We use a generalized random walk to model node mobility, where under peak speed m as defined in Section V-B, a node moves from its current cell i to one of its (m )-hop neighbors, τ j h cell j, with transition probability P ij proportional to e for a tightness parameter τ and a home cell h ( j h is the taxicab distance between cells j and h). The home cell h models a spot that the node drifts toward (for τ > ) or away from (for τ < ) with strength τ, which can be tuned to control its locality. Moreover, we control its activeness by setting P ij = β (, ). j i Policies: We first compare various dynamic control policies as well as the predetermined switching policy; see Fig Besides the myopic policy ( MY ), the switching policy ( SW ), and the optimal policy ( VI ; see Section IV-B), we also implement several state-of-the-art heuristic policies in POMDP: Point-Based Value Iteration [] ( PBVI ), Interpolation -Extrapolation [] ( IE ), and Fast-Informed Bound [] ( FIB ) 7. Although suboptimal, the myopic policy closely approximates the optimal and the other sophisticated policies at a much lower complexity. To see the reason, we plot both the discounted reward and the actual performance under different γ in Fig. 5. While the discounted reward is sensitive to γ, the actual performance is not, which explains the good performance of the myopic policy as γ controls the level of greediness. Similar observations hold in -D cases. Mobility parameters: We then verify the impact of various node mobility parameters. To test the impact of domain size and layout, we plot the mean inter-contact times of the switching and the myopic policies under both -D and -D mobilities together with the bounds in Corollary 5.3; see Fig. 6. The switching policy and the lower bound are invariant to the domain layout as long as b is fixed; however, the myopic policy is sensitive. As predicted by analysis, it performs better in constrained domains (-D), improving the switching policy by an amount that grows linearly with n. As the domain becomes unconstrained (-D), the performance degrades, but is clearly better than the upper bound, implying that the upper bound is loose in -D, and dynamic control can still provide significant performance gain. As node (peak) speed m increases (not shown), the myopic policy becomes less sensitive to domain layout, with inter-contact times for -D and -D both converging to the upper bound with d(n) n/m, i.e., E[K MY ] (m )n/(m). Next, we test the impact of locality and activeness by varying the tightness parameter τ and the activeness parameter β; see Fig Fig. 7 shows that the ferry s performance is highly sensitive to variations in node locality. Dynamic control ( MY ) outperforms the predetermined control ( SW ) when node s steady-state distribution is close to uniform ( τ ), but if τ exceeds certain point, the distribution becomes so localized that the two policies become identical. Interestingly, instead of smooth transitions, we see sharp jumps at both ends 7 FIB computes an upper bound on the value function by storing only one value vector that dominates the optimal set of value vectors in every element.

8 ANNUAL CONFERENCE OF ITA, SEPT. 8 no. contacts VI PBVI IE FIB MY SW horizon T Fig. 3. Policy comparisons: performance (- D, n = 5, m =, h = 3, β =., τ =, γ = ). Policy Offline Online (sec) (msec) VI PBVI IE FIB.9.34 MY.4 SW.5 Fig. 4. Policy comparisons: complexity (offline computation time & online response time). (discounted) no. contacts discounted reward: VI myopic actual contacts: VI myopic discount factor γ Fig. 5. Impact of discount factor γ (same settings as in Fig. 3). of the optimality region of the switching policy. This region contains the optimality region derived from Corollary 5.4 (outside the vertical lines), verifying the sufficient optimality condition. Fig. 8 confirms that the switching policy and the lower bound are independent of the level of activeness, while the myopic policy quickly converges to the lower bound with the decrease of β as predicted in Theorem 5.6. inter-contact E[K] SW lower bound MY: D D upper bound: D D domain size n Fig. 6. Impact of domain size & layout (m =, h = domain center, β =., τ = ). inter-contact E[K] tightness τ Fig. 7. Impact of locality controlled by tightness τ (-D, n =, m =, h = center, β =.). inter-contact E[K] activeness β SW MY SW MY lower bound Fig. 8. Impact of activeness (τ =, β as labeled, rest as in Fig. 7). VII. CONCLUSIONS We studied mobility control of autonomous data ferries in mobile networks. In contrast to existing solutions, the proposed solution is fully dynamic. Our evaluations show promising performance improvement measured by increases in contact rate, especially when the performance suffers most. Although the presented results are based on the specific scenario of one gateway per domain and separate global/local control, our approach is also applicable to the more general scenario of multiple gateways per domain and joint control, where we have obtained initial results []. Our work reveals more possibilities: While the current solution is designed for stationary mobilities, can similar techniques be used to track node speed, in cases that nodes drift in fixed directions but with uncertain speed, so that the ferry can catch the nodes accurately? Moreover, we have focused on contact optimization, but are the solutions amendable for optimizing end-to-end performance metrics such as delay and how does this change the comparison between policies? Our study provides a sound foundation for tackling these problems. REFERENCES [] T. He, K.-W. Lee, and A. Swami, Flying in the Dark: Controlling Autonomous Data Ferries with Partial Observations, in ACM MobiHoc,. [] T. Brown, B. Argrow, C. Dixon, S. Doshi, R.-G. Thekkekunnel, and D. Henkel, Ad hoc uav ground network (augnet), in AIAA Unmanned Unlimited Conference, 4. [3] W. Zhao, M. Ammar, and E. Zegura, Controlling the mobility of multiple data transport ferries in a delay-tolerant network, in IEEE INFOCOM, 5. [4] D. Henkel and T. Brown, On controlled node mobility in delay-tolerant networks of unmannned aerial vehicles, in International Symposium on Advance Radio Technologies, 6. [5] Data Mule: Mobility for Enhanced Performance in Sensor Networks, 5, [6] W. Zhao, M. Ammar, and E. Zegura, A message ferrying approach for data delivery in sparse mobile ad hoc networks, in ACM MobiHoc, 4. [7] J. Wu, S. Yang, and F. Dai, Logarithmic store-carry-forward routing in mobile ad hoc networks, IEEE Trans. PDS. [8] Q. Li and D. Rus, Sending messages to mobile users in disconnected ad-hoc wireless networks, in ACM MobiCom,. [9] E. Sondik, The optimal control of partially observable markov decision processes, Ph.D. dissertation, Stanford University, 97. [] M. Hauskrecht, Value-function approximations for partially observable markov decision processes, Journal of AI Research. [] J. Pineau, G. Gordon, and S. Thrun, Anytime point-based approximations for large pomdps, Journal of AI Research. [] I. MacPhee and B. Jordan, Optimal search for a moving target, Probability in the Engineering and Informational Sciences, 995. [3] E. Sondik, The optimal control of partially observable markov processes over the infinite horizon: Discounted costs, OR. [4] T. He, K.-W. Lee, and A. Swami, Controlling Autonomous Data Ferries: Proof of Selected Theorems, IBM, Tech. Rep. RC498, April,

Optimal Control of Partiality Observable Markov. Processes over a Finite Horizon

Optimal Control of Partiality Observable Markov Processes over a Finite Horizon Report by Jalal Arabneydi 04/11/2012 Taken from Control of Partiality Observable Markov Processes over a finite Horizon by