Age of Information: Whittle Index for Scheduling Stochastic Arrivals

Age of Information: Whittle Index for Scheduling Stochastic Arrivals Yu-Pin Hsu Deartment of Communication Engineering National Taiei University yuinhsu@mail.ntu.edu.tw arxiv:80.03422v2 [math.oc] 7 Ar 208 Abstract Age of information is a new concet that characterizes the freshness of information at end devices. This aer studies the age of information from a scheduling ersective. We consider a wireless broadcast network where a base-station udates many users on stochastic information arrivals. Suose that only one user can be udated for each time. In this context, we aim at develoing a transmission scheduling algorithm for minimizing the long-run average age. To develo a low-comlexity transmission scheduling algorithm, we aly the Whittle s framework for restless bandits. We successfully derive the Whittle index in a closed form and establish the indexability. Based on the Whittle index, we roose a scheduling algorithm, while exerimentally showing that it closely aroximates an age-otimal scheduling algorithm. I. INTRODUCTION In recent years, there has been a dramatic roliferation of research on an age of information. The age of information is insired by a variety of network alications requiring timely information to accomlish some tasks. Examles include information udates for smart-hone users, e.g., traffic and transortation, as well as status udates for smart systems, e.g., smart transortation systems and smart grid systems. On the one hand, a smart-hone user needs timely traffic and transortation information for lanning the best route. On the other hand, timely information about vehicles ositions and seeds is needed for lanning a collision-free transortation system. In both cases, snashots of the information are generated by their sources at some eochs and sent to the end devices (e.g., smart-hone users and vehicles) in the form of ackets over wired or wireless networks. Since the information at the end devices is exected to be as timely as ossible, the age of information is therefore roosed to cature the freshness of the information at the end devices; more recisely, it measures the elased time since the generation of the freshest ackets. The goal is to develo networks suorting the agesensitive alications. It is interesting that either throughutotimal design or delay-otimal design might not result in the minimum age []. In this aer, we consider that a base-station (BS) udates many users over a wireless broadcast network, where new information is randomly generated. We assume that the BS can udate at most one user for each transmission oortunity. Under the transmission constraint, a transmission scheduling algorithm manages how the channel resources are allocated for each time, deending on the acket arrivals and the ages of the information. The scheduling design is a critical issue to rovide good network erformance. We hence design and analyze a scheduling algorithm to minimize the long-run average age. The wireless broadcast network is similar to the model in the earlier work [2] of the resent author; however, a lowcomlexity scheduling algorithm is unexlored. To fill this ga, this work investigates an age-otimal scheduling from the ersective of restless bandits [3]. Whittle [4] considers a relaxed restless multi-armed bandit roblem and decoule the roblem into many sub-roblems consisting of a single bandit, while roosing an index olicy and a concet of indexability. The Whittle index olicy is asymtotically otimal under certain conditions [5], and in ractice erforms strikingly well [6]. Note that each user in our roblem can be viewed as a restless bandit; as such, we aly the Whittle s aroach to develo a scheduling algorithm. A. Contributions We transform our roblem into a relaxed restless multiarmed bandit roblem and investigate the roblem from the Whittle s ersective. However, in general, a closed form of the Whittle index might be unavailable. To tackle this issue, we formulate each decouled sub-roblem as a Markov decision rocess (MDP), with the urose of minimizing an average cost. Since our MDP involves an average cost otimization over infinite horizon with a countably infinite state sace, our roblem is challenging to analyze [7]. We rove that an otimal olicy of the MDP is stationary and deterministic; in articular, it is a simle threshold tye. We then derive an otimal threshold by exloiting the threshold tye along with a ost-action age. It turns out that the ost-action age simlifies the calculation of the average cost; as such, we successfully obtain the Whittle index in a closed form and show the indexability. Finally, we roose a Whittle index scheduling algorithm and numerically validate its erformance. B. Related works The age of information has attracted many interests from the research community, e.g., [, 8, 9] and see the survey [0]. Of the most relevant works on scheduling multile users are [ 4]. The works [, 2] consider queues at a BS to store all out-of-date ackets, different from ours. The aer [3] considers a buffer to store the latest information with

2 sources s s 2 s N Λ (t) Λ 2 (t) Λ N (t) base-station users u u 2 u N A (t) A 2 (t) A N (t) age increases linearly with slots. Then, the dynamics of the age of information for user u i is { if Λi (t) = and D(t) = i; X i (t+) = X i (t)+ else, where the age in the next slot is one if the user gets udated on the new information; otherwise, the age increases by one. Since the BS can udate at most one user for each slot, X i (t) for all i, X i (t) X j (t) for all i j, and N i= X i(t) +2+ +N for all t. Fig.. A BS udates N users u,,u N on information from sources s,,s N, resectively. eriodic arrivals; whereas information udates in [4] can be generated at will. Our work contributes to the age of information by develoing a low-comlexity algorithm for scheduling stochastic information arrivals. A. Network model II. SYSTEM OVERVIEW We consider a wireless broadcast network consisting of a base-station (BS) and N wireless users u,,u N in Fig.. Each useru i is interested in a tye of information generated by a source s i, for i =,,N, resectively. All information is sent in the form of ackets by the BS over a noiseless broadcast channel. We consider a discrete-time system with slot t = 0,,. The ackets from the sources (if any) arrive at the BS at the beginning of each slot. The arrivals at the BS for different users are indeendent of each other, and also indeendent and identically distributed (i.i.d.) over slots, governed by a Bernoulli distribution. Precisely, by Λ i (t) we indicate if a acket from source s i arrives at the BS in slot t, where Λ i (t) = if there is a acket, with P[Λ i (t) = ] = i ; otherwise, Λ i (t) = 0. We assume that the BS can send at most one acket during each slot, i.e., the BS can udate at most one user in each slot. Moreover, we focus on the setting that the BS does not buffer a acket if it is not transmitted in the arriving slot. The no-buffer network is motivated by [2]. By D(t) {0,, N} we denote a decision of the BS in slot t, where D(t) = 0 if no one will be udated in slot t and D(t) = i for i =,,N if user u i is scheduled to be udated in slot t. A scheduling algorithm θ = {D(0),D(), } secifies a decision for each slot. Next, we will define an age of information as our design criterion. B. Age of information model The age of information imlies the freshness of the information at the users. We initialize the ages of all arriving ackets at the BS to be zero. The age of information at a user becomes one on receiving a acket, due to one slot of the transmission time. Let X i (t) be the age of information for user u i in slot t before the BS makes a scheduling decision. Suose that the C. Problem formulation We define the average age under a scheduling algorithm θ by [ T ] lim su T T + E N θ X i (t), t=0 i= where E θ reresents the conditional exectation, given that the algorithm θ is emloyed. Note that we focus on the total age, but our work can be easily extended to a weighted sum of the ages. Our goal is to develo a low-comlexity scheduling algorithm whose average age is close to the minimum by leveraging the Whittle s methodology [4]. III. SCHEDULING ALGORITHM DESIGN We will develo a scheduling algorithm based on restless bandits [3] in stochastic control theory. To reach the goal, in this section, we start with casting our roblem as a restless multi-armed bandit roblem [3], followed by introducing the Whittle index [4] as a solution to the multi-armed bandit roblem. A challenge of this aroach is to obtain the Whittle index. We then exlicitly derive the Whittle index in a simler way using a ost-action age. We finally roose a scheduling algorithm based on the Whittle index. A. Restless bandits and Whittle s aroach A restless bandit generalizes a classic bandit by allowing the bandit to kee evolving under a assive action, but in a distinct way from its continuation under an active action. However, the restless bandits roblem, in general, is PSPACEhard [3]. Whittle hence investigates a relaxed version, where a constraint on the number of active bandits for each slot is relaced by the exected number. With this relaxation, Whittle then alies a Lagrangian aroach to decoule the multiarmed bandit roblem into multile sub-roblems. We can regard each user in our roblem as a restless bandit. Following the Whittle s aroach, we can decoule our roblem into N sub-roblems. A sub-roblem consists of a user u i and adheres to the network model in Section II with N =, excet for an additional cost C for udating the user. In each sub-roblem, we aim at determining whether or not the user should be udated for each slot, in order to strike a balance between the udating cost and the cost incurred by age. In fact, the cost C is a scalar Lagrange multilier in the Lagrangian aroach. Since each sub-roblem consists of a single user, hereafter we omit the index i for simlicity.

3 B. Decouled sub-roblem We formulate the sub-roblem as a Markov decision rocess (MDP), with the comonents [5] as follows. States: We define the state s(t) of the MDP in slot t by s(t) = (X(t),Λ(t)). This is an infinite-state MDP as the age is ossibly unbounded. Actions: Let a(t) {0,} be an action of the MDP in slot t indicating the BS s decision, where a(t) = if the BS decides to udate the user and a(t) = 0 if the BS decides to idle. Transition robabilities: The transition robability from state s = (x,λ) to state s under action a(t) = a is: P[s = (x+,) s = (x,λ),a(t) = 0] = ; P[s = (x+,0) s = (x,λ),a(t) = 0] = ; P[s = (,) s = (x,),a(t) = ] = ; P[s = (,0) s = (x,),a(t) = ] = ; P[s = (x+,) s = (x,0),a(t) = ] = ; P[s = (x+,0) s = (x,0),a(t) = ] =. Cost: Let C(s(t),a(t)) be an immediate cost if action a(t) is taken in slot t under states(t), with the definition as follows. ( ) C s(t) = (x,λ),a(t) = a (x+ x a λ)+c a, where the first art x+ x a λ is the resulting age in the next slot and the second art is the incurred cost for udating the user. A olicy µ = {a(0),a(), } of the MDP secifies an action for each slot. A olicy µ is history deendent if a(t) deends on s(0),,s(t) and a(0),a(t ). A olicy is stationary if a(t ) = a(t 2 ) when s(t ) = s(t 2 ) for any t,t 2. Moreover, a randomized olicy chooses an action with a robability, while a deterministic olicy chooses an action with certainty. The average cost under a olicy µ is defined by lim su T T + E µ [ T ] C(s(t),a(t)). t=0 Definition. A olicy µ (that can be history deendent) is cost-otimal if it minimizes the average cost. The objective of the MDP is to find a olicy µ that minimizes the average cost. According to [5], there may not exist a cost-otimal olicy that is stationary or deterministic. Hence, in the next section, we aim at investigating a costotimal olicy. C. Characterizing a cost-otimal olicy We will study structures of a cost-otimal olicy in this section. First, we show that a cost-otimal olicy is stationary and deterministic as follows. Theorem 2. There exists a stationary and deterministic olicy that is cost-otimal, indeendent of initial state s(0). () Proof: Given initial state s(0) = s, we define the exected Fig. 2. - 2 3 4 - - - The age X(t) under the olicy f forms a DTMC. total discounted cost [5] under olicy µ by [ T ] J α (s;µ) = limsue µ α t C(s(t),a(t)) s(0) = s, T t=0 where 0 < α < is a discount factor. Moreover, let J α (s) = min µ J α (s;µ) be the minimum exected total discounted cost. A olicy that minimizes the exected total α-discounted cost is called α-otimal olicy. According to [6], a deterministic stationary olicy is costotimal if the following two conditions hold. ) There exists a deterministic stationary olicy of the MDP such that the associated average cost is finite: Let f be the deterministic stationary olicy of always choosing the action a(t) = for each slot t if there is an arrival. The age X(t) under the olicy f forms a discretetime Markov chain (DTMC) in Fig. 2. The steady-state distribution π = (π,π 2,,) of the DTMC is π i = ( ) i for all i =,2,. Hence, the average age is iπ i = i( ) i =. i= i= On the other hand, the average udating cost is C as the arrival robability is. Hence, the average cost under the olicy f is the average age (i.e., /) lus the average udating cost (i.e., C ), which is finite and yields the result. 2) There exists a non-negative L such that the relative cost function h α (s) J α (s) J α (0) L for all s and α, where 0 is a reference state: Similar to [2], we can show that J α (x,λ) is a non-decreasing function in age x given arrival indicator λ; moreover,j α (x,λ) is a nonincreasing function in λ given x. Then, we can choose L = 0 by choosing the reference state 0 = (0,). By verifying the two conditions, the theorem immediately follows from [6]. Next, we further investigate a cost-otimal olicy by showing that it is a secial tye of deterministic stationary olicy. Definition 3. A threshold-tye olicy is a deterministic stationary olicy of the MDP. The action for state (x,0) is to idle, for all x. Moreover, if the action for state (x,) is to udate, then the action for state (x+,) is to udate as well. In other words, there exists a threshold X {,2, } such that the action is to udate if there is an arrival and the age is greater than or equal to X; otherwise, the action is to idle.

4 X Fig. 3. The ost-action age Y(t) under the threshold-tye olicy forms a DTMC. Theorem 4. If C 0, then there exists a olicy of the threshold tye that is cost-otimal. Proof: It is obvious that an otimal action for state (x,0) is to idle if C 0. To establish the otimality of the threshold structure for state (x, ), we need the discounted cost otimality equation (see the roof of Theorem 2 and [5]): if the first condition in the roof of Theorem 2 holds, then for any state s the minimum exected total discounted cost J α (s) satisfies X - X + J α (s) = min a {0,} C(s,a)+αE[J α(s )], where the exectation is taken over all ossible next state s reachable from state s. Subsequently, we intend to rove that an α-otimal olicy is the threshold tye. Let J α (s;a) = C(s;a)+αE[J α (s )]. Then, J α (s) = min a {0,} J α (s;a). Moreover, an α-otimal action for state s is argmin a {0,} J α (s;a) [5]. Suose an α-otimal action for state (x,) is to udate, i.e., J α (x,;) J α (x,;0) 0. Then, an α-otimal action for state (x+,) is still to udate since J α (x+,;) J α (x+,;0) =(+C +αe[j α (,λ )]) (x+2+αe[j α (x+2,λ )]) (a) (+C +αe[j α (,λ )]) (x++αe[j α (x+,λ )]) =J α (x,;) J α (x,;0) 0, where (a) results from the non-decreasing function of J α (x,λ) in x given λ (see roof of Theorem 2). Hence, an α-otimal olicy is the threshold tye. Finally, we conclude that a cost-otimal olicy is the threshold tye as well since a cost-otimal olicy is the limit oint of α-otimal olicies with α if both conditions in the roof of Theorem 2 hold [2]. To find an otimal threshold for minimizing the average cost, we exlicitly derive the average cost in the next theorem. Theorem 5. Given the threshold-tye olicy with the threshold X {,2, }, then the average cost, denoted by C( X), under the olicy is C( X) = X 2 2 +( 2 ) X + 2 +C X +. (2) Proof: Let Y(t) be the age after an action in slot t; recisely, ifs(t) = (x,λ) anda(t) = a, theny(t) = x+ x a λ. - Note that Y(t), called ost-action age (similar to the ostdecsion state [7]), is different from the re-action age X(t). The ost-action age Y(t) forms a DTMC in Fig. 3. Moreover, we associate each state in the DTMC with a cost. The DTMC incurs the cost of C+ in slot t when the ost-action age in slot t is Y(t) = since the ost-action age Y(t) = imlies that the BS udates the user, while incurring the cost of y in slot t when the ost-action age is Y(t) = y. Then, the steady-state distribution π = (π,π 2, ) of the DTMC is π i = X+ X+ if i =,, X; ( ) i X if i = X +,. Therefore, the average cost of the DTMC is (+C)π + iπ i = i=2 X 2 2 +( 2 ) X + 2 +C X +. We want to elaborate on the ost-action age in the roof. There might be alternatives for obtaining the average cost as follows: As in many literature, e.g., [3], we can find an otimal action for each state by solving the average cost otimality equation [5]. However, we might not arrive at the average cost otimality equation as this is an infinitestate MDP [6]. Even though the average cost otimality equation is established, it is usually hard to solve the otimality equation directly. Given threshold X, then state (X(t),Λ(t)) forms a twodimensional DTMC. It is usually hard to solve the steadystate distribution for a multi-dimensional DTMC. Given threshold X, the re-action age X(t) forms a DTMC as well. However, we cannot associate each state with a cost, since the cost deends on not only state but also action (see Eq. ()). On the contrary, the cost for the ost-action age Y(t) is determined by state only. D. Deriving the Whittle index Now, we are ready to define the Whittle index. Definition 6. We define the Whittle index I(s) by the cost C that makes both actions for state s equally desirable. Theorem 7. The Whittle index of the sub-roblem for state (x,λ) is { 0 if λ = 0; I(x,λ) = x 2 2 x 2 + x if λ =. Proof: It is obvious that the Whittle index for state (x,0) is I(x,0) = 0 as both actions result in the same immediate cost and the same age of next slot if C = 0. Let g(x) = C(x) in Eq. (2) for the domain of {x R : x }. Note that g(x) is strictly convex in the domain. Let x be the minimizer of g(x). Then, an otimal threshold for minimizing the average cost C( X) is either x or x : the otimal threshold is X = x if C( x ) C( x ) and

5 X = x if C( x ) C( x ). If there is a tie, both choices are otimal. Hence, both actions for state (x, ) are equally desirable if and only if the age x satisfies C(x) = C(x+), i.e., x = x and both thresholds of x and x+ are otimal. By solving the above equation, we obtain the cost to make both actions equally desirable, as the Whittle index in the theorem. According to Theorem 7, both actions might have a tie. If there is a tie, we break the tie in favor of idling. Then, we can exlicitly exress the otimal threshold in the next theorem. Theorem 8. The otimal threshold for minimizing the average cost is X = x if the cost satisfies I(x,) C < I(x,), for all x =,2,. Proof: Since I(x) is the cost to make both actions for state (x,) equally desirable and we break a tie in favor of idling, then the otimal threshold is X = x + if the cost is C = I(x), for all x. We claim that the otimal threshold monotonically increases with cost C, and then the theorem follows. To verify the claim, we can focus on an α-otimal olicy according to the roof of Theorem 4. Suose that an α- otimal action, associated with a cost C, for state (x,) is to idle, i.e., x++αe[j α (x+,λ )] +C +αe[j α (,λ )]. Then, an α-otimal action, associated with a cost C 2 C, for state (x,) is to idle as well since x++αe[j α (x+,λ )] +C +αe[j α (,λ )] +C 2 +αe[j α (,λ )]. Then, the monotonicity is established. Next, according to [4], we have to demonstrate the indexability defined as follows. Definition 9. Given cost C, let S(C) be the set of states s such that the otimal action for the states is to idle. The subroblem is indexable if the set S(C) monotonically increases from the emty set to the entire state sace, as C increases from to. Theorem 0. The sub-roblem is indexable. Proof: If C < 0, the otimal action for every state is to udate; as such, S(0) =. If C 0, then S(C) is comosed of the set {s = (x,0) : x =,2, } and a set of (x,) for some x s. According to Theorem 8, the otimal threshold monotonically increases as C increases, and hence the set S(C) monotonically increases to the entire state sace. E. Scheduling algorithm design Now, we are ready to roose a scheduling algorithm based on the Whittle index. For each slot t, the BS observes age X i (t) and arrival indicator Λ i (t) for every user u i ; then, udate a user u i with the highest value of the Whittle index 6 5.5 5 4.5 4 3.5 0.2 0.4 0.6 0.8 Fig. 4. Average age for two users under the roosed Whittle index scheduling algorithm and an age-otimal scheduling algorithm. I(X i (t),λ i (t)). We can think of the index I(X i (t),λ i (t)) as the cost to udate user u i. The intuition of the scheduling algorithm is that the BS intends to send the most valuable acket. In Fig. 4, we comare the roosed algorithm with the age-otimal scheduling algorithm in [2] for two users over 00,000 slots. It turns out that the simle index algorithm almost achieves the minimum average age. IV. CONCLUSION This aer treated a wireless broadcast network, where many users are interested in different tyes of information delivered by a base-station. Under a transmission constraint, we studied a transmission scheduling roblem, with resect to the age of information. We have roosed a low-comlexity scheduling algorithm leveraging the Whittle s methodology. Numerical studies showed that the roosed algorithm almost minimizes the average age. To investigate a regime under which the roosed algorithm is otimal would be an interesting extension. ACKNOWLEDGEMENT The author is grateful to anonymous reviewers for their constructive comments. REFERENCES [] S. Kaul, R. D. Yates, and M. Gruteser, Real-Time Status: How Often Should One Udate? Proc of IEEE INFOCOM,. 273 2735, 202. [2] Y.-P. Hsu, E. Modiano, and L. Duan, Age of Information: Design and Analysis of Otimal Scheduling Algorithms, Proc. of IEEE ISIT,. 56 565, 207. [3] J. Gittins, K. Glazebrook, and R. Weber, Multi-Armed Bandit Allocation Indices. John Wiley & Sons, 20. [4] P. Whittle, Restless Bandits: Activity Allocation in a Changing World, Journal of alied robability,. 287 298, 988. [5] R. R. Weber and G. Weiss, On an index olicy for restless bandits, Journal of Alied Probability, vol. 27, no. 3,. 637 648, 990. [6] M. Larranaga, U. Ayesta, and I. M. Verloo, Stochastic and Fluid Index Policies for Resource Allocation Problems, Proc. of IEEE INFOCOM,. 230 238, 205. [7] D. P. Bertsekas, Dynamic Programming and Otimal Control Vol. I and II. Athena Scientific, 202. [8] M. Costa, M. Codreanu, and A. Ehremides, On the Age of Information in Status Udate Systems with Packet Management, IEEE Trans. Inf. Theory, vol. 62, no. 4,. 897 90, 206.

[9] A. M. Bedewy, Y. Sun, and N. B. Shroff, Age-Otimal Information Udates in Multiho Networks, Proc. of IEEE ISIT,. 576 580, 207. [0] A. Kosta, N. Paas, and V. Angelakis, Age of Information: A New Concet, Metric, and Tool, Foundations and Trends R in Networking, vol. 2, no. 3,. 62 259, 207. [] Q. He, D. Yuan, and A. Ehremides, Otimal Link Scheduling for Age Minimization in Wireless Systems, acceted to IEEE Trans. Inf. Theory, 207. [2] C. Joo and A. Eryilmaz, Wireless Scheduling for Information Freshness and Synchrony: Drift-based Design and Heavy- Traffic Analysis, Proc. of IEEE WIOPT,. 8, 207. [3] I. Kadota, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, Minimizing Age of Information in Broadcast Wireless Networks, Proc. of Allerton, 206. [4] R. D. Yates, P. Ciblat, A. Yener, and M. Wigger, Age-Otimal Constrained Cache Udating, Proc. of IEEE ISIT,. 4 45, 207. [5] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. The MIT Press, 994. [6] L. I. Sennott, Average Cost Otimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs, Oerations Research, vol. 37,. 626 633, 989. [7] W. B. Powell, Aroximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley & Sons, 20. 6