Bayesian Congestion Control over a Markovian Network Bandwidth Process

Size: px

Start display at page:

Download "Bayesian Congestion Control over a Markovian Network Bandwidth Process"

Rolf Barnett
5 years ago
Views:

1 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work with Bhaskar Krishnamachari (USC) and Tara Javidi (UCSD) November 4, 2013

2 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 2/30 Outline Introduction problem formulation main results Analysis: some key properties Simulation results Summary

3 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 3/30 Motivation In many network protocols, a device must set the communication parameters to maximize the utilization of the resource whose availability is a stochastic process. One prominent example is congestion control, in which a transmitter must select the transmission rate to utilize the available bandwidth, which varies randomly due to the dynamic nature of traffic load imposed by other users on the network. The goal is to find the optimal policy to maximize the total reward (utilization minus penalty)

4 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 4/30 Assume available bandwidth varies as a Markovian process. A sender wants to set its transmission rate at each time step. If the sender selects a rate higher than the available bandwidth, - it can utilize the whole available bandwidth - but has to pay an over-utilization penalty - perfect information about the current available bandwidth is revealed If the user selects a rate less than the available bandwidth, - it does not experience loss (no penalty), - but the available bandwidth is under-utilized. - the sender gets partial information about the available bandwidth a trade-off between getting more information about the available bandwidth and paying less penalty.

5 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 5/30 Assumptions: a Discrete-time finite-state Markov process, whose state is denoted by B t the finite horizon by T and the discrete time steps by t = 1, 2,..., T. A known transition matrix The state of the process is not fully observable Partially Observable Markov Decision Process (POMDP)

6 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 6/30 At each time step, the decision-maker selects an action based on the history of observations It earns a reward which is a function of the selected action and the state (belief vector) Objective: selecting the sequential actions to maximize the total expected discounted reward

7 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 7/30 POMDP tuple {M, P, B, A, O, U, R} State: B t, is one of the elements of a finite state set denoted by M = {1, 2,..., M} State transition: The transition probabilities of the states B t over time - assumed to be known and stationary - indicated by an M M transition probability matrix, P. - elements P i,j = Pr(B t+1 = j B t = i), i, j M, t Belief vector: The probability distribution of the resource state, - given all past observations, - denoted by a belief vector b t = [b t (1),..., b t (M)], - with elements of b t (k) = Pr(B t = k), k M

8 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 8/30 POMDP tuple {M, P, B, A, O, U, R} Action: At each time step, according to the current belief, we choose an action r t A = {1,..., M}. Observed information: defined by the events o t (r t ) O as follows: - o t (r t ) = {B t = i}, i = 1,..., r t 1 is the event of fully observing B t. - o t (r t ) = {B t r t } is the event of partial observation

9 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 9/30 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 5, r t = 3.

10 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 10/30 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 2, r t = 3.

11 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 11/30 POMDP tuple {M, P, B, A, O, U, R} Reward: The immediate reward earned is a mapping R : A O R: { B t C(r t B t ) if r t > B t R(B t, r t ) = r t if r t B t, (1) - C: over-utilization penalty coefficient, - B t : the available bandwidth and r t : the selected rate

12 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 12/30 The policy π specifies a sequence of functions π 1,..., π T, π t : B A, r t = π t (b t ). Goal: to maximize the total expected discounted reward in the finite horizon T, over all admissible policies π, given by max π Jπ T (b 0 ) = max E π [ π T β t R(B t ; r t ) b 0 ], (2) - 0 β 1: the discount factor and b 0 : the initial belief vector. - The optimal policy π opt : a policy which maximizes (2) - It exists since the number of admissible policies are finite. t=0

13 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 13/30 Dynamic programming (DP) V t (b t ) = max r t V t (b t ; r t ), V t (b t ; r t ) = R(b t ; r t ) + βe{v t+1 (b t+1 ) r t }, t T V T (b T ; r T ) = R T (b T ; r T ), The value function V t (b t ): the maximum remaining expected reward accrued starting from time t when the current belief vector is b t V t (b t ; r t ) is the remaining expected reward accrued after time t with choosing action r t at time t and following the optimal policy for time t + 1,..., T. Optimal Policy rt opt (b) = arg max V t(b; r). r A

14 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 14/30 Assumption 1 The P matrix satisfies the State-Independent State Change (SISC) property. SISC property: P i,i+k = p j,j+k. Only indicating the probability of moving k step higher, p k, independent of which state we are, such that k < 0 corresponds to moving k steps lower.

15 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 15/30 Assumption 2 The P matrix satisfies SISC property with edge effects. Edge effect: the transition matrix will be affected by the limits (edges) of the state set, since the state set M is limited from both sides. For example for M = 4, p 1 = p, p 1 = 1 p p p p 0 p p 0 p p 1 p

16 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 16/30 Theorem 1: Lower bound r lb = min{r M : r i=1 b(i) C }. This lower bound is equal to the myopic action which at each time step selects the action maximizing the immediate expected reward and ignores its impact on the future reward: r myopic (b) = arg max R(b; r). r M A percentile threshold structure: the lowest action with the cumulative distribution (the summation of the beliefs up to the action) is higher than a threshold.

17 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 17/30 Theorem 2: Upper bound Under Assumption 1 or 2, r ub = min{r M : f (β) S r + [(1 + C) f (β)r]s r C 0}, S r r h i=r+1 b(i), S r r h i=r+1 ib(i), 1 1 βt f (β) β 1 β. r l r lb = r myopic r opt r ub r h, r l :the lowest and r h : the highest states with non-zero beliefs

18 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 18/30 Lemma 1 The expected immediate reward is a uni-modal function of the action r, Lemma 2 V t (b; r), and V t (b), are convex with respect to the belief vector b, V t (b; r) λv t (b 1 ; r) + (1 λ)v t (b 2 ; r), r M, V t (b) λv t (b 1 ) + (1 λ)v t (b 2 ), 0 λ 1. Lemma 3 The future expected reward, V f t (b; r), is monotonically increasing in the action, V f t (b; r 1 ) V f t (b; r 2 ) 0, r 1 r 2.

19 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 19/30 First Order Stochastically Dominance Let b 1, b 2 B be any two belief vectors, b 1 first order stochastically dominates b 2 (or b 1 is FOSD greater than b 2 ), denoted as b 1 s b 2, if M b 1 (j) j=r M b 2 (j), r {1,..., M}. j=r Lemma 4 Under Assumption 1 or 2, the value function is a FOSD-increasing function of the belief vector. i.e., if b 1 s b 2, then V (b 1 ) V (b 2 ).

20 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 20/30 b α : shifted version of b by α steps, i.e. b α (i) = b(i + α). Lemma 5 R(b α ; r) = R(b; r α) + α, r myopic (b α ) = r myopic (b) + α. Lemma 7 Under Assumption 2, V t (b α t 1 βt ) V t (b) + α 1 β.

21 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 21/30 Figure : The gap between the lower and the upper bounds versus β and the variance (β = 0.8), for C = 5. The simulation parameters, except in the figures that their effect is considered, are fixed as follows: M = 10,C = 5, β = 0.8, and the transition probabilities p 1 = p 1 = 0.3, p 0 = 0.4.

22 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 22/30 Figure : The total expected discounted reward for two sub-optimal policies versus β for C = 5, for horizon T = 100. Now we compare two sub-optimal policies: (i) the myopic policy, (ii) the upper-bound (UB) policy These policies pick the myopic and UB actions, respectively, at all time steps and update their belief vectors according to these actions.

23 Conclusions and Future Work Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 23/30 Summary and Future Work We formulated a Bayesian congestion control problem in which a source must select the transmission rate (the action) over a network whose available bandwidth (resource) evolves as a stochastic process. We modeled the problem as a POMDP and derived some key properties for the myopic and the optimal policies. We proved structural results providing bounds on the optimal actions, yielding tractable sub-optimal solutions that have been shown via simulations to perform well. We conjecture that there may be even better approximation for the optimal policy with the similar percentile threshold structure. Finding upper bound for general form of transition matrix. Heuristic policies better than two sub-optimal policies.

24 Thank you for your attention Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 24/30

25 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 25/30 Figure : Selected actions by EM (τ = 4) and their corresponding lower and upper bounds, for C = 5, β = 0.8, M = 10, and transition of p 1 = p 1 = 0.3, p 0 = 0.4 Note that the stars in the figure indicate the non-zero beliefs. This figure shows the policy sequence where the selected actions does not exceed B t.

26 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 26/30 Figure : The total expected discounted reward for two sub-optimal policies versus C for β = 0.8, τ = 4, for horizon T = 100 and transition of p 1 = p 1 = 0.3, p 0 = 0.4.

27 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 27/30 POMDP tuple {M, P, B, A, O, U, R} Belief updating: a mapping U : A O B B. { T rt b t P if r t B t b t+1 = I Bt P if r t > B t, T r : a non-linear operation on a belief vector b: { 0 if i < r T r b(i) = b(i) M j=r b(j) if i r.

28 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 28/30 Proposition 1 A looser upper bound, r ub2 r ub, r ub2 = min{r M : 1 1 βt where U = β 1 β (r h r l ). r b(i) 1 + U 1 + C + U }, i=1 - A percentile threshold structure with an extra term of U in the nominator and denominator of the threshold - r ub2 is an increasing function of U.

29 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 29/30 Lemma 6 Under Assumption 1, V t (b α t+1 1 βt ; r) V t (b; r α) = α, 1 β rt opt (b α ) = rt opt (b) + α, V t (b α t 1 βt ) = V t (b) + α 1 β. Note for β = 1, we need to substitute 1 βx 1 β by x. Lemma 7 Under Assumption 2, V t (b α t 1 βt ) V t (b) + α 1 β.

30 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 30/30 V t (b t ; r t ) is the summation of two terms: (i) the immediate expected reward (ii) the discounted future expected reward. There is no efficiently computable solution for the above POMDP problem, We present upper and lower bounds on the optimal actions.

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process: