Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Size: px

Start display at page:

Download "Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem"

Cory Wheeler
5 years ago
Views:

1 Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard (USC) Joint work with Bhaskar Krishnamachari (USC) and Tara Javidi (UCSD) September 4, 2013

2 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 2/37 Outline Introduction problem formulation main results Analysis: some key properties Simulation results Summary

3 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 3/37 Motivation In many network protocols, a device must set the communication parameters to maximize the utilization of the resource whose availability is a stochastic process. One prominent example is congestion control, in which a transmitter must select the transmission rate to utilize the available bandwidth, which varies randomly due to the dynamic nature of traffic load imposed by other users on the network. The goal is to find the optimal policy to maximize the total reward (utilization minus penalty)

4 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 4/37 Assume available bandwidth varies as a Markovian process. A sender wants to set its transmission rate at each time step. If the sender selects a rate higher than the available bandwidth, - it can utilize the whole available bandwidth - but has to pay an over-utilization penalty - perfect information about the current available bandwidth is revealed If the user selects a rate less than the available bandwidth, - it does not experience loss (no penalty), - but the available bandwidth is under-utilized. - the sender gets partial information about the available bandwidth a trade-off between getting more information about the available bandwidth and paying less penalty.

5 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 5/37 Assumptions: a Discrete-time finite-state Markov process, whose state is denoted by B t the finite horizon by T and the discrete time steps by t = 1, 2,..., T. A known transition matrix The state of the process is not fully observable Partially Observable Markov Decision Process (POMDP)

6 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 6/37 At each time step, the decision-maker selects an action based on the history of observations It earns a reward which is a function of the selected action and the state (belief vector) Objective: selecting the sequential actions to maximize the total expected discounted reward

7 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 7/37 POMDP tuple {M, P, B, A, O, U, R} State: B t, is one of the elements of a finite state set denoted by M = {1, 2,..., M} State transition: The transition probabilities of the states B t over time - assumed to be known and stationary - indicated by an M M transition probability matrix, P. - elements P i,j = Pr(B t+1 = j B t = i), i, j M, t Belief vector: The probability distribution of the resource state, - given all past observations, - denoted by a belief vector b t = [b t (1),..., b t (M)], - with elements of b t (k) = Pr(B t = k), k M

8 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 8/37 POMDP tuple {M, P, B, A, O, U, R} Action: At each time step, according to the current belief, we choose an action r t A = {1,..., M}. Observed information: defined by the events o t (r t ) O as follows: - o t (r t ) = {B t = i}, i = 1,..., r t 1 is the event of fully observing B t. - o t (r t ) = {B t r t } is the event of partial observation

9 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 9/37 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 5, r t = 3.

10 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 10/37 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 2, r t = 3.

11 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 11/37 POMDP tuple {M, P, B, A, O, U, R} Belief updating: a mapping U : A O B B. { T rt b t P if r t B t b t+1 = I Bt P if r t > B t, T r : a non-linear operation on a belief vector b: { 0 if i < r T r b(i) = b(i) M j=r b(j) if i r.

12 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 12/37 POMDP tuple {M, P, B, A, O, U, R} Reward: The immediate reward earned is a mapping R : A O R: { qb t C u (r t B t ) if r t > B t R(B t, r t ) = qr t C l (B t r t ) if r t B t, (1) - C u : over-utilization and C l : under-utilization penalty coefficients, - q is the gain unit. Reward function in this talk Coefficients q = 1, C l = 0, C u = C B t : the available bandwidth and r t : the selected rate

13 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 13/37 Reward function of newsvendor problem Newsvendor problem with perishable products, B t : the demand and r t : the selected inventory level, the reward function: { R Inventory pb t cr t if r t > B t (B t, r t ) = (p c)r t C l (B t r t ) if r t B t, p: the benefit per unit of selling c: the cost spent per unit of purchasing. C l : the dissatisfaction of the customers per un-served demands.

14 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 14/37 The policy π specifies a sequence of functions π 1,..., π T, π t : B A, r t = π t (b t ). Goal: to maximize the total expected discounted reward in the finite horizon T, over all admissible policies π, given by max π Jπ T (b 0 ) = max E π [ π T β t R(B t ; r t ) b 0 ], (2) - 0 β 1: the discount factor and b 0 : the initial belief vector. - The optimal policy π opt : a policy which maximizes (2) - It exists since the number of admissible policies are finite. t=0

15 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 15/37 Dynamic programming (DP) V t (b t ) = max r t V t (b t ; r t ), V t (b t ; r t ) = R(b t ; r t ) + βe{v t+1 (b t+1 ) r t }, t T V T (b T ; r T ) = R T (b T ; r T ), The value function V t (b t ): the maximum remaining expected reward accrued starting from time t when the current belief vector is b t V t (b t ; r t ) is the remaining expected reward accrued after time t with choosing action r t at time t and following the optimal policy for time t + 1,..., T. V t (b t ; r t ) is the summation of two terms: (i) the immediate expected reward (ii) the discounted future expected reward which can be computed as follows:

16 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 16/37 Optimal Policy rt opt (b) = arg max V t(b; r). r A There is no efficiently computable solution for the above POMDP problem, We present upper and lower bounds on the optimal actions.

17 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 17/37 Assumption 1 The P matrix satisfies the State-Independent State Change (SISC) property. SISC property: P i,i+k = p j,j+k. Only indicating the probability of moving k step higher, p k, independent of which state we are, such that k < 0 corresponds to moving k steps lower.

18 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 18/37 Assumption 2 The P matrix satisfies SISC property with edge effects. Edge effect: the transition matrix will be affected by the limits (edges) of the state set, since the state set M is limited from both sides. For example for M = 4, p 1 = p, p 1 = 1 p p p p 0 p p 0 p p 1 p

19 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 19/37 Theorem 1: Lower bound r lb = min{r M : r i=1 b(i) C }. This lower bound is equal to the myopic action which at each time step selects the action maximizing the immediate expected reward and ignores its impact on the future reward: r myopic (b) = arg max R(b; r). r M A percentile threshold structure: the lowest action with the cumulative distribution (the summation of the beliefs up to the action) is higher than a threshold.

20 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 20/37 Theorem 2: Upper bound Under Assumption 1 or 2, r ub = min{r M : f (β) S r + [(1 + C) f (β)r]s r C 0}, S r r h i=r+1 b(i), S r r h i=r+1 ib(i), 1 1 βt f (β) β 1 β. r l r lb = r myopic r opt r ub r h, r l :the lowest and r h : the highest states with non-zero beliefs

21 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 21/37 Proposition 1 A looser upper bound, r ub2 r ub, r ub2 = min{r M : 1 1 βt where U = β 1 β (r h r l ). r b(i) 1 + U 1 + C + U }, i=1 - A percentile threshold structure with an extra term of U in the nominator and denominator of the threshold - r ub2 is an increasing function of U.

22 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 22/37 Lemma 1 The expected immediate reward is a uni-modal function of the action r, Lemma 2 V t (b; r), and V t (b), are convex with respect to the belief vector b, V t (b; r) λv t (b 1 ; r) + (1 λ)v t (b 2 ; r), r M, V t (b) λv t (b 1 ) + (1 λ)v t (b 2 ), 0 λ 1. Lemma 3 The future expected reward, V f t (b; r), is monotonically increasing in the action, V f t (b; r 1 ) V f t (b; r 2 ) 0, r 1 r 2.

23 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 23/37 First Order Stochastically Dominance Let b 1, b 2 B be any two belief vectors, b 1 first order stochastically dominates b 2 (or b 1 is FOSD greater than b 2 ), denoted as b 1 s b 2, if M b 1 (j) j=r M b 2 (j), r {1,..., M}. j=r Lemma 4 Under Assumption 1 or 2, the value function is a FOSD-increasing function of the belief vector. i.e., if b 1 s b 2, then V (b 1 ) V (b 2 ).

24 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 24/37 b α : shifted version of b by α steps, i.e. b α (i) = b(i + α). Lemma 5 R(b α ; r) = R(b; r α) + α, r myopic (b α ) = r myopic (b) + α.

25 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 25/37 Lemma 6 Under Assumption 1, V t (b α t+1 1 βt ; r) V t (b; r α) = α, 1 β rt opt (b α ) = rt opt (b) + α, V t (b α t 1 βt ) = V t (b) + α 1 β. Note for β = 1, we need to substitute 1 βx 1 β by x. Lemma 7 Under Assumption 2, V t (b α t 1 βt ) V t (b) + α 1 β.

26 Simulation Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 26/37

27 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 27/37 Figure : Selected actions by EM (τ = 4) and their corresponding lower and upper bounds, for C = 5, β = 0.8, M = 10, and transition of p 1 = p 1 = 0.3, p 0 = 0.4 Note that the stars in the figure indicate the non-zero beliefs. This figure shows the policy sequence where the selected actions does not exceed B t.

28 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 28/37 Figure : The gap between the lower and the upper bounds versus β and the variance (β = 0.8), for C = 5. The simulation parameters, except in the figures that their effect is considered, are fixed as follows: M = 10,C = 5, β = 0.8, and the transition probabilities p 1 = p 1 = 0.3, p 0 = 0.4.

29 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 29/37 Figure : The total expected discounted reward for two sub-optimal policies versus β for C = 5, for horizon T = 100. Now we compare two sub-optimal policies: (i) the myopic policy, (ii) the upper-bound (UB) policy These policies pick the myopic and UB actions, respectively, at all time steps and update their belief vectors according to these actions.

30 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 30/37 Figure : The total expected discounted reward for two sub-optimal policies versus C for β = 0.8, τ = 4, for horizon T = 100 and transition of p 1 = p 1 = 0.3, p 0 = 0.4.

31 Related Works Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 31/37 Existing techniques, such as Transmission Control Protocol (TCP), adopt an Additive Increase Multiplicative Decrease (AIMD) algorithm, that adjusts the congestion window based on the transmission acknowledgments. A. Bensoussan et al., A multiperiod newsvendor problem with partially observed demand, 2007, consider a POMDP problem where the demand is a Markovian process. They consider the setting where the resources and actions are both continuous, as well as the case where the resources are discrete but the actions remain continuous. For these settings, they also show that the optimal actions exceed the myopic actions. P. Mansourifard, B. Krishnamachari, T. Javidi, Bayesian Congestion Control over a Markovian Network Bandwidth Process, Invited paper in Asilomar 2013.

32 Conclusions and Future Work Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 32/37 Summary We formulated a Bayesian congestion control problem in which a source must select the transmission rate (the action) over a network whose available bandwidth (resource) evolves as a stochastic process. We modeled the problem as a POMDP and derived some key properties for the myopic and the optimal policies. We proved structural results providing bounds on the optimal actions, yielding tractable sub-optimal solutions that have been shown via simulations to perform well. We conjecture that there may be even better approximation for the optimal policy with the similar percentile threshold structure.

33 Thank you for your attention Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 33/37

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work