Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem
|
|
- Cory Wheeler
- 5 years ago
- Views:
Transcription
1 Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard (USC) Joint work with Bhaskar Krishnamachari (USC) and Tara Javidi (UCSD) September 4, 2013
2 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 2/37 Outline Introduction problem formulation main results Analysis: some key properties Simulation results Summary
3 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 3/37 Motivation In many network protocols, a device must set the communication parameters to maximize the utilization of the resource whose availability is a stochastic process. One prominent example is congestion control, in which a transmitter must select the transmission rate to utilize the available bandwidth, which varies randomly due to the dynamic nature of traffic load imposed by other users on the network. The goal is to find the optimal policy to maximize the total reward (utilization minus penalty)
4 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 4/37 Assume available bandwidth varies as a Markovian process. A sender wants to set its transmission rate at each time step. If the sender selects a rate higher than the available bandwidth, - it can utilize the whole available bandwidth - but has to pay an over-utilization penalty - perfect information about the current available bandwidth is revealed If the user selects a rate less than the available bandwidth, - it does not experience loss (no penalty), - but the available bandwidth is under-utilized. - the sender gets partial information about the available bandwidth a trade-off between getting more information about the available bandwidth and paying less penalty.
5 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 5/37 Assumptions: a Discrete-time finite-state Markov process, whose state is denoted by B t the finite horizon by T and the discrete time steps by t = 1, 2,..., T. A known transition matrix The state of the process is not fully observable Partially Observable Markov Decision Process (POMDP)
6 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 6/37 At each time step, the decision-maker selects an action based on the history of observations It earns a reward which is a function of the selected action and the state (belief vector) Objective: selecting the sequential actions to maximize the total expected discounted reward
7 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 7/37 POMDP tuple {M, P, B, A, O, U, R} State: B t, is one of the elements of a finite state set denoted by M = {1, 2,..., M} State transition: The transition probabilities of the states B t over time - assumed to be known and stationary - indicated by an M M transition probability matrix, P. - elements P i,j = Pr(B t+1 = j B t = i), i, j M, t Belief vector: The probability distribution of the resource state, - given all past observations, - denoted by a belief vector b t = [b t (1),..., b t (M)], - with elements of b t (k) = Pr(B t = k), k M
8 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 8/37 POMDP tuple {M, P, B, A, O, U, R} Action: At each time step, according to the current belief, we choose an action r t A = {1,..., M}. Observed information: defined by the events o t (r t ) O as follows: - o t (r t ) = {B t = i}, i = 1,..., r t 1 is the event of fully observing B t. - o t (r t ) = {B t r t } is the event of partial observation
9 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 9/37 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 5, r t = 3.
10 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 10/37 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 2, r t = 3.
11 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 11/37 POMDP tuple {M, P, B, A, O, U, R} Belief updating: a mapping U : A O B B. { T rt b t P if r t B t b t+1 = I Bt P if r t > B t, T r : a non-linear operation on a belief vector b: { 0 if i < r T r b(i) = b(i) M j=r b(j) if i r.
12 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 12/37 POMDP tuple {M, P, B, A, O, U, R} Reward: The immediate reward earned is a mapping R : A O R: { qb t C u (r t B t ) if r t > B t R(B t, r t ) = qr t C l (B t r t ) if r t B t, (1) - C u : over-utilization and C l : under-utilization penalty coefficients, - q is the gain unit. Reward function in this talk Coefficients q = 1, C l = 0, C u = C B t : the available bandwidth and r t : the selected rate
13 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 13/37 Reward function of newsvendor problem Newsvendor problem with perishable products, B t : the demand and r t : the selected inventory level, the reward function: { R Inventory pb t cr t if r t > B t (B t, r t ) = (p c)r t C l (B t r t ) if r t B t, p: the benefit per unit of selling c: the cost spent per unit of purchasing. C l : the dissatisfaction of the customers per un-served demands.
14 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 14/37 The policy π specifies a sequence of functions π 1,..., π T, π t : B A, r t = π t (b t ). Goal: to maximize the total expected discounted reward in the finite horizon T, over all admissible policies π, given by max π Jπ T (b 0 ) = max E π [ π T β t R(B t ; r t ) b 0 ], (2) - 0 β 1: the discount factor and b 0 : the initial belief vector. - The optimal policy π opt : a policy which maximizes (2) - It exists since the number of admissible policies are finite. t=0
15 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 15/37 Dynamic programming (DP) V t (b t ) = max r t V t (b t ; r t ), V t (b t ; r t ) = R(b t ; r t ) + βe{v t+1 (b t+1 ) r t }, t T V T (b T ; r T ) = R T (b T ; r T ), The value function V t (b t ): the maximum remaining expected reward accrued starting from time t when the current belief vector is b t V t (b t ; r t ) is the remaining expected reward accrued after time t with choosing action r t at time t and following the optimal policy for time t + 1,..., T. V t (b t ; r t ) is the summation of two terms: (i) the immediate expected reward (ii) the discounted future expected reward which can be computed as follows:
16 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 16/37 Optimal Policy rt opt (b) = arg max V t(b; r). r A There is no efficiently computable solution for the above POMDP problem, We present upper and lower bounds on the optimal actions.
17 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 17/37 Assumption 1 The P matrix satisfies the State-Independent State Change (SISC) property. SISC property: P i,i+k = p j,j+k. Only indicating the probability of moving k step higher, p k, independent of which state we are, such that k < 0 corresponds to moving k steps lower.
18 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 18/37 Assumption 2 The P matrix satisfies SISC property with edge effects. Edge effect: the transition matrix will be affected by the limits (edges) of the state set, since the state set M is limited from both sides. For example for M = 4, p 1 = p, p 1 = 1 p p p p 0 p p 0 p p 1 p
19 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 19/37 Theorem 1: Lower bound r lb = min{r M : r i=1 b(i) C }. This lower bound is equal to the myopic action which at each time step selects the action maximizing the immediate expected reward and ignores its impact on the future reward: r myopic (b) = arg max R(b; r). r M A percentile threshold structure: the lowest action with the cumulative distribution (the summation of the beliefs up to the action) is higher than a threshold.
20 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 20/37 Theorem 2: Upper bound Under Assumption 1 or 2, r ub = min{r M : f (β) S r + [(1 + C) f (β)r]s r C 0}, S r r h i=r+1 b(i), S r r h i=r+1 ib(i), 1 1 βt f (β) β 1 β. r l r lb = r myopic r opt r ub r h, r l :the lowest and r h : the highest states with non-zero beliefs
21 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 21/37 Proposition 1 A looser upper bound, r ub2 r ub, r ub2 = min{r M : 1 1 βt where U = β 1 β (r h r l ). r b(i) 1 + U 1 + C + U }, i=1 - A percentile threshold structure with an extra term of U in the nominator and denominator of the threshold - r ub2 is an increasing function of U.
22 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 22/37 Lemma 1 The expected immediate reward is a uni-modal function of the action r, Lemma 2 V t (b; r), and V t (b), are convex with respect to the belief vector b, V t (b; r) λv t (b 1 ; r) + (1 λ)v t (b 2 ; r), r M, V t (b) λv t (b 1 ) + (1 λ)v t (b 2 ), 0 λ 1. Lemma 3 The future expected reward, V f t (b; r), is monotonically increasing in the action, V f t (b; r 1 ) V f t (b; r 2 ) 0, r 1 r 2.
23 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 23/37 First Order Stochastically Dominance Let b 1, b 2 B be any two belief vectors, b 1 first order stochastically dominates b 2 (or b 1 is FOSD greater than b 2 ), denoted as b 1 s b 2, if M b 1 (j) j=r M b 2 (j), r {1,..., M}. j=r Lemma 4 Under Assumption 1 or 2, the value function is a FOSD-increasing function of the belief vector. i.e., if b 1 s b 2, then V (b 1 ) V (b 2 ).
24 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 24/37 b α : shifted version of b by α steps, i.e. b α (i) = b(i + α). Lemma 5 R(b α ; r) = R(b; r α) + α, r myopic (b α ) = r myopic (b) + α.
25 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 25/37 Lemma 6 Under Assumption 1, V t (b α t+1 1 βt ; r) V t (b; r α) = α, 1 β rt opt (b α ) = rt opt (b) + α, V t (b α t 1 βt ) = V t (b) + α 1 β. Note for β = 1, we need to substitute 1 βx 1 β by x. Lemma 7 Under Assumption 2, V t (b α t 1 βt ) V t (b) + α 1 β.
26 Simulation Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 26/37
27 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 27/37 Figure : Selected actions by EM (τ = 4) and their corresponding lower and upper bounds, for C = 5, β = 0.8, M = 10, and transition of p 1 = p 1 = 0.3, p 0 = 0.4 Note that the stars in the figure indicate the non-zero beliefs. This figure shows the policy sequence where the selected actions does not exceed B t.
28 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 28/37 Figure : The gap between the lower and the upper bounds versus β and the variance (β = 0.8), for C = 5. The simulation parameters, except in the figures that their effect is considered, are fixed as follows: M = 10,C = 5, β = 0.8, and the transition probabilities p 1 = p 1 = 0.3, p 0 = 0.4.
29 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 29/37 Figure : The total expected discounted reward for two sub-optimal policies versus β for C = 5, for horizon T = 100. Now we compare two sub-optimal policies: (i) the myopic policy, (ii) the upper-bound (UB) policy These policies pick the myopic and UB actions, respectively, at all time steps and update their belief vectors according to these actions.
30 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 30/37 Figure : The total expected discounted reward for two sub-optimal policies versus C for β = 0.8, τ = 4, for horizon T = 100 and transition of p 1 = p 1 = 0.3, p 0 = 0.4.
31 Related Works Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 31/37 Existing techniques, such as Transmission Control Protocol (TCP), adopt an Additive Increase Multiplicative Decrease (AIMD) algorithm, that adjusts the congestion window based on the transmission acknowledgments. A. Bensoussan et al., A multiperiod newsvendor problem with partially observed demand, 2007, consider a POMDP problem where the demand is a Markovian process. They consider the setting where the resources and actions are both continuous, as well as the case where the resources are discrete but the actions remain continuous. For these settings, they also show that the optimal actions exceed the myopic actions. P. Mansourifard, B. Krishnamachari, T. Javidi, Bayesian Congestion Control over a Markovian Network Bandwidth Process, Invited paper in Asilomar 2013.
32 Conclusions and Future Work Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 32/37 Summary We formulated a Bayesian congestion control problem in which a source must select the transmission rate (the action) over a network whose available bandwidth (resource) evolves as a stochastic process. We modeled the problem as a POMDP and derived some key properties for the myopic and the optimal policies. We proved structural results providing bounds on the optimal actions, yielding tractable sub-optimal solutions that have been shown via simulations to perform well. We conjecture that there may be even better approximation for the optimal policy with the similar percentile threshold structure.
33 Thank you for your attention Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 33/37
Bayesian Congestion Control over a Markovian Network Bandwidth Process
Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work
More informationPercentile Threshold Policies for Inventory Problems with Partially Observed Markovian Demands
Percentile Threshold Policies for Inventory Problems with Partially Observed Markovian Demands Parisa Mansourifard Joint work with: Bhaskar Krishnamachari and Tara Javidi (UCSD) University of Southern
More informationPercentile Policies for Inventory Problems with Partially Observed Markovian Demands
Proceedings o the International Conerence on Industrial Engineering and Operations Management Percentile Policies or Inventory Problems with Partially Observed Markovian Demands Farzaneh Mansouriard Department
More informationPower Allocation over Two Identical Gilbert-Elliott Channels
Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa
More informationA Game Theoretic Approach to Newsvendor Problems with Censored Markovian Demand
Paris, France, July 6-7, 018 Game Theoretic pproach to Newsvendor Problems with Censored Markovian Demand Parisa Mansourifard Ming Hsieh Department of Electrical Engineering University of Southern California
More informationSTRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS
STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS Qing Zhao University of California Davis, CA 95616 qzhao@ece.ucdavis.edu Bhaskar Krishnamachari University of Southern California
More informationOnline Learning to Optimize Transmission over an Unknown Gilbert-Elliott Channel
Online Learning to Optimize Transmission over an Unknown Gilbert-Elliott Channel Yanting Wu Dept. of Electrical Engineering University of Southern California Email: yantingw@usc.edu Bhaskar Krishnamachari
More informationDynamic Pricing for Non-Perishable Products with Demand Learning
Dynamic Pricing for Non-Perishable Products with Demand Learning Victor F. Araman Stern School of Business New York University René A. Caldentey DIMACS Workshop on Yield Management and Dynamic Pricing
More informationOptimality of Myopic Sensing in Multi-Channel Opportunistic Access
Optimality of Myopic Sensing in Multi-Channel Opportunistic Access Tara Javidi, Bhasar Krishnamachari, Qing Zhao, Mingyan Liu tara@ece.ucsd.edu, brishna@usc.edu, qzhao@ece.ucdavis.edu, mingyan@eecs.umich.edu
More informationMulti-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays
Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays Sahand Haji Ali Ahmad, Mingyan Liu Abstract This paper considers the following stochastic control problem that arises
More informationOptimality of Myopic Sensing in Multi-Channel Opportunistic Access
Optimality of Myopic Sensing in Multi-Channel Opportunistic Access Tara Javidi, Bhasar Krishnamachari,QingZhao, Mingyan Liu tara@ece.ucsd.edu, brishna@usc.edu, qzhao@ece.ucdavis.edu, mingyan@eecs.umich.edu
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationPerformance of Round Robin Policies for Dynamic Multichannel Access
Performance of Round Robin Policies for Dynamic Multichannel Access Changmian Wang, Bhaskar Krishnamachari, Qing Zhao and Geir E. Øien Norwegian University of Science and Technology, Norway, {changmia,
More informationDynamic spectrum access with learning for cognitive radio
1 Dynamic spectrum access with learning for cognitive radio Jayakrishnan Unnikrishnan and Venugopal V. Veeravalli Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University
More informationOptimal and Suboptimal Policies for Opportunistic Spectrum Access: A Resource Allocation Approach
Optimal and Suboptimal Policies for Opportunistic Spectrum Access: A Resource Allocation Approach by Sahand Haji Ali Ahmad A dissertation submitted in partial fulfillment of the requirements for the degree
More informationOPPORTUNISTIC Spectrum Access (OSA) is emerging
Optimal and Low-complexity Algorithms for Dynamic Spectrum Access in Centralized Cognitive Radio Networks with Fading Channels Mario Bkassiny, Sudharman K. Jayaweera, Yang Li Dept. of Electrical and Computer
More informationMarkov decision processes and interval Markov chains: exploiting the connection
Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationReinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies
Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University
More informationAn Introduction to Markov Decision Processes. MDP Tutorial - 1
An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal
More informationExploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks A Whittle s Indexability Analysis
1 Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks A Whittle s Indexability Analysis Wenzhuo Ouyang, Sugumar Murugesan, Atilla Eryilmaz, Ness B Shroff Abstract We address
More informationA Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling
2015 IEEE 54th Annual Conference on Decision and Control (CDC) December 15-18, 2015 Osaka, Japan A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationOpportunistic Spectrum Access for Energy-Constrained Cognitive Radios
1206 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 3, MARCH 2009 Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios Anh Tuan Hoang, Ying-Chang Liang, David Tung Chong Wong,
More informationChannel Probing in Communication Systems: Myopic Policies Are Not Always Optimal
Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Matthew Johnston, Eytan Modiano Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,
More informationPreference Elicitation for Sequential Decision Problems
Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These
More informationInfluence of product return lead-time on inventory control
Influence of product return lead-time on inventory control Mohamed Hichem Zerhouni, Jean-Philippe Gayon, Yannick Frein To cite this version: Mohamed Hichem Zerhouni, Jean-Philippe Gayon, Yannick Frein.
More informationOn the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels
On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],
More informationWireless Channel Selection with Restless Bandits
Wireless Channel Selection with Restless Bandits Julia Kuhn and Yoni Nazarathy Abstract Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may
More informationA POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation
A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New
More informationMulti-Armed Bandit: Learning in Dynamic Systems with Unknown Models
c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University
More informationPartially Observable Markov Decision Processes (POMDPs)
Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationExploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks
Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks Wenzhuo Ouyang, Sugumar Murugesan, Atilla Eryilmaz, Ness B. Shroff Department of Electrical and Computer Engineering The
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationPoint-Based Value Iteration for Constrained POMDPs
Point-Based Value Iteration for Constrained POMDPs Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science Pascal Poupart School of Computer Science IJCAI-2011 2011. 7. 22. Motivation goals
More informationOptimal Power Allocation Policy over Two Identical Gilbert-Elliott Channels
Optimal Power Allocation Policy over Two Identical Gilbert-Elliott Channels Wei Jiang School of Information Security Engineering Shanghai Jiao Tong University, China Email: kerstin@sjtu.edu.cn Junhua Tang
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationA State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels
A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels Krishna Jagannathan, Shie Mannor, Ishai Menache, Eytan Modiano Abstract We consider scheduling over a wireless
More information6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE Control of continuous-time Markov chains Semi-Markov problems Problem formulation Equivalence to discretetime problems Discounted problems Average cost
More informationDialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department
Dialogue management: Parametric approaches to policy optimisation Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 30 Dialogue optimisation as a reinforcement learning
More informationCSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam
CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost
More informationSolving the Newsvendor Problem under Partial Demand Information
Solving the Newsvendor Problem under Partial Demand Information Roberto Rossi 1 Steven D Prestwich 2 S Armagan Tarim 3 Brahim Hnich 4 1 Wageningen University, The Netherlands 2 University College Cork,
More informationPersuading Skeptics and Reaffirming Believers
Persuading Skeptics and Reaffirming Believers May, 31 st, 2014 Becker-Friedman Institute Ricardo Alonso and Odilon Camara Marshall School of Business - USC Introduction Sender wants to influence decisions
More informationOpen Problem: Approximate Planning of POMDPs in the class of Memoryless Policies
Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment
More informationOptimality of Myopic Sensing in Multichannel Opportunistic Access
4040 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 9, SEPTEMBER 2009 Optimality of Myopic Sensing in Multichannel Opportunistic Access Sahand Haji Ali Ahmad, Mingyan Liu, Member, IEEE, Tara Javidi,
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationAlgorithms for Dynamic Spectrum Access with Learning for Cognitive Radio
Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v2 [cs.ni] 21 Nov 2008
More informationDynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition
Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and
More informationNear-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement
Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance
More informationOn the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers
On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,
More informationBayesian Social Learning with Random Decision Making in Sequential Systems
Bayesian Social Learning with Random Decision Making in Sequential Systems Yunlong Wang supervised by Petar M. Djurić Department of Electrical and Computer Engineering Stony Brook University Stony Brook,
More informationAn Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits
An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits Peter Jacko YEQT III November 20, 2009 Basque Center for Applied Mathematics (BCAM), Bilbao, Spain Example: Congestion
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationOptimal Control of an Inventory System with Joint Production and Pricing Decisions
Optimal Control of an Inventory System with Joint Production and Pricing Decisions Ping Cao, Jingui Xie Abstract In this study, we consider a stochastic inventory system in which the objective of the manufacturer
More informationThe Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing
The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing Grace Y. Lin, Yingdong Lu IBM T.J. Watson Research Center Yorktown Heights, NY 10598 E-mail: {gracelin, yingdong}@us.ibm.com
More informationAnalysis of Scalable TCP in the presence of Markovian Losses
Analysis of Scalable TCP in the presence of Markovian Losses E Altman K E Avrachenkov A A Kherani BJ Prabhu INRIA Sophia Antipolis 06902 Sophia Antipolis, France Email:altman,kavratchenkov,alam,bprabhu}@sophiainriafr
More informationPartially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS
Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal
More informationStochastic convexity in dynamic programming
Economic Theory 22, 447 455 (2003) Stochastic convexity in dynamic programming Alp E. Atakan Department of Economics, Columbia University New York, NY 10027, USA (e-mail: aea15@columbia.edu) Received:
More informationStochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook
Stochastic Models Edited by D.P. Heyman Bellcore MJ. Sobel State University of New York at Stony Brook 1990 NORTH-HOLLAND AMSTERDAM NEW YORK OXFORD TOKYO Contents Preface CHARTER 1 Point Processes R.F.
More informationAllocating Resources, in the Future
Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model......
More informationIn diagnostic services, agents typically need to weigh the benefit of running an additional test and improving
MANAGEMENT SCIENCE Vol. 59, No. 1, January 213, pp. 157 171 ISSN 25-199 (print) ISSN 1526-551 (online) http://dx.doi.org/1.1287/mnsc.112.1576 213 INFORMS Diagnostic Accuracy Under Congestion Saed Alizamir
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationAlgorithms for Dynamic Spectrum Access with Learning for Cognitive Radio
Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v4 [cs.ni] 6 Feb 2010
More informationMacro 1: Dynamic Programming 2
Macro 1: Dynamic Programming 2 Mark Huggett 2 2 Georgetown September, 2016 DP Problems with Risk Strategy: Consider three classic problems: income fluctuation, optimal (stochastic) growth and search. Learn
More informationIndex Policies and Performance Bounds for Dynamic Selection Problems
Index Policies and Performance Bounds for Dynamic Selection Problems David B. Brown Fuqua School of Business Duke University dbbrown@duke.edu James E. Smith Tuck School of Business Dartmouth College jim.smith@dartmouth.edu
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationSymbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning
Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus
More informationDynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA
Dynamic Power Management under Uncertain Information Hwisung Jung and Massoud Pedram University of Southern California Los Angeles CA Agenda Introduction Background Stochastic Decision-Making Framework
More informationStochastic Optimization
Chapter 27 Page 1 Stochastic Optimization Operations research has been particularly successful in two areas of decision analysis: (i) optimization of problems involving many variables when the outcome
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationarxiv:cs/ v1 [cs.ni] 27 Feb 2007
Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors Yunxia Chen, Qing Zhao, and Ananthram Swami Abstract arxiv:cs/0702158v1 [cs.ni] 27 Feb 2007 We
More informationUniversity of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming
University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of
More informationMarkovian Model of Internetworking Flow Control
Информационные процессы, Том 2, 2, 2002, стр. 149 154. c 2002 Bogoiavlenskaia. KALASHNIKOV MEMORIAL SEMINAR Markovian Model of Internetworking Flow Control O. Bogoiavlenskaia Petrozavodsk State University
More informationOptimization and Stability of TCP/IP with Delay-Sensitive Utility Functions
Optimization and Stability of TCP/IP with Delay-Sensitive Utility Functions Thesis by John Pongsajapan In Partial Fulfillment of the Requirements for the Degree of Master of Science California Institute
More informationLightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty
JMLR: Workshop and Conference Proceedings vol (212) 1 12 European Workshop on Reinforcement Learning Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty Shie Mannor Technion Ofir Mebel
More informationCongestion In Large Balanced Fair Links
Congestion In Large Balanced Fair Links Thomas Bonald (Telecom Paris-Tech), Jean-Paul Haddad (Ernst and Young) and Ravi R. Mazumdar (Waterloo) ITC 2011, San Francisco Introduction File transfers compose
More informationDynamic Pricing in the Presence of Competition with Reference Price Effect
Applied Mathematical Sciences, Vol. 8, 204, no. 74, 3693-3708 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.2988/ams.204.44242 Dynamic Pricing in the Presence of Competition with Reference Price Effect
More informationOptimally Solving Dec-POMDPs as Continuous-State MDPs
Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI
More informationOptimality Results in Inventory-Pricing Control: An Alternate Approach
Optimality Results in Inventory-Pricing Control: An Alternate Approach Woonghee Tim Huh, Columbia University Ganesh Janakiraman, New York University May 9, 2006 Abstract We study a stationary, single-stage
More informationFinding the Value of Information About a State Variable in a Markov Decision Process 1
05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742
More information6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE
6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE DP for imperfect state info Sufficient statistics Conditional state distribution as a sufficient statistic Finite-state systems Examples 1 REVIEW: IMPERFECT
More informationHigh-dimensional Problems in Finance and Economics. Thomas M. Mertens
High-dimensional Problems in Finance and Economics Thomas M. Mertens NYU Stern Risk Economics Lab April 17, 2012 1 / 78 Motivation Many problems in finance and economics are high dimensional. Dynamic Optimization:
More informationIntroduction to Sequential Teams
Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011 Decentralized
More informationExploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks
1 Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks Wenzhuo Ouyang, Sugumar Murugesan, Atilla Eryilmaz, Ness B. Shroff arxiv:1009.3959v6 [cs.ni] 7 Dec 2011 Abstract We
More informationThe convergence limit of the temporal difference learning
The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector
More informationMarkov decision processes with threshold-based piecewise-linear optimal policies
1/31 Markov decision processes with threshold-based piecewise-linear optimal policies T. Erseghe, A. Zanella, C. Codemo Dept. of Information Engineering, University of Padova, Italy Padova, June 2, 213
More informationOPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS
OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony
More informationPracticable Robust Markov Decision Processes
Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)
More informationEquilibria for games with asymmetric information: from guesswork to systematic evaluation
Equilibria for games with asymmetric information: from guesswork to systematic evaluation Achilleas Anastasopoulos anastas@umich.edu EECS Department University of Michigan February 11, 2016 Joint work
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationEfficient Maximization in Solving POMDPs
Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University
More informationDynamic Inventory Models and Stochastic Programming*
M. N. El Agizy Dynamic Inventory Models and Stochastic Programming* Abstract: A wide class of single-product, dynamic inventory problems with convex cost functions and a finite horizon is investigated
More information(s, S) Optimality in Joint Inventory-Pricing Control: An Alternate Approach*
OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0030-364X eissn 1526-5463 00 0000 0001 INFORMS doi 10.1287/xxxx.0000.0000 c 0000 INFORMS (s, S) Optimality in Joint Inventory-Pricing Control:
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationA Recourse Approach for the Capacitated Vehicle Routing Problem with Evidential Demands
A Recourse Approach for the Capacitated Vehicle Routing Problem with Evidential Demands Nathalie Helal 1, Frédéric Pichon 1, Daniel Porumbel 2, David Mercier 1 and Éric Lefèvre1 1 Université d Artois,
More informationStochastic Analysis of Bidding in Sequential Auctions and Related Problems.
s Case Stochastic Analysis of Bidding in Sequential Auctions and Related Problems S keya Rutgers Business School s Case 1 New auction models demand model Integrated auction- inventory model 2 Optimizing
More information