Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Size: px
Start display at page:

Download "Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem"

Transcription

1 Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard (USC) Joint work with Bhaskar Krishnamachari (USC) and Tara Javidi (UCSD) September 4, 2013

2 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 2/37 Outline Introduction problem formulation main results Analysis: some key properties Simulation results Summary

3 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 3/37 Motivation In many network protocols, a device must set the communication parameters to maximize the utilization of the resource whose availability is a stochastic process. One prominent example is congestion control, in which a transmitter must select the transmission rate to utilize the available bandwidth, which varies randomly due to the dynamic nature of traffic load imposed by other users on the network. The goal is to find the optimal policy to maximize the total reward (utilization minus penalty)

4 Introduction Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 4/37 Assume available bandwidth varies as a Markovian process. A sender wants to set its transmission rate at each time step. If the sender selects a rate higher than the available bandwidth, - it can utilize the whole available bandwidth - but has to pay an over-utilization penalty - perfect information about the current available bandwidth is revealed If the user selects a rate less than the available bandwidth, - it does not experience loss (no penalty), - but the available bandwidth is under-utilized. - the sender gets partial information about the available bandwidth a trade-off between getting more information about the available bandwidth and paying less penalty.

5 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 5/37 Assumptions: a Discrete-time finite-state Markov process, whose state is denoted by B t the finite horizon by T and the discrete time steps by t = 1, 2,..., T. A known transition matrix The state of the process is not fully observable Partially Observable Markov Decision Process (POMDP)

6 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 6/37 At each time step, the decision-maker selects an action based on the history of observations It earns a reward which is a function of the selected action and the state (belief vector) Objective: selecting the sequential actions to maximize the total expected discounted reward

7 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 7/37 POMDP tuple {M, P, B, A, O, U, R} State: B t, is one of the elements of a finite state set denoted by M = {1, 2,..., M} State transition: The transition probabilities of the states B t over time - assumed to be known and stationary - indicated by an M M transition probability matrix, P. - elements P i,j = Pr(B t+1 = j B t = i), i, j M, t Belief vector: The probability distribution of the resource state, - given all past observations, - denoted by a belief vector b t = [b t (1),..., b t (M)], - with elements of b t (k) = Pr(B t = k), k M

8 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 8/37 POMDP tuple {M, P, B, A, O, U, R} Action: At each time step, according to the current belief, we choose an action r t A = {1,..., M}. Observed information: defined by the events o t (r t ) O as follows: - o t (r t ) = {B t = i}, i = 1,..., r t 1 is the event of fully observing B t. - o t (r t ) = {B t r t } is the event of partial observation

9 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 9/37 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 5, r t = 3.

10 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 10/37 Figure : An example of belief updating: M = 6, p 1 0, p 1 0, B t = 2, r t = 3.

11 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 11/37 POMDP tuple {M, P, B, A, O, U, R} Belief updating: a mapping U : A O B B. { T rt b t P if r t B t b t+1 = I Bt P if r t > B t, T r : a non-linear operation on a belief vector b: { 0 if i < r T r b(i) = b(i) M j=r b(j) if i r.

12 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 12/37 POMDP tuple {M, P, B, A, O, U, R} Reward: The immediate reward earned is a mapping R : A O R: { qb t C u (r t B t ) if r t > B t R(B t, r t ) = qr t C l (B t r t ) if r t B t, (1) - C u : over-utilization and C l : under-utilization penalty coefficients, - q is the gain unit. Reward function in this talk Coefficients q = 1, C l = 0, C u = C B t : the available bandwidth and r t : the selected rate

13 Problem Formulation Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 13/37 Reward function of newsvendor problem Newsvendor problem with perishable products, B t : the demand and r t : the selected inventory level, the reward function: { R Inventory pb t cr t if r t > B t (B t, r t ) = (p c)r t C l (B t r t ) if r t B t, p: the benefit per unit of selling c: the cost spent per unit of purchasing. C l : the dissatisfaction of the customers per un-served demands.

14 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 14/37 The policy π specifies a sequence of functions π 1,..., π T, π t : B A, r t = π t (b t ). Goal: to maximize the total expected discounted reward in the finite horizon T, over all admissible policies π, given by max π Jπ T (b 0 ) = max E π [ π T β t R(B t ; r t ) b 0 ], (2) - 0 β 1: the discount factor and b 0 : the initial belief vector. - The optimal policy π opt : a policy which maximizes (2) - It exists since the number of admissible policies are finite. t=0

15 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 15/37 Dynamic programming (DP) V t (b t ) = max r t V t (b t ; r t ), V t (b t ; r t ) = R(b t ; r t ) + βe{v t+1 (b t+1 ) r t }, t T V T (b T ; r T ) = R T (b T ; r T ), The value function V t (b t ): the maximum remaining expected reward accrued starting from time t when the current belief vector is b t V t (b t ; r t ) is the remaining expected reward accrued after time t with choosing action r t at time t and following the optimal policy for time t + 1,..., T. V t (b t ; r t ) is the summation of two terms: (i) the immediate expected reward (ii) the discounted future expected reward which can be computed as follows:

16 Optimal Policy and Value Function Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 16/37 Optimal Policy rt opt (b) = arg max V t(b; r). r A There is no efficiently computable solution for the above POMDP problem, We present upper and lower bounds on the optimal actions.

17 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 17/37 Assumption 1 The P matrix satisfies the State-Independent State Change (SISC) property. SISC property: P i,i+k = p j,j+k. Only indicating the probability of moving k step higher, p k, independent of which state we are, such that k < 0 corresponds to moving k steps lower.

18 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 18/37 Assumption 2 The P matrix satisfies SISC property with edge effects. Edge effect: the transition matrix will be affected by the limits (edges) of the state set, since the state set M is limited from both sides. For example for M = 4, p 1 = p, p 1 = 1 p p p p 0 p p 0 p p 1 p

19 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 19/37 Theorem 1: Lower bound r lb = min{r M : r i=1 b(i) C }. This lower bound is equal to the myopic action which at each time step selects the action maximizing the immediate expected reward and ignores its impact on the future reward: r myopic (b) = arg max R(b; r). r M A percentile threshold structure: the lowest action with the cumulative distribution (the summation of the beliefs up to the action) is higher than a threshold.

20 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 20/37 Theorem 2: Upper bound Under Assumption 1 or 2, r ub = min{r M : f (β) S r + [(1 + C) f (β)r]s r C 0}, S r r h i=r+1 b(i), S r r h i=r+1 ib(i), 1 1 βt f (β) β 1 β. r l r lb = r myopic r opt r ub r h, r l :the lowest and r h : the highest states with non-zero beliefs

21 Main Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 21/37 Proposition 1 A looser upper bound, r ub2 r ub, r ub2 = min{r M : 1 1 βt where U = β 1 β (r h r l ). r b(i) 1 + U 1 + C + U }, i=1 - A percentile threshold structure with an extra term of U in the nominator and denominator of the threshold - r ub2 is an increasing function of U.

22 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 22/37 Lemma 1 The expected immediate reward is a uni-modal function of the action r, Lemma 2 V t (b; r), and V t (b), are convex with respect to the belief vector b, V t (b; r) λv t (b 1 ; r) + (1 λ)v t (b 2 ; r), r M, V t (b) λv t (b 1 ) + (1 λ)v t (b 2 ), 0 λ 1. Lemma 3 The future expected reward, V f t (b; r), is monotonically increasing in the action, V f t (b; r 1 ) V f t (b; r 2 ) 0, r 1 r 2.

23 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 23/37 First Order Stochastically Dominance Let b 1, b 2 B be any two belief vectors, b 1 first order stochastically dominates b 2 (or b 1 is FOSD greater than b 2 ), denoted as b 1 s b 2, if M b 1 (j) j=r M b 2 (j), r {1,..., M}. j=r Lemma 4 Under Assumption 1 or 2, the value function is a FOSD-increasing function of the belief vector. i.e., if b 1 s b 2, then V (b 1 ) V (b 2 ).

24 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 24/37 b α : shifted version of b by α steps, i.e. b α (i) = b(i + α). Lemma 5 R(b α ; r) = R(b; r α) + α, r myopic (b α ) = r myopic (b) + α.

25 Analysis Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 25/37 Lemma 6 Under Assumption 1, V t (b α t+1 1 βt ; r) V t (b; r α) = α, 1 β rt opt (b α ) = rt opt (b) + α, V t (b α t 1 βt ) = V t (b) + α 1 β. Note for β = 1, we need to substitute 1 βx 1 β by x. Lemma 7 Under Assumption 2, V t (b α t 1 βt ) V t (b) + α 1 β.

26 Simulation Results Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 26/37

27 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 27/37 Figure : Selected actions by EM (τ = 4) and their corresponding lower and upper bounds, for C = 5, β = 0.8, M = 10, and transition of p 1 = p 1 = 0.3, p 0 = 0.4 Note that the stars in the figure indicate the non-zero beliefs. This figure shows the policy sequence where the selected actions does not exceed B t.

28 Simulation Results: Upper and Lower Bound on Optimal Actions Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 28/37 Figure : The gap between the lower and the upper bounds versus β and the variance (β = 0.8), for C = 5. The simulation parameters, except in the figures that their effect is considered, are fixed as follows: M = 10,C = 5, β = 0.8, and the transition probabilities p 1 = p 1 = 0.3, p 0 = 0.4.

29 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 29/37 Figure : The total expected discounted reward for two sub-optimal policies versus β for C = 5, for horizon T = 100. Now we compare two sub-optimal policies: (i) the myopic policy, (ii) the upper-bound (UB) policy These policies pick the myopic and UB actions, respectively, at all time steps and update their belief vectors according to these actions.

30 Simulation Results: Myopic and Upper-Bound policies Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 30/37 Figure : The total expected discounted reward for two sub-optimal policies versus C for β = 0.8, τ = 4, for horizon T = 100 and transition of p 1 = p 1 = 0.3, p 0 = 0.4.

31 Related Works Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 31/37 Existing techniques, such as Transmission Control Protocol (TCP), adopt an Additive Increase Multiplicative Decrease (AIMD) algorithm, that adjusts the congestion window based on the transmission acknowledgments. A. Bensoussan et al., A multiperiod newsvendor problem with partially observed demand, 2007, consider a POMDP problem where the demand is a Markovian process. They consider the setting where the resources and actions are both continuous, as well as the case where the resources are discrete but the actions remain continuous. For these settings, they also show that the optimal actions exceed the myopic actions. P. Mansourifard, B. Krishnamachari, T. Javidi, Bayesian Congestion Control over a Markovian Network Bandwidth Process, Invited paper in Asilomar 2013.

32 Conclusions and Future Work Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 32/37 Summary We formulated a Bayesian congestion control problem in which a source must select the transmission rate (the action) over a network whose available bandwidth (resource) evolves as a stochastic process. We modeled the problem as a POMDP and derived some key properties for the myopic and the optimal policies. We proved structural results providing bounds on the optimal actions, yielding tractable sub-optimal solutions that have been shown via simulations to perform well. We conjecture that there may be even better approximation for the optimal policy with the similar percentile threshold structure.

33 Thank you for your attention Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 33/37

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work

More information

Percentile Threshold Policies for Inventory Problems with Partially Observed Markovian Demands

Percentile Threshold Policies for Inventory Problems with Partially Observed Markovian Demands Percentile Threshold Policies for Inventory Problems with Partially Observed Markovian Demands Parisa Mansourifard Joint work with: Bhaskar Krishnamachari and Tara Javidi (UCSD) University of Southern

More information

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands

Percentile Policies for Inventory Problems with Partially Observed Markovian Demands Proceedings o the International Conerence on Industrial Engineering and Operations Management Percentile Policies or Inventory Problems with Partially Observed Markovian Demands Farzaneh Mansouriard Department

More information

Power Allocation over Two Identical Gilbert-Elliott Channels

Power Allocation over Two Identical Gilbert-Elliott Channels Power Allocation over Two Identical Gilbert-Elliott Channels Junhua Tang School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University, China Email: junhuatang@sjtu.edu.cn Parisa

More information

A Game Theoretic Approach to Newsvendor Problems with Censored Markovian Demand

A Game Theoretic Approach to Newsvendor Problems with Censored Markovian Demand Paris, France, July 6-7, 018 Game Theoretic pproach to Newsvendor Problems with Censored Markovian Demand Parisa Mansourifard Ming Hsieh Department of Electrical Engineering University of Southern California

More information

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS

STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS STRUCTURE AND OPTIMALITY OF MYOPIC SENSING FOR OPPORTUNISTIC SPECTRUM ACCESS Qing Zhao University of California Davis, CA 95616 qzhao@ece.ucdavis.edu Bhaskar Krishnamachari University of Southern California

More information

Online Learning to Optimize Transmission over an Unknown Gilbert-Elliott Channel

Online Learning to Optimize Transmission over an Unknown Gilbert-Elliott Channel Online Learning to Optimize Transmission over an Unknown Gilbert-Elliott Channel Yanting Wu Dept. of Electrical Engineering University of Southern California Email: yantingw@usc.edu Bhaskar Krishnamachari

More information

Dynamic Pricing for Non-Perishable Products with Demand Learning

Dynamic Pricing for Non-Perishable Products with Demand Learning Dynamic Pricing for Non-Perishable Products with Demand Learning Victor F. Araman Stern School of Business New York University René A. Caldentey DIMACS Workshop on Yield Management and Dynamic Pricing

More information

Optimality of Myopic Sensing in Multi-Channel Opportunistic Access

Optimality of Myopic Sensing in Multi-Channel Opportunistic Access Optimality of Myopic Sensing in Multi-Channel Opportunistic Access Tara Javidi, Bhasar Krishnamachari, Qing Zhao, Mingyan Liu tara@ece.ucsd.edu, brishna@usc.edu, qzhao@ece.ucdavis.edu, mingyan@eecs.umich.edu

More information

Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays

Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays Multi-channel Opportunistic Access: A Case of Restless Bandits with Multiple Plays Sahand Haji Ali Ahmad, Mingyan Liu Abstract This paper considers the following stochastic control problem that arises

More information

Optimality of Myopic Sensing in Multi-Channel Opportunistic Access

Optimality of Myopic Sensing in Multi-Channel Opportunistic Access Optimality of Myopic Sensing in Multi-Channel Opportunistic Access Tara Javidi, Bhasar Krishnamachari,QingZhao, Mingyan Liu tara@ece.ucsd.edu, brishna@usc.edu, qzhao@ece.ucdavis.edu, mingyan@eecs.umich.edu

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Performance of Round Robin Policies for Dynamic Multichannel Access

Performance of Round Robin Policies for Dynamic Multichannel Access Performance of Round Robin Policies for Dynamic Multichannel Access Changmian Wang, Bhaskar Krishnamachari, Qing Zhao and Geir E. Øien Norwegian University of Science and Technology, Norway, {changmia,

More information

Dynamic spectrum access with learning for cognitive radio

Dynamic spectrum access with learning for cognitive radio 1 Dynamic spectrum access with learning for cognitive radio Jayakrishnan Unnikrishnan and Venugopal V. Veeravalli Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University

More information

Optimal and Suboptimal Policies for Opportunistic Spectrum Access: A Resource Allocation Approach

Optimal and Suboptimal Policies for Opportunistic Spectrum Access: A Resource Allocation Approach Optimal and Suboptimal Policies for Opportunistic Spectrum Access: A Resource Allocation Approach by Sahand Haji Ali Ahmad A dissertation submitted in partial fulfillment of the requirements for the degree

More information

OPPORTUNISTIC Spectrum Access (OSA) is emerging

OPPORTUNISTIC Spectrum Access (OSA) is emerging Optimal and Low-complexity Algorithms for Dynamic Spectrum Access in Centralized Cognitive Radio Networks with Fading Channels Mario Bkassiny, Sudharman K. Jayaweera, Yang Li Dept. of Electrical and Computer

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

An Introduction to Markov Decision Processes. MDP Tutorial - 1

An Introduction to Markov Decision Processes. MDP Tutorial - 1 An Introduction to Markov Decision Processes Bob Givan Purdue University Ron Parr Duke University MDP Tutorial - 1 Outline Markov Decision Processes defined (Bob) Objective functions Policies Finding Optimal

More information

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks A Whittle s Indexability Analysis

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks A Whittle s Indexability Analysis 1 Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks A Whittle s Indexability Analysis Wenzhuo Ouyang, Sugumar Murugesan, Atilla Eryilmaz, Ness B Shroff Abstract We address

More information

A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling

A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling 2015 IEEE 54th Annual Conference on Decision and Control (CDC) December 15-18, 2015 Osaka, Japan A Restless Bandit With No Observable States for Recommendation Systems and Communication Link Scheduling

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios

Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios 1206 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 3, MARCH 2009 Opportunistic Spectrum Access for Energy-Constrained Cognitive Radios Anh Tuan Hoang, Ying-Chang Liang, David Tung Chong Wong,

More information

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Matthew Johnston, Eytan Modiano Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,

More information

Preference Elicitation for Sequential Decision Problems

Preference Elicitation for Sequential Decision Problems Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These

More information

Influence of product return lead-time on inventory control

Influence of product return lead-time on inventory control Influence of product return lead-time on inventory control Mohamed Hichem Zerhouni, Jean-Philippe Gayon, Yannick Frein To cite this version: Mohamed Hichem Zerhouni, Jean-Philippe Gayon, Yannick Frein.

More information

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],

More information

Wireless Channel Selection with Restless Bandits

Wireless Channel Selection with Restless Bandits Wireless Channel Selection with Restless Bandits Julia Kuhn and Yoni Nazarathy Abstract Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may

More information

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation

A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation A POMDP Framework for Cognitive MAC Based on Primary Feedback Exploitation Karim G. Seddik and Amr A. El-Sherif 2 Electronics and Communications Engineering Department, American University in Cairo, New

More information

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Sachin Patil Guest Lecture: CS287 Advanced Robotics Slides adapted from Pieter Abbeel, Alex Lee Outline Introduction to POMDPs Locally Optimal Solutions

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks Wenzhuo Ouyang, Sugumar Murugesan, Atilla Eryilmaz, Ness B. Shroff Department of Electrical and Computer Engineering The

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Point-Based Value Iteration for Constrained POMDPs

Point-Based Value Iteration for Constrained POMDPs Point-Based Value Iteration for Constrained POMDPs Dongho Kim Jaesong Lee Kee-Eung Kim Department of Computer Science Pascal Poupart School of Computer Science IJCAI-2011 2011. 7. 22. Motivation goals

More information

Optimal Power Allocation Policy over Two Identical Gilbert-Elliott Channels

Optimal Power Allocation Policy over Two Identical Gilbert-Elliott Channels Optimal Power Allocation Policy over Two Identical Gilbert-Elliott Channels Wei Jiang School of Information Security Engineering Shanghai Jiao Tong University, China Email: kerstin@sjtu.edu.cn Junhua Tang

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels

A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels A State Action Frequency Approach to Throughput Maximization over Uncertain Wireless Channels Krishna Jagannathan, Shie Mannor, Ishai Menache, Eytan Modiano Abstract We consider scheduling over a wireless

More information

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 13 LECTURE OUTLINE Control of continuous-time Markov chains Semi-Markov problems Problem formulation Equivalence to discretetime problems Discounted problems Average cost

More information

Dialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department

Dialogue management: Parametric approaches to policy optimisation. Dialogue Systems Group, Cambridge University Engineering Department Dialogue management: Parametric approaches to policy optimisation Milica Gašić Dialogue Systems Group, Cambridge University Engineering Department 1 / 30 Dialogue optimisation as a reinforcement learning

More information

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam

CSE 573. Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming. Slides adapted from Andrey Kolobov and Mausam CSE 573 Markov Decision Processes: Heuristic Search & Real-Time Dynamic Programming Slides adapted from Andrey Kolobov and Mausam 1 Stochastic Shortest-Path MDPs: Motivation Assume the agent pays cost

More information

Solving the Newsvendor Problem under Partial Demand Information

Solving the Newsvendor Problem under Partial Demand Information Solving the Newsvendor Problem under Partial Demand Information Roberto Rossi 1 Steven D Prestwich 2 S Armagan Tarim 3 Brahim Hnich 4 1 Wageningen University, The Netherlands 2 University College Cork,

More information

Persuading Skeptics and Reaffirming Believers

Persuading Skeptics and Reaffirming Believers Persuading Skeptics and Reaffirming Believers May, 31 st, 2014 Becker-Friedman Institute Ricardo Alonso and Odilon Camara Marshall School of Business - USC Introduction Sender wants to influence decisions

More information

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies

Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies Kamyar Azizzadenesheli U.C. Irvine Joint work with Prof. Anima Anandkumar and Dr. Alessandro Lazaric. Motivation +1 Agent-Environment

More information

Optimality of Myopic Sensing in Multichannel Opportunistic Access

Optimality of Myopic Sensing in Multichannel Opportunistic Access 4040 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 9, SEPTEMBER 2009 Optimality of Myopic Sensing in Multichannel Opportunistic Access Sahand Haji Ali Ahmad, Mingyan Liu, Member, IEEE, Tara Javidi,

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v2 [cs.ni] 21 Nov 2008

More information

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition 1 arxiv:1510.07001v1 [cs.gt] 23 Oct 2015 Yi Ouyang, Hamidreza Tavafoghi and

More information

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance

More information

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,

More information

Bayesian Social Learning with Random Decision Making in Sequential Systems

Bayesian Social Learning with Random Decision Making in Sequential Systems Bayesian Social Learning with Random Decision Making in Sequential Systems Yunlong Wang supervised by Petar M. Djurić Department of Electrical and Computer Engineering Stony Brook University Stony Brook,

More information

An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits

An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits An Optimal Index Policy for the Multi-Armed Bandit Problem with Re-Initializing Bandits Peter Jacko YEQT III November 20, 2009 Basque Center for Applied Mathematics (BCAM), Bilbao, Spain Example: Congestion

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Optimal Control of an Inventory System with Joint Production and Pricing Decisions

Optimal Control of an Inventory System with Joint Production and Pricing Decisions Optimal Control of an Inventory System with Joint Production and Pricing Decisions Ping Cao, Jingui Xie Abstract In this study, we consider a stochastic inventory system in which the objective of the manufacturer

More information

The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing

The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing Grace Y. Lin, Yingdong Lu IBM T.J. Watson Research Center Yorktown Heights, NY 10598 E-mail: {gracelin, yingdong}@us.ibm.com

More information

Analysis of Scalable TCP in the presence of Markovian Losses

Analysis of Scalable TCP in the presence of Markovian Losses Analysis of Scalable TCP in the presence of Markovian Losses E Altman K E Avrachenkov A A Kherani BJ Prabhu INRIA Sophia Antipolis 06902 Sophia Antipolis, France Email:altman,kavratchenkov,alam,bprabhu}@sophiainriafr

More information

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS

Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Partially Observable Markov Decision Processes (POMDPs) Pieter Abbeel UC Berkeley EECS Many slides adapted from Jur van den Berg Outline POMDPs Separation Principle / Certainty Equivalence Locally Optimal

More information

Stochastic convexity in dynamic programming

Stochastic convexity in dynamic programming Economic Theory 22, 447 455 (2003) Stochastic convexity in dynamic programming Alp E. Atakan Department of Economics, Columbia University New York, NY 10027, USA (e-mail: aea15@columbia.edu) Received:

More information

Stochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook

Stochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook Stochastic Models Edited by D.P. Heyman Bellcore MJ. Sobel State University of New York at Stony Brook 1990 NORTH-HOLLAND AMSTERDAM NEW YORK OXFORD TOKYO Contents Preface CHARTER 1 Point Processes R.F.

More information

Allocating Resources, in the Future

Allocating Resources, in the Future Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making online resource allocation: basic model......

More information

In diagnostic services, agents typically need to weigh the benefit of running an additional test and improving

In diagnostic services, agents typically need to weigh the benefit of running an additional test and improving MANAGEMENT SCIENCE Vol. 59, No. 1, January 213, pp. 157 171 ISSN 25-199 (print) ISSN 1526-551 (online) http://dx.doi.org/1.1287/mnsc.112.1576 213 INFORMS Diagnostic Accuracy Under Congestion Saed Alizamir

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio

Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Algorithms for Dynamic Spectrum Access with Learning for Cognitive Radio Jayakrishnan Unnikrishnan, Student Member, IEEE, and Venugopal V. Veeravalli, Fellow, IEEE 1 arxiv:0807.2677v4 [cs.ni] 6 Feb 2010

More information

Macro 1: Dynamic Programming 2

Macro 1: Dynamic Programming 2 Macro 1: Dynamic Programming 2 Mark Huggett 2 2 Georgetown September, 2016 DP Problems with Risk Strategy: Consider three classic problems: income fluctuation, optimal (stochastic) growth and search. Learn

More information

Index Policies and Performance Bounds for Dynamic Selection Problems

Index Policies and Performance Bounds for Dynamic Selection Problems Index Policies and Performance Bounds for Dynamic Selection Problems David B. Brown Fuqua School of Business Duke University dbbrown@duke.edu James E. Smith Tuck School of Business Dartmouth College jim.smith@dartmouth.edu

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning

Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Symbolic Perseus: a Generic POMDP Algorithm with Application to Dynamic Pricing with Demand Learning Pascal Poupart (University of Waterloo) INFORMS 2009 1 Outline Dynamic Pricing as a POMDP Symbolic Perseus

More information

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA

Dynamic Power Management under Uncertain Information. University of Southern California Los Angeles CA Dynamic Power Management under Uncertain Information Hwisung Jung and Massoud Pedram University of Southern California Los Angeles CA Agenda Introduction Background Stochastic Decision-Making Framework

More information

Stochastic Optimization

Stochastic Optimization Chapter 27 Page 1 Stochastic Optimization Operations research has been particularly successful in two areas of decision analysis: (i) optimization of problems involving many variables when the outcome

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

arxiv:cs/ v1 [cs.ni] 27 Feb 2007

arxiv:cs/ v1 [cs.ni] 27 Feb 2007 Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors Yunxia Chen, Qing Zhao, and Ananthram Swami Abstract arxiv:cs/0702158v1 [cs.ni] 27 Feb 2007 We

More information

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of

More information

Markovian Model of Internetworking Flow Control

Markovian Model of Internetworking Flow Control Информационные процессы, Том 2, 2, 2002, стр. 149 154. c 2002 Bogoiavlenskaia. KALASHNIKOV MEMORIAL SEMINAR Markovian Model of Internetworking Flow Control O. Bogoiavlenskaia Petrozavodsk State University

More information

Optimization and Stability of TCP/IP with Delay-Sensitive Utility Functions

Optimization and Stability of TCP/IP with Delay-Sensitive Utility Functions Optimization and Stability of TCP/IP with Delay-Sensitive Utility Functions Thesis by John Pongsajapan In Partial Fulfillment of the Requirements for the Degree of Master of Science California Institute

More information

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty

Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty JMLR: Workshop and Conference Proceedings vol (212) 1 12 European Workshop on Reinforcement Learning Lightning Does Not Strike Twice: Robust MDPs with Coupled Uncertainty Shie Mannor Technion Ofir Mebel

More information

Congestion In Large Balanced Fair Links

Congestion In Large Balanced Fair Links Congestion In Large Balanced Fair Links Thomas Bonald (Telecom Paris-Tech), Jean-Paul Haddad (Ernst and Young) and Ravi R. Mazumdar (Waterloo) ITC 2011, San Francisco Introduction File transfers compose

More information

Dynamic Pricing in the Presence of Competition with Reference Price Effect

Dynamic Pricing in the Presence of Competition with Reference Price Effect Applied Mathematical Sciences, Vol. 8, 204, no. 74, 3693-3708 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/0.2988/ams.204.44242 Dynamic Pricing in the Presence of Competition with Reference Price Effect

More information

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Optimally Solving Dec-POMDPs as Continuous-State MDPs Optimally Solving Dec-POMDPs as Continuous-State MDPs Jilles Dibangoye (1), Chris Amato (2), Olivier Buffet (1) and François Charpillet (1) (1) Inria, Université de Lorraine France (2) MIT, CSAIL USA IJCAI

More information

Optimality Results in Inventory-Pricing Control: An Alternate Approach

Optimality Results in Inventory-Pricing Control: An Alternate Approach Optimality Results in Inventory-Pricing Control: An Alternate Approach Woonghee Tim Huh, Columbia University Ganesh Janakiraman, New York University May 9, 2006 Abstract We study a stationary, single-stage

More information

Finding the Value of Information About a State Variable in a Markov Decision Process 1

Finding the Value of Information About a State Variable in a Markov Decision Process 1 05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742

More information

6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE

6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE 6.231 DYNAMIC PROGRAMMING LECTURE 7 LECTURE OUTLINE DP for imperfect state info Sufficient statistics Conditional state distribution as a sufficient statistic Finite-state systems Examples 1 REVIEW: IMPERFECT

More information

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

High-dimensional Problems in Finance and Economics. Thomas M. Mertens High-dimensional Problems in Finance and Economics Thomas M. Mertens NYU Stern Risk Economics Lab April 17, 2012 1 / 78 Motivation Many problems in finance and economics are high dimensional. Dynamic Optimization:

More information

Introduction to Sequential Teams

Introduction to Sequential Teams Introduction to Sequential Teams Aditya Mahajan McGill University Joint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan MITACS Workshop on Fusion and Inference in Networks, 2011 Decentralized

More information

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks

Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks 1 Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks Wenzhuo Ouyang, Sugumar Murugesan, Atilla Eryilmaz, Ness B. Shroff arxiv:1009.3959v6 [cs.ni] 7 Dec 2011 Abstract We

More information

The convergence limit of the temporal difference learning

The convergence limit of the temporal difference learning The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector

More information

Markov decision processes with threshold-based piecewise-linear optimal policies

Markov decision processes with threshold-based piecewise-linear optimal policies 1/31 Markov decision processes with threshold-based piecewise-linear optimal policies T. Erseghe, A. Zanella, C. Codemo Dept. of Information Engineering, University of Padova, Italy Padova, June 2, 213

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

Practicable Robust Markov Decision Processes

Practicable Robust Markov Decision Processes Practicable Robust Markov Decision Processes Huan Xu Department of Mechanical Engineering National University of Singapore Joint work with Shiau-Hong Lim (IBM), Shie Mannor (Techion), Ofir Mebel (Apple)

More information

Equilibria for games with asymmetric information: from guesswork to systematic evaluation

Equilibria for games with asymmetric information: from guesswork to systematic evaluation Equilibria for games with asymmetric information: from guesswork to systematic evaluation Achilleas Anastasopoulos anastas@umich.edu EECS Department University of Michigan February 11, 2016 Joint work

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Efficient Maximization in Solving POMDPs

Efficient Maximization in Solving POMDPs Efficient Maximization in Solving POMDPs Zhengzhu Feng Computer Science Department University of Massachusetts Amherst, MA 01003 fengzz@cs.umass.edu Shlomo Zilberstein Computer Science Department University

More information

Dynamic Inventory Models and Stochastic Programming*

Dynamic Inventory Models and Stochastic Programming* M. N. El Agizy Dynamic Inventory Models and Stochastic Programming* Abstract: A wide class of single-product, dynamic inventory problems with convex cost functions and a finite horizon is investigated

More information

(s, S) Optimality in Joint Inventory-Pricing Control: An Alternate Approach*

(s, S) Optimality in Joint Inventory-Pricing Control: An Alternate Approach* OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0030-364X eissn 1526-5463 00 0000 0001 INFORMS doi 10.1287/xxxx.0000.0000 c 0000 INFORMS (s, S) Optimality in Joint Inventory-Pricing Control:

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

A Recourse Approach for the Capacitated Vehicle Routing Problem with Evidential Demands

A Recourse Approach for the Capacitated Vehicle Routing Problem with Evidential Demands A Recourse Approach for the Capacitated Vehicle Routing Problem with Evidential Demands Nathalie Helal 1, Frédéric Pichon 1, Daniel Porumbel 2, David Mercier 1 and Éric Lefèvre1 1 Université d Artois,

More information

Stochastic Analysis of Bidding in Sequential Auctions and Related Problems.

Stochastic Analysis of Bidding in Sequential Auctions and Related Problems. s Case Stochastic Analysis of Bidding in Sequential Auctions and Related Problems S keya Rutgers Business School s Case 1 New auction models demand model Integrated auction- inventory model 2 Optimizing

More information