Allocating Resources, in the Future

Size: px
Start display at page:

Download "Allocating Resources, in the Future"

Transcription

1 Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making

2 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 1 =3 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated 1/18

3 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 2 =3 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions 1/18

4 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 3 =2 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable 1/18

5 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B t =1 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable assumptions on agent types {θ t } finite set of values {v i } n (e.g. θ (t) = v i with prob p i i.i.d.) in general: arrivals can be time varying, correlated 1/18

6 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B t =1 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable assumptions on agent types {θ t } finite set of values {v i } n (e.g. θ (t) = v i with prob p i i.i.d.) in general: arrivals can be time varying, correlated online resource allocation problem allocate resources to maximize sum of rewards 1/18

7 online resource allocation: first generalization θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents; each has type θ i = (A i, v i ) A i {0, 1} d : resource requirement, v i : value agent has type θ i with prob p i also known as: network revenue management; single-minded buyer 2/18

8 online resource allocation: second generalization θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (v i1,v i2 ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents arrive sequentially each has type θ = (v i1, v i2,..., v id ), wants single resource also known as: online weighted matching; unit-demand buyer 3/18

9 online allocation across fields related problems studied in Markov decision processes, online algorithms, prophet inequalities, revenue management, etc. informational variants: distributional knowledge bandit settings adversarial inputs 4/18

10 the technological zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction 5/18

11 axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction axiom: have access to black-box predictive algorithms 6/18

12 axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction axiom: have access to black-box predictive algorithms core question of this talk how does having such an oracle affect online resource allocation? TL;DR - new online allocation policies with strong regret bounds re-examining old questions leads to surprising new insights 6/18

13 bridging online allocation and predictive models The Bayesian Prophet: A Low-Regret Framework for Online Decision Making Alberto Vera & S.B. (2018) 7/18

14 focus of talk: allocation with single-minded agents θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents arrive sequentially; each has type θ = (A, v) A = resource requirement, v = value agent has type θ i with prob p i, i.i.d. online allocation problem allocate resources to maximize sum of rewards 8/18

15 performance measure θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i optimal policy can be computed via dynamic programming requires exact distributional knowledge curse of dimensionality : state-space = T B 1... B d does not quantify cost of uncertainty 9/18

16 performance measure θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i optimal policy can be computed via dynamic programming requires exact distributional knowledge curse of dimensionality : state-space = T B 1... B d does not quantify cost of uncertainty prophet benchmark V off : OFFLINE optimal policy; has full knowledge of {θ 1, θ 2,..., θ T } 9/18

17 performance measure: regret prophet benchmark: V off OFFLINE knows entire type sequence {θ t t = 1... T } for the network revenue management setting, V off given by max. s.t. x i v i A i x i B 0 x i N i [1 : T ] N i [1 : T ] # of arrivals of type θ i = (A i, v i ) over {1, 2,..., T } regret E[Regret] = E[V off V alg ] 10/18

18 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) 11/18

19 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] 11/18

20 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > /18

21 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on N i [t : T ]) Bayes selector has E[Regret] independent of T, B 1, B 2,..., B d 11/18

22 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on N i [t : T ]) Bayes selector has E[Regret] independent of T, B 1, B 2,..., B d arrivals can be time-varying, correlated; discounted rewards works for general settings (single-minded, unit-demand, etc.) can use approx oracle (e.g., from samples) 11/18

23 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] 12/18

24 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] max. s.t. (upfront) fluid LP V fl x i v i A i x i B 0 x i E[N i [1 : T ]] = Tp i E[V off ] V fl (via Jensen s, concavity of V off w.r.t. N i ) fluid RAC: accept type θ i with prob x i Tp i 12/18

25 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] max. s.t. (upfront) fluid LP V fl x i v i A i x i B 0 x i E[N i [1 : T ]] = Tp i E[V off ] V fl (via Jensen s, concavity of V off w.r.t. N i ) fluid RAC: accept type θ i with prob x i Tp i proposition fluid RAC has E[Regret] = Θ( T ) [Gallego & van Ryzin 97], [Maglaras & Meissner 06] N.B. this is a static policy! 12/18

26 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i 13/18

27 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i regret improves to o( T ) [Reiman & Wang 08] O(1) regret under (dual) non-degeneracy [Jasin & Kumar 12] 13/18

28 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i regret improves to o( T ) [Reiman & Wang 08] O(1) regret under (dual) non-degeneracy [Jasin & Kumar 12] most results use V fl as benchmark (including prophet inequality ) proposition [Vera & B 18] for degenerate instances, V fl E[V off ] = Ω( T ) 13/18

29 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > /18

30 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] 14/18

31 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > /18

32 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > 0.5 proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] 2v max n p 1 i 14/18

33 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > 0.5 proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] 2v max n p 1 i proposed for multi-secretary by [Gurvich & Arlotto, 2017] NRM via partial resolving [Bumpensanti & Wang, 2018] 14/18

34 proof outline the proof comprises two parts 1. compensated coupling: regret bound for Bayes selector for generic online decision problem 2. bound compensation for online packing problems via LP sensitivity, measure concentration 15/18

35 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state 16/18

36 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) 16/18

37 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) can compensate OFFLINE to follow same action a as ONLINE V off (t, B[t]) R alg t + v max 1 ω Qt(a) + V off (t + 1, B[t + 1]) 16/18

38 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) can compensate OFFLINE to follow same action a as ONLINE V off (t, B[t]) R alg t + v max 1 ω Qt(a) + V off (t + 1, B[t + 1]) iterating, we get E[V off ] E[V alg ] + v max T t=1 note: Bayes selector picks a t = min a P[Q t (a t )] P[Q t (a t )] 16/18

39 compensated coupling for single resource allocation for any time t, budget B[t] if Bayes selector rejects type θ i, assume OFFLINE front-loads θ i error only if OFFLINE rejects all future θ i if Bayes selector accepts type θ i, assume OFFLINE back-loads θ i error only if OFFLINE accepts all future θ i 17/18

40 compensated coupling for single resource allocation for any time t, budget B[t] if Bayes selector rejects type θ i, assume OFFLINE front-loads θ i error only if OFFLINE rejects all future θ i if Bayes selector accepts type θ i, assume OFFLINE back-loads θ i error only if OFFLINE accepts all future θ i c(t t) claim: smaller of the two events has probability e 17/18

41 summary online allocation via the Bayes selector new online allocation policy with horizon-independent regret way to use black-box predictive algorithms generic regret bounds for any online decision problem 18/18

42 Thanks! 18/18

Low-Regret for Online Decision-Making

Low-Regret for Online Decision-Making Siddhartha Banerjee and Alberto Vera November 6, 2018 1/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation 2/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation

More information

arxiv: v3 [math.oc] 11 Dec 2018

arxiv: v3 [math.oc] 11 Dec 2018 A Re-solving Heuristic with Uniformly Bounded Loss for Network Revenue Management Pornpawee Bumpensanti, He Wang School of Industrial and Systems Engineering, Georgia Institute of echnology, Atlanta, GA

More information

arxiv: v1 [cs.ds] 15 Jan 2019

arxiv: v1 [cs.ds] 15 Jan 2019 The Bayesian Prophet: A Low-Regret Framework for Online Decision Making arxiv:1901.05028v1 [cs.ds] 15 Jan 2019 ALBERTO VERA, Cornell University SIDDHARTHA BANERJEE, Cornell University Motivated by the

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Dual Interpretations and Duality Applications (continued)

Dual Interpretations and Duality Applications (continued) Dual Interpretations and Duality Applications (continued) Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters

More information

Prophet Inequalities and Stochastic Optimization

Prophet Inequalities and Stochastic Optimization Prophet Inequalities and Stochastic Optimization Kamesh Munagala Duke University Joint work with Sudipto Guha, UPenn Bayesian Decision System Decision Policy Stochastic Model State Update Model Update

More information

A Dynamic Near-Optimal Algorithm for Online Linear Programming

A Dynamic Near-Optimal Algorithm for Online Linear Programming Submitted to Operations Research manuscript (Please, provide the manuscript number! Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the

More information

Dynamic Optimization of Mobile Push Advertising Campaigns

Dynamic Optimization of Mobile Push Advertising Campaigns Submitted to Management Science manuscript Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template

More information

Models of collective inference

Models of collective inference Models of collective inference Laurent Massoulié (Microsoft Research-Inria Joint Centre) Mesrob I. Ohannessian (University of California, San Diego) Alexandre Proutière (KTH Royal Institute of Technology)

More information

Optimization under Uncertainty: An Introduction through Approximation. Kamesh Munagala

Optimization under Uncertainty: An Introduction through Approximation. Kamesh Munagala Optimization under Uncertainty: An Introduction through Approximation Kamesh Munagala Contents I Stochastic Optimization 5 1 Weakly Coupled LP Relaxations 6 1.1 A Gentle Introduction: The Maximum Value

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

1 MDP Value Iteration Algorithm

1 MDP Value Iteration Algorithm CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using

More information

Advance Service Reservations with Heterogeneous Customers

Advance Service Reservations with Heterogeneous Customers Advance ervice Reservations with Heterogeneous Customers Clifford tein, Van-Anh Truong, Xinshang Wang Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027,

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Polyhedral Approaches to Online Bipartite Matching

Polyhedral Approaches to Online Bipartite Matching Polyhedral Approaches to Online Bipartite Matching Alejandro Toriello joint with Alfredo Torrico, Shabbir Ahmed Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Industrial

More information

Point Process Control

Point Process Control Point Process Control The following note is based on Chapters I, II and VII in Brémaud s book Point Processes and Queues (1981). 1 Basic Definitions Consider some probability space (Ω, F, P). A real-valued

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

On Variants of the Matroid Secretary Problem

On Variants of the Matroid Secretary Problem On Variants of the Matroid Secretary Problem Shayan Oveis Gharan 1 and Jan Vondrák 2 1 Stanford University, Stanford, CA shayan@stanford.edu 2 IBM Almaden Research Center, San Jose, CA jvondrak@us.ibm.com

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Artificial Intelligence & Sequential Decision Problems

Artificial Intelligence & Sequential Decision Problems Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet

More information

Bayesian Active Learning With Basis Functions

Bayesian Active Learning With Basis Functions Bayesian Active Learning With Basis Functions Ilya O. Ryzhov Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA IEEE ADPRL April 13, 2011 1 / 29

More information

Dynamic Discrete Choice Structural Models in Empirical IO

Dynamic Discrete Choice Structural Models in Empirical IO Dynamic Discrete Choice Structural Models in Empirical IO Lecture 4: Euler Equations and Finite Dependence in Dynamic Discrete Choice Models Victor Aguirregabiria (University of Toronto) Carlos III, Madrid

More information

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands

More information

Online Learning Schemes for Power Allocation in Energy Harvesting Communications

Online Learning Schemes for Power Allocation in Energy Harvesting Communications Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Stochastic Primal-Dual Methods for Reinforcement Learning

Stochastic Primal-Dual Methods for Reinforcement Learning Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,

More information

Sequential Decisions

Sequential Decisions Sequential Decisions A Basic Theorem of (Bayesian) Expected Utility Theory: If you can postpone a terminal decision in order to observe, cost free, an experiment whose outcome might change your terminal

More information

Mean field equilibria of multiarmed bandit games

Mean field equilibria of multiarmed bandit games Mean field equilibria of multiarmed bandit games Ramesh Johari Stanford University Joint work with Ramki Gummadi (Stanford University) and Jia Yuan Yu (IBM Research) A look back: SN 2000 2 Overview What

More information

ORIE Pricing and Market Design

ORIE Pricing and Market Design 1 / 15 - Pricing and Market Design Module 1: Capacity-based Revenue Management (Two-stage capacity allocation, and Littlewood s rule) Instructor: Sid Banerjee, ORIE RM in the Airline Industry 2 / 15 Courtesy:

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning

Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:

More information

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels

On the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],

More information

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract

More information

Bayesian Contextual Multi-armed Bandits

Bayesian Contextual Multi-armed Bandits Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating

More information

Resource Allocation In Trait Introgression A Markov Decision Process Approach

Resource Allocation In Trait Introgression A Markov Decision Process Approach Resource Allocation In Trait Introgression A Markov Decision Process Approach Ye Han Iowa State University yeh@iastateedu Nov 29, 2016 Ye Han (ISU) MDP in Trait Introgression Nov 29, 2016 1 / 27 Acknowledgement

More information

Managing Call Centers with Many Strategic Agents

Managing Call Centers with Many Strategic Agents Managing Call Centers with Many Strategic Agents Rouba Ibrahim, Kenan Arifoglu Management Science and Innovation, UCL YEQT Workshop - Eindhoven November 2014 Work Schedule Flexibility 86SwofwthewqBestwCompanieswtowWorkwForqwofferw

More information

Multi-armed Bandits with Limited Exploration

Multi-armed Bandits with Limited Exploration Multi-armed Bandits with Limited Exploration Sudipto Guha Kamesh Munagala Abstract A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University

More information

Nitzan Carmeli. MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov

Nitzan Carmeli. MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov Nitzan Carmeli MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov Industrial Engineering and Management Technion 23.06.2014 1 Introduction o Interactive Voice

More information

The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing

The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing Grace Y. Lin, Yingdong Lu IBM T.J. Watson Research Center Yorktown Heights, NY 10598 E-mail: {gracelin, yingdong}@us.ibm.com

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work

More information

The knowledge gradient method for multi-armed bandit problems

The knowledge gradient method for multi-armed bandit problems The knowledge gradient method for multi-armed bandit problems Moving beyond inde policies Ilya O. Ryzhov Warren Powell Peter Frazier Department of Operations Research and Financial Engineering Princeton

More information

CS261: Problem Set #3

CS261: Problem Set #3 CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

Full Surplus Extraction and Costless Information Revelation in Dynamic Environments. Shunya NODA (University of Tokyo)

Full Surplus Extraction and Costless Information Revelation in Dynamic Environments. Shunya NODA (University of Tokyo) Full Surplus Extraction and Costless Information Revelation in Dynamic Environments Shunya NODA (University of Tokyo) Outline 1. Introduction. Two-Period Example 3. Three-Period Example 4. Model 5. Main

More information

arxiv: v1 [cs.ds] 14 Dec 2017

arxiv: v1 [cs.ds] 14 Dec 2017 SIAM J. COMPUT. Vol. xx, No. x, pp. x x c xxxx Society for Industrial and Applied Mathematics ONLINE SUBMODULAR WELFARE MAXIMIZATION: GREEDY BEATS 1 2 IN RANDOM ORDER NITISH KORULA, VAHAB MIRROKNI, AND

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

Online Algorithms for Dynamic Resource Allocation Problems

Online Algorithms for Dynamic Resource Allocation Problems Online Algorithms for Dynamic Resource Allocation Problems Xinshang Wang Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences

More information

Bits of Machine Learning Part 2: Unsupervised Learning

Bits of Machine Learning Part 2: Unsupervised Learning Bits of Machine Learning Part 2: Unsupervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design in Multi-Parameter Bayesian Settings

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design in Multi-Parameter Bayesian Settings CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design in Multi-Parameter Bayesian Settings Instructor: Shaddin Dughmi Administrivia HW1 graded, solutions on website

More information

Sequential Decisions

Sequential Decisions Sequential Decisions A Basic Theorem of (Bayesian) Expected Utility Theory: If you can postpone a terminal decision in order to observe, cost free, an experiment whose outcome might change your terminal

More information

Gideon Weiss University of Haifa. Joint work with students: Anat Kopzon Yoni Nazarathy. Stanford University, MSE, February, 2009

Gideon Weiss University of Haifa. Joint work with students: Anat Kopzon Yoni Nazarathy. Stanford University, MSE, February, 2009 Optimal Finite Horizon Control of Manufacturing Systems: Fluid Solution by SCLP (separated continuous LP) and Fluid Tracking using IVQs (infinite virtual queues) Stanford University, MSE, February, 29

More information

On the Approximate Linear Programming Approach for Network Revenue Management Problems

On the Approximate Linear Programming Approach for Network Revenue Management Problems On the Approximate Linear Programming Approach for Network Revenue Management Problems Chaoxu Tong School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853,

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

1 [15 points] Search Strategies

1 [15 points] Search Strategies Probabilistic Foundations of Artificial Intelligence Final Exam Date: 29 January 2013 Time limit: 120 minutes Number of pages: 12 You can use the back of the pages if you run out of space. strictly forbidden.

More information

U Logo Use Guidelines

U Logo Use Guidelines Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity

More information

Stocking Retail Assortments Under Dynamic Consumer Substitution

Stocking Retail Assortments Under Dynamic Consumer Substitution Stocking Retail Assortments Under Dynamic Consumer Substitution Siddharth Mahajan Garret van Ryzin Operations Research, May-June 2001 Presented by Felipe Caro 15.764 Seminar: Theory of OM April 15th, 2004

More information

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University

More information

Dynamic Control of Parallel-Server Systems

Dynamic Control of Parallel-Server Systems Dynamic Control of Parallel-Server Systems Jim Dai Georgia Institute of Technology Tolga Tezcan University of Illinois at Urbana-Champaign May 13, 2009 Jim Dai (Georgia Tech) Many-Server Asymptotic Optimality

More information

Online Selection Problems

Online Selection Problems Online Selection Problems Imagine you re trying to hire a secretary, find a job, select a life partner, etc. At each time step: A secretary* arrives. *I m really sorry, but for this talk the secretaries

More information

Preference Elicitation for Sequential Decision Problems

Preference Elicitation for Sequential Decision Problems Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Optimal Control of Partiality Observable Markov. Processes over a Finite Horizon

Optimal Control of Partiality Observable Markov. Processes over a Finite Horizon Optimal Control of Partiality Observable Markov Processes over a Finite Horizon Report by Jalal Arabneydi 04/11/2012 Taken from Control of Partiality Observable Markov Processes over a finite Horizon by

More information

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing

More information

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24

Bayesian reinforcement learning and partially observable Markov decision processes November 6, / 24 and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

Approximate Bayesian Computation and Particle Filters

Approximate Bayesian Computation and Particle Filters Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Bayesian Combinatorial Auctions: Expanding Single Buyer Mechanisms to Many Buyers

Bayesian Combinatorial Auctions: Expanding Single Buyer Mechanisms to Many Buyers Bayesian Combinatorial Auctions: Expanding Single Buyer Mechanisms to Many Buyers Saeed Alaei December 12, 2013 Abstract We present a general framework for approximately reducing the mechanism design problem

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008

Csaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008 LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,

More information

Efficiency, Fairness and Competitiveness in Nash Bargaining Games

Efficiency, Fairness and Competitiveness in Nash Bargaining Games Efficiency, Fairness and Competitiveness in Nash Bargaining Games Deeparnab Chakrabarty, Gagan Goel 2, Vijay V. Vazirani 2, Lei Wang 2 and Changyuan Yu 3 Department of Combinatorics and Optimization, University

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

Lecture 1. Evolution of Market Concentration

Lecture 1. Evolution of Market Concentration Lecture 1 Evolution of Market Concentration Take a look at : Doraszelski and Pakes, A Framework for Applied Dynamic Analysis in IO, Handbook of I.O. Chapter. (see link at syllabus). Matt Shum s notes are

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Assessing the Value of Dynamic Pricing in Network Revenue Management

Assessing the Value of Dynamic Pricing in Network Revenue Management Assessing the Value of Dynamic Pricing in Network Revenue Management Dan Zhang Desautels Faculty of Management, McGill University dan.zhang@mcgill.ca Zhaosong Lu Department of Mathematics, Simon Fraser

More information

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Markov Decision Processes Infinite Horizon Problems

Markov Decision Processes Infinite Horizon Problems Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)

More information

An MDP-Based Approach to Online Mechanism Design

An MDP-Based Approach to Online Mechanism Design An MDP-Based Approach to Online Mechanism Design The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Parkes, David C., and

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming

Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305,

More information

Multi-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang

Multi-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang Multi-Armed Bandits Credit: David Silver Google DeepMind Presenter: Tianlu Wang Credit: David Silver (DeepMind) Multi-Armed Bandits Presenter: Tianlu Wang 1 / 27 Outline 1 Introduction Exploration vs.

More information

The Irrevocable Multi-Armed Bandit Problem

The Irrevocable Multi-Armed Bandit Problem The Irrevocable Multi-Armed Bandit Problem Vivek F. Farias Ritesh Madan 22 June 2008 Revised: 16 June 2009 Abstract This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls

More information

We consider a network revenue management problem where customers choose among open fare products

We consider a network revenue management problem where customers choose among open fare products Vol. 43, No. 3, August 2009, pp. 381 394 issn 0041-1655 eissn 1526-5447 09 4303 0381 informs doi 10.1287/trsc.1090.0262 2009 INFORMS An Approximate Dynamic Programming Approach to Network Revenue Management

More information

Multi-armed Bandits in the Presence of Side Observations in Social Networks

Multi-armed Bandits in the Presence of Side Observations in Social Networks 52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness

More information

Multi armed bandit problem: some insights

Multi armed bandit problem: some insights Multi armed bandit problem: some insights July 4, 20 Introduction Multi Armed Bandit problems have been widely studied in the context of sequential analysis. The application areas include clinical trials,

More information

Markov decision processes with threshold-based piecewise-linear optimal policies

Markov decision processes with threshold-based piecewise-linear optimal policies 1/31 Markov decision processes with threshold-based piecewise-linear optimal policies T. Erseghe, A. Zanella, C. Codemo Dept. of Information Engineering, University of Padova, Italy Padova, June 2, 213

More information

Predictive MDL & Bayes

Predictive MDL & Bayes Predictive MDL & Bayes Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive MDL & Bayes Contents PART I: SETUP AND MAIN RESULTS PART II: FACTS,

More information

Economic Growth: Lecture 8, Overlapping Generations

Economic Growth: Lecture 8, Overlapping Generations 14.452 Economic Growth: Lecture 8, Overlapping Generations Daron Acemoglu MIT November 20, 2018 Daron Acemoglu (MIT) Economic Growth Lecture 8 November 20, 2018 1 / 46 Growth with Overlapping Generations

More information

Markov Decision Processes

Markov Decision Processes Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour

More information

Reducing contextual bandits to supervised learning

Reducing contextual bandits to supervised learning Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

JOINT PRICING AND PRODUCTION PLANNING FOR FIXED PRICED MULTIPLE PRODUCTS WITH BACKORDERS. Lou Caccetta and Elham Mardaneh

JOINT PRICING AND PRODUCTION PLANNING FOR FIXED PRICED MULTIPLE PRODUCTS WITH BACKORDERS. Lou Caccetta and Elham Mardaneh JOURNAL OF INDUSTRIAL AND doi:10.3934/jimo.2010.6.123 MANAGEMENT OPTIMIZATION Volume 6, Number 1, February 2010 pp. 123 147 JOINT PRICING AND PRODUCTION PLANNING FOR FIXED PRICED MULTIPLE PRODUCTS WITH

More information