Allocating Resources, in the Future
|
|
- Christopher Marshall
- 5 years ago
- Views:
Transcription
1 Allocating Resources, in the Future Sid Banerjee School of ORIE May 3, 2018 Simons Workshop on Mathematical and Computational Challenges in Real-Time Decision Making
2 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 1 =3 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated 1/18
3 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 2 =3 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions 1/18
4 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B 3 =2 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable 1/18
5 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B t =1 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable assumptions on agent types {θ t } finite set of values {v i } n (e.g. θ (t) = v i with prob p i i.i.d.) in general: arrivals can be time varying, correlated 1/18
6 online resource allocation: basic model θ (1) θ (2) θ (3) θ (t) θ (T) B t =1 single resource, initial capacity B; T agents arrive sequentially agent t has type θ (t) = reward earned if agent is allocated principle makes irrevocable decisions; resource is non-replenishable assumptions on agent types {θ t } finite set of values {v i } n (e.g. θ (t) = v i with prob p i i.i.d.) in general: arrivals can be time varying, correlated online resource allocation problem allocate resources to maximize sum of rewards 1/18
7 online resource allocation: first generalization θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents; each has type θ i = (A i, v i ) A i {0, 1} d : resource requirement, v i : value agent has type θ i with prob p i also known as: network revenue management; single-minded buyer 2/18
8 online resource allocation: second generalization θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (v i1,v i2 ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents arrive sequentially each has type θ = (v i1, v i2,..., v id ), wants single resource also known as: online weighted matching; unit-demand buyer 3/18
9 online allocation across fields related problems studied in Markov decision processes, online algorithms, prophet inequalities, revenue management, etc. informational variants: distributional knowledge bandit settings adversarial inputs 4/18
10 the technological zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction 5/18
11 axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction axiom: have access to black-box predictive algorithms 6/18
12 axiomatizing the zeitgeist the deep learning revolution vast improvements in machine learning for data-driven prediction axiom: have access to black-box predictive algorithms core question of this talk how does having such an oracle affect online resource allocation? TL;DR - new online allocation policies with strong regret bounds re-examining old questions leads to surprising new insights 6/18
13 bridging online allocation and predictive models The Bayesian Prophet: A Low-Regret Framework for Online Decision Making Alberto Vera & S.B. (2018) 7/18
14 focus of talk: allocation with single-minded agents θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i d resources, initial capacities (B 1, B 2,..., B d ) T agents arrive sequentially; each has type θ = (A, v) A = resource requirement, v = value agent has type θ i with prob p i, i.i.d. online allocation problem allocate resources to maximize sum of rewards 8/18
15 performance measure θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i optimal policy can be computed via dynamic programming requires exact distributional knowledge curse of dimensionality : state-space = T B 1... B d does not quantify cost of uncertainty 9/18
16 performance measure θ (1) θ (2) θ (3) θ (t) θ (T) θ ~ (A i,v i ) w.p. p i optimal policy can be computed via dynamic programming requires exact distributional knowledge curse of dimensionality : state-space = T B 1... B d does not quantify cost of uncertainty prophet benchmark V off : OFFLINE optimal policy; has full knowledge of {θ 1, θ 2,..., θ T } 9/18
17 performance measure: regret prophet benchmark: V off OFFLINE knows entire type sequence {θ t t = 1... T } for the network revenue management setting, V off given by max. s.t. x i v i A i x i B 0 x i N i [1 : T ] N i [1 : T ] # of arrivals of type θ i = (A i, v i ) over {1, 2,..., T } regret E[Regret] = E[V off V alg ] 10/18
18 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) 11/18
19 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] 11/18
20 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > /18
21 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on N i [t : T ]) Bayes selector has E[Regret] independent of T, B 1, B 2,..., B d 11/18
22 online allocation with prediction oracle given black-box predictive oracle about performance of OFFLINE (specifically, for any t, B, have statistical info about V off [t, T ]) let π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] Bayes selector accept t th arrival iff π t > 0.5 theorem [Vera & B, 2018] (under mild tail bounds on N i [t : T ]) Bayes selector has E[Regret] independent of T, B 1, B 2,..., B d arrivals can be time-varying, correlated; discounted rewards works for general settings (single-minded, unit-demand, etc.) can use approx oracle (e.g., from samples) 11/18
23 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] 12/18
24 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] max. s.t. (upfront) fluid LP V fl x i v i A i x i B 0 x i E[N i [1 : T ]] = Tp i E[V off ] V fl (via Jensen s, concavity of V off w.r.t. N i ) fluid RAC: accept type θ i with prob x i Tp i 12/18
25 standard approach: randomized admission control (RAC) offline optimum V off max. s.t. x i v i A i x i B 0 x i N i [1 : T ] max. s.t. (upfront) fluid LP V fl x i v i A i x i B 0 x i E[N i [1 : T ]] = Tp i E[V off ] V fl (via Jensen s, concavity of V off w.r.t. N i ) fluid RAC: accept type θ i with prob x i Tp i proposition fluid RAC has E[Regret] = Θ( T ) [Gallego & van Ryzin 97], [Maglaras & Meissner 06] N.B. this is a static policy! 12/18
26 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i 13/18
27 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i regret improves to o( T ) [Reiman & Wang 08] O(1) regret under (dual) non-degeneracy [Jasin & Kumar 12] 13/18
28 RAC with re-solving offline optimum V off max. s.t. x i v i A i x i B 0 x i N i max. s.t. re-solved fluid LP V fl (t): x i [t]v i A i x i [t] B[t] 0 x i [t] E[N i [t : T ]] = (T t)p i AC with re-solving: at time t, accept type θ i with prob x i [t] (T t)p i regret improves to o( T ) [Reiman & Wang 08] O(1) regret under (dual) non-degeneracy [Jasin & Kumar 12] most results use V fl as benchmark (including prophet inequality ) proposition [Vera & B 18] for degenerate instances, V fl E[V off ] = Ω( T ) 13/18
29 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > /18
30 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] 14/18
31 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > /18
32 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > 0.5 proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] 2v max n p 1 i 14/18
33 Bayes selector for i.i.d arrivals Bayes selector π t = P [ V off [t, T ] decreases if OFFLINE accepts t th arrival ] accept t th arrival iff π t > 0.5 re-solved fluid LP max. x i [t]v i s.t. Ax[t] B[t], 0 x i [t] E[N i [t : T ]] a the re-solved LP gives an approximate admission oracle fluid Bayes selector accept type θ i iff x i [t] E[N i [t:t ]] > 0.5 proposition [Vera & B, 2018] fluid Bayes selector has E[Regret] 2v max n p 1 i proposed for multi-secretary by [Gurvich & Arlotto, 2017] NRM via partial resolving [Bumpensanti & Wang, 2018] 14/18
34 proof outline the proof comprises two parts 1. compensated coupling: regret bound for Bayes selector for generic online decision problem 2. bound compensation for online packing problems via LP sensitivity, measure concentration 15/18
35 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state 16/18
36 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) 16/18
37 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) can compensate OFFLINE to follow same action a as ONLINE V off (t, B[t]) R alg t + v max 1 ω Qt(a) + V off (t + 1, B[t + 1]) 16/18
38 the compensated coupling: make OFFLINE follow ONLINE for any time t, budget B[t] let V off (t, B[t]) OFFLINE starting from current state for any action a, disagreement set Q t (a) set of sample-paths ω where a is sub-optimal (given B[t]) can compensate OFFLINE to follow same action a as ONLINE V off (t, B[t]) R alg t + v max 1 ω Qt(a) + V off (t + 1, B[t + 1]) iterating, we get E[V off ] E[V alg ] + v max T t=1 note: Bayes selector picks a t = min a P[Q t (a t )] P[Q t (a t )] 16/18
39 compensated coupling for single resource allocation for any time t, budget B[t] if Bayes selector rejects type θ i, assume OFFLINE front-loads θ i error only if OFFLINE rejects all future θ i if Bayes selector accepts type θ i, assume OFFLINE back-loads θ i error only if OFFLINE accepts all future θ i 17/18
40 compensated coupling for single resource allocation for any time t, budget B[t] if Bayes selector rejects type θ i, assume OFFLINE front-loads θ i error only if OFFLINE rejects all future θ i if Bayes selector accepts type θ i, assume OFFLINE back-loads θ i error only if OFFLINE accepts all future θ i c(t t) claim: smaller of the two events has probability e 17/18
41 summary online allocation via the Bayes selector new online allocation policy with horizon-independent regret way to use black-box predictive algorithms generic regret bounds for any online decision problem 18/18
42 Thanks! 18/18
Low-Regret for Online Decision-Making
Siddhartha Banerjee and Alberto Vera November 6, 2018 1/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation 2/17 Introduction Compensated Coupling Bayes Selector Conclusion Motivation
More informationarxiv: v3 [math.oc] 11 Dec 2018
A Re-solving Heuristic with Uniformly Bounded Loss for Network Revenue Management Pornpawee Bumpensanti, He Wang School of Industrial and Systems Engineering, Georgia Institute of echnology, Atlanta, GA
More informationarxiv: v1 [cs.ds] 15 Jan 2019
The Bayesian Prophet: A Low-Regret Framework for Online Decision Making arxiv:1901.05028v1 [cs.ds] 15 Jan 2019 ALBERTO VERA, Cornell University SIDDHARTHA BANERJEE, Cornell University Motivated by the
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationDual Interpretations and Duality Applications (continued)
Dual Interpretations and Duality Applications (continued) Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters
More informationProphet Inequalities and Stochastic Optimization
Prophet Inequalities and Stochastic Optimization Kamesh Munagala Duke University Joint work with Sudipto Guha, UPenn Bayesian Decision System Decision Policy Stochastic Model State Update Model Update
More informationA Dynamic Near-Optimal Algorithm for Online Linear Programming
Submitted to Operations Research manuscript (Please, provide the manuscript number! Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the
More informationDynamic Optimization of Mobile Push Advertising Campaigns
Submitted to Management Science manuscript Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template
More informationModels of collective inference
Models of collective inference Laurent Massoulié (Microsoft Research-Inria Joint Centre) Mesrob I. Ohannessian (University of California, San Diego) Alexandre Proutière (KTH Royal Institute of Technology)
More informationOptimization under Uncertainty: An Introduction through Approximation. Kamesh Munagala
Optimization under Uncertainty: An Introduction through Approximation Kamesh Munagala Contents I Stochastic Optimization 5 1 Weakly Coupled LP Relaxations 6 1.1 A Gentle Introduction: The Maximum Value
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationAdvance Service Reservations with Heterogeneous Customers
Advance ervice Reservations with Heterogeneous Customers Clifford tein, Van-Anh Truong, Xinshang Wang Department of Industrial Engineering and Operations Research, Columbia University, New York, NY 10027,
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationPolyhedral Approaches to Online Bipartite Matching
Polyhedral Approaches to Online Bipartite Matching Alejandro Toriello joint with Alfredo Torrico, Shabbir Ahmed Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Industrial
More informationPoint Process Control
Point Process Control The following note is based on Chapters I, II and VII in Brémaud s book Point Processes and Queues (1981). 1 Basic Definitions Consider some probability space (Ω, F, P). A real-valued
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationOn Variants of the Matroid Secretary Problem
On Variants of the Matroid Secretary Problem Shayan Oveis Gharan 1 and Jan Vondrák 2 1 Stanford University, Stanford, CA shayan@stanford.edu 2 IBM Almaden Research Center, San Jose, CA jvondrak@us.ibm.com
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationArtificial Intelligence & Sequential Decision Problems
Artificial Intelligence & Sequential Decision Problems (CIV6540 - Machine Learning for Civil Engineers) Professor: James-A. Goulet Département des génies civil, géologique et des mines Chapter 15 Goulet
More informationBayesian Active Learning With Basis Functions
Bayesian Active Learning With Basis Functions Ilya O. Ryzhov Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA IEEE ADPRL April 13, 2011 1 / 29
More informationDynamic Discrete Choice Structural Models in Empirical IO
Dynamic Discrete Choice Structural Models in Empirical IO Lecture 4: Euler Equations and Finite Dependence in Dynamic Discrete Choice Models Victor Aguirregabiria (University of Toronto) Carlos III, Madrid
More informationComplexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning Christos Dimitrakakis Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
More informationOnline Learning Schemes for Power Allocation in Energy Harvesting Communications
Online Learning Schemes for Power Allocation in Energy Harvesting Communications Pranav Sakulkar and Bhaskar Krishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationStochastic Primal-Dual Methods for Reinforcement Learning
Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,
More informationSequential Decisions
Sequential Decisions A Basic Theorem of (Bayesian) Expected Utility Theory: If you can postpone a terminal decision in order to observe, cost free, an experiment whose outcome might change your terminal
More informationMean field equilibria of multiarmed bandit games
Mean field equilibria of multiarmed bandit games Ramesh Johari Stanford University Joint work with Ramki Gummadi (Stanford University) and Jia Yuan Yu (IBM Research) A look back: SN 2000 2 Overview What
More informationORIE Pricing and Market Design
1 / 15 - Pricing and Market Design Module 1: Capacity-based Revenue Management (Two-stage capacity allocation, and Littlewood s rule) Instructor: Sid Banerjee, ORIE RM in the Airline Industry 2 / 15 Courtesy:
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More informationCrowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning
Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem
Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:
More informationOn the Optimality of Myopic Sensing. in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels
On the Optimality of Myopic Sensing 1 in Multi-channel Opportunistic Access: the Case of Sensing Multiple Channels Kehao Wang, Lin Chen arxiv:1103.1784v1 [cs.it] 9 Mar 2011 Abstract Recent works ([1],
More informationOblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games
Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract
More informationBayesian Contextual Multi-armed Bandits
Bayesian Contextual Multi-armed Bandits Xiaoting Zhao Joint Work with Peter I. Frazier School of Operations Research and Information Engineering Cornell University October 22, 2012 1 / 33 Outline 1 Motivating
More informationResource Allocation In Trait Introgression A Markov Decision Process Approach
Resource Allocation In Trait Introgression A Markov Decision Process Approach Ye Han Iowa State University yeh@iastateedu Nov 29, 2016 Ye Han (ISU) MDP in Trait Introgression Nov 29, 2016 1 / 27 Acknowledgement
More informationManaging Call Centers with Many Strategic Agents
Managing Call Centers with Many Strategic Agents Rouba Ibrahim, Kenan Arifoglu Management Science and Innovation, UCL YEQT Workshop - Eindhoven November 2014 Work Schedule Flexibility 86SwofwthewqBestwCompanieswtowWorkwForqwofferw
More informationMulti-armed Bandits with Limited Exploration
Multi-armed Bandits with Limited Exploration Sudipto Guha Kamesh Munagala Abstract A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationReinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies
Reinforcement earning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies Presenter: Roi Ceren THINC ab, University of Georgia roi@ceren.net Prashant Doshi THINC ab, University
More informationNitzan Carmeli. MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov
Nitzan Carmeli MSc Advisors: Prof. Haya Kaspi, Prof. Avishai Mandelbaum Empirical Support: Dr. Galit Yom-Tov Industrial Engineering and Management Technion 23.06.2014 1 Introduction o Interactive Voice
More informationThe Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing
The Stochastic Knapsack Revisited: Switch-Over Policies and Dynamic Pricing Grace Y. Lin, Yingdong Lu IBM T.J. Watson Research Center Yorktown Heights, NY 10598 E-mail: {gracelin, yingdong}@us.ibm.com
More informationBayesian Congestion Control over a Markovian Network Bandwidth Process
Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work
More informationThe knowledge gradient method for multi-armed bandit problems
The knowledge gradient method for multi-armed bandit problems Moving beyond inde policies Ilya O. Ryzhov Warren Powell Peter Frazier Department of Operations Research and Financial Engineering Princeton
More informationCS261: Problem Set #3
CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:
More informationReinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationOnline (and Distributed) Learning with Information Constraints. Ohad Shamir
Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information
More informationFull Surplus Extraction and Costless Information Revelation in Dynamic Environments. Shunya NODA (University of Tokyo)
Full Surplus Extraction and Costless Information Revelation in Dynamic Environments Shunya NODA (University of Tokyo) Outline 1. Introduction. Two-Period Example 3. Three-Period Example 4. Model 5. Main
More informationarxiv: v1 [cs.ds] 14 Dec 2017
SIAM J. COMPUT. Vol. xx, No. x, pp. x x c xxxx Society for Industrial and Applied Mathematics ONLINE SUBMODULAR WELFARE MAXIMIZATION: GREEDY BEATS 1 2 IN RANDOM ORDER NITISH KORULA, VAHAB MIRROKNI, AND
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationOnline Algorithms for Dynamic Resource Allocation Problems
Online Algorithms for Dynamic Resource Allocation Problems Xinshang Wang Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences
More informationBits of Machine Learning Part 2: Unsupervised Learning
Bits of Machine Learning Part 2: Unsupervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationCS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design in Multi-Parameter Bayesian Settings
CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design in Multi-Parameter Bayesian Settings Instructor: Shaddin Dughmi Administrivia HW1 graded, solutions on website
More informationSequential Decisions
Sequential Decisions A Basic Theorem of (Bayesian) Expected Utility Theory: If you can postpone a terminal decision in order to observe, cost free, an experiment whose outcome might change your terminal
More informationGideon Weiss University of Haifa. Joint work with students: Anat Kopzon Yoni Nazarathy. Stanford University, MSE, February, 2009
Optimal Finite Horizon Control of Manufacturing Systems: Fluid Solution by SCLP (separated continuous LP) and Fluid Tracking using IVQs (infinite virtual queues) Stanford University, MSE, February, 29
More informationOn the Approximate Linear Programming Approach for Network Revenue Management Problems
On the Approximate Linear Programming Approach for Network Revenue Management Problems Chaoxu Tong School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853,
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More information1 [15 points] Search Strategies
Probabilistic Foundations of Artificial Intelligence Final Exam Date: 29 January 2013 Time limit: 120 minutes Number of pages: 12 You can use the back of the pages if you run out of space. strictly forbidden.
More informationU Logo Use Guidelines
Information Theory Lecture 3: Applications to Machine Learning U Logo Use Guidelines Mark Reid logo is a contemporary n of our heritage. presents our name, d and our motto: arn the nature of things. authenticity
More informationStocking Retail Assortments Under Dynamic Consumer Substitution
Stocking Retail Assortments Under Dynamic Consumer Substitution Siddharth Mahajan Garret van Ryzin Operations Research, May-June 2001 Presented by Felipe Caro 15.764 Seminar: Theory of OM April 15th, 2004
More informationMulti-Armed Bandit: Learning in Dynamic Systems with Unknown Models
c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University
More informationDynamic Control of Parallel-Server Systems
Dynamic Control of Parallel-Server Systems Jim Dai Georgia Institute of Technology Tolga Tezcan University of Illinois at Urbana-Champaign May 13, 2009 Jim Dai (Georgia Tech) Many-Server Asymptotic Optimality
More informationOnline Selection Problems
Online Selection Problems Imagine you re trying to hire a secretary, find a job, select a life partner, etc. At each time step: A secretary* arrives. *I m really sorry, but for this talk the secretaries
More informationPreference Elicitation for Sequential Decision Problems
Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationOptimal Control of Partiality Observable Markov. Processes over a Finite Horizon
Optimal Control of Partiality Observable Markov Processes over a Finite Horizon Report by Jalal Arabneydi 04/11/2012 Taken from Control of Partiality Observable Markov Processes over a finite Horizon by
More informationOperations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads
Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing
More informationBayesian reinforcement learning and partially observable Markov decision processes November 6, / 24
and partially observable Markov decision processes Christos Dimitrakakis EPFL November 6, 2013 Bayesian reinforcement learning and partially observable Markov decision processes November 6, 2013 1 / 24
More informationDecision Theory: Q-Learning
Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More informationBayesian Combinatorial Auctions: Expanding Single Buyer Mechanisms to Many Buyers
Bayesian Combinatorial Auctions: Expanding Single Buyer Mechanisms to Many Buyers Saeed Alaei December 12, 2013 Abstract We present a general framework for approximately reducing the mechanism design problem
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationCsaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008
LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,
More informationEfficiency, Fairness and Competitiveness in Nash Bargaining Games
Efficiency, Fairness and Competitiveness in Nash Bargaining Games Deeparnab Chakrabarty, Gagan Goel 2, Vijay V. Vazirani 2, Lei Wang 2 and Changyuan Yu 3 Department of Combinatorics and Optimization, University
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationLecture 1. Evolution of Market Concentration
Lecture 1 Evolution of Market Concentration Take a look at : Doraszelski and Pakes, A Framework for Applied Dynamic Analysis in IO, Handbook of I.O. Chapter. (see link at syllabus). Matt Shum s notes are
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationAssessing the Value of Dynamic Pricing in Network Revenue Management
Assessing the Value of Dynamic Pricing in Network Revenue Management Dan Zhang Desautels Faculty of Management, McGill University dan.zhang@mcgill.ca Zhaosong Lu Department of Mathematics, Simon Fraser
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationMarkov Decision Processes Infinite Horizon Problems
Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)
More informationAn MDP-Based Approach to Online Mechanism Design
An MDP-Based Approach to Online Mechanism Design The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Parkes, David C., and
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationProject Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming
Project Discussions: SNL/ADMM, MDP/Randomization, Quadratic Regularization, and Online Linear Programming Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305,
More informationMulti-Armed Bandits. Credit: David Silver. Google DeepMind. Presenter: Tianlu Wang
Multi-Armed Bandits Credit: David Silver Google DeepMind Presenter: Tianlu Wang Credit: David Silver (DeepMind) Multi-Armed Bandits Presenter: Tianlu Wang 1 / 27 Outline 1 Introduction Exploration vs.
More informationThe Irrevocable Multi-Armed Bandit Problem
The Irrevocable Multi-Armed Bandit Problem Vivek F. Farias Ritesh Madan 22 June 2008 Revised: 16 June 2009 Abstract This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls
More informationWe consider a network revenue management problem where customers choose among open fare products
Vol. 43, No. 3, August 2009, pp. 381 394 issn 0041-1655 eissn 1526-5447 09 4303 0381 informs doi 10.1287/trsc.1090.0262 2009 INFORMS An Approximate Dynamic Programming Approach to Network Revenue Management
More informationMulti-armed Bandits in the Presence of Side Observations in Social Networks
52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness
More informationMulti armed bandit problem: some insights
Multi armed bandit problem: some insights July 4, 20 Introduction Multi Armed Bandit problems have been widely studied in the context of sequential analysis. The application areas include clinical trials,
More informationMarkov decision processes with threshold-based piecewise-linear optimal policies
1/31 Markov decision processes with threshold-based piecewise-linear optimal policies T. Erseghe, A. Zanella, C. Codemo Dept. of Information Engineering, University of Padova, Italy Padova, June 2, 213
More informationPredictive MDL & Bayes
Predictive MDL & Bayes Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive MDL & Bayes Contents PART I: SETUP AND MAIN RESULTS PART II: FACTS,
More informationEconomic Growth: Lecture 8, Overlapping Generations
14.452 Economic Growth: Lecture 8, Overlapping Generations Daron Acemoglu MIT November 20, 2018 Daron Acemoglu (MIT) Economic Growth Lecture 8 November 20, 2018 1 / 46 Growth with Overlapping Generations
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationJOINT PRICING AND PRODUCTION PLANNING FOR FIXED PRICED MULTIPLE PRODUCTS WITH BACKORDERS. Lou Caccetta and Elham Mardaneh
JOURNAL OF INDUSTRIAL AND doi:10.3934/jimo.2010.6.123 MANAGEMENT OPTIMIZATION Volume 6, Number 1, February 2010 pp. 123 147 JOINT PRICING AND PRODUCTION PLANNING FOR FIXED PRICED MULTIPLE PRODUCTS WITH
More information