An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement
|
|
- Reynold Waters
- 5 years ago
- Views:
Transcription
1 An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement Satyanath Bhat Joint work with: Shweta Jain, Sujit Gujar, Y. Narahari Department of Computer Science and Automation, Indian Institute of Science Bangalore August 3, 2015
2 Outline Preliminaries: Optimal auction: An Economics perspective The stochastic Multi-armed bandit problem Motivation Problem statement Main results Empirical evaluations Future work
3 Optimal Auction Simple Economics of Optimal Auctions Optimal Auctions Third Degree Price Discrimination in Monopolistic Market a a Jeremy Bulow and John Roberts. The simple economics of optimal auctions. In: The Journal of Political Economy (1989), pp
4 Monopoly Single Seller Price set by Seller Profit maximization :price vs demand
5 Price Discrimination First Degree:- Maximum price to each customer. Second Degree:- Charge prices based on quantity. Third Degree:- Separate market for different sets of customers, different prices set for different groups.
6 Third Degree Price Discrimination Image Courtesy:
7 Third Degree Price Discrimination Image Courtesy:
8 Third Degree Price Discrimination Image Courtesy:
9 Third Degree Price Discrimination Image Courtesy:
10 Optimal Auction
11 Multi-armed bandits Estimating the bias of a coin confidently Toss a coin with bias p, t times Random variables X 1,, X t {0, 1} Estimator ˆp = i X i/t Hoeffding s Inequality: P ((ˆp p) > ɛ) e 2tɛ2
12 Multi-armed bandits K armed bandit Samples from each arm distributed as a Bernoulli r.v Arm i is equivalent to a coin producing a mean reward µ i N i (t) : No. of times arm i is pulled in total t trials S i (t): sum of the rewards produced by arm i upto t trials ˆµ i = S i (t)/n i (t) Define UCB for ˆµ i + =: ˆµ i + 2 log t N i (t) Arm with highest UCB is pulled(algorithm UCB1 a ) a Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. In: Journal of Machine Learning (2002), pp
13 Motivation A hospital wishes to procure a large number of units of a generic drug from a pool of suppliers. The efficiency of a drug is stochastic and the mean varies across suppliers For each supplier: the cost of production per unit and the capacity is private to him Hospital aims to learn the mean efficiency of the drug from each supplier and minimize its cost
14 Model c 1 q 1 k 1 ^c 1 k^ 1 ^c 2 q 2 c 2 k 2 ^ k 2 c n Supplier Selection Rule i Experience the Quality of the Procured Item Make Payment According to Items procured ^ k n ^q i Update Learnt Qualities c n q n k n Repeat until procurement stops Agents
15 Problem Optimization Problem: n ) max (x i Rq i t i i=1 s.t. x i {0, 1,..., ˆk i }, x i L i Challenges: Unknown qualities q i s, strategic costs c i s and strategic capacities k i s Goal: Design an incentive compatible MAB mechanism, to procure L units of the item while learning the qualities of the suppliers
16 Auctions with known qualities Theorem (BIC and IR characterization) A mechanism is BIC and IR iff i N, 1. X i (ĉ i, ˆk i ; q) is non-increasing in ĉ i, q and ˆk i [k i, k i ] 2. ρ i (ĉ i, ˆk i ; q) is non-negative, and non-decreasing in ˆk i 3. ρ i (ĉ i, ˆk i ; q) = ρ i ( c i, ˆk i ; q) + c i ĉ i X i (z, ˆk i ; q)dz
17 Auctions with known qualities Theorem (Optimal payment structure) Suppose the allocation rule maximizes n c1 cn k1 ( kn ( Rq i c i + F ) ) i(c i k i ) c 1 k 1 f i (c i k i ) i=1 c n k n x i (c i, k i, c i, k i )f 1 (c 1, k 1 )... f n (c n, k n ) dc 1... dc n dk 1... dk n subject to monotonicity condition (1). Also suppose that the payment is given by ci T i (c i, k i ; q) = c i X i (c i, k i ; q) + X i (z, k i ; q)dz c i then such a payment scheme and allocation scheme constitute an optimal auction satisfying BIC and IR.
18 2D-OPT Algorithm 1: 2D-OPT Mechanism Input: i, Bids b i = (ĉ i ˆki ), reward parameter R Output: An optimal, DSIC, IR Mechanism M = (x, t) 1 Allocation is given by x = ALLOC(N, ĉ, ˆk, q, L) 2 for i N && x i 0 do 3 G i := Rq i H i (b i ) 4 y = ALLOC(N \ {i}, ĉ i, (ˆk i x i ), q i, x i ) 5 Payment to i, t i = y k max(g 1 i (Rq k H k (b k )), c i ) + ( x i y k ) ci k 6 end k N\{i}
19 Auctions with unknown qualities Well-behaved allocation Rule An allocation rule x is called a Well Behaved Allocation Rule if: x j i depends only on the agent s bids and the reward realization till j units and is non decreasing in terms of costs For any three distinct agents {α, β, γ} such that j th round unit is allocated to β. A change of bid by agent α should not transfer allocation of j th round unit from β to γ if other quantities are fixed till j units For all reward realizations s, x i (c i, k i ; s) is non-decreasing with increase in capacity k i Theorem For a well behaved allocation rule, there exists a transformation that produces the transformed allocation ( x) and payment ( t) such that the resulting mechanism M = ( x, t) is stochastic BIC and IR.
20 2D-UCB Algorithm 2: 2D-UCB Mechanism Input: i N, bids ĉ i [c i, c i ], ˆk i [k i, k i ], parameter µ (0, 1), Reward parameter R Output: A mechanism M = (x, t) 1 i N, ˆq + i = 1, ˆq i = 0, n i = 1 2 Obtain modified bids as (α, β) 3 = ((α 1 (ĉ 1 ), β 1 (ĉ 1 ),..., (α n(ĉ n), β n(ĉ n)) using resampling 4 Allocate one unit to all agents and estimate empirical quality ˆq 5 ˆq i = q i (i)/n i, ˆq + i = ˆq i + 1 ln(t) 2n i 6 for t = n to L do 7 Compute H i = α i + F i (α i ˆk i ) f i (α i ˆk i ) 8 Let i = arg max {js.t.kj >n j } Rˆq+ j H j and Ĝi = Rˆq + i H i 9 if Ĝj > 0 then 10 Procure the unit from agent i and update ˆq i 11 ˆq + i = ˆq i + 2ni ln(t) 12 else 13 break \\ Don t allocate future units to anyone 14 Make payment to each agent i, T i = ĉ i n i + P i, where, P i = { 1µ n i (c ĉ i ), ifβ i > ĉ i 0, otherwise.
21 Empirical Evaluations Average utility per unit ɛ = L 1 6 ɛ= L 1 3 ɛ= L 1 2 ɛ= L 2 3 2D-UCB 2D-OPT ,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90, Number of units Observations: All the mechanisms approach 2D-OPT The performance of 2D-UCB is superior as it approaches 2D-OPT faster
22 Future Work An extension where allocation happens to subset of agents at any round would be interesting The complete characterization of a learning algorithm in this space is still open A theoretical analysis of regret is also a possible direction of future work
23 THANK YOU
arxiv: v3 [cs.gt] 17 Jun 2015
An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance Shweta Jain, Sujit Gujar, Satyanath Bhat, Onno Zoeter, Y. Narahari arxiv:1406.7157v3 [cs.gt] 17 Jun 2015 June 18,
More informationThe Multi-Armed Bandit Problem
The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist
More informationMechanisms with Learning for Stochastic Multi-armed Bandit Problems
Mechanisms with Learning for Stochastic Multi-armed Bandit Problems Shweta Jain 1, Satyanath Bhat 1, Ganesh Ghalme 1, Divya Padmanabhan 1, and Y. Narahari 1 Department of Computer Science and Automation,
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationRedistribution Mechanisms for Assignment of Heterogeneous Objects
Redistribution Mechanisms for Assignment of Heterogeneous Objects Sujit Gujar Dept of Computer Science and Automation Indian Institute of Science Bangalore, India sujit@csa.iisc.ernet.in Y Narahari Dept
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationTwo optimization problems in a stochastic bandit model
Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationBandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University
Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More informationGame Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012
Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 01 Incentive Compatibility and Revelation Theorem Note: This is
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationKnapsack Auctions. Sept. 18, 2018
Knapsack Auctions Sept. 18, 2018 aaacehicbvdlssnafj34rpuvdelmsigvskmkoasxbtcuk9ghncfmjpn26gqszibsevojbvwvny4ucevsnx/jpm1cww8mndnnxu69x08ylcqyvo2v1bx1jc3svnl7z3dv3zw47mg4fzi0ccxi0forjixy0lzumdjlbegrz0jxh93kfvebceljfq8mcxejnoa0pbgplxnmwxxs2tu49ho16lagvjl/8hpoua6dckkhrizrtt2zytwtgeaysqtsaqvanvnlbdfoi8ivzkjkvm0lys2qubqzmi07qsqjwim0ih1noyqidlpzqvn4qpuahrhqjys4u393zcischl5ujjfus56ufif109veovmlcepihzpb4upgyqgetowoijgxsaaicyo3hxiiriik51hwydgl568tdqnum3v7bulsvo6ikmejsejqaibxiimuaut0ayypijn8arejcfjxxg3pualk0brcwt+wpj8acrvmze=
More informationLecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationx ax 1 2 bx2 a bx =0 x = a b. Hence, a consumer s willingness-to-pay as a function of liters on sale, 1 2 a 2 2b, if l> a. (1)
Answers to Exam Economics 201b First Half 1. (a) Observe, first, that no consumer ever wishes to consume more than 3/2 liters (i.e., 1.5 liters). To see this, observe that, even if the beverage were free,
More informationOn the Complexity of Best Arm Identification in Multi-Armed Bandit Models
On the Complexity of Best Arm Identification in Multi-Armed Bandit Models Aurélien Garivier Institut de Mathématiques de Toulouse Information Theory, Learning and Big Data Simons Institute, Berkeley, March
More informationGame Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012
Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012 Myerson Optimal Auction Note: This is a only a draft version,
More informationPayment Rules for Combinatorial Auctions via Structural Support Vector Machines
Payment Rules for Combinatorial Auctions via Structural Support Vector Machines Felix Fischer Harvard University joint work with Paul Dütting (EPFL), Petch Jirapinyo (Harvard), John Lai (Harvard), Ben
More informationCOS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationThe information complexity of best-arm identification
The information complexity of best-arm identification Emilie Kaufmann, joint work with Olivier Cappé and Aurélien Garivier MAB workshop, Lancaster, January th, 206 Context: the multi-armed bandit model
More informationMechanism design and allocation algorithms for energy-network markets with piece-wise linear costs and quadratic externalities
1 / 45 Mechanism design and allocation algorithms for energy-network markets with piece-wise linear costs and quadratic externalities Alejandro Jofré 1 Center for Mathematical Modeling & DIM Universidad
More informationEvaluation of multi armed bandit algorithms and empirical algorithm
Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationIntroducing strategic measure actions in multi-armed bandits
213 IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications: Workshop on Cognitive Radio Medium Access Control and Network Solutions Introducing strategic measure actions
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More informationTruthful Learning Mechanisms for Multi Slot Sponsored Search Auctions with Externalities
Truthful Learning Mechanisms for Multi Slot Sponsored Search Auctions with Externalities Nicola Gatti a, Alessandro Lazaric b, Marco Rocco a, Francesco Trovò a a Politecnico di Milano, piazza Leonardo
More informationLecture 3: Lower Bounds for Bandit Algorithms
CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and
More informationMechanism Design II. Terence Johnson. University of Notre Dame. Terence Johnson (ND) Mechanism Design II 1 / 30
Mechanism Design II Terence Johnson University of Notre Dame Terence Johnson (ND) Mechanism Design II 1 / 30 Mechanism Design Recall: game theory takes the players/actions/payoffs as given, and makes predictions
More informationThe Multi-Arm Bandit Framework
The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationStochastic Contextual Bandits with Known. Reward Functions
Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show
More informationCS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 11: Ironing and Approximate Mechanism Design in Single-Parameter Bayesian Settings
CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 11: Ironing and Approximate Mechanism Design in Single-Parameter Bayesian Settings Instructor: Shaddin Dughmi Outline 1 Recap 2 Non-regular
More informationCS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design in Multi-Parameter Bayesian Settings
CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 12: Approximate Mechanism Design in Multi-Parameter Bayesian Settings Instructor: Shaddin Dughmi Administrivia HW1 graded, solutions on website
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationSolution: Since the prices are decreasing, we consider all the nested options {1,..., i}. Given such a set, the expected revenue is.
Problem 1: Choice models and assortment optimization Consider a MNL choice model over five products with prices (p1,..., p5) = (7, 6, 4, 3, 2) and preference weights (i.e., MNL parameters) (v1,..., v5)
More informationGame Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012
Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012 Properties of Social Choice Functions Note: This is a only
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationInformed Principal in Private-Value Environments
Informed Principal in Private-Value Environments Tymofiy Mylovanov Thomas Tröger University of Bonn June 21, 2008 1/28 Motivation 2/28 Motivation In most applications of mechanism design, the proposer
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Sequential Decision
More informationGame Theory. Lecture Notes By Y. Narahari. Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012
Game Theory Lecture Notes By Y. Narahari Department of Computer Science and Automation Indian Institute of Science Bangalore, India July 2012 The Vickrey-Clarke-Groves Mechanisms Note: This is a only a
More informationMicroeconomics II Lecture 4: Incomplete Information Karl Wärneryd Stockholm School of Economics November 2016
Microeconomics II Lecture 4: Incomplete Information Karl Wärneryd Stockholm School of Economics November 2016 1 Modelling incomplete information So far, we have studied games in which information was complete,
More informationAnnealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm
Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Saba Q. Yahyaa, Madalina M. Drugan and Bernard Manderick Vrije Universiteit Brussel, Department of Computer Science, Pleinlaan 2, 1050 Brussels,
More informationLearning Algorithms for Minimizing Queue Length Regret
Learning Algorithms for Minimizing Queue Length Regret Thomas Stahlbuhk Massachusetts Institute of Technology Cambridge, MA Brooke Shrader MIT Lincoln Laboratory Lexington, MA Eytan Modiano Massachusetts
More informationAlgorithmic Game Theory and Applications
Algorithmic Game Theory and Applications Lecture 18: Auctions and Mechanism Design II: a little social choice theory, the VCG Mechanism, and Market Equilibria Kousha Etessami Reminder: Food for Thought:
More informationPAC Subset Selection in Stochastic Multi-armed Bandits
In Langford, Pineau, editors, Proceedings of the 9th International Conference on Machine Learning, pp 655--66, Omnipress, New York, NY, USA, 0 PAC Subset Selection in Stochastic Multi-armed Bandits Shivaram
More informationStochastic bandits: Explore-First and UCB
CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this
More informationOrdinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods
Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied
More informationMulti-armed Bandit Problems with Strategic Arms
Multi-armed Bandit Problems with Strategic Arms Mark Braverman Jieming Mao Jon Schneider S. Matthew Weinberg May 11, 2017 Abstract We study a strategic version of the multi-armed bandit problem, where
More informationMicroeconomic Theory (501b) Problem Set 10. Auctions and Moral Hazard Suggested Solution: Tibor Heumann
Dirk Bergemann Department of Economics Yale University Microeconomic Theory (50b) Problem Set 0. Auctions and Moral Hazard Suggested Solution: Tibor Heumann 4/5/4 This problem set is due on Tuesday, 4//4..
More informationLecture 6: Communication Complexity of Auctions
Algorithmic Game Theory October 13, 2008 Lecture 6: Communication Complexity of Auctions Lecturer: Sébastien Lahaie Scribe: Rajat Dixit, Sébastien Lahaie In this lecture we examine the amount of communication
More informationMulti-armed Bandits in the Presence of Side Observations in Social Networks
52nd IEEE Conference on Decision and Control December 0-3, 203. Florence, Italy Multi-armed Bandits in the Presence of Side Observations in Social Networks Swapna Buccapatnam, Atilla Eryilmaz, and Ness
More informationOrdinal Optimization and Multi Armed Bandit Techniques
Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationMulti-Armed Bandit Formulations for Identification and Control
Multi-Armed Bandit Formulations for Identification and Control Cristian R. Rojas Joint work with Matías I. Müller and Alexandre Proutiere KTH Royal Institute of Technology, Sweden ERNSI, September 24-27,
More informationLearning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning
JMLR: Workshop and Conference Proceedings vol:1 8, 2012 10th European Workshop on Reinforcement Learning Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning Michael
More informationCrowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning
Crowd-Learning: Improving the Quality of Crowdsourcing Using Sequential Learning Mingyan Liu (Joint work with Yang Liu) Department of Electrical Engineering and Computer Science University of Michigan,
More informationOn Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits
1 On Regret-Optimal Learning in Decentralized Multi-player Multi-armed Bandits Naumaan Nayyar, Dileep Kalathil and Rahul Jain Abstract We consider the problem of learning in single-player and multiplayer
More informationLecture 4. 1 Examples of Mechanism Design Problems
CSCI699: Topics in Learning and Game Theory Lecture 4 Lecturer: Shaddin Dughmi Scribes: Haifeng Xu,Reem Alfayez 1 Examples of Mechanism Design Problems Example 1: Single Item Auctions. There is a single
More informationTsinghua Machine Learning Guest Lecture, June 9,
Tsinghua Machine Learning Guest Lecture, June 9, 2015 1 Lecture Outline Introduction: motivations and definitions for online learning Multi-armed bandit: canonical example of online learning Combinatorial
More informationMotivation. Game Theory 24. Mechanism Design. Setting. Preference relations contain no information about by how much one candidate is preferred.
Motivation Game Theory 24. Mechanism Design Preference relations contain no information about by how much one candidate is preferred. Idea: Use money to measure this. Albert-Ludwigs-Universität Freiburg
More informationAn Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services
An Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services Paat Rusmevichientong David P. Williamson July 6, 2007 Abstract Increases in online searches have spurred the
More informationOn Random Sampling Auctions for Digital Goods
On Random Sampling Auctions for Digital Goods Saeed Alaei Azarakhsh Malekian Aravind Srinivasan Saeed Alaei, Azarakhsh Malekian, Aravind Srinivasan Random Sampling Auctions... 1 Outline Background 1 Background
More informationInformation Choice in Macroeconomics and Finance.
Information Choice in Macroeconomics and Finance. Laura Veldkamp New York University, Stern School of Business, CEPR and NBER Spring 2009 1 Veldkamp What information consumes is rather obvious: It consumes
More informationRobust Mechanism Design with Correlated Distributions. Michael Albert and Vincent Conitzer and
Robust Mechanism Design with Correlated Distributions Michael Albert and Vincent Conitzer malbert@cs.duke.edu and conitzer@cs.duke.edu Prior-Dependent Mechanisms In many situations we ve seen, optimal
More informationOn the Informed Principal Model with Common Values
On the Informed Principal Model with Common Values Anastasios Dosis ESSEC Business School and THEMA École Polytechnique/CREST, 3/10/2018 Anastasios Dosis (ESSEC and THEMA) Informed Principal with Common
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, joint work with Emilie Kaufmann CNRS, CRIStAL) to be presented at COLT 16, New York
More informationLearning Optimal Online Advertising Portfolios with Periodic Budgets
Learning Optimal Online Advertising Portfolios with Periodic Budgets Lennart Baardman Operations Research Center, MIT, Cambridge, MA 02139, baardman@mit.edu Elaheh Fata Department of Aeronautics and Astronautics,
More informationAdaptive Concentration Inequalities for Sequential Decision Problems
Adaptive Concentration Inequalities for Sequential Decision Problems Shengjia Zhao Tsinghua University zhaosj12@stanford.edu Ashish Sabharwal Allen Institute for AI AshishS@allenai.org Enze Zhou Tsinghua
More informationWelfare Maximization with Production Costs: A Primal Dual Approach
Welfare Maximization with Production Costs: A Primal Dual Approach Zhiyi Huang Anthony Kim The University of Hong Kong Stanford University January 4, 2015 Zhiyi Huang, Anthony Kim Welfare Maximization
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com Navin Goyal Microsoft Research India navingo@microsoft.com Abstract The multi-armed
More informationMechanism Design Tutorial David C. Parkes, Harvard University Indo-US Lectures Week in Machine Learning, Game Theory and Optimization
1 Mechanism Design Tutorial David C. Parkes, Harvard University Indo-US Lectures Week in Machine Learning, Game Theory and Optimization 2 Outline Classical mechanism design Preliminaries (DRMs, revelation
More informationA Contract for Demand Response based on Probability of Call
A Contract for Demand Response based on Probability of Call Jose Vuelvas, Fredy Ruiz and Giambattista Gruosso Pontificia Universidad Javeriana - Politecnico di Milano 2018 1 2 Outline Introduction Problem
More informationCombinatorial Multi-Armed Bandit: General Framework, Results and Applications
Combinatorial Multi-Armed Bandit: General Framework, Results and Applications Wei Chen Microsoft Research Asia, Beijing, China Yajun Wang Microsoft Research Asia, Beijing, China Yang Yuan Computer Science
More informationCsaba Szepesvári 1. University of Alberta. Machine Learning Summer School, Ile de Re, France, 2008
LEARNING THEORY OF OPTIMAL DECISION MAKING PART I: ON-LINE LEARNING IN STOCHASTIC ENVIRONMENTS Csaba Szepesvári 1 1 Department of Computing Science University of Alberta Machine Learning Summer School,
More informationMulti-armed bandit based policies for cognitive radio s decision making issues
Multi-armed bandit based policies for cognitive radio s decision making issues Wassim Jouini SUPELEC/IETR wassim.jouini@supelec.fr Damien Ernst University of Liège dernst@ulg.ac.be Christophe Moy SUPELEC/IETR
More informationarxiv: v1 [cs.lg] 7 Sep 2018
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms Alihan Hüyük Bilkent University Cem Tekin Bilkent University arxiv:809.02707v [cs.lg] 7 Sep 208
More informationPure Exploration Stochastic Multi-armed Bandits
C&A Workshop 2016, Hangzhou Pure Exploration Stochastic Multi-armed Bandits Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Outline Introduction 2 Arms Best Arm Identification
More informationAlgorithmic Game Theory Introduction to Mechanism Design
Algorithmic Game Theory Introduction to Mechanism Design Makis Arsenis National Technical University of Athens April 216 Makis Arsenis (NTUA) AGT April 216 1 / 41 Outline 1 Social Choice Social Choice
More informationMechanism Design and Truthful Algorithms
Mechanism Design and Truthful Algorithms Ocan Sankur 13/06/2013 Ocan Sankur (ULB) Mechanism Design and Truthful Algorithms June 13, 2013 1 / 25 Mechanism Design Mechanism design is about designing games
More informationLearning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach
Learning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach Yingying Li, Qinran Hu, and Na Li Abstract In this paper, we consider residential demand response (DR) programs
More informationCO759: Algorithmic Game Theory Spring 2015
CO759: Algorithmic Game Theory Spring 2015 Instructor: Chaitanya Swamy Assignment 1 Due: By Jun 25, 2015 You may use anything proved in class directly. I will maintain a FAQ about the assignment on the
More informationReinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More informationAn Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes
An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University
More informationTwo generic principles in modern bandits: the optimistic principle and Thompson sampling
Two generic principles in modern bandits: the optimistic principle and Thompson sampling Rémi Munos INRIA Lille, France CSML Lunch Seminars, September 12, 2014 Outline Two principles: The optimistic principle
More informationPure Exploration Stochastic Multi-armed Bandits
CAS2016 Pure Exploration Stochastic Multi-armed Bandits Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Outline Introduction Optimal PAC Algorithm (Best-Arm, Best-k-Arm):
More informationReinforcement Learning
Reinforcement Learning Lecture 6: RL algorithms 2.0 Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Present and analyse two online algorithms
More informationChapter 2. Equilibrium. 2.1 Complete Information Games
Chapter 2 Equilibrium The theory of equilibrium attempts to predict what happens in a game when players behave strategically. This is a central concept to this text as, in mechanism design, we are optimizing
More informationThe Monopolist. The Pure Monopolist with symmetric D matrix
University of California, Davis Department of Agricultural and Resource Economics ARE 252 Optimization with Economic Applications Lecture Notes 5 Quirino Paris The Monopolist.................................................................
More informationBuying Random yet Correlated Wind Power
Buying Random yet Correlated Wind Power Wenyuan Tang Department of Electrical Engineering University of Southern California Los Angeles, CA 90089, USA wenyuan@usc.edu Rahul Jain EE & ISE Departments University
More informationLearning to play K-armed bandit problems
Learning to play K-armed bandit problems Francis Maes 1, Louis Wehenkel 1 and Damien Ernst 1 1 University of Liège Dept. of Electrical Engineering and Computer Science Institut Montefiore, B28, B-4000,
More informationFinite-time Analysis of the Multiarmed Bandit Problem*
Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010
More informationONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS CHONG JIANG DISSERTATION
c 2015 Chong Jiang ONLINE ADVERTISEMENTS AND MULTI-ARMED BANDITS BY CHONG JIANG DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and
More informationInformational Confidence Bounds for Self-Normalized Averages and Applications
Informational Confidence Bounds for Self-Normalized Averages and Applications Aurélien Garivier Institut de Mathématiques de Toulouse - Université Paul Sabatier Thursday, September 12th 2013 Context Tree
More informationarxiv: v3 [cs.gt] 13 Aug 2014
Mechanism Design for Crowdsourcing: An Optimal 1-1/e Competitive Budget-Feasible Mechanism for Large Markets Nima Anari, Gagan Goel, Afshin Nikzad arxiv:1405.2452v3 [cs.gt] 13 Aug 2014 Abstract In this
More informationImplementability, Walrasian Equilibria, and Efficient Matchings
Implementability, Walrasian Equilibria, and Efficient Matchings Piotr Dworczak and Anthony Lee Zhang Abstract In general screening problems, implementable allocation rules correspond exactly to Walrasian
More information