An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement

Size: px

Start display at page:

Download "An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement"

Reynold Waters
5 years ago
Views:

1 An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement Satyanath Bhat Joint work with: Shweta Jain, Sujit Gujar, Y. Narahari Department of Computer Science and Automation, Indian Institute of Science Bangalore August 3, 2015

2 Outline Preliminaries: Optimal auction: An Economics perspective The stochastic Multi-armed bandit problem Motivation Problem statement Main results Empirical evaluations Future work

3 Optimal Auction Simple Economics of Optimal Auctions Optimal Auctions Third Degree Price Discrimination in Monopolistic Market a a Jeremy Bulow and John Roberts. The simple economics of optimal auctions. In: The Journal of Political Economy (1989), pp

4 Monopoly Single Seller Price set by Seller Profit maximization :price vs demand

5 Price Discrimination First Degree:- Maximum price to each customer. Second Degree:- Charge prices based on quantity. Third Degree:- Separate market for different sets of customers, different prices set for different groups.

6 Third Degree Price Discrimination Image Courtesy:

7 Third Degree Price Discrimination Image Courtesy:

8 Third Degree Price Discrimination Image Courtesy:

9 Third Degree Price Discrimination Image Courtesy:

10 Optimal Auction

11 Multi-armed bandits Estimating the bias of a coin confidently Toss a coin with bias p, t times Random variables X 1,, X t {0, 1} Estimator ˆp = i X i/t Hoeffding s Inequality: P ((ˆp p) > ɛ) e 2tɛ2

12 Multi-armed bandits K armed bandit Samples from each arm distributed as a Bernoulli r.v Arm i is equivalent to a coin producing a mean reward µ i N i (t) : No. of times arm i is pulled in total t trials S i (t): sum of the rewards produced by arm i upto t trials ˆµ i = S i (t)/n i (t) Define UCB for ˆµ i + =: ˆµ i + 2 log t N i (t) Arm with highest UCB is pulled(algorithm UCB1 a ) a Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. In: Journal of Machine Learning (2002), pp

13 Motivation A hospital wishes to procure a large number of units of a generic drug from a pool of suppliers. The efficiency of a drug is stochastic and the mean varies across suppliers For each supplier: the cost of production per unit and the capacity is private to him Hospital aims to learn the mean efficiency of the drug from each supplier and minimize its cost

14 Model c 1 q 1 k 1 ^c 1 k^ 1 ^c 2 q 2 c 2 k 2 ^ k 2 c n Supplier Selection Rule i Experience the Quality of the Procured Item Make Payment According to Items procured ^ k n ^q i Update Learnt Qualities c n q n k n Repeat until procurement stops Agents

15 Problem Optimization Problem: n ) max (x i Rq i t i i=1 s.t. x i {0, 1,..., ˆk i }, x i L i Challenges: Unknown qualities q i s, strategic costs c i s and strategic capacities k i s Goal: Design an incentive compatible MAB mechanism, to procure L units of the item while learning the qualities of the suppliers

16 Auctions with known qualities Theorem (BIC and IR characterization) A mechanism is BIC and IR iff i N, 1. X i (ĉ i, ˆk i ; q) is non-increasing in ĉ i, q and ˆk i [k i, k i ] 2. ρ i (ĉ i, ˆk i ; q) is non-negative, and non-decreasing in ˆk i 3. ρ i (ĉ i, ˆk i ; q) = ρ i ( c i, ˆk i ; q) + c i ĉ i X i (z, ˆk i ; q)dz

17 Auctions with known qualities Theorem (Optimal payment structure) Suppose the allocation rule maximizes n c1 cn k1 ( kn ( Rq i c i + F ) ) i(c i k i ) c 1 k 1 f i (c i k i ) i=1 c n k n x i (c i, k i, c i, k i )f 1 (c 1, k 1 )... f n (c n, k n ) dc 1... dc n dk 1... dk n subject to monotonicity condition (1). Also suppose that the payment is given by ci T i (c i, k i ; q) = c i X i (c i, k i ; q) + X i (z, k i ; q)dz c i then such a payment scheme and allocation scheme constitute an optimal auction satisfying BIC and IR.

18 2D-OPT Algorithm 1: 2D-OPT Mechanism Input: i, Bids b i = (ĉ i ˆki ), reward parameter R Output: An optimal, DSIC, IR Mechanism M = (x, t) 1 Allocation is given by x = ALLOC(N, ĉ, ˆk, q, L) 2 for i N && x i 0 do 3 G i := Rq i H i (b i ) 4 y = ALLOC(N \ {i}, ĉ i, (ˆk i x i ), q i, x i ) 5 Payment to i, t i = y k max(g 1 i (Rq k H k (b k )), c i ) + ( x i y k ) ci k 6 end k N\{i}

19 Auctions with unknown qualities Well-behaved allocation Rule An allocation rule x is called a Well Behaved Allocation Rule if: x j i depends only on the agent s bids and the reward realization till j units and is non decreasing in terms of costs For any three distinct agents {α, β, γ} such that j th round unit is allocated to β. A change of bid by agent α should not transfer allocation of j th round unit from β to γ if other quantities are fixed till j units For all reward realizations s, x i (c i, k i ; s) is non-decreasing with increase in capacity k i Theorem For a well behaved allocation rule, there exists a transformation that produces the transformed allocation ( x) and payment ( t) such that the resulting mechanism M = ( x, t) is stochastic BIC and IR.

20 2D-UCB Algorithm 2: 2D-UCB Mechanism Input: i N, bids ĉ i [c i, c i ], ˆk i [k i, k i ], parameter µ (0, 1), Reward parameter R Output: A mechanism M = (x, t) 1 i N, ˆq + i = 1, ˆq i = 0, n i = 1 2 Obtain modified bids as (α, β) 3 = ((α 1 (ĉ 1 ), β 1 (ĉ 1 ),..., (α n(ĉ n), β n(ĉ n)) using resampling 4 Allocate one unit to all agents and estimate empirical quality ˆq 5 ˆq i = q i (i)/n i, ˆq + i = ˆq i + 1 ln(t) 2n i 6 for t = n to L do 7 Compute H i = α i + F i (α i ˆk i ) f i (α i ˆk i ) 8 Let i = arg max {js.t.kj >n j } Rˆq+ j H j and Ĝi = Rˆq + i H i 9 if Ĝj > 0 then 10 Procure the unit from agent i and update ˆq i 11 ˆq + i = ˆq i + 2ni ln(t) 12 else 13 break \\ Don t allocate future units to anyone 14 Make payment to each agent i, T i = ĉ i n i + P i, where, P i = { 1µ n i (c ĉ i ), ifβ i > ĉ i 0, otherwise.

21 Empirical Evaluations Average utility per unit ɛ = L 1 6 ɛ= L 1 3 ɛ= L 1 2 ɛ= L 2 3 2D-UCB 2D-OPT ,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 90, Number of units Observations: All the mechanisms approach 2D-OPT The performance of 2D-UCB is superior as it approaches 2D-OPT faster

22 Future Work An extension where allocation happens to subset of agents at any round would be interesting The complete characterization of a learning algorithm in this space is still open A theoretical analysis of regret is also a possible direction of future work

23 THANK YOU

arxiv: v3 [cs.gt] 17 Jun 2015

arxiv: v3 [cs.gt] 17 Jun 2015 An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance Shweta Jain, Sujit Gujar, Satyanath Bhat, Onno Zoeter, Y. Narahari arxiv:1406.7157v3 [cs.gt] 17 Jun 2015 June 18,