Characterizing Truthful Multi-Armed Bandit Mechanisms

Size: px

Start display at page:

Download "Characterizing Truthful Multi-Armed Bandit Mechanisms"

Silvester Newton
5 years ago
Views:

1 Characterzng Truthful Mult-Armed Bandt Mechansms [Extended Abstract] Moshe Babaoff Mcrosoft Research Mountan Vew, CA Yogeshwer Sharma Cornell Unversty Ithaca, NY Aleksandrs Slvkns Mcrosoft Research Mountan Vew, CA ABSTRACT We consder a mult-round aucton settng motvated by payper-clck auctons for Internet advertsng. In each round the auctoneer selects an advertser and shows her ad, whch s then ether clcked or not. An advertser derves value from clcks; the value of a clck s her prvate nformaton. Intally, nether the auctoneer nor the advertsers have any nformaton about the lkelhood of clcks on the advertsements. The auctoneer s goal s to desgn a (domnant strateges) truthful mechansm that (approxmately) maxmzes the socal welfare. If the advertsers bd ther true prvate values, our problem s equvalent to the mult-armed bandt problem, and thus can be vewed as a strategc verson of the latter. In partcular, for both problems the qualty of an algorthm can be characterzed by regret, the dfference n socal welfare between the algorthm and the benchmark whch always selects the same best advertsement. We nvestgate how the desgn of mult-armed bandt algorthms s affected by the restrcton that the resultng mechansm must be truthful. We fnd that truthful mechansms have certan strong structural propertes essentally, they must separate exploraton from explotaton and they ncur much hgher regret than the optmal mult-armed bandt algorthms. Moreover, we provde a truthful mechansm whch (essentally) matches our lower bound on regret. Categores and Subject Descrptors F.. [Analyss of Algorthms and Problem Complexty]: Nonnumercal Algorthms and Problems; K.4.4 [Computers and Socety]: Electronc Commerce; F.1. [Computaton by Abstract Devces]: Modes of Computaton Onlne computaton; J.4 [Socal and Behavoral Scences]: Economcs The full verson of ths paper [8] s avalable from Ths research was done whle the author was an ntern at Mcrosoft Research, Slcon Valley Center. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. EC 09, July 6 10, 009, Stanford, Calforna, USA. Copyrght 009 ACM /09/07...$5.00. General Terms theory, algorthms, economcs Keywords mechansm desgn, truthful mechansms, sngle-parameter auctons, mult-armed bandts, onlne learnng 1. INTRODUCTION In recent years there has been much nterest n understandng the mplcaton of strategc behavor on the performance of algorthms whose nput s dstrbuted among selfsh agents. Ths study was manly motvated by the Internet, the man arena of large scale nteracton of agents wth conflctng goals. The feld of Algorthmc Mechansm Desgn [3] studes the desgn of mechansms n computatonal settngs (for background see the recent book [33] and survey [35]). Much attenton has been drawn to the market for sponsored search (e.g. [5, 17, 36, 9, 3]), a bllons dollar market wth numerous auctons runnng every second. Research on sponsored search mostly focus on equlbra of the Generalzed Second Prce (GSP) aucton [17, 36], the aucton that s most commonly used n practce (e.g. by Google and Yahoo), or on the desgn of truthful auctons []. All these auctons rely on knowng the rates at whch users clck on the dfferent advertsements (a.k.a. Clck-Through-Rates, or CTRs), and do not consder the process n whch these CTRs are learned or refned over tme by observng users behavor. We argue that strategc agents would take ths process nto account, as t nfluences ther utlty. Pror work [0] focused on the mplcaton of clck fraud on the methods used to learn CTRs. We on the other hand are nterested n the mplcatons of the strategc bddng by the agents. Thus, we consder the problem of desgnng truthful sponsored search auctons when the process of learnng the CTRs s a part of the game. We are manly nterested n the nterplay between the onlne learnng and the strategc aspects of the problem. To solate ths ssue, we consder the followng settng, whch s a natural strategc verson of the mult-armed bandt (MAB) problem. In ths settng, there are k agents. Each agent has a sngle advertsement, and a prvate value v > 0 for every clck she gets. The mechansm s an onlne algorthm that frst solcts bds from the agents, and then runs for T rounds. In each round the mechansm pcks an agent (usng the bds and the clcks observed n the past rounds), dsplays her advertsement, and receves a feedback f there

2 was a clck or not. Payments are assgned after round T. Each agent tres to maxmze her own utlty: the dfference between the value that she derves from clcks and the payment she pays. We assume that ntally no nformaton s known about the lkelhood of each agent to be clcked, and n partcular there are no Bayesan prors. We are nterested n desgnng mechansms whch are truthful (n domnant strateges): every agent maxmzes her utlty by bddng truthfully, for any bds of the others and for any clcks that would have been receved. The goal s to maxmze the socal welfare. 1 Snce the payments cancel out, ths s equvalent to maxmzng the total value derved from clcks, where an agent s contrbuton to that total s her prvate value tmes the number of clcks she receves. We call ths settng the MAB mechansm desgn problem. In the absence of strategc behavor ths problem reduces to a standard MAB formulaton n whch an algorthm repeatedly chooses one of the k alternatves ( arms ) and observes the assocated payoff: the value-per-clck of the correspondng ad f the ad s clcked, and 0 otherwse. The crucal aspect n MAB problems s the tradeoff between acqurng more nformaton (exploraton) and usng the current nformaton to choose a good agent (explotaton). MAB problems have been studed ntensvely for the past three decades (see [1, 13, 18]). In partcular, the above formulaton s well-understood [6, 7, 14] n terms of regret relatve to the benchmark whch always chooses the same best alternatve. Ths noton of regret naturally extends to the strategc settng outlned above, the total payoff beng exactly equal to the socal welfare, and the regret beng exactly the loss n socal welfare. Thus one can drectly compare MAB algorthms and MAB mechansms n terms of welfare loss (regret). Broadly, we ask how the desgn of MAB algorthms s affected by the restrcton of truthfulness: what s the dfference between the best algorthms and the best truthful mechansms? We are nterested both n terms of the structural propertes and the gap n performance (n terms of regret). We are not aware of any pror work that characterzes truthful learnng algorthms or proves negatve results on ther performance. Our contrbutons. We present two man contrbutons. Frst, we present a characterzaton of (domnant-strategy) truthful mechansms. Second, we present a lower bound on the regret that such mechansms must suffer. Ths regret s sgnfcantly larger than the regret of the best MAB algorthms. Formally, a mechansm for the MAB mechansm desgn problem s a par (A, P), where A s the allocaton rule (essentally, an MAB algorthm), and P s the payment rule. Note that regret s completely determned by the allocaton rule. As s standard n the lterature, we focus on mechansms n whch each agent s payment (averaged over clcks) s between 0 and her bd; such mechansms are called normalzed, and they satsfy voluntary partcpaton. The settng we study s a sngle-parameter aucton, the most studed and well-understood type of auctons. For such 1 Socal welfare ncludes both the actoneer s revenue and the agents utlty. Snce n practce dfferent sponsored search platforms compete aganst one another, takng nto account the agents utlty ncreases the platform s attractveness to the advertsers. settngs truthful mechansms are fully characterzed [30, 4]: a mechansm s truthful f and only f the allocaton rule s monotone (by ncreasng her bd an agent cannot cause a decrease n the number of clcks she gets), and the payment rule s defned n a specfc and (essentally) unque way. Yet, ths characterzaton s not the rght characterzaton for the MAB settng! The man problem s that n our settng clck nformaton for any agent that s not chosen at a gven round s not avalable to the mechansm, and thus cannot be used n the computaton of payments. Thus, the payment cannot depend on any unobserved clcks. We show that ths has severe mplcatons on the structure of truthful mechansms. The frst notable property of a truthful mechansm s a much stronger verson of monotoncty: Defnton 1.1. A realzaton conssts of clck nformaton for all agents at all rounds (ncludng unobserved ones). An allocaton rule s pontwse monotone f for each realzaton, each bd profle and each round, f an agent s played at the round, then she s also played after ncreasng her bd (fxng everythng else). Let us consder allocaton rules that satsfy the followng two natural condtons. Frst, an allocaton rule s scalefree f t s nvarant under multplyng all bds by the same postve number (essentally, changng the currency unt). Second, t s Independent of Irrelevant Alternatves (IIA, for short) f for any gven realzaton, bd profle and round, a change of bd of agent cannot transfer the allocaton n ths round from agent j to agent l, where these are three dstnct agents. (Note that the second condton trvally holds f there are only two agents.) We show that any truthful mechansm must have a strct separaton between exploraton and explotaton. A crucal feature of exploraton s the ablty to nfluence the allocaton n forthcomng rounds. To make ths pont more concrete, we call a round nfluental for a gven realzaton f for some bd profle changng the realzaton for ths round can affect the allocaton n some future round. We show that n any such round, the allocaton can not depend on the bds. Thus, nfluental rounds are essentally useless for explotaton. Defnton 1.. An allocaton rule A s called exploratonseparated f for any gven realzaton, the allocaton n any nfluental round for that realzaton does not depend on the bds. We are now ready to present our man structural result, whch s n fact a complete characterzaton. Theorem 1.3. Consder the MAB mechansm desgn problem. Let A be a non-degenerate determnstc allocaton rule whch s scale-free and satsfes IIA. Then mechansm (A, P) s normalzed and truthful for some payment rule Non-degeneracy s a mld techncal assumpton, formally defned n prelmnares, whch ensures that (essentally) f a gven allocaton happens for some bd profle (b, b ) then the same allocaton happens for all bd profles (x,b ), where x ranges over some non-degenerate nterval. Wthout ths assumpton, all structural results hold (essentally) almost surely w.r.t the k-dmensonal Lebesgue measure on the bd vectors. Exposton becomes sgnfcantly more cumbersome, yet leads to the same lower bounds on regret. For clarty, we assume non-degeneracy throughout ths verson of the paper.

3 P f and only f A s pontwse monotone and exploratonseparated. We also obtan a smlar (but somewhat more complcated) characterzaton wthout assumng that allocatons are scale-free and satsfy IIA (Theorem 3.8). We use t then to derve Theorem 1.3. We emphasze that our characterzaton results hold regardless of whether the auctoneer s goal s to maxmze welfare or revenue or any other objectve. In vew of Theorem 1.3, we present a lower bound on the performance of exploraton-separated algorthms. We consder a settng, termed the stochastc MAB mechansm desgn problem, n whch each clck on a gven advertsement s an ndependent random event whch happens wth a fxed probablty, a.k.a. the CTR. The expected payoff from choosng a gven agent s her prvate value tmes her CTR. For the ease of exposton, assume that the bds le n the nterval [0, 1]. Then the non-strategc verson s the stochastc MAB problem n whch the payoff from choosng a gven arm s an ndependent sample n [0, 1] wth a fxed mean µ. In both versons, regret s defned wth respect to a hypothetcal allocaton rule (resp. algorthm) that always chooses an arm wth the maxmal expected payoff. Specfcally, regret s the expected dfference between the socal welfare (resp. total payoff) of the benchmark and that of the allocaton rule (resp. algorthm). The goal s to mnmze R(T), worst-case regret over all problem nstances on T rounds. We show that the worst-case regret of any exploratonseparated mechansm s larger than that of the optmal MAB algorthm: Ω(T /3 ) vs O( T log T) for a fxed number of agents. We obtan an even more pronounced dfference f we restrct our attenton to the δ-gap problem nstances: nstances for whch the best agent s better than the secondbest by a (comparatvely large) amount δ, that s µ 1v 1 µ v = δ (max v ), where arms are arranged such that µ 1v 1 µ v µ k v k. Such nstances are known to be easy for the MAB algorthms. Namely, an algorthm can acheve the optmal worst-case regret O( kt log T) and regret O( k log T) on δ-gap nstances [6, 6]. However, for δ exploraton-separated mechansms the worst-case regret R δ (T) over the δ-gap nstances s polynomal n T as long as worstcase regret s even remotely non-trval (.e., sublnear). Thus, for the δ-gap nstances the gap between algorthms and truthful mechansms n the worst-case regret s exponental n T. Theorem 1.4. Consder the stochastc MAB mechansm desgn problem wth k agents. Let A be a determnstc allocaton rule that s exploraton-separated. Then A has worstcase regret R(T) = Ω(k 1/3 T /3 ). Moreover, f R(T) = O(T γ ) for some γ < 1 then for every fxed δ 1 4 and λ < (1 γ) the worst-case regret over the δ-gap nstances s R δ (T) = Ω(δ T λ ). We note that our lower bounds holds for a more general settng n whch the values-per-clck can change over tme, and the advertsers are allowed to change ther bds at every tme step. To complete the pcture, we present a very smple (determnstc) mechansm that s truthful and normalzed, and matches the lower bound R(T) = Ω(k 1/3 T /3 ) up to logarthmc factors. We also provde a number of extensons. Frst, we prove a smlar (but slghtly weaker) regret bound wthout the scale-free assumpton. Second, we extend some of our results to randomzed mechansms; n ths settng, (domnantstrategy) truthfulness means truthfulness for each realzaton of the prvate randomness. Thrd, we consder a weaker noton of truthfulness for randomzed mechansms for each realzaton of the clcks, but n expectaton over the random seed, and use ths noton to provde algorthmc results for the verson of the MAB mechansm desgn problem n whch clcks are chosen by an adversary. Fourth, we dscuss an even more permssve noton of truthfulness truthfulness n expectaton over the clcks. Other related work and dscusson. The queston of how the performance of a truthful mechansm compares to that of the optmal algorthm for the correspondng nonstrategc problem has been consdered n the lterature n a number of other aucton settngs. Performance gaps have been shown for varous schedulng problems [4, 3, 16] and for onlne aucton for exprng goods [8]. Other papers presented approxmaton gaps due to computatonal constrants, e.g. for combnatoral auctons [7, 16] and combnatoral publc projects [34], showng a gap va a structural result for truthful mechansms. The study of MAB mechansms has been ntated by Gonen and Pavlov [19]. The authors present a MAB mechansm whch s clamed to be truthful n a certan approxmate sense. Unfortunately, ths mechansm does not satsfy the clamed propertes; ths was also confrmed wth the authors through personal communcaton (see also a smlar note n [15]). MAB algorthms were used n the desgn of Cost-Per- Acton sponsored search auctons n Nazerzadeh et al. [31], where the authors construct a mechansm wth approxmate propertes of truthfulness and ndvdual ratonalty. Approxmately truthful mechansms are reasonable assumng the agents would not le unless t leads to sgnfcant gans. However, ths soluton concept s weaker than the exact noton and t may stll be ratonal for the agents to devate (perhaps sgnfcantly) from beng truthful. Moreover, as truthful bddng s not a Nash equlbrum, agents mght have an ncreased ncentve to devate f they speculate that others are devatng. All of that may result n unpredctable, and possbly hghly suboptmal outcomes. In ths paper we focus on understandng what can be acheved wth the exact truthfulness, manly provng results of structural and lowerboundng nature. We note n passng that provdng smlar results for the approxmately truthful settng such as the one n [31] s a worthy and challengng open queston. Independently and concurrently, Devanur and Kakade [15] have studed truthful MAB mechansms wth focus on maxmzng the revenue. They present a lower bound of Ω(T /3 ) on the loss n revenue wth respect to the VCG (Vckrey- Clarke-Groves) payment, as well as a truthful mechansm that matches the lower bound. (Ths mechansm s almost dentcal to the one that we present n order to match the lower bound n Theorem 4.1.) Our lower bounds use (a novel applcaton of) the relatve entropy technque from [6, 7], see [3] for an account. For other applcaton of ths technque, see e.g. [14, 1, 4, 10]. Our work focuses on regret n a pror-free settng n whch the algorthm has no pror on CTRs. Ths s n contrast to

4 the recent lne of work on dynamc auctons [11, 5] whch consders fully Bayesan settngs n whch there s a known pror on CTRs, and VCG-lke socal welfare-maxmzng mechansms are feasble. In our pror-free settng VCG-mechansms cannot be appled as such mechansms requre the allocaton to exactly maxmze the expected socal welfare, whch s mpossble (and not well-defned) wthout a pror. We requre the mechansms to satsfy a strong noton of truthfulness: bddng truthful s optmal for every possble realzaton. Ths s desrable as t does not requre the agents to be rsk neutral. Moreover, such noton does not requre agents to consder the process that generates the clcks. In partcular, even n the presence of clck spammng by others an agent s best strategy s stll to bd truthfully. Fnally, an agent never regrets n retrospect that she has been truthful. Map of the paper. Secton s prelmnares. Truthfulness characterzaton s developed and proved n Secton 3. The lower bounds on regret and the smple mechansm that matches them are n Secton 4. Extensons and open questons are n Secton 5. Due to the page lmt, some of the proofs are deferred to the full verson [8]. DEFINITIONS AND PRELIMINARIES In the MAB mechansm desgn problem, there s a set K of k agents numbered from 1 to k. Each agent has a value v > 0 for every clck she gets; ths value s known only to agent. Intally, each agent submts a bd b > 0, possbly dfferent from v. 3 The game lasts for T rounds, where T s the gven tme horzon. A realzaton represents the clck nformaton for all agents and all rounds. Formally, t s a tuple ρ = (ρ 1,..., ρ k ) such that for every agent and round t, the bt ρ (t) {0, 1} ndcates whether gets a clck f played at round t. An nstance of the MAB mechansm desgn problem conssts of the number of agents k, tme horzon T, a vector of prvate values v = (v 1,..., v k ), a vector of bds (bd profle) b = (b 1,..., b k ), and realzaton ρ. A mechansm s a par (A, P), where A s allocaton rule and P s the payment rule. An allocaton rule s represented by a functon A that maps bd profle b, realzaton ρ and a round t to the agent that s chosen (receves an mpresson) n ths round: A(b; ρ; t) =. We also denote A (b; ρ;t) = 1 {A(b;ρ;t)=}. The allocaton s onlne n the sense that at each round t can only depend on clcks observed pror to that round. Moreover, t does not know the realzaton n advance; n every round t only observes the realzaton for the agent that s shown n that round. A payment rule s a tuple P = (P 1,..., P k ), where P (b; ρ) R denotes the payment charged to agent when the bds are b and the realzaton s ρ. 4 The payment can only depend on observed 3 One can also consder a more realstc and general model n whch the value-per-clck of an agent changes over tme and the agents are allowed to change ther bd at every round. The case that the value-per-clck of each agent does not change over tme s a specal case. In that case truthfulness mples that each agent bascally submts one bd as n our model (the same bd at every round), thus our man results (necessary condtons for truthfulness and regret lower bounds) also hold for the more general model. 4 We allow the mechansm to determne the payments at the end of the T rounds, and not after every round. Ths makes that task of desgnng a truthful mechansm easer and thus strengthen our necessary condton for truthfulness clcks. A mechansm s called normalzed f for any agent, bds b and realzaton ρ t holds that P (b; ρ) s non-negatve and at most b tmes the number of clcks agent got. For gven realzaton ρ and bd profle b, the number of clcks receved by agent s denoted C (b; ρ). Call C = (C 1,..., C k ) the clck-allocaton for A. The utlty that agent wth value v gets from the mechansm (A, P) when the bds are b and the realzaton s ρ s U (v ; b; ρ) = v C (b; ρ) P (b; ρ) (quas-lnear utlty). The mechansm s truthful f for any agent, value v, bd profle b and realzaton ρ t s the case that U (v ; v, b ; ρ) U (v ; b, b ; ρ). In the stochastc MAB mechansm desgn problem, an adversary specfes a vector µ = (µ 1,..., µ k ) of CTRs (concealed from A), then for each agent and round t, realzaton ρ (t) s chosen ndependently wth mean µ. Thus, an nstance of the problem ncludes µ rather than a fxed realzaton. For a gven problem nstance I, let argmax µ v, then regret on ths nstance s defned as " T # X kx R I (T) = T v µ E µ v A (b; ρ; t). (.1) t=1 =1 For a gven parameter v max, the worst-case regret 5 R(T;v max) denotes the supremum of R I (T) over all problem nstances I n whch all prvate values are at most v max. Smlarly, we defne R δ (T;v max), the worst-case δ-regret, by takng the supremum only on nstances wth δ-gap. Most of our results are stated for non-degenerate allocaton rules, defned as follows. An nterval s called nondegenerate f t has postve length. Fx bd profle b, realzaton ρ, and rounds t and t wth t t. Let = A(b; ρ;t) and ρ be the allocaton obtaned from ρ by flppng the bt ρ (t). An allocaton rule A s non-degenerate w.r.t. (b, ρ,t, t ) f there exsts a non-degenerate nterval I b such that A (x,b ; ϕ; s) = A (b; ϕ; s) for each ϕ {ρ, ρ }, each s {t, t }, and all x I. An allocaton rule s non-degenerate f t s non-degenerate w.r.t. each tuple (b, ρ, t,t ). 3. TRUTHFULNESS CHARACTERIZATION Before presentng our characterzaton we begn by descrbng some related background. The clck allocaton C s non-decreasng f for each agent, ncreasng her bd (and keepng everythng else fxed) does not decrease C. Pror work has establshed a characterzaton of truthful mechansms for sngle-parameter domans (domans n whch the prvate nformaton of each agent s one-dmensonal), relatng clck allocaton monotoncty and truthfulness (see below). For our problem, ths result s a characterzaton of MAB algorthms that are truthful for a gven realzaton ρ, assumng that the entre realzaton ρ can be used to compute payments (when computng payments one can use clck nformaton for every round and every agent, even f the agent was not shown at that round.) One of our man contrbutons s a characterzaton of MAB allocaton rules that can be truthfully mplemented when payment computaton s restrcted to only use clcks nformaton of the actual mpressons assgned by the allocaton rule. An MAB allocaton rule A s truthful wth unrestrcted payment computaton f t s truthful wth a payment rule (the condton used to derve the lower bounds on regret.) 5 By abuse of notaton, when clear from the context, the worst-case regret s sometmes smply called regret.

5 that can use the entre realzaton ρ n t computaton. We next present the pror result characterzng truthful mechansms wth unrestrcted payment computaton. Theorem 3.1 (Myerson [30], Archer and Tardos [4]). Let (A, P) be a normalzed mechansm for the MAB mechansm desgn problem. It s truthful wth unrestrcted payment computaton f and only f for any gven realzaton ρ the correspondng clck-allocaton C s non-decreasng and the payment rule s gven by 6 P (b, b ; ρ) = b C (b, b ; ρ) R b C(x,b ; ρ)dx. (3.1) 0 We can now move to characterze truthful MAB mechansms when the payment computaton s restrcted. The followng notaton wll be useful: for a gven realzaton ρ, let ρ 1(,t), be the realzaton that concdes wth ρ everywhere, except that the bt ρ (t) s flpped. The frst notable property of truthful mechansms s a stronger verson of monotoncty. Recall (see Defnton 1.1) that an allocaton rule A s pontwse monotone f for each realzaton ρ, bd profle b, round t and agent, f A (b, b ; ρ; t) = 1 then A (b +, b ; ρ; t) = 1 for any b+ > b. In words, ncreasng a bd cannot cause a loss of an mpresson. Lemma 3.. Consder the MAB mechansm desgn problem. Let (A, P) be a normalzed truthful mechansm such that A s a non-degenerate determnstc allocaton rule. Then A s pontwse-monotone. Proof. For a contradcton, assume not. Then there s a realzaton ρ, a bd profle b, a round t and agent such that agent loses an mpresson n round t by ncreasng her bd from b to some larger value b +. In other words, we have A (b +, b ; ρ; t) < A(b, b ; ρ; t). Wthout loss of generalty, let us assume that there are no clcks after round t, that s ρ j(t ) = 0 for any agent j and any round t > t (snce changes n ρ after round t does not affect anythng before round t). Let ρ = ρ 1(, t). The allocaton n round t cannot depend on ths bt, so t must be the same for both realzatons. Now, for each realzaton ϕ {ρ, ρ } the mechansm must be able to compute the prce for agent when bds are (b +, b ). That nvolves computng the ntegral I (ϕ) = R x b + C (x,b ; ϕ) dx from (3.1). We clam that I (ρ) I (ρ ). However, the mechansm cannot dstngush between ρ and ρ snce they only dffer n bt (, t) and agent does not get an mpresson n round t. Ths s a contradcton. It remans to prove the clam. Wthout loss of generalty, assume that ρ (t) = 0 (otherwse nterchange the role of ρ and ρ ). We frst note that C (x, b ; ρ) C (x, b ; ρ ) for every x. Ths s because everythng s same n ρ and ρ untl round t (so the mpressons are same too), there are no clcks after round t, and n round t the behavor of A on the two realzatons can be dfferent only f that agent gets an mpresson, n whch case she s clcked under ρ and not clcked under ρ. Snce A s non-degenerate, there exsts a non-degenerate nterval I contanng b such that changng bd of agent 6 Archer and Tardos [4] was the frst paper n the Theoretcal Computer Scence lterature that presented a characterzaton of truthful mechansms for sngle-parameter domans, n the context of machne schedulng. to any value n ths nterval does not change the allocaton at round t (both for ρ and for ρ ). For any x I we have C (x, b ; ρ) < C (x, b ; ρ ), where the dfference s due to the clck n round t. It follows that I (ρ) < I (ρ ). Clam proved. Hence, the mechansm cannot be mplemented truthfully. Recall (see Defnton 1.) that round t s nfluental for a gven realzaton ρ f for some bd profle b there exsts a round t > t such that A(b; ρ; t ) A(b; ρ 1(j, t);t ) for j = A(b; ρ; t). In words: changng the relevant part of the realzaton at round t affects the allocaton n some future round t. An allocaton rule A s called exploraton-separated f for any gven realzaton ρ and round t that s nfluental for ρ, t holds that A(b; ρ; t) = A(b ; ρ;t) for any two bd vectors b, b (allocaton at t does not depend on the bds). The man structural mplcaton s truthful mples exploraton-separated. To llustrate the deas behnd ths mplcaton, we frst state and prove t for two agents. Proposton 3.3. Consder the MAB mechansm desgn problem wth two agents. Let A be a non-degenerate scalefree determnstc allocaton rule. If (A, P) s a normalzed truthful mechansm for some P, then t s exploratonseparated. Proof. Assume A s not exploraton-separated. Then there s a counterexample (ρ, t): a realzaton ρ and a round t such that round t s nfluental and allocaton n round t depends on bds. We want to prove that ths leads to a contradcton. Let us pck a counterexample (ρ,t) wth some useful propertes. Snce round t s nfluental, there exsts a realzaton ρ and bd profle b such that the allocaton at some round t > t (the nfluenced round) s dfferent under realzaton ρ and another realzaton ρ = ρ 1(j, t), where j = A(b; ρ; t) s the agent chosen at round t under ρ. Wthout loss of generalty, let us pck a counterexample wth mnmum value of t over all choces of (b, ρ, t). For ease of exposton, from ths pont on let us assume that j =. For the counterexample we can also assume that ρ 1(t ) = 1, and that there are no clcks after round t, that s ρ l (t ) = ρ l(t ) = 0 for all t > t and for all l {1, }. We know that the allocaton n round t depends on bds. Ths means that agent 1 gets an mpresson n round t for some bd profle ˆb = (ˆb 1,ˆb ) under realzaton ρ, that s A(ˆb; ρ; t) = 1. As the mechansm s scale-free ths means that, denotng b + 1 = ˆb 1 b /ˆb we have A(b + 1, b; ρ;t) = 1. Snce A(b 1, b ; ρ;t) = and A(b + 1, b; ρ;t) = 1, pontwse monotoncty (Lemma 3.) mples that b + 1 > b1. We conclude that there exsts a bd b + 1 > b1 for agent 1 such that A(b + 1, b; ρ;t) = 1. Now, the mechansm needs to compute prces for agent 1 for bds (b + 1, b) under realzatons ρ and ρ, that s P 1(b + 1, b; ρ) and P 1(b +, b; ρ ). Therefore, the mechansm needs to compute the ntegral I 1(ϕ) = R x b + C 1(x,b ; ϕ) dx for both realzatons ϕ {ρ, ρ }. 1 Frst of all, for all x b + 1 and for all t < t, A(x, b ; ρ; t ) = A(x, b ; ρ ; t ), snce otherwse the mnmalty of t wll be volated. The only dfference n the allocaton can occur n round t. Let us assume A 1(b 1, b ; ρ;t ) < A 1(b 1, b ; ρ, t ) (otherwse, we can swap ρ and ρ ). We make the clam that for all bds x b + 1 of agent 1, the nfluence of round t on round t

6 s n the same drecton : A 1(x,b ; ρ; t ) A 1(x, b ; ρ ; t ) for all x b + 1. (3.) Suppose (3.) does not hold. Then there s an x < b + 1 such that 1 = A 1(x,b ; ρ;t ) > A 1(x, b ; ρ ; t ) = 0. (Note that we have used the fact that the mechansm s determnstc.) If x < b 1 then pontwse monotoncty s volated under realzaton ρ, snce A 1(x, b ; ρ; t ) > A 1(b 1, b ; ρ; t ); otherwse t s volated under realzaton ρ, gvng a contradcton n both cases. The clam (3.) follows. Snce A s non-degenerate, there exsts a non-degenerate nterval I contanng b such that f agent 1 bds any value x I then A 1(x, b ; ρ; t ) < A 1(x,b ; ρ ; t ). Now by (3.) t follows that I 1(ρ) < I (ρ ). However, the mechansm cannot dstngush between ρ and ρ when the bd of agent 1 s b + 1, snce the dfferng bt ρ 1(t) s not observed. Therefore the mechansm cannot compute prces, contradcton. 3.1 General Truthfulness Characterzaton Let us develop the general truthfulness characterzaton that does not assume that an allocaton s scale-free and IIA. We wll later use t to derve Theorem 1.3. Defnton 3.4. Fx realzaton ρ and bd vector b. A round t s called (b; ρ)-secured from agent f A(b +, b ; ρ; t) = A(b, b ; ρ; t) for any b + > b. A round t s called bdndependent w.r.t. ρ f the allocaton A(b; ρ; t) s a constant functon of b. The followng defntons elaborate on the noton of an nfluental round. Defnton 3.5. A round t s called (b; ρ)-nfluental, for bd profle b and realzaton ρ, f for some round t > t t holds that A(b; ρ; t ) A(b; ρ ; t ) for realzaton ρ = ρ 1(j, t) such that j = A(b; ρ; t). 7 In ths case, t s called the nfluenced round and j s called the nfluencng agent of round t. The agent s called an nfluenced agent of round t f {A(b; ρ;t ), A(b; ρ ; t )}. Note that a round s nfluental w.r.t. realzaton ρ f and only f t s (b, ρ)-nfluental for some b. The central property n our characterzaton s that each (b, ρ)-nfluental round s (b, ρ)-secured. Defnton 3.6. A determnstc allocaton s called weakly separated f for every realzaton ρ and each bd vector b, t holds that f round t s (b; ρ)-nfluental wth nfluenced agent then t s (b; ρ)-secured from. We notce that exploraton-separated s a stronger noton. Observaton 3.7. For a determnstc allocaton, exploratonseparated mples weakly separated. 8 We are now ready to state our general characterzaton. 7 Note that realzatons ρ and ρ are nterchangeable. 8 To see ths, smply use the defntons. Fx realzaton ρ and bd vector b, let t be a (b; ρ)-nfluental round wth nfluenced agent. We need to show that t s (b; ρ)-secured from. Round t s (b; ρ)-nfluental, thus nfluental w.r.t. ρ, thus (snce the allocaton s exploraton-separated) t s bdndependent w.r.t. ρ, thus agent cannot change allocaton n round t by ncreasng her bd. Theorem 3.8. Consder the MAB mechansm desgn problem. Let A be a non-degenerate determnstc allocaton rule. Then mechansm (A, P) s normalzed and truthful for some payment rule P f and only f A s pontwse monotone and weakly separated. Proof. For the only f drecton, A s pontwse-monotone by Lemma 3., and the fact that A s weakly separated s proved smlarly to Proposton 3.3 (albet wth a few extra detals). We defer t to the full verson [8]. We focus on the f drecton. Let A be a determnstc allocaton rule whch s pontwse monotone and weakly separated. We need to provde a payment rule P such that the resultng mechansm (A, P) s truthful and normalzed. Snce A s pontwse monotone, t mmedately follows that t s monotone (.e., as an agent ncreases her bd, the number of clcks that she gets cannot decrease). Therefore t follows from Theorem 3.1 that mechansm (A, P) s truthful and normalzed f and only f P s gven by (3.1). We need to show that P can be computed usng only the knowledge of the clcks (bts from the realzaton) that were revealed durng the executon of A. Assume we want to compute the payment for agent n bd profle (b, b ) and realzaton ρ. We wll prove that we can compute C (x) := C (x,b ; ρ) for all x b. To compute C (x), we show that t s possble to smulate the executon of the mechansm wth bd = x. In some rounds, the agent loses an mpresson, and n others t retans the mpresson (pontwse monotoncty ensures that agent cannot gan an mpresson when decreasng her bd). In rounds that t loses an mpresson, the mechansm does not observe the bts of ρ n those rounds, so we prove that those bts are rrelevant whle computng C (x). In other words, whle runnng wth bd = x, f mechansm needs to observe the bt that was not revealed when runnng wth bd = b, we arbtrarly put that bt equal to 1 and smulate the executon of A. We want to prove that ths computes C (x) correctly. Let t 1 < t < < t n be the rounds n whch agent dd not get an mpresson whle bddng x, but dd get an mpresson whle bddng b. Let ρ 0 := ρ, and let us defne realzaton ρ l nductvely for every l [n] by settng ρ l := ρ l 1 1(j l, t l ), where j l = A(x, b ; ρ l 1 ; t l ) s the agent that got the mpresson at round t l wth realzaton ρ l 1 and bds (x, b ). Frst, we clam that j l for any l. Indeed, suppose not, and pck the smallest l such that j l+1 =. Then t l s a (x, b ; ρ l )-nfluental round, wth nfluenced agent j l+1 =. Thus t l s (x, b ; ρ l )-secured from. Snce A(x, b ; ρ l ; t l ) = A(x, b ; ρ l 1 ; t l ) = j l by mnmalty of l, agent does not get an mpresson n round t l f she rases her bd to b. That s, A(b; ρ l ; t l ). However, the changes n realzatons ρ 0,..., ρ l 1 only concern the rounds n whch agent s chosen, so they are not seen by the allocaton f the bd profle s b (to prove ths formally, use nducton). Thus, A(b; ρ l ; t l ) = A(b; ρ; t l ) =, contradcton. Clam proved. It follows that A(b; ρ; t l ) = for each l. (Ths s because by nducton, the change from ρ l 1 to ρ l s not seen by the allocaton f the bd profle s b.) We clam that A (x, b ; ρ; t ) = A (x,b ; ρ n ; t ) for every round t, whch wll prove the theorem. If not, then there exsts l such that A (x,b ; ρ l ; t ) A (x, b ; ρ l 1 ; t ) for some t (and of course t > t l ). Round t l s thus (x,b ; ρ l )- nfluental wth nfluenced round t and nfluenced agent. Moreover, the nfluencng agent of that round s j l, and we

7 already proved that j l. Snce round t l s (x,b ; ρ l )- secured from agent due to the weakly separated condton, t follows that agent does not get an mpresson n round t l f she rases her bd to b. That s, A(b; ρ l ; t l ), contradcton. Note that we have proven the man characterzaton result (Theorem 1.3) for the case of two agents, because for two agents, t s not hard to see that a scale-free allocaton s exploraton-separated f and only f t s weakly separated, and also IIA trvally holds for two agents. Let us argue that the non-degeneracy assumpton n Theorem 3.8 s ndeed necessary. To ths end, let us present a smple determnstc mechansm (A, P) for two agents that s truthful and normalzed, such that the allocaton rule A s pontwse monotone, scale-free and yet not weakly separated. (The catch s, of course, that t s degenerate.) There are only two rounds. Agent 1 allocated at round 1 f and only f b 1 b. Agent 1 allocated at round f b 1 > b or f b 1 = b and ρ 1(1) = 1; otherwse agent s shown. Ths completes the descrpton of the allocaton rule. To obtan a payment rule P whch makes the mechansm normalzed and truthful, consder an alternate allocaton rule A whch n each round selects agent 1 f and only f b 1 b. (Note that A = A except when b 1 = b.) Use Theorem 3.8 for A to obtan a normalzed truthful mechansm (A, P ), and set P = P. The payment rule P s well-defned snce the observed clcks for P and P concde unless b 1 = b, n whch case both payment rules charge 0 to both players. The resultng mechansm (A, P) s normalzed and truthful because the ntegral n (3.1) remans the same even f we change the value at a sngle pont. It s easy to see that the allocaton rule A has all the clamed propertes; t fals to be non-degenerate because round t s nfluental only when b 1 = b. 3. Scalefree and IIA allocaton rules To complete the proof of Theorem 1.3, we show that under the rght assumptons, an allocaton s exploraton-separated f and only f t s weakly separated. The full proof of ths result s n the full verson [8]. Lemma 3.9. Consder the MAB mechansm desgn problem. Let A be a non-degenerate determnstc allocaton rule whch s scalefree, pontwse monotone, and satsfes IIA. Then t s exploraton-separated f and only f t s weakly separated. Proof Sketch. We sketch the proof of Lemma 3.9 at a very hgh level. The only f drecton was observed n Observaton 3.7. For the f drecton, Let A be a weaklyseparated mechansm. We prove by a contradcton that t s exploraton-separated. If not, then there s a realzaton ρ and a round t such that t s nfluencal w.r.t. ρ as well as not bd-dependent w.r.t. ρ. Let round t be nfluencal wth bd vector b, nfluencng agent l, and nfluenced agents j and j j n nfluenced round t (see 1 n Fgure 1; all boxed numbers n ths sketch wll refer to ths fgure). From the assumpton, t s not bd-dependent w.r.t. ρ, whch means that there exsts a bd profle b such that l s played n round t wth bds b. Usng scalefreeness, IIA, and pontwse-monotoncty, we can prove that there exsts a suffcently large bd b + of agent such that she gets an mpresson n round t wth bds (b +, b ) (see ). Usng the propertes of the mechansm, t can further be proved that there s an agent such that she gets the mpresson n round t when ether ncreases her bd, or l decreases her bd (see 3 ). When ncreases her bd to b +, she also gets an mpresson n round t, snce mpressons cannot dffer n round t n the case when l s not played n round t and they must get transferred from j and j to somebody n round t, and IIA mples that ths somebody should be. Recall that two dfferent players j and j get the mpresson n round t under ρ and ρ respectvely (see 4 ). We prove that ether agent j or agent j must be equal to l (ths s done by lookng at how the allocaton n round t changes when l decreases her bd). Let us break the symmetry and assume j = l (see box 5 ). It s also easy to see that when ncreases her bd, mpresson n round t get transferred to her n ρ (at some mnmum value b +ρ, see 6 ), and mpresson n round t gets transferred to her also n ρ (as some possbly dfferent mnmum value b +ρ, see 7 ). Usng the assumptons of weakly-separatedness, we prove that b +ρ = b +ρ b + max{b +ρ (see 8 ). Ths can be proved by observng that, b +ρ }, and then usng weakly-separatedness of A. Snce these two bds were at a threshold value (these were the mnmum values of bds to have transferred the mpresson n ρ and ρ from j and l respectvely), we are able to prove that the rato of b j/b l must be some fxed number dependent on ρ, ρ, and t. In partcular, t follows that b l belongs to a fnte set S(b l ) whch depends only on b l. However, by non-degeneracy of A there must be nfntely many such b l s, whch leads to a contradcton. 4. LOWER BOUNDS ON REGRET In ths secton we use structural results from the prevous secton to derve lower bounds on regret. Theorem 4.1. Consder the stochastc MAB mechansm desgn problem wth k agents. Let A be an exploratonseparated determnstc allocaton rule. Then ts regret s R(T; v max) = Ω(v max k 1/3 T /3 ). Let µ 0 = ( 1,..., 1 ) [0, 1]k be the vector of CTRs n whch for each agent the CTR s 1. For each agent, let µ = (µ 1,..., µ k ) [0, 1] k be the vector of CTRs n whch agent has CTR µ = 1 +ǫ, ǫ = k1/3 T 1/3, and every other agent j has CTR µ j = 1. As a notatonal conventon, denote by P [ ] and E [ ] respectvely the probablty and expectaton nduced by the algorthm when clcks are gven by µ. Let I be the problem nstance n whch CTRs are gven by µ and all bds are v max. For each agent, let J be the problem nstance n whch CTRs are gven by µ 0, the bd of agent s v max, and the bds of all other agents are v max/. We wll show that for any exploraton-separated determnstc allocaton rule A, one of these k nstances causes hgh regret. Let N be the number of bd-ndependent rounds n whch agent s played. Note that N does not depend on the bds. It s a random varable n the probablty space nduced by the clcks; ts dstrbuton s completely specfed by the CTRs. We show that (n a certan sense) the allocaton cannot dstngush between µ 0 and µ f N s too small. Specfcally, let A t be the allocaton n round t. Once the bds are fxed, ths s a random varable n the probablty space nduced by the clcks. For a gven set S of agents,

8 b t t b or b l 1 (b +, b ) (b +, b ) (b l, b l) (b; ρ) (b; ρ ) ρ ρ 4 Tme rounds... b +ρ = b +ρ 7 8 (b +ρ ; ρ ) n ρ, b 6 (b; ρ) n ρ, b (b +ρ ; ρ) (b; ρ ) j l = j 5 Fgure 1: Ths fgure explans all the steps n the proof of Lemma 3.9. The rows correspond to agents (whose dentty s shown on the rght sde), and columns correspond to tme rounds. The astersks show the mpressons. The arrows show how the mpressons get transferred, and labels on the arrows show what causes the transfer. In labels, n ρ, b denotes that a partcular transfer of mpresson s caused n realzaton ρ when bd b n ncreased. we consder the event {A t S} for some fxed round t, and upper-bound the dfference between the probablty of ths event under µ 0 and µ n terms of E [N ], n the followng crucal clam, whch s proved n the full verson [8] va relatve entropy technques. Clam 4.. For any fxed vector of bds, each round t, each agent and each set of agents S, we have P 0[A t S] P [A t S] O(ǫ E 0[N ]). (4.1) Proof (of Theorem 4.1). Fx a postve constant β to be specfed later. Consder the case k = frst. If E 0[N ] > β T /3 for some agent, then on the problem nstance J, regret s Ω(T /3 ). So wthout loss of generalty let us assume E 0[N ] β T /3 for each agent. Then, pluggng n the values for ǫ and E 0[N ], the rght-hand sde of (4.1) s at most O(β). Take β so that the rght-hand sde of (4.1) s at most 1. For each round t there s an agent such that 4 P 0[A t ] 1. Then P[At ] 1 by Clam 4., and 4 therefore n ths round algorthm A ncurs regret Ω(ǫ v max) under problem nstance I. By Pgeonhole Prncple there exsts an such that ths happens for at least half of the rounds t, whch gves the desred lower-bound. Case k 3 requres a dfferent (and somewhat more complcated) argument. Let R = β k 1/3 T /3 and N be the number of bd-ndependent rounds. Assume E 0[N] > R. Then E 0[N ] 1 E0[N] for some agent. For the problem k nstance J there are, n expectaton, E[N N ] = Ω(R) bd-ndependent rounds n whch agent s not played; each of whch contrbutes Ω(v max) to regret, so the total regret s Ω(v max R). From now on assume that E 0[N] R. Note that by Pgeonhole Prncple, there are more than k agents such that E 0[N ] R/k. Furthermore, let us say that an agent s good f P 0[A t = ] 4 for more than T/6 dfferent rounds t. 5 We clam that there are more than k good agents. Suppose not. If agent s not good then P 0[A t = ] > 4 for at least 5 5 T dfferent rounds t, so f there are at least k/ such agents 6 then T = P T P k t=1 =1 P0[At = ] > k (5T) 4 kt/3 T, 6 5 contradcton. Clam proved. It follows that there exsts a good agent such that E 0[N ] R/k. Therefore the rght-hand sde of (4.1) s at most O(β). Pck β so that the rght-hand sde of (4.1) s at most 1. Then by Clam 4. for 10 at least T/6 dfferent rounds t we have P [A t = ] 9. In 10 each such round, f agent s not played then algorthm A ncurs regret Ω(ǫ v max) on problem nstance I. Therefore, the (total) regret of A on problem nstance I s Ω(ǫ v max T) = Ω(v max k 1/3 T /3 ). Theorem 4.3. In the settng of Theorem 4.1, fx k and v max and assume that R(T; v max) = O(v max T γ ) for some γ < 1. Then for every fxed δ 1 and λ < (1 γ) we have 4 R δ (T; v max) = Ω(δ v max T λ ). Proof. Fx λ (0, (1 γ)). Redefne µ s wth respect to a dfferent ǫ, namely ǫ = T λ/. Defne the problem nstances I n the same way as before: all bds are v max, the CTRs are gven by µ. Let us focus on agents 1 and. We clam that E 1[N 1] + E [N ] β T λ, where β > 0 s a constant to be defned later. Suppose not. Fx all bds to be v max. For each round t, consder event S t = {A t = 1}. Then by Clam 4. P 1[S t] P [S t] P 0[S t] P 1[S t] + P 0[S t] P [S t] O `ǫ (E 1[N 1] + E [N ]) 1 4 for a suffcently small β. Now, P 1[S t] 1 for at least T/ rounds t. Ths s because otherwse on problem nstance I regret would be R(T) Ω(ǫ Tv max) = Ω(v max T 1 λ/ ), whch contradcts the assumpton R(T) = O(v max T γ ). Therefore P [S t] 1 for at least T/ rounds t, hence on prob- 4

9 lem nstance I regret s at least Ω(ǫ Tv max), contradcton. Clam proved. Now wthout loss of generalty let us assume that E 1[N 1] β T λ. Consder the problem nstance n whch CTRs gven by µ 1, bd of agent s v max, and all other bds are v max(1 δ)/(1+ǫ). It s easy to see that ths problem nstance has δ-gap. Each tme agent 1 s selected, algorthm ncurs regret Ω(δv max). Thus the total regret s at least Ω(δN 1 v max) = Ω(δ v max T λ ). Matchng upper bound. Let us descrbe a very smple mechansm, called the nave MAB mechansm, whch matches the lower bound from Theorem 4.1 up to polylogarthmc factors (and also the lower bound from Theorem 4.3, for γ = λ = 9 and constant δ). 3 Fx the number of agents k, the tme horzon T, and the bd vector b. The mechansm has two phases. In the exploraton phase, each agent s played for T 0 := k /3 T /3 (log T) 1/3 rounds, n a round robn fashon. Let c be the number of clcks on agent n the exploraton phase. In the explotaton phase, an agent argmax c b s chosen and played n all remanng rounds. Payments are defned as follows: agent pays max [k]\{ } c b /c for every clck she gets n explotaton phase, and all others pay 0. (Exploraton rounds are free for every agent.) Ths completes the descrpton of the mechansm. Observaton 4.4. Consder the stochastc MAB mechansm desgn problem wth k agents. The nave mechansm s normalzed, truthful and has worst-case regret R(T;v max) = O(v max k 1/3 T /3 log /3 T). Proof. The mechansm s truthful by a smple secondprce argument. 10 Recall that c s the number of clcks got n the exploraton phase. Let p = max j c jb j/c be the prce pad (per clck) by agent f she wns (all) rounds n explotaton phase. If v p, then by bddng anythng greater than p agent gans v p utlty each clck rrespectve of her bd, and bddng less than v, she gans 0, so bddng v s weakly domnant. Smlarly, f v < p, then by bddng anythng less than p she gans 0, whle bddng b > p, she loses b p each clck. So bddng v s weakly domnant n ths case too. For the regret bound, let (µ 1,..., µ k ) be the vector of CTRs, and let µ = c /T 0 be the sample CTRs. By Chernoff bounds, for each agent we have Pr [ µ µ > r] T 4, for r = p 8log(T)/T 0. If n a gven run of the mechansm all estmates µ le n the ntervals specfed above, call the run clean. The expected regret from the runs that are not clean s at most O(v max), and can thus be gnored. From now on let us assume that the run s clean. 9 Independently, Devanur and Kakade [15] presented a verson of the nave MAB mechansm that acheves the same regret even n the more general model n whch the valueper-clck of an agent changes over tme and the agents are allowed to submt a dfferent bd at every round. Instead of assgnng all mpressons to the same agent n the explotaton phase, ther mechansm runs the same allocaton and payment procedure for each exploraton round separately (see [15] for detals). 10 Alternatvely, one can use Theorem 3.8 snce all exploraton rounds are bd-ndependent, and only exploraton rounds are nfluental, and the payments are exactly as defned n Theorem 3.1. The regret n the exploraton phase s at most k T 0 v max = O(v max k 1/3 T /3 log 1/3 T). For the explotaton phase, let j = argmax µ b. Then (snce we assume that the run s clean) we have (µ + r)b µ b µ j b j (µ j r)b j, whch mples µ jv j µ v r(v j + v ) r v max. Therefore, the regret n explotaton phase s at most r v max T = O(v max k 1/3 T /3 log /3 T). Therefore the total regret s as clamed. 5. EXTENSIONS AND OPEN QUESTIONS We extend our results n several drectons whch are fleshed out n the full verson [8]. Frst, we derve a regret lower bound for determnstc truthful mechansms wthout assumng that the allocatons are scale-free. In partcular, for two agents there are no assumptons. Ths lower bound holds for any k (the number of agents) assumng IIA, but unlke the one n Theorem 4.1 t does not depend on k. 11 Second, we extend our results to randomzed mechansms. We consder randomzed mechansms that are unversally truthful,.e. truthful for each realzaton of the nternal random seed. For mechansms that randomze over exploratonseparated determnstc allocaton rules, we obtan the same lower bounds as n Theorems 4.1 and Theorem 4.3. Thrd, we consder randomzed allocaton rules under a weaker verson of truthfulness: a mechansm s weakly truthful f for each realzaton, t s truthful n expectaton over ts random seed. We show that any randomzed allocaton that s pontwse monotone and satsfes a certan noton of separaton between exploraton and explotaton can be turned nto a mechansm that s weakly truthful and normalzed. Then we apply ths result to two algorthms n the lterature [, 14] n order to obtan regret guarantees for the verson of the MAB mechansm desgn problem n whch the clcks are chosen by an adversary. (Ths verson corresponds to the adversaral MAB problem [7, 14, 1, 9].) In partcular, for oblvous (resp. adaptve) adversares the upper bound matches our lower bound for determnstc allocatons up to (log k) 1/3 (resp. k /3 ) factors. Fourth, we consder the stochastc MAB mechansm desgn problem under a more relaxed noton of truthfulness: truthfulness n expectaton, where for each vector of CTRs the expectaton s taken over clcks (and the nternal randomness n the mechansm, f the latter s not determnstc). Followng our lne of nvestgaton, we ask whether restrctng a mechansm to be truthful n expectaton has any mplcatons on the structure and regret thereof. Gven our results on mechansms that are truthful and normalzed, t s temptng to seek smlar results for mechansms that are truthful n expectaton and normalzed n expectaton. 1 We rule out ths approach: we show that n order to obtan any non-trval lower bounds on regret and (essentally) any non-trval structural results, one needs to assume that a mechansm s ex-post normalzed, at least n some ap- 11 One would expect to obtan such bound by a reducton to the two-agent case. Interestngly, the trval reducton fals. 1 A mechansm s normalzed n expectaton f n expectaton over clcks (and possbly over the allocaton s randomness), each agent s charged an amount between 0 and her bd for each clck she receves.

Vickrey Auction VCG Combinatorial Auctions. Mechanism Design. Algorithms and Data Structures. Winter 2016

Vickrey Auction VCG Combinatorial Auctions. Mechanism Design. Algorithms and Data Structures. Winter 2016 Mechansm Desgn Algorthms and Data Structures Wnter 2016 1 / 39 Vckrey Aucton Vckrey-Clarke-Groves Mechansms Sngle-Mnded Combnatoral Auctons 2 / 39 Mechansm Desgn (wth Money) Set A of outcomes to choose