Optimal Auctions for Correlated Buyers with Sampling

Optimal Auctions for Correlated Buyers with Sampling Hu Fu Nima Haghpanah Jason Hartline Robert Kleinberg October 11, 2017 Abstract Cremer and McLean (1988) showed that, when buyers valuations are correlated, a seller with full knowledge of the distribution can extract the full surplus. We study whether this phenomenon persists when the seller has only partial knowledge of the distribution. We assume that the seller knows that the distribution is one of finitely many, and has access to samples (draws) from the true distribution. A learning approach that uses the samples to infer the true distribution cannot extract the full surplus, whereas a mechanism that uses the samples for randomization extracts the full surplus for each distribution, given enough samples. Our main result is a tight bound on the number of samples needed for full surplus extraction, which is the difference between the number of distributions and the dimension of the linear space they span, plus one. The authors would like to thank David Bindel, Nick Gravin, Bart Lipman, Luca Rigotti, Bruno Strulovici, and Gjergji Zaimi, and seminar participants at Pennsylvania Economic Theory Conference, SAET, and North American Summer Meeting of the Econometric Society for helpful discussions. Department of Computer Science, University of British Columbia Department of Economics, Penn State University EECS Department, Northwestern University Department of Computer Science, Cornell University

1 Introduction Cremer and McLean (1988) showed that if the values of the buyers in an auction are correlated, then generically the seller can design an auction that extracts the full surplus. A criticism to this result is that the seller requires full knowledge of the distribution of values in order to fine-tune the auction appropriately. Motivated by this criticism, we study the possibility of full surplus extraction when the seller has only partial knowledge of the distribution. Our model of partial knowledge has two components. First, it is common knowledge among the seller and the buyers that the true distribution belongs to a given finite set. 1 Second, the seller has access to a signal that may be informative of the true distribution. The signal can be interpreted as the external information that the seller obtains via market research or observing historical bid data. In the main application of our model, the signal consists of independent draws from the true distribution, which we refer to as samples. A sample can be interpreted as the observed profile of bids in a truthful past auction. A mechanism selects the allocation and payments based on the realized signal and the buyers bids. Crucially, we assume that while the choice of the mechanism may affect the bids, it does not affect the realized signal. To justify this assumption with our application, one can assume that the participants in the current and the past auctions have the same observable characteristics (therefore the distributions of values in the current and past auctions are identical), but are otherwise different individuals. Our goal is to identify the number of samples that guarantees the existence of a mechanism that extracts the full surplus for all possible distributions. We interpret the number of samples as a proxy to the amount of information required to extract surplus without full knowledge of the distribution. A natural approach to solving the problem is to use the samples for inference. That is, a mechanism may use the samples to identify the true distribution with high enough certainty, and then choose an auction. For example, one may use maximum likelihood estimation or a statistical learning method. However, if the possible distributions are hard to distinguish, such an approach may require a large number of samples. We consider two classes of mechanisms, show that they must learn the true distribution with high probability, and therefore must use a large number of samples. First, we consider selecting an auction that extracts the full surplus for the identified 1 Our results hold if the buyers have additional information, for instance if they know the true distribution. 1

distribution. A problem is that such an auction may violate individual rationally for the true distribution, if the identified distribution is not the true distribution. Therefore individual rationality will be violated unless the true distribution is identified with certainty. Even if we relax the participation constraint of the mechanism to only hold approximately, the mechanism must identify the true distribution with high probability and requires a large number of samples. Second, and to avoid violating the individual rationality constraint, we consider selecting an auction that respects individual rationality for all distributions in the set. The problem with this approach is that no single auction may extract the full surplus on all possible distributions. Therefore, to extract the full surplus, the mechanism must identify the true distribution with certainty. Even if we require the mechanism to extract only an approximation of full surplus, it must learn the true distribution with high probability and requires a large number of samples. Our main result, Theorem 2, identifies the number of samples that are necessary and sufficient for full surplus extraction. The number of samples is equal to the number of distributions in the set, minus the dimension of the linear space spanned by them, plus one. 2 This number is at least equal to one and at most equal to the number of distributions minus one. If the distributions are linearly independent, which is the case if the set consists of two non identical distributions, the number of samples is one. This is in stark contrast with the learning approach in which the number of samples grows unboundedly even with two distributions and after relaxing the full surplus extraction to hold approximately. The full surplus extracting mechanism is a direct extension of the Crémer-McLean s approach. The mechanism is a second price auction plus side payments that depend on the sample realization. To extract full surplus, the side payments for each buyer must be constructed to match the buyer s utility from the second price auction in expectation. This approach is common in many extensions of Crémer-McLean (e.g. McAfee and Reny, 1992; Liu et al., 2017), and is used in Lopomo et al. (2017) in a mechanism design setting with Knightian uncertainty. Our result shows that it is more effective to use the external information for randomization than learning. Since buyers bid before the samples are revealed, they perceive mechanism as random. Additionally, the probability of learning the true distribution with the number of samples identified in our main theorem may be arbitrarily small. In particular, we construct a parametric family of linearly independent distributions for which one sample suffices to 2 It is also necessary that full surplus extraction is possible for every distribution in the set. 2

extract the full surplus. However, as the number of distributions in the family grows or they get closer to each other, the probability of learning the true distribution with one sample goes to zero. To prove sufficiency of the number of samples identified in Theorem 2, the main step is to show that the number guarantees the existence of appropriate side payments. Such side payments exist if for each buyer, the conditional probability matrix that specifies the probability of a profile of other buyers values and samples conditioned on the buyer s type (its value as well as the index of the distribution) has full rank. We crucially use the independence assumption on samples to decompose the conditional probability matrix as an outer product of conditional probability matrices of value profiles and samples. We then identify, for a given set of vectors (representing conditional probabilities for each value of a buyer), the number of times that each vector should be multiplied by itself (in the outer product sense) such that the resulting set of vectors are linearly independent. A challenge of proving the necessity of the number of samples is that simply showing that the conditional probability matrix discussed above is rank deficient is not sufficient to argue that no full surplus extracting mechanism exists. The reason is that the set of possible utilities induced by second price auction for different distributions may satisfy certain linear equations. 3 As a result, even though the conditional probability matrix discussed above is rank deficient, payments may exist that match utilities with the given linear relationship. To prove the theorem, we explicitly study the linear dependence of the conditional probability matrix. We show that in our constructed instances, the interim induced utilities of second price auctions may not be matched by any side payments. Our work can be seen as a reexamination of features of commonly used auction models in order to identify a feature that is responsible for the unsettling prediction of full surplus extraction. Other papers have shown that full surplus extraction is not generically possible if the assumptions of risk neutrality and unlimited liability are relaxed (Robert, 1991). The particular feature we study is the assumption that the seller has full knowledge of the distribution of types. We show that this assumption is mostly innocent: full surplus extraction persists even with sample access to the true distribution. 3 For example, the expected utility of a distribution that is a convex combination of a set of other distributions is itself a convex combination of utilities of those other distributions. 3

1.1 Related Works Other papers have studied design of mechanisms without the knowledge of the distribution of types. Segal (2003) considers a monopolist who has a prior over the possible distributions. Brooks (2013) considers a seller who only knows that buyer s beliefs are consistent with a common prior. In Bergemann et al. (2016), the seller knows the distribution of values but not the agent s beliefs. Neeman (2003) considers (among other models) a seller who chooses auction parameters only based on the expected values of the buyers. Importantly, in those models, unlike ours, the seller does not have external access to additional information about the distribution in the form of samples. Instead, the seller can infer a buyer s value statistically from other buyers bids. A related literature to our work studies genericness of priors that admit full surplus extracting mechanisms in universal type spaces (Heifetz and Neeman, 2006; Barelli, 2009; Chen and Xiong, 2013). These papers seek implementation in Bayes-Nash equilibria, whereas our solution concept is ex post equilibria, and hence we do not explicitly model high order beliefs. More importantly, in those paper the distribution is known to the mechanism. That is, the question is whether a mechanism exists that can extract surplus for a given distribution. In our setting, on the other hand, we ask whether a single mechanism exists that can extract surplus for all distributions in a set (equipped with the additional power of having access to samples). A large body of literature is dedicated to robust mechanism design (Bergemann and Morris, 2005; Chung and Ely, 2007), that is mechanisms that are less sensitive to details such as the agents higher order beliefs. Since we study ex-post implementation, the buyers higher order beliefs need not be known. However, since our individual rationality condition is defined in expectation, a buyer s entry decision does indeed depend on its beliefs, and therefore for a buyer to deduce that other buyers will participate in the mechanism, it must be common knowledge among the agents that the distribution belongs to the set of candidate distribution. Interim individual rationality is an important feature in Cremer and McLean (1988) and its extensions, and one interpretation of our work is that it is interim individual rationality, and not full knowledge of distribution, that is crucial to existence of full surplus extracting mechanisms. Our work is also related the literature on model misspecification and model uncertainty (Madarász and Prat, 2017; Bergemann and Schlag, 2011). A key difference is that in the model misspecification literature, it is assumed that the designer knows a distribution that is close to the true distribution, whereas in our setting, the true distribution belongs to a 4

set of possible distributions. Prior-independent revenue maximization and sampling have been extensively studied in computer science literature (e.g. Fu et al., 2013; Cole and Roughgarden, 2014; Morgenstern and Roughgarden, 2016; Cai and Daskalakis, 2017), although they all assume independent value distributions and focus on obtaining approximately optimal mechanisms. The most relevant to this work is Dhangwatnotai et al. (2010) s single-sampling auction, which showed that with one sample from each bidder s valuation distribution, the VCG auction with the samples as reserve prices gives a 4-approximation to the optimal revenue, when the distributions are regular. As an extension, Roughgarden and Talgam-Cohen (2013) gave a single-sampling mechanism for the more general interdependent value settings under various assumptions, although the benchmark is the optimal revenue under ex post individual rationality. 2 The Setting A seller has a single indivisible item to sell to at most one of n potential buyers. Each buyer i privately knows its value v i. The value v i belongs to a finite set V i R +, with a minimum element v i and a maximum element v i. Let v = (v 1,..., v n ) be a profile of values, and V = i V i the set of possible value profiles. For a buyer i, v i V i = i i V i profile of values of buyers other than i. A deterministic allocation is x X = {(x 1,..., x n ) {0, 1} n, i x i 1}, where x i is the indicator for buyer i s allocation. A randomized allocation is x (X), where x i is the probability that i gets the item. Buyers are risk neutral and have quasilinear utilities in money. That is, the utility of player i with value v i V i for allocation x and payment p is v i x i p. Although our results extend to more general settings, we restrict attention to this setting to focus on the key features of our model. 4 Let S be a finite set of signals with elements s S. Let F = {F 1,..., F m } be a finite set of distinct joint distributions over value profiles v V and signals s S. F j is a That is, (V S) for all j where F j (v, s) is the probability of (v, s) according to the j th distribution. We assume that it is commonly known to all buyers and the seller that the 4 In particular, it is possible to extend the setting to allow for interdependence of utilities, in the sense that a buyer s willingness to pay depends on other buyers signals. It is also possible to extend a setting to a multi-alternative setting with multidimensional types. In such a setting, each player i has a finite set of preference types Θ i. Each player i has a valuation function v i : A Θ i R. The utility of player i with preference type θ i Θ i for alternative a A and payment p R is v i (a, θ i ) p. 5

true distribution belongs to F. We invoke the revelation principle and focus on direct mechanisms. A (direct) mechanism consists of a pair functions (x, p). The function x is the allocation function mapping actions and signals, possibly at random, to allocations x : V 1... V n S (X). The function p is the payment function mapping actions and signal to payments p : V 1... V n S R n. 5 A mechanism is ex post incentive compatible (ex post IC) if the following holds for all i, v i, v i,, v i, and s. 6 v i x i (v i, v i, s) p(v i, v i, s) v i x i (v i, v i, s) p(v i, v i, s). A mechanism is interim individually rational (IIR) for F if, for any buyer i, value v i, and distribution F j F, E (v,s) F j [ vi x i (v, s) p i (v, s) v i 0. (1) That is, for any value v i, buyer i expects non-negative utility from participation, regardless of which distribution is the true distribution. Given a set of distributions F = {F 1,..., F m }, our goal is to construct a mechanism that extracts full surplus for all distributions in the set. Definition 1. Mechanism (x, p) extracts full surplus on F if 1. The mechanism is ex post IC. 2. The mechanism is IIR for F. 3. In expectation for each distribution in F, the revenue of the mechanism equals the highest value. That is, for all j, E (v,s) F j i [ p i (v, s) = E (v,s) F j max v i. i We simply call a direct mechanism satisfying properties (1) and (2) above a F-feasible mechanism. We say that full surplus extraction is possible for F if there exists a mechanism (x, p) 5 By risk neutrality there is no loss in focusing on deterministic payment rules. 6 Recall that the allocation x i is allowed to be randomized. It may seem that a more restrictive definition would be to require the condition to hold for every internal random choice of the mechanism, instead of in expectation over the random choices of the mechanism, as is stated above. However, this is not the case. We will argue this after the characterization of incentive compatibility. 6

that extracts full surplus on F. In our setting, the characterization of ex post IC mechanisms is standard. Since we consider finite sets of values, the payment function is not pinned down uniquely by the allocation function (even after fixing the utility of the lowest type). Instead, any monotone extension of the allocation function to real numbers can be used to define a payment function. Lemma 1. A mechanism (x, p) is ex post IC if and only if, 1. x i is monotone nondecreasing in v i for all i, v i, and s. 2. p i (v, s) = p A i (v i, s)+v i x i (v) v i z=v i y i (z, v i, s)dz, for some function y i (, v i, s) : R [0, 1, where y i (z, v i, s) is monotone non-decreasing in z and coincides with x i on V i, that is y i (v i, v i, s) = x i (v i, v i, s) for all v i V i. 7 Recall our main application of a seller who does not know the distribution of values. In our application, the signal s is the the history of auction values, which may contain complete profiles of values, or simply the second highest value. 8 The signal is observable by all players and the outcome of the mechanism can depend on s. However, buyers bid in the auction before the signal s is revealed. The goal is to design a mechanism that extracts full surplus for any distribution in F, assuming it is common knowledge among buyers that the true distribution is in F. We often focus on a special case where the signal s consists of independent draws from the distribution of values. Let D j be the marginal probability distribution of F j on value profiles, i.e., D j (v) is the probability of value profile v. We say that F is a k-sample set if S = V k and for each j and (s 1,..., s k ) S, F j (v, s) = D j (v) D j (s 1 )... D j (s k ). In this case, we refer to each s l as a sample, and abusing notation, represent F as {D 1,..., D m }. By specifying a number of samples k needed, we quantify the amount of information required for full surplus extraction. Another special case of our model is when the family of distributions is a singleton, F = {F 1 }. In this case we simply say that the distribution is known to the mechanism. Note that in that case, the signal s does not reveal any extra information about the distribution 7 We now revisit the connection between internal randomization of a mechanism and the definition of ex post IC first discussed in footnote 6. Any randomized allocation and payment rule (y i, p i ) of Lemma 1 can be implemented by offering the buyer a random take it or leave it price P such that Pr[P v i = y i (v i, v i, s) for all v, s, plus a transfer p A i (v i, s). For any realization of P, truthfulness maximizes buyer s utility. Therefore, the mechanism is ex post IC even accounting for internal randomization of the mechanism. 8 This is justified by assuming that the past auctions were incentive compatible, and the bidders in the past and current auctions are different but come from the same distribution. 7

and thus there is no gain in conditioning a mechanism on s. In particular, there is no loss in assuming that (x, p)(v, s) = (x, p)(v, s ) for all v, s, and s 9. We will refer to such a mechanism as an auction, and use A to denote the set of auctions. For an auction, ex post IC is equivalent to dominant strategy incentive compatibility (DSIC). Also an auction is IIR for F = {F 1 } if for all i and v i, E v D j [ vi x i (v) p i (v) v i 0. Cremer and McLean (1988) showed that, if the distribution is known to the mechanism and under a fairly lenient condition on the value distribution, there exists a DSIC and interim IR auction that extracts the full surplus. To state the Crémer-McLean result, we start with some notation for distributions. Definition 2. For j {1,..., m}, i {1,..., n}, and v i V i, 1. let D ( ) j = D j (v) be the distribution D j over v, represented as a vector of size V. 2. let D j v i = v V ( ) D j (v i v i ) v i V i be the distribution D j over v i conditioned on v i, represented as a vector of size V i. We now state the condition used by Crémer-McLean. Definition 3. A valuation distribution D j is said to satisfy the Crémer-McLean condition if, for each bidder i, the V i vectors in { D j v i } vi V i are linearly independent. We now restate the Crémer-McLean theorem in our setting. Theorem 0 (Cremer and McLean, 1988). Consider the case of a known distribution F = {F 1 }. There exists an auction that extracts full surplus for F if and only if the marginal distribution of F 1 on value profiles D 1 satisfies the Crémer-McLean condition. 10 Let us call a family of distributions F a Crémer-McLean family if for each F j F, the marginal distribution on values D j satisfies the Crémer-McLean condition. Directly from Definition 2, one can see that if a mechanism extracts full surplus on F, then it must also 9 To see this, for F = {F 1 }, consider a F-feasible mechanism (x, p). Construct a mechanism ( x, p) that redraws s as follows: x(v, s) = E (v,s ) F 1[x(v, s ) v = v and x(v, s) = E (v,s ) F 1[x(v, s ) v = v. Note that ( x, p) is also F-feasible, is invariant of s, and extracts full surplus on F = {F 1 } if so does (x, p). 10 Strictly speaking, to obtain the only if result the setting needs to be generalized to the one described in footnote 4. Our possibility results apply to the generalized setting as well. To avoid the extra notation, we simply treat Crémer-McLean distribution as necessary condition for our setting as well. 8

extract surplus on a singleton family {F j } for all j. Therefore, by Theorem 0, F must be a Crémer-McLean set. We later conditions that are necessary and sufficient for full surplus extraction. 3 Limits of Learning Approaches In this section we consider two natural approaches to the problem, and show via an example that they fail. To quantify the amount of information needed, throughout this section we focus on k-sample instances. First, we consider the approach of using the samples to identify a distribution and then selecting a full surplus extracting auction for the identified distribution. We show that any such mechanism violates individual rationality. More severely, even if individual rationality is relaxed to an approximate notion, any such mechanism requires many samples. Second, and to avoid violating the IR constraint, we formulate a class of signal-feasible mechanisms that use the samples to select a feasible auction to run (that is, the auction must be IIR for all distributions in F). We show that to get a revenue that is close to full surplus, any such mechanism requires to have access a number of samples that goes to infinity as the desired error goes to zero. The analysis of this section is based on the following instance. Example 1. Fix a positive real number ɛ, 0 < ɛ < 1. Consider two buyers whose values are generated by the following process: two random variables ν 1, ν 2 {1,..., H} are independently and identically drawn such that Pr[ν i h = 1/h, for all h {1,..., H}. With probability 1 ɛ, the two values are defined as v 1 = ν 1 and v 2 = ν 2 ; with probability ɛ, the higher of the two random variables is assigned to buyer 1, and the lower to buyer 2, v 1 = max(ν 1, ν 2 ), v 2 = min(ν 1, ν 2 ). Call the resulting correlated distribution D 1. Define another distribution D 2 by exactly the same procedure but eventually switching the values of the two buyers. As a result, player 1 is more likely to have a higher value than player 2 in D 1, and the opposite holds in D 2. The k-sample family of distributions is F = {D 1, D 2 }. Let the full surplus FS(H) be the expected maximum value in either distribution (the full surplus is identical for the two distributions). The lemma below bounds the number of samples that are required to learn the true distribution, and is used in the analyses of both subsections. As ɛ gets smaller, the two distributions are harder to distinguish, and more samples are required to distinguish them. Formally, we say that a function h : S ({1,..., m}) learns the true distribution with error at most δ if Pr s ( D j ) k[h(s) j δ for j {1,..., m} (the expectation is also over 9

the randomization of h). The lemma identifies to number of samples required to learn the true distribution with error at most δ. Lemma 2. Consider the k-sample family of distributions in Example 1. If there exists a function h : S ({1, 2}) that learns the true distribution with error at most δ, then the number of samples k must be at least (1 2δ) log((1/δ) 1) 1 ɛ ɛ 2. We additionally show that each distribution D 1 and D 2 satisfies the Crémer-McLean condition. Therefore, if the true distribution is known, there exists an auction that extracts full surplus. Lemma 3. Both distributions D 1 and D 2 in Example 1 satisfy the Crémer-McLean condition. Therefore, for each D j there exists an auction that extracts full surplus. 3.1 Using the Samples to Select a Full Surplus Extracting Auction Motivated by the fact that each distribution admits a full surplus extracting auction (Lemma 3), it is natural to consider an approach that uses the samples to identify the true distribution, and then selects an auction that extracts full surplus for the identified distribution. We call such a mechanism a learning Crémer-McLean mechanism. The lemma below shows that any learning Crémer-McLean mechanism violates IIR. In addition, even if the IIR requirement is relaxed, any such mechanism requires many samples to be approximately feasible. The main idea is that an auction that extracts full surplus on D 1 satisfies the IIR condition for D 1 with equality. On the other hand, and auction that extracts full surplus on D 2 violates the IIR condition for D 1 significantly. Therefore, for a learning Crémer-McLean mechanism to not violate the IIR condition by much, it must learn the true distribution with small error. But from Lemma 2 we know that the number of samples required to limit the misidentification probability is large. Recall that A is the set of auctions (i.e., signal invariant mechanisms). A mechanism can be specified by a learning rule L : S A that selects an auction L(s) given the signal s. We say that a mechanism learns the true distribution with error as most δ if there exist mutually disjoint sets A 1,..., A m A such that Pr s ( D j ) k[l(s) / A j δ for j {1,..., m}. 10

A mechanism (x, p) is σ-iir for distribution F j if for all i, v i E (v,s) F j [ vi x i (v, s) p i (v, s) v i σ. Proposition 1. Consider the k-sample family of distributions F in Example 1. For H = 2, consider any learning Crémer-McLean mechanism that is σ-iir for each distribution in F. Then the mechanism must learn the true distribution with error at most δ = 4σ/(0.5 + ɛ) and the number of samples k is at least (1 2δ ) log((1/δ ) 1) 1 ɛ ɛ 2. Note that as either ɛ or σ goes to zero, the number of samples goes to infinity. 3.2 Using the Samples to Select a feasible Auction To avoid violating the IIR condition, one approach would be to use the samples to learn about the true distribution, and then to select an F-feasible auction (an auction that respects IIR for all distributions in F). We show that to get a revenue that is close to full surplus on a k-sample F, any such mechanism must learn the true distribution with small error and therefore need a large number of samples. Formally, we call a mechanism x, p a signal-feasible mechanism if for all s, the auction x(, s), p(, s) is F-feasible. That is, (x, p) is ex post IC, and for all v i, D j F, and s, E v D j [ vi x i (v, s) p i (v, s) v i 0. Note that effectively the only restriction imposed on this class is that individual rationality must be satisfied for each realization s, whereas to satisfy IIR (Inequality 1), a mechanism needs to satisfy the condition only in expectation over s. Theorem 1. Consider the k-sample family of distributions in Example 1. Consider a signalfeasible mechanism such that E v D j,s ( D j ) k i p i (v, s) (1 δ)fs(h), j {1, 2}. 11

Then the mechanism must learn the true distribution with error at most δ, δ = δfs(h) FS(H) (4 + 2ɛ) and the number of samples k is at least (1 2δ ) log((1/δ ) 1) 1 ɛ ɛ 2. (2) The full surplus FS(H) is at least the expected value of each player, which is log(h). In the limit as H gets large, δ = δ and therefore the bound of Theorem 1 simply becomes k (1 2δ) log((1/δ) 1) 1 ɛ ɛ 2. For a fixed δ, the rate of growth is quadratic in 1/ɛ, and for a fixed ɛ, the rate of growth is logarithmic in 1/δ. To prove Theorem 1, we show that no F-feasible auction can achieve a large revenue for both distributions. Therefore, to perform adequately, any signal-feasible mechanism must limit the probability of misidentifying the distribution. We then invoke Lemma 2 to bound the number of samples. To show that no F-feasible auction can achieve a large revenue for both distributions, we first invoke Lemma 1. Consider any auction (x, p). By Lemma 1, the auction s payment is the sum of two parts p A and p B, p i (v) = p A i (v i ) + p B i (v); p B i (v) = v i x i (v) vi z=v i y i (z, v i )dz, (3) where y i (, v i ) is an extension of x i with support over [v i, v i. The part p A is the constant term that behaves as a lottery when v i is drawn at random. The part p B is to maintain incentive compatibility for fixed v i, and v i z=v i y i (z, v i )dz is the information rent of type v i. We bound the expectation of each part p A and p B in the following two lemmas. The first lemma shows that the expectation of p A can not be high simultaneously for both distributions D 1 and D 2. In particular, it shows that for each player, the sum of the expectation of p A over D 1 and D 2 is at most 2. Note that the lemma does not directly imply a bound for either distribution in isolation. That is, E v D 1[p A 1 (v) may be a large positive number, but only if E v D 2[p A 1 (v) is a large negative number. Indeed, in the Crémer-McLean 12

construction with a known distribution, p A is constructed to match the expected utility of a buyer in the second price auction. To describe our approach, let us first define D 0 to be the distribution resulting from drawing v 1 and v 2 independently such that Pr[v i h = 1/h, for all h {1,..., H}. The proof outline is as follows. The distribution D 0, which is a product distribution, is the average of D 1 and D 2. Therefore to prove the lemma we bound the expected payment in D 0. By IIR, the expected payment of the lowest value in D j can not be too large, and since D 0 is close to D j, the expected payment of the lowest value is small in D 0 as well. But since D 0 is a product distribution, the expected payment of all values are equal and are small. Lemma 4. In Example 1, any F-feasible auction (x, p) satisfies E v D 1 [ [ p A i (v) + E v D 2 p A i (v) 2, i. We next bound p B. This bound follows from noting that each of the two distributions D 1 and D 2 are close to the distribution D 0, and that the expected revenue under distribution D 0 can be bounded using Myerson s characterization of optimal auctions for product distributions (Myerson, 1981). The first lemma below follows from Myerson s characterization. The revenue curve associated with D 0 is a constant at 1, since the revenue of posting a price p is p Pr[v i p = p 1/p = 1. Therefore the virtual value (marginal revenue) of any type is zero, and the expected virtual surplus is zero. Revenue is the expected virtual surplus plus the payment of the lowest type, which is 1 for each player. Lemma 5. In Example 1, the expected revenue of any DSIC and ex post IR auction, over distribution D 0, is at most 2. The next lemma follows from the fact that the probability of any profile of values v in D 1 and D 2 is at most 1 + ɛ times the probability of v in D 0. Lemma 6. In Example 1, any F-feasible auction (x, p) satisfies E D j i p B i (v) 2(1 + ɛ), j {1, 2}. By combining Lemma 4 and Lemma 6, we conclude that no F-feasible auction can achieve large revenue in expectation over both distributions. 13

Lemma 7. In Example 1, any F-feasible auction (x, p) satisfies E D 1 i p i (v) + E D 2 i p i (v) 8 + 4ɛ. We can now combine Lemma 2 and Lemma 7 to prove Theorem 1. By Lemma 7, since no F-feasible auction can perform well on both distributions, the learning rule must select an auction that performs well for the true distribution with high probability. Lemma 2 specifies the number of samples that are required for any given probability of identifying the true distribution. Proof of Theorem 1. For j {1, 2}, let M j M SI (F) be the subset of all F-feasible auctions (x, p) that achieve a revenue of at least 4 + 2ɛ in expectation over distribution D j, that is E v D j i p i (v) 4 + 2ɛ. (4) By Lemma 7, any auction in M SI (F) that satisfies the above inequality for j must violate it for j, that is, M 1 M 2 =. Now consider any signal-feasible mechanism (x, p) identified by a learning rule L. By definition of M j, for any given s, E v D j i p i (v, s) 4 + 2ɛ, if L(s) / M j, and by IIR, E v D j i p i (v, s) FS(H), if L(s) M j. 14

(Recall that FS(H) is the full surplus of either distribution in Example 1.) Therefore we can write E v D j,s ( D j ) k i p i (v, s) [ [ Pr s ( D j ) L(s) k Mj FS(H) + (1 Prs ( D j ) L(s) k Mj )(4 + 2ɛ) [ ( ) = Pr s ( D j ) L(s) k Mj FS(H) (4 + 2ɛ) + (4 2ɛ). By the assumption of the theorem, for all j, E v D j,s ( D j ) k i p i (v, s) (1 δ)fs(h). We therefore must have [ ( ) Pr s ( D j ) L(s) k Mj FS(H) (4 + 2ɛ) + (4 2ɛ) (1 δ)fs(h) and therefore for all j, Pr s ( D j ) k [ L(s) Mj (1 δ)fs(h) (4 + 2ɛ) FS(H) (4 + 2ɛ) = 1 δfs(h) FS(H) (4 + 2ɛ). Define the function h such that h(s) = 1 if L(s) M 1 and h(s) = 2 if L(s) M 2 (and define h arbitrarily elsewhere). The above equation becomes, Pr s ( D j ) k [ h(s) = j 1 δfs(h) FS(H) (4 + 2ɛ). The theorem follows from the above inequality and Lemma 2. 4 Mechanisms with Internal Samples In this section we prove the main theorem of the paper. The theorem quantifies the amount of information required for full surplus extraction. Let us define the terminology. Recall from Section 2 that a family of distributions F is k-sample if in each F j F, the signal s = (s 1,..., s k ) consists of k independent draws from 15

D j. Recall also that for each distribution D j, D j = (D j (v)) v V is the vector of probabilities of value profiles v. The dimension of a vector space is the cardinality of its basis. The dimension of a k-sample family of distributions F is the dimension of the linear space spanned by { D 1,..., D m }. Note that the dimension of F is between 2 and m. 11 Recall that a family of distributions F a Crémer-McLean family if for each F j F, the marginal distribution on values D j satisfies the Crémer-McLean condition. Theorem 2. Consider any m and d such that 2 d m. Full surplus extraction is possible for all Crémer-McLean k-sample families of distributions F of size m and span d if and only if k m d + 1. Recall from Section 2 that to get full surplus extraction on F, it is necessary that each distribution D j satisfies the Crémer-McLean condition. The theorem shows that additionally, having access to k m d + 1 samples is necessary and sufficient for full surplus extraction. Since 2 d m, the sufficient number of samples is at least 1 and at most m 1. The dimension of the space spanned by any two distinct distributions is 2. As a result, only 1 sample is sufficient to extract full surplus for any two distributions. Contrast this with Theorem 1 regarding signal-feasible mechanisms. We prove the necessary and sufficient directions of Theorem 2 separately in Section 4.3 (Proposition 3) and Section 4.4 (Proposition 4). We start by defining a class of mechanisms in Section 4.1 that extract full surplus give the conditions of Theorem 2. 4.1 CM Mechanism with Samples We first define a class of mechanisms that extend the Crémer-McLean construction to our setting, without requiring F to be k-sample. Similar to the Crémer-McLean construction and its extensions (McAfee and Reny, 1992; Lopomo et al., 2017), a mechanism in this class consists of two components. First, a second-price auction is run. The allocation of the second price auction is efficient, but buyers have positive expected utility from participation in it. Second, to extract the remaining surplus from the buyers, each buyer makes an additional side payment to the mechanism. In order to ensure that these side payments do not violate incentive compatibility, each buyer s side payment depends only on the reports of other buyers and the realized signal (i.e., the payment does not depend on the buyer s 11 The dimension can not be 1. Otherwise, distributions must be a scaled versions of each other. This is not possible for probability distributions. 16

own report). The class is defined formally below. The side payments are unrestricted in the definition below, and will be constructed later to extract the full surplus. Definition 4. A CM mechanism with samples works as follows: 1. The allocation is efficient. That is, x i (v, s) > 0 only if v i = max j v j, and also i x i(v, s) = 1 (ties among maximum bids are broken arbitrarily). 2. The payment consists of two parts. First, a second price payment p SP i A (v) that is the second highest value if i gets the item and zero otherwise. Second, a side payment q i (v i, s) that for each buyer i depends only on the values of other buyers. Each buyer s payment is the sum of the two parts, p i (v, s) = p SP i A (v) + q i (v i, s). For each buyer i and value profile v, let u SPA i (v) be the ex post utility of buyer i in the second price auction. 12 For each buyer i with value v i and distribution j, let [ u SPA i,j (v i ) = E v D j u SPA i (v) be the interim expected utility of buyer i in the second price auction. Now assume that there exists side payments q i such that E (v,s) F j [ qi (v, s) v i = u SPA i,j (v i ), i, v i, j. (5) Then, the interim utility of buyer i with value v i is zero in the CM mechanism with samples, for any distribution j. Since the allocation of the mechanism is efficient, this implies full surplus extraction for any distribution. Therefore, full surplus extraction is possible if side payments that satisfy Equation 5 exist. The lemma below specifies conditions for existence of such side payment functions. It provides conditions on F under which a solution exists for any right hand side u SPA, without assuming any structure on u SPA. In particular, it shows that for any i, the set of possible interim expected side payments {(E (v,s) F j[q i (v, s) v i ) j,vi } qi is equal to R m V i. The proof is standard. Similar to Definition 2, we first define F v j i as the vector representation of the probability of (v i, s) under distribution j conditioned on v i, below. Definition 5. For j {1,..., m}, i {1,..., n}, and v i V i, 12 The utility is well defined regardless of how ties are broken, since in case of a tie, a buyer with maximum value has zero utility regardless of how the tie is broken. 17

1. let F ( ) v j i = F j (v i, s v i ) be the distribution F j over (v i, s) conditioned on v i V i,s S v i, represented as a vector of size V i S. Compare F j v i with D j v i in Definition 2. Whereas F j v i is a distribution of (v i, s), D j vi is simply a distribution of v i. Lemma 8. Consider a family of distributions F = {F 1,..., F m }. There exists a Crémer- McLean mechanism with samples that extracts the full surplus on F, if for each bidder i, the set of V i m vectors { F j v i } vi V i,j {1,...,m} are linearly independent. Example 2. Let us revisit Example 1. For simplicity set H = 2 and ɛ = 0.5. For buyer 1, the interim utilities in the second price auction are u SPA 1,j (1) = 0 for all j, and u SPA 1,j (2) = Pr[v 2 = 1 v 1 = 2, which is equal to 3/4 if j = 1, and 1/3 if j = 2. We look for a side payment function q 1 (v 2, s) to match the interim utilities. It turns out that it would suffice to use a nonzero side payment only if the observed value in the sample does not match. So let us assume s {1, 2} denotes the player with the higher value. The probabilities are Pr D 1[s = 1 = 3/8, Pr D 1[s = 2 = 1/8, Pr D 2[s = 1 = 1/8, Pr D 2[s = 2 = 3/8. We need q 1 to solve the following system (v 1, j)\(v 2, s) (1, 1) (1, 2) (2, 1) (2, 2) 2 (1, 1) 3 2 1 1 3 1 1 q 3 8 3 8 3 8 3 8 1 (1, 1) 0 u 1,1 (1) (2, 1) 3 4 3 3 1 1 3 1 1 8 4 8 4 8 4 8 q 1 (1, 2) 3 = u 4 1,1 (2) (1, 2) 1 1 3 3 q 1 (2, 1) 0 u 1,2 (1) 1 4 8 1 (2, 2) 1 3 8 3 4 8 1 3 3 8 1 4 8 2 1 3 8 3 4 8 2 3 3 8 q 1 (2, 2) 1 u 3 1,2 (2) Note that the probabilities in each row do not sum up to 1 since we removed the samples where the values are equal. The solution to the system is q 1 (1, 1) = 6, q 1 (1, 2) = 6, q 1 (2, 1) = 17, q 1 (2, 2) = 3. 4.2 Learning is not Necessary for Surplus Extraction We now argue that it is possible to extract full surplus without learning the true distribution. Consider the following extension of Example 1 with n buyers. Example 3. The k-sample family of distributions is F = {D 1,..., D n }, where D i is defined as follows. Draw n random variables ν 1,..., ν n {1, 2} independently and identically drawn such that Pr[ν i h = 1/h, for all h {1, 2}. With probability 1 ɛ, the values are assigned 18

directly v i = ν i for all i ; with probability ɛ, player i is has the maximum value v i = max i ν i, and the remaining n 1 random variables are uniformly assigned to the remaining n 1 buyers. The following proposition shows that the span of F in Example 3 is n, and therefore full surplus extraction is possible with only 1 sample. On the other hand, with only one sample it is not possible to learn the true distribution with high probability. Proposition 2. Full surplus extraction is possible with 1 sample for the instance given in Example 3. With 1 sample, there exists no mechanism that learns the true distribution with error at most (1 ɛ)(1 1/n). 4.3 The Upper Bound on the Number of Samples Needed We now state the sufficiency part of Theorem 2. Proposition 3. Consider a Crémer-McLean k-sample family of distributions F of size m and span d. If k m d + 1, then there exists Crémer-McLean mechanism with samples that extracts the full surplus on F. Proposition 3 is a corollary of Lemma 8 and the lemma below. Lemma 9. Consider a Crémer-McLean k-sample family of distributions F of size m and span d. If k m d+1, then for each bidder i, the set of V i m vectors { F j v i } vi V i,j {1,...,m} are linearly independent. The rest of this subsection proves Lemma 9. Notation will be greatly simplified by using outer products on vectors. The outer product of two vectors A = (a i ) i I R I of size I and B = (b j ) j J R J of size J, denoted C = A B, is a vector C = (a 1 B,..., a I B) of size I J. Outer products are bilinear and associative, but in general are not commutative. 13 We use the following standard property of outer products. Lemma 10. Consider a set of linearly independent vectors A = {A 1,..., A m } and, for each j = 1,..., m, a set B j of linearly independent vectors. The set of vectors in the set {B A j } j {1,,m},B Bj (of size j B j ) are linearly independent. We also establish the following property on independence of outer product of vectors. Let ( A) k denote the outer product of k copies of A. 13 Two A B and B A are identical only up to permutations, for example (1, 2) (3, 4) = (1(3, 4), 2(3, 4)) = (3, 4, 6, 8) and (3, 4) (1, 2) = (3(1, 2), 4(1, 2)) = (3, 6, 4, 8). 19

Lemma 11. Consider a set of m vectors {A 1,..., A m }. Let d be the dimension of the linear space spanned by {A 1,..., A m }. The set of vectors in {( A 1 ) k,..., ( A m ) k } are linearly independent if k m d + 1. We next use Lemma 10 and Lemma 11 to prove Lemma 9. Proof of Lemma 9. Fix a buyer i. Recall from Definition 5 the definition of F, for all j, and v i, F j v i = ( ) F j (v i, s v i ). v i V i,s S Since F is k-sample, we have, F j v i = ( ) D j (v i v i ) D j (s 1 )... D j (s k ). v i,s Using the outer product notation, this simplifies to (see Definition 2 and Definition 5) F j v i = D j v i ( D j ) k. (6) Recall that by assumption, k m d + 1, where d is the dimension of the linear space spanned by { D 1,..., D m }. Therefore, by Lemma 11, the m vectors in {( D 1 ) k,..., ( D m ) k } are linearly independent. Also, by the assumption that for each j the distribution D j satisfies the Crémer-McLean condition, the V i vectors in { D v j i } vi V i are linearly independent. We can then apply Lemma 10, to conclude that the vectors in { F v j i } vi V i,j {1,...,m} are linearly independent. In particular, define the set A = {( D 1 ) k,..., ( D m ) k } of linearly independent vectors as argued above. Also, for each j {1,..., m}, define the set B j = { D v j i } vi V i. Given the Crémer-McLean condition, each B j consists of linearly independent vectors. Now plug into Lemma 10 to conclude that the vectors in the set {B A j } j {1,,m},B Bj = { D v j i ( D j ) k } j {1,,m},vi V i = { F v j i } j {1,,m},vi V i are also linearly independent. 4.4 The Lower Bound on the Number of Samples Needed We now show that the number of samples in Theorem 2 is necessary. 20

Proposition 4. Consider any m and d such that 2 d m. If k m d then there exists a Crémer-McLean k-sample family of distributions F of size m and span d such that full surplus extraction is not possible for F. Let us first point out a difficulty. Recall that Proposition 3 was established through Lemma 8 which ensured that for each buyer i, the set of possible interim expected side payments is equal to R m Vi. Thus, for any interim utilities u SPA i of the second price auction, side payments q i exist that extract full surplus. To prove the converse of the theorem, it is not sufficient to show that the set of possible interim expected side payments is a strict subset of R m Vi. The reason is that the set of interim utilities u SPA i is structured. For instance, consider distributions D 1, D 2, and D 3 such that for a buyer i, D 3 (v i v i ) = D 1 (v i v i )/2+D 2 (v i v i )/2. Then it must be that u SPA i,3 = u SPA i,1 /2+u SPA i,2 /2. As a result, even though the set of interim expected side payments is not equal to R m V i, a side payment may exists for each utility function satisfying u SPA i,3 = u SPA i,1 /2 + u SPA i,2 /2. We first prove the case where d = 2, and later discuss the generalization which is a simple extension. The proof is based on the following instance. Example 4. Player 1 has two possible values, v 1 {2, 3}. There is only two profile of values that other players can possibly have, v 1 i and v 2 i. 14 We only assume that the maximum value in v 1 1 and v 2 1 is 1, that is max j i v 1 j = max j i v 2 j = 1, and otherwise leave them unconstrained. Construct basis distributions B 1, B 2 as follows. B 1 = v 1 i v 2 i ( ) 2 1/2 0 3 0 1/2, B 2 = v 1 i v 2 i ( ) 2 0 1/2 3 1/2 0. Consider α 1 to α m, where 0 α j 1, α j 1/2. Construct each distribution D j in the family F as a convex combination of D 1 and D 2 with weight α j, that is, D j = α j B 1 + (1 α j )B 2. Assume for contradiction that a full surplus extracting mechanism exists for Proposition 4. Then player 1 must be allocated regardless of the profile of values, since player 1 has the highest value. As a result, player 1 s utility from allocation, ignoring payments, is equal to its value. Therefore, to extract surplus the expected payment of each value must be equal 14 Strictly speaking, V i is any product set v i 1 and v2 i. We only consider v1 i and v2 i since they are the only profiles that may have positive probability. 21

to the value, [ E v D j,s ( D j ) k p1 (v, s) v 1 = v1, j, v 1. Since player 1 gets the item regardless of its value, incentive compatibility requires that the payment of player 1 does not depend on its report. We thus write the payment function of player 1 as p 1 (v 1, s). The above equality becomes E v D j,s ( D j ) k [ p1 (v 1, s) v 1 = v1, j, v 1. Now consider any profile β = (β j,v1 ) j,v1 such that j,v 1 β j,v1 = 0. Note that this implies that j β j,2 = j β j,3. Assume furthur that j β j,2 0. We must have [ β j,v1 E v D j,s ( D j ) k p1 (v 1, s) v 1 = β j,2 2 + j,v 1 j j = (2 3)( j β j,2 ) 0. β j,3 3 Summarizing the argument so far, we have shown that if a full surplus extracting mechanism exists, then there must exist a function p 1 : V 1 S R such that for any profile β = (β j,v1 ) j,v1 satisfying i) j,v 1 β j,v1 = 0 and ii) j β j,2 0, we have [ β j,v1 E v D j,s ( D j ) k p1 (v 1, s) v 1 0. j,v 1 The next lemma shows that no such function p 1 exists. Lemma 12. Consider the family of distributions F = {D 1,..., D m } defined in Example 4 and assume that the number of samples is k m 2. There exists a profile β = (β j,v1 ) j,v1 such that i) j,v 1 β j,v1 = 0 and ii) j β j,2 0 such that any payment function p 1 : V 1 S R satisfies [ β j,v1 E v D j,s ( D j ) k p1 (v 1, s) v 1 = 0. j,v 1 22