Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Similar documents
Computational and Statistical Learning Theory

Computational and Statistical Learning Theory

1 Proof of learning bounds

1 Generalization bounds based on Rademacher complexity

1 Rademacher Complexity Bounds

CS Lecture 13. More Maximum Likelihood

A Simple Regression Problem

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Tight Information-Theoretic Lower Bounds for Welfare Maximization in Combinatorial Auctions

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Understanding Machine Learning Solution Manual

Symmetrization and Rademacher Averages

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

1 Bounding the Margin

Machine Learning Basics: Estimators, Bias and Variance

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

Fixed-to-Variable Length Distribution Matching

1 Proving the Fundamental Theorem of Statistical Learning

MULTIAGENT Resource Allocation (MARA) is the

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Lecture 21. Interior Point Methods Setup and Algorithm

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

CS599: Algorithm Design in Strategic Settings Fall 2012 Lecture 11: Ironing and Approximate Mechanism Design in Single-Parameter Bayesian Settings

Detection and Estimation Theory

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Stochastic Subgradient Methods

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Bayes Decision Rule and Naïve Bayes Classifier

Testing Properties of Collections of Distributions

Lecture 4. 1 Examples of Mechanism Design Problems

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Bootstrapping Dependent Data

Support recovery in compressed sensing: An estimation theoretic approach

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

3.8 Three Types of Convergence

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

arxiv: v1 [cs.ds] 3 Feb 2014

Learnability and Stability in the General Learning Setting

ESE 523 Information Theory

Probability Distributions

Bipartite subgraphs and the smallest eigenvalue

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

Computable Shell Decomposition Bounds

Introduction to Discrete Optimization

Introduction to Machine Learning. Recitation 11

3.3 Variational Characterization of Singular Values

Outperforming the Competition in Multi-Unit Sealed Bid Auctions

Fairness via priority scheduling

Risk & Safety in Engineering. Dr. Jochen Köhler

Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning

Stability Bounds for Non-i.i.d. Processes

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

List Scheduling and LPT Oliver Braun (09/05/2017)

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Non-Parametric Non-Line-of-Sight Identification 1

Learnability of Gaussians with flexible variances

Mixed Robust/Average Submodular Partitioning

1 Identical Parallel Machines

The Weierstrass Approximation Theorem

Equilibria on the Day-Ahead Electricity Market

Lecture 20 November 7, 2013

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

Supplement to: Subsampling Methods for Persistent Homology

Characterizing the Sample Complexity of Private Learners

Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

VC Dimension and Sauer s Lemma

On Process Complexity

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 11 10/15/2008 ABSTRACT INTEGRATION I

Estimating Average-Case Learning Curves Using Bayesian, Statistical Physics and VC Dimension Methods

Computable Shell Decomposition Bounds

Shannon Sampling II. Connections to Learning Theory

Feature Extraction Techniques

Graphical Models in Local, Asymmetric Multi-Agent Markov Decision Processes

Analyzing Simulation Results

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Estimating Parameters for a Gaussian pdf

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

A Note on Online Scheduling for Jobs with Arbitrary Release Times

Pseudo-marginal Metropolis-Hastings: a simple explanation and (partial) review of theory

Universal algorithms for learning theory Part II : piecewise polynomial functions

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

A note on the multiplication of sparse matrices

arxiv: v3 [cs.lg] 7 Jan 2016

Block designs and statistics

Worst-case efficiency ratio in false-name-proof combinatorial auction mechanisms

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Collision-based Testers are Optimal for Uniformity and Closeness

Estimating Entropy and Entropy Norm on Data Streams

An Improved Particle Filter with Applications in Ballistic Target Tracking

Ensemble Based on Data Envelopment Analysis

Transcription:

CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single ite auction, there are k bidders bid for one ite, auction design is the proble of designing a truthful echnis to axiize certain objective. For different objectives, there are different results. Social Welfare (benefit the society): VCG echanis will be optial. In single ite case, it will be the second price auction. It is prior-free, i.e., does not require any assuption about the distribution of bidders value (not the cretiria of this lecture). Revenue (benefit the auctioneer): Myerson s algorith (1981) is optial for average case. In this case, there is guarantee only for Bayesian setting, i.e., we assue the bidders values v i are drawn fro certain distribution, (v 1, v 2,..., v k ) F. In this lecture, we will discuss revenue axiization. Also we assue the bidders values are independenlty drawn fro F i, i.e. F = F 1 F 2... F k. The Myerson s auction can be phrased as VCG echanis that axiizes virtual 1 F (v) f(v) values. The virtual value of a bidder with value v is defined to be φ(v) = v when the distribution with c.d.f. F, p.d.f. f is regular (φ(v) is strictly-increasing). The ain goal will be understanding the epirical version of Myerson s auction, naely, instead of knowing the distribution F, we can only draw saples fro it. This proble is hard in general without assuptions. For exaple, on a point with a large value on which the distribution assigns a very tiny probability, with certain nuber of saples we cannot observe the large value and thus cannot provide any guarantees. Therefore, we need assuptions about the class of distributions. 1

2 AUCTION WITH SAMPLES, FIRST NATURAL APPROACH 2 2 Auction with Saples, First natural approach There are two criteria for auctions with saples: Saple coplexity, i.e., the nuber of saples required. Coputational coplexity, i.e., the running tie of the algorith. Inforation theory provides a lower bound of saple coplexity S(n, ɛ, D). Here D is the class which we assue F is in. The algorith we designed should have saple coplexity in poly(s) and coputational coplexity in poly(s). Open proble: efficient algorith that has saple coplexity Õ(S). Here we have our first approach: 1. Take saples fro each F i ; 2. Get the epirical distribution F i ; 3. Run Myerson s auction with F i. This approach does not work but a variant of it will. 3 Learning distributions The proble of learning distributions is defined as: Definition 2. Given a faily of distributions D, and i.i.d. saples fro an distribution p D. Output a distribution h, s.t. with high probability d(h, p) ɛ. Here d(h, p) easures the distance between two distributions. Since the definition is generic, there are three probles we need to address. The distance functions we will discuss: total variation distance, Kologorov distance. Nuber of saples required (ɛ, D, d). For the with high probability, the randoness is fro the saples and the algorith.

3 LEARNING DISTRIBUTIONS 3 3.1 Learning discrete distribution subject to total variation distance We assue D to be all discrete distributions over doain [n]. For two discrete distributions with p..f. p, q, the total variation distance between the is defined as d tv = ax A [n] p(a) q(a) = 1 2 p q 1 With saples S 1,..., S, the optial algorith is just using the epirical distribution, let N i = {j [] S j = i} : ˆp (i) = N i. When is large enough, we will have d tv (p, ˆp ) ɛ. To find a lower bound of, since d tv (p, ˆp ) is a rando variable, we can first find the that akes E [d tv (p, ˆp )] ɛ/c, then use Markov inequality to bound the tail probability: Pr[d tv (p, ˆp ) > ɛ] 1 C. ɛ. In the following proof, we will ignore the constant C since it is just a scale factor of Theore 3. When Ω( n ), the expected total variation distance between p and ˆp is upper bounded by ɛ. Proof. Since N i Bino(, p(i)), we have E [N i ] = p(i), Var[N i ] = p(i)(1 p(i)). E [d tv (p, ˆp )] 1 n E [ N i p(i) ] 1 n E [(N i p(i)) 2 ] = 1 n p(i)(1 p(i)) 1 n p(i) n

4 LEARNING DISTRIBUTION SUBJECT TO KOLMOGOROV DISTANCE 4 The second and last inequality is due to Cauchy-Schwarz inequality. When n, we have E [d tv (p, ˆp )] ɛ. The bound of n is tight, here we provide a constructive proof. Proof. We will construct a distribution faily D by perturbing the unifor distribution over [n]. For each pair (2i 1, 2i), set their probability to be either ( 1 ɛ, 1+ɛ) or n n ( 1+ɛ, 1 ɛ). Thus there will be n n 2n/2 different distributions in the faily. To learn a distribution h fro saples of p D, s.t. d tv (p, h) ɛ/4, we need to find the correct perturbing direction for at least n pairs. Finding the correct perturbation 4 direction for each pair requires O( 1 ) saples (by inforation theory). Thus in total, at least O( n ) saples are required. When n goes to infinity, there is no chance to use finite nuber of saples to learn a distribution and bound the total variation distance. However, for certain faily of distributions, we can learn distributions subject to the total variation distance less than ɛ, with saple coplexity poly(1/ɛ, log n) Distribution class D is all log-concave distributions over [n]. Then we can learn with Θ(1/ɛ 5/2 ). The hazard rate of a distribution is f(x)/(1 F (X)) where f is the PDF of the distribution and F is the CDF. If we have onotone none-decreasing hazard rate, we can learn it with O(log(n)/ɛ 3 ) saples. If I ake no assuptions about the distribution, to get error probability δ I need 1 n δ saples using our previous error analysis and arkov s inequality. With ore careful analysis, we can show that we only need ( ) n + log 1 δ Θ 4 Learning distribution subject to Kologorov Distance Instead of aking stronger assuptions on the distribution to achieve saple coplexity independent of saple support, we turn towards changing the etric we are using to learn fro total variation distance to Kologorov Distance. We have a rando variable X supported over R. Let F be the CDF of X. F (u) = P r[x u]

5 AUCTION WITH SAMPLES 5 Learning using Kologorov Distance eans d K (X, Y ) = ax u R F X(u) F Y (u) Theore 4. DKW Inequality For any X over R, we can learn with O( 1 ) up to d K (X, Y ) ɛ. Proof. Consider the epirical. ˆF (u) = 1 n 1{x i u} ax u R ˆF (u) F (u) 1 with probability at least 3 10. Using chernoff bounds, 1 log( 10 ɛ ). We can also prove using artingales or chaining. 5 Auction with Saples Suppose we have a single bidder, single ite auction. Also, assue that the distribution of the bidder value is over [0, 1]. We will show that with O( 1 ) saples fro F I can achieve revenue at least OP T ɛ. By DKW using O( 1 ) saples for F, I can find ˆF such that sup u R F (u) ˆF (u) ɛ. We will use this ˆF to copute price ˆp that axiizes ˆp argax u [0,1] u(1 ˆF (u)). Recall that if we know F, optial p is exactly argax u [0,1] u(1 F (u)) and this p achieves optial revenue OP T. OP T = p (1 F (p )) Rev(ˆp) = ˆp(1 F (ˆp) ˆp(1 ˆF (ˆp) ɛ) = ˆp(1 ˆF (ˆp)) ɛˆp p (1 ˆF (p )) ɛˆp p (1 F (p ) ɛ) ɛˆp = p (1 F (p )) ɛ(ˆp + p ) = OP T 2ɛ With the last step using the fact that prices are bounded within the interval [0, 1]. But what if the prices are not bounded? Consider the following distribution over prices.

5 AUCTION WITH SAMPLES 6 The prices p is set to M 2 with probability 1 and 0 otherwise. In order to avoid saple M coplexity dependent on M, I need an assuption on the distribution. We now generalize to the single ite, k bidder odel. Assue that F is a product distribution so F = F 1 F 2 F k with v (1)...v (k) F. Theore 5. In the single ite auction with k bidders and independent regular distributions, if = Ω ( ) k 10 ɛ then there is a -saple auction coputable in polynoial-tie 7 in k and 1/ɛ with expected revenue at least OP T ɛ. Theore 6. Any such auction requires soe polynoial in k/ɛ saples. 1 F (x) A regular distribution is one where the virtual value x is non-decreasing. Let f(x) f i be the pdf of the distribution over prices for the ith bidder and F i the corresponding CDF. Then the virtual value v i of bidder i is 1 F i(v) f i. (v) Fact: If F i s are regular then Myerson coputes the VCG rule on virtual values. If all virtual values are less than zero, no one gets it. Otherwise the ite goes to the bidder with the highest virtual value.