Lecture 3: Lower Bounds for Bandit Algorithms
|
|
- Lauren Hensley
- 5 years ago
- Views:
Transcription
1 CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and the first half of the next one), we prove a Ω( KT ) lower bound for regret of bandit algorithms. This gives us a sense of what are the best possible upper bounds on regret that we can hope to prove. On a high level, there are two ways of proving a lower bound on regret: (1) Give a family F of problem instances, which is the same for all algorithms, such that any algorithm fails (has high regret) on some instance in F. () Give a distribution over problem instances, and show that, in expectation over this distribution, any algorithm with fail. Note that () implies (1) since: if regret is high in expectation over problem instances, then there exists at least one problem instance with high regret. Also, (1) implies () if F is a constant. This can be seen as follows: suppose we know that for any algorithm we have high regret (say H) with one problem instance in F and low regret with all other instances in F, then, taking a uniform distribution over F, we can say that any algorithm has expected regret at least H/ F. (So this argument breaks if F is large.) If we prove a stronger version of (1) that says that for any algorithm, regret is high for a constant fraction of the problem instances in F, then, considering a uniform distribution over F, this implies () regardless of whether F is large or not. In this lecture, for proving lower bounds, we consider 0-1 rewards and the following family of problem instances (with fixed ɛ to be adusted in the analysis): { µi 1/ for each arm i, I for each 1,,..., K. (1) µ i (1 + ɛ)/ for arm i (Recall that K is the number of arms.) In the previous lecture, we saw that sampling each arm Õ(1/ɛ ) times is sufficient for the upper bounds on regret that we derived. In this lecture, we prove that sampling each arm Ω(1/ɛ ) times is necessary to determine whether an arm is bad or not. The proof methods will require KL divergence, an important tool from Information Theory. In the next section, we briefly study the KL divergence and some of its properties. KL-divergence Consider a finite sample space Ω, and let p, q be two probability distributions defined on Ω. Then, the Kullback-Leibler divergence or KL-divergence is defined as: KL(p, q) p(x) ln p(x) [ E p ln p(x) ]. 1
2 The KL divergence is similar to a notion of distance with the properties that it is non-negative, 0 iff p q, and small if the distributions p and q are close. However, it is not strictly a distance function since it is not symmetric and does not satisfy the triangle-inequality. The intuition for the formula is as follows: we are interested in how certain we are that data, with underlying distribution q, can be generated from distribution p. The KL divergence effectively answers this question by measuring the average log likelihood of observing data with distribution p when the underlying distribution of the data is actually given by q. Remark.1. The definition of KL-divergence, as well as the properties discussed below, extend to infinite sample spaces. However, KL-divergence for finite sample spaces suffices for this class, and is much easier to work with. Properties of KL-divergence We present several basic properties of KL-divergence that will be needed later. The proofs of these properties are fairly simple, we include them here for the sake of completeness. 1. Gibbs Inequality: KL(p, q) 0, p, q. Further, KL(p, q) 0 iff p q. Proof. Let us define: f(y) y ln(y). f is a convex function under the domain y > 0. Now, from the definition of the KL divergence we get: KL(p, q) p(x) ln p(x) ( ) p(x) f ( ) f p(x) ( ) f p(x) f(1) 0, [follows from Jensen s inequality] where Jensen s inequality states that ϕ(λ 1 x 1 + λ x ) λ 1 ϕ(x 1 ) + λ ϕ(x ), if ϕ is a convex function and λ 1 + λ 1 with λ 1, λ > 0. Jensen s inequality further has the property that ϕ(λ 1 x 1 + λ x ) λ 1 ϕ(x 1 ) + λ ϕ(x ) iff x 1 x or if ϕ is a linear function. In this case, since f is not a linear function, equality holds (i.e., KL(p, q) 0) iff p(x), x.. Let the sample space Ω be composed as Ω Ω 1 Ω 1 Ω n. Further, let p and q be two distributions defined on Ω as p p 1 p p n and q q 1 q q n, such that 1,..., n, p and q are distributions defined on Ω. Then we have the property: KL(p, q) n 1 KL(p, q ). Proof. Let x (x 1, x,..., x n ) Ω st x i Ω i, i 1,..., n. Let h i (x i ) ln p i(x i ) q i (x i ). Then: KL(p, q) p(x) ln p(x)
3 n p(x)h i (x i ) i1 n i1 n i1 x i Ω i h i (x i ), x i x i p i (x i )h i (x i ) x i Ω i n KL(p i, q i ). i1 p(x) [ since since ln p(x), x i x i ] n h i (x i ) i1 p(x) p i (x i ) 3. Weaker form of Pinsker s inequality: A Ω : (p(a) q(a)) KL(p, q). Proof. To prove this property, we first claim the following: Claim.. For each event A Ω, Proof. Let us define the following: p(x) ln p(x) p(a) p(a) ln q(a). p A (x) p(x) p(a) and q A (x) q(a) x A. Then the claim can be proved as follows: p(x) ln p(x) p(a) p A (x) ln p(a)p A(x) q(a)q A (x) ( ) p(a) p A (x) ln p A(x) + p(a) ln p(a) p A (x) q A (x) q(a) [ p(a) ln p(a) q(a). since ] p A (x) ln p A(x) q A (x) KL(p A, q A ) 0 Fix A Ω. Using Claim. we have the following: x/ A p(x) ln p(x) p(a) p(a) ln q(a), p(x) ln p(x) p(ā) p(ā) ln q(ā), 3
4 where Ā denotes the complement of A. Now, let a p(a) and b q(a). Further, assume a < b. Then, we have: KL(p, q) a ln a 1 a + (1 a) ln b 1 b b ( a x + 1 a ) dx 1 x a b a b This proves the property. a x a x(1 x) dx 4(x a)dx (b a). [since x(1 x) 1/4] 4. Let p ɛ denote a distribution on {0, 1} such that p ɛ (1) (1 + ɛ)/. Thus, p ɛ (0) (1 ɛ)/. Further, let p 0 denote the distribution on {0, 1} where p 0 (0) p 0 (1) 1/. Then we have the property: KL(p ɛ, p 0 ) ɛ. Proof. KL(p ɛ, p 0 ) 1 + ɛ ln(1 + ɛ) + 1 ɛ ln(1 ɛ) 1 (ln(1 + ɛ) + ln(1 ɛ)) + ɛ (ln(1 + ɛ) ln(1 ɛ)) 1 ln(1 ɛ ) + ɛ ln 1 + ɛ 1 ɛ. Now, ln(1 ɛ ) < 0 and we can write ln 1+ɛ 1 ɛ ln ( 1 + ɛ 1 ɛ KL(p ɛ, p 0 ) < ɛ How are these properties going to be used? ) ɛ 1 ɛ ɛ 1 ɛ ɛ. We start with the same setting as in Property. From Property 3, we have: ɛ 1 ɛ. Thus, we get: (p(a) q(a)) KL(p, q) n KL(p, q ). (follows from Property ) 1 For example, we can define p and q to be distributions of a biased coin with small ɛ (p (1) (1 + ɛ)/, p (1) (1 ɛ)/) vs an unbiased coin (q (0) q (1) 1/). Then, we can use Property 4 to bound the above as: n n (p(a) q(a)) KL(p, q ) δ nδ, 1 where δ ɛ Thus, we arrive at the following bound: p(a) q(a) nδ/. 1 4
5 3 A simple example: flipping one coin We start with a simple example, which illustrates our proof technique and is interesting as a standalone result. We have a single coin, whose outcome is a 0 or 1. The coin s mean is unknown. We assume that the true mean µ [0, 1] is either µ 1 or µ for two known values µ 1 > µ. The coin is flipped T times. The goal is to identify if µ µ 1 or µ µ. Define Ω : {0, 1} T to be the sample space of the outcomes of the T coin tosses. We need a decision rule Rule : Ω {High, Low} with the following two properties: Pr[Rule(observations) High µ µ 1 ] 0.99 () Pr[Rule(observations) Low µ µ ] 0.99 (3) The question is how large should T be for for such a Rule to exist? We know that if δ µ 1 µ, then T Ω( 1 ) is sufficient. We will prove that it is also necessary. We will focus on the special δ case when both µ 1 and µ are close to 1. Claim 3.1. Let µ 1 1+ɛ and µ 1. For any rule to work (i.e., satisfy equations () and (3)) we need T Ω( 1 ɛ ) (4) Proof. Define for any event A Ω, the following quantities P 1 (A) Pr[A µ µ 1 ] P (A) Pr[A µ µ ] To prove this claim, we will consider the following equation. For an event A Ω, such that Rule(A) High (i.e. A {ω Ω : Rule(ω) High}), P 1 (A) P (A) 0.98 (5) We prove the claim by showing that if (4) is false, then (5) is false, too. Specifically, we will assume that T < 1.(In fact, the argument below holds for an arbitrary event A Ω.) 4ɛ Define, for all i {1, }, P i,t to be the distribution of the t th coin toss for P i. Then, P i P i,1 P i,... P i,t. From KL divergence property, we have (P 1 (A) P (A)) KL(P 1, P ) From KL divergence property 3 T KL(P 1,t, P,t ) From KL divergence property t1 Hence, we have P 1 (A) P (A) T ɛ < 1. T ɛ From KL divergence property 4 5
6 4 Flipping several coins: bandits with prediction Let us extend the previous example to flipping multiple coins. More formally, we consider a bandit problem with K arms (where each arm corresponds to a coin). Each arm gives a 0-1 reward, drawn independently from a fixed but unknown distribution. After T rounds, the algorithm outputs a guess y T A for which arm is the best arm (where A is the set of all arms). 1 We call this version bandits with predictions. In this section, we will only be concerned with a quality of prediction, rather than accumulated rewards and regret. For each arm a A, the mean reward is denoted as µ(a). (We will also write it as µ a whenever convenient.) A particular problem instance is specified as a tuple I (µ(a) : a A). A good algorithm for the bandits with prediction problem described above should satisfy Pr[y T is correct I] 0.99 (6) for each problem instance I. We will use the family (1) of problem instances to argue that one needs T Ω ( K ɛ ) for any algorithm to work, i.e., satisfy (6), on all instances in this family. Lemma 4.1. Suppose an algorithm for bandits with predictions satisfies (6) for all problem instances I 1,..., I K. Then T Ω ( K ɛ ). This result is of independent interest (regardless of the lower bound on regret). In fact, we will prove a stronger lemma which will (also) be the crux in the proof of the regret bound. Lemma 4.. Suppose T ck, for a small enough absolute constant c. Fix any deterministic ɛ algorithm for bandits with prediction. Then there exists at least K/3 arms such that Pr[y T I ] < 3 4 Remark 4.3. The proof for K arms is particularly simple, so we will do it first. We will then extend this proof to arbitrary K with more subtleties. While the lemma holds for an arbitrary K, we will present a simplified proof which requires K 4. We will use a standard shorthand [T ] : {1,,..., T }. Let us set up the sample space to be used in the proof. Let (r t (a) : a A, t [T ]) be mutually independent 0-1 random variables such that r t (a) has expectation µ(a). We refer to this tuple as the rewards table, where we interpret r t (a) as the reward received by the algorithm for the t-th time it chooses arm a. The sample space is Ω {0, 1} K T, where each outcome ω Ω corresponds to a particular realization of the rewards table. Each problem instance I defines distribution P on Ω: Also, let P a,t P (A) Pr[A I ] for each A Ω. be the distribution of r t (a) under instance I, so that P a A, t [T ] P a,t. Proof (K arms). Define A {ω Ω : y T 1}. In other words, A is the set of all events such that the correct arm is arm 1. (But the argument below holds for any any event A Ω.) Similar to the previous section, we use the properties of KL divergence as follows: (P 1 (A) P (A)) KL(P 1, P ) 1 Recall that the best arm is the arm with the highest mean reward. 6
7 K T a1 t1 KL(P a,t 1, P a,t ) T ɛ (7) The last inequality is because KL divergence of P a,t 1 and P a,t is non-zero if and only if P a,t 1 P a,t And when they are non-equal, their KL divergence is ɛ. Hence, P 1 (A) P (A) ɛ T < 1. The last inequality holds whenever T ( 1 4ɛ ). To complete the proof, observe that if Pr[y T I ] 3 4 for both problem instances, then P 1 (A) 3 4 and P (A) < 1 4, so their difference is at least 1, contradiction. Proof (K 4). Compared to the -arms case, time horizon T can be larger by a factor of O(K). The crucial improvement is a more delicate version of the KL-divergence argument in (7) which results in the right-hand side of the form O(T ɛ /K). For the sake of the analysis, we will consider an additional problem instance I 0 { µ i 1 : for all arms a } which we call the base instance. Let E 0 [ ] be the expectation given this problem instance. Also, let T a be the total number of times arm a is played. We consider the algorithm s performance on problem instance I 0, and focus on arms that are neglected by the algorithm, in the sense that the algorithm does not choose arm very often and is not likely to pick for the guess y T. Formally, we observe that. K 3 arms such that E 0(T ) 3T K (8) K 3 arms such that P 0(y T ) 3 K. (9) (To prove (8), assume for contradiction that we have more than K 3 arms with E 0(T ) > 3T K. Then the expected total number of times these arms are played is strictly greater than T, which is a contradiction. (9) is proved similarly.) By Markov inequality, E 0 (T ) 3T K implies that Pr[T 4T K ] 7 8. Since the sets of arms in (8) and (9) must overlap on least K 3 arms, we conclude: K 3 arms such that Pr[T m] 7 8 and P 0(y T ) 3 K, (10) where m 4T K. We will now refine our definition of the sample space to get the required claim. For each arm a, define the t-round sample space Ω t a {0, 1} t, where each outcome corresponds to a particular realization of the tuple (r s (a) : s [t]). (Recall that we interpret r t (a) as the reward received by the algorithm for the t-th time it chooses arm a.) Then the full sample space we considered before can be expressed as Ω a A ΩT a. 7
8 Fix an arm satisfying the two properties in (10). We will consider a reduced sample space in which arm is played only m 4T K times: Ω Ω m Ω T a. (11) arms a Each problem instance I l defines a distribution P l on Ω : P l (A) Pr[A I l] for each A Ω. In other words, distribution P l is a restriction of P l to the reduced sample space Ω. We apply the KL-divergence argument to distributions P 0 and P. For each event A Ω : (P0 (A) P (A)) KL(P0, P ) T m KL(P a,t 0, P a,t ) + KL(P,t 0, P,t ) arm a t1 0 + m ɛ. Note that each arm a has identical distributions under instances I 0 and I (namely, its mean reward is 1 a,t ). So distributions P0 and P a,t are the same, and therefore their KL-divergence is 0. Whereas for arm we only need to sum up over m samples. Therefore, assuming T ck with small enough constant c, we can conclude that ɛ P 0 (A) P (A) ɛ m < 1 8 for all events A Ω. (1) To apply (1), we need to make sure that the event A is in fact contained in Ω, i.e., whether A holds is completely determined by the first m samples of arm (and arbitrarily many samples of other arms). In particular, we cannot take A {y t }, which would be the most natural extension of the proof technique from the -arms case. Instead, we apply (1) twice: to events A {y T and T m} and A {T > m}. (13) Indeed, note that whether the algorithm samples arm more than m times is completely determined by the first m coin tosses! We are ready for the final computation: P (A) P 0(A) by (1) P 0(y T ) 1 4 by our choice of arm. P (A ) P 0(A ) by (1) 1 4 by our choice of arm. P (Y T ) P (Y T and T m) + P (T > m) P (A) + P (A ) 1 4. Recall that this holds for any arm satisfying the properties in (10). Since there are at least K/3 such arms, the lemma follows. Next lecture: Lemma 4. is used to derive the Ω( KT ) lower bound on regret. t1 8
9 5 Bibliographic notes The Ω( KT ) lower bound on regret is from Auer et al. (00). KL-divergence and its properties is textbook material from Information Theory, e.g., see Cover and Thomas (1991). The present exposition the outline and much of the technical details is based on Robert Kleinberg s lecture notes from (Kleinberg, 007). We present a substantially simpler proof compared to (Auer et al., 00; Kleinberg, 007) in that we avoid the general chain rule for KL-divergence. Instead, we only use the special case of independent distributions (Property in Section ), which is much easier to state and to apply. The proof of Lemma 4. (for general K), which in prior work relies on the general chain rule, is modified accordingly. In particular, we define the reduced sample space Ω with only a small number of samples from the bad arm, and apply the KL-divergence argument to carefully defined events in (13), rather than a seemingly more natural event A {y T }. References Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 3(1):48 77, 00. Preliminary version in 36th IEEE FOCS, Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, Robert Kleinberg. Lecture notes: CS683: Learning, Games, and Electronic Markets (week 9), 007. Available at 9
Lecture 4: Lower Bounds (ending); Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from
More informationIntroduction to Multi-Armed Bandits
Introduction to Multi-Armed Bandits (preliminary and incomplete draft) Aleksandrs Slivkins Microsoft Research NYC https://www.microsoft.com/en-us/research/people/slivkins/ Last edits: Jan 25, 2017 i Preface
More informationLecture 5: Regret Bounds for Thompson Sampling
CMSC 858G: Bandits, Experts and Games 09/2/6 Lecture 5: Regret Bounds for Thompson Sampling Instructor: Alex Slivkins Scribed by: Yancy Liao Regret Bounds for Thompson Sampling For each round t, we defined
More informationNotes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed
CS 683 Learning, Games, and Electronic Markets Spring 007 Notes from Week 9: Multi-Armed Bandit Problems II Instructor: Robert Kleinberg 6-30 Mar 007 1 Information-theoretic lower bounds for multiarmed
More informationIntroduction to Multi-Armed Bandits
Introduction to Multi-Armed Bandits (preliminary and incomplete draft) Aleksandrs Slivkins Microsoft Research NYC https://www.microsoft.com/en-us/research/people/slivkins/ First draft: January 2017 This
More informationLecture 6: September 22
CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 6: September 22 Lecturer: Prof. Alistair Sinclair Scribes: Alistair Sinclair Disclaimer: These notes have not been subjected
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationHands-On Learning Theory Fall 2016, Lecture 3
Hands-On Learning Theory Fall 016, Lecture 3 Jean Honorio jhonorio@purdue.edu 1 Information Theory First, we provide some information theory background. Definition 3.1 (Entropy). The entropy of a discrete
More informationLecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem
Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationThe Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm
he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,
More informationBlackwell s Approachability Theorem: A Generalization in a Special Case. Amy Greenwald, Amir Jafari and Casey Marks
Blackwell s Approachability Theorem: A Generalization in a Special Case Amy Greenwald, Amir Jafari and Casey Marks Department of Computer Science Brown University Providence, Rhode Island 02912 CS-06-01
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationCOS598D Lecture 3 Pseudorandom generators from one-way functions
COS598D Lecture 3 Pseudorandom generators from one-way functions Scribe: Moritz Hardt, Srdjan Krstic February 22, 2008 In this lecture we prove the existence of pseudorandom-generators assuming that oneway
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationAdvanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12
Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Adversarial bandits Definition: sequential game. Lower bounds on regret from the stochastic case. Exp3: exponential weights
More informationLower Bounds for Testing Bipartiteness in Dense Graphs
Lower Bounds for Testing Bipartiteness in Dense Graphs Andrej Bogdanov Luca Trevisan Abstract We consider the problem of testing bipartiteness in the adjacency matrix model. The best known algorithm, due
More informationDiscrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14
CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14 Introduction One of the key properties of coin flips is independence: if you flip a fair coin ten times and get ten
More informationEASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University
EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between
More informationLecture 10 : Contextual Bandits
CMSC 858G: Bandits, Experts and Games 11/07/16 Lecture 10 : Contextual Bandits Instructor: Alex Slivkins Scribed by: Guowei Sun and Cheng Jie 1 Problem statement and examples In this lecture, we will be
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationOnline learning with feedback graphs and switching costs
Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationLecture 5: January 30
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 5: January 30 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationCS261: Problem Set #3
CS261: Problem Set #3 Due by 11:59 PM on Tuesday, February 23, 2016 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Submission instructions:
More informationNew bounds on the price of bandit feedback for mistake-bounded online multiclass learning
Journal of Machine Learning Research 1 8, 2017 Algorithmic Learning Theory 2017 New bounds on the price of bandit feedback for mistake-bounded online multiclass learning Philip M. Long Google, 1600 Amphitheatre
More informationLecture 1: Introduction, Entropy and ML estimation
0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual
More informationBandits, Experts, and Games
Bandits, Experts, and Games CMSC 858G Fall 2016 University of Maryland Intro to Probability* Alex Slivkins Microsoft Research NYC * Many of the slides adopted from Ron Jin and Mohammad Hajiaghayi Outline
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationGambling in a rigged casino: The adversarial multi-armed bandit problem
Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationCS 229: Lecture 7 Notes
CS 9: Lecture 7 Notes Scribe: Hirsh Jain Lecturer: Angela Fan Overview Overview of today s lecture: Hypothesis Testing Total Variation Distance Pinsker s Inequality Application of Pinsker s Inequality
More informationEntropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture
More informationLearning Algorithms for Minimizing Queue Length Regret
Learning Algorithms for Minimizing Queue Length Regret Thomas Stahlbuhk Massachusetts Institute of Technology Cambridge, MA Brooke Shrader MIT Lincoln Laboratory Lexington, MA Eytan Modiano Massachusetts
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationInaccessible Entropy and its Applications. 1 Review: Psedorandom Generators from One-Way Functions
Columbia University - Crypto Reading Group Apr 27, 2011 Inaccessible Entropy and its Applications Igor Carboni Oliveira We summarize the constructions of PRGs from OWFs discussed so far and introduce the
More informationToward a Classification of Finite Partial-Monitoring Games
Toward a Classification of Finite Partial-Monitoring Games Gábor Bartók ( *student* ), Dávid Pál, and Csaba Szepesvári Department of Computing Science, University of Alberta, Canada {bartok,dpal,szepesva}@cs.ualberta.ca
More informationLecture Notes 3 Convergence (Chapter 5)
Lecture Notes 3 Convergence (Chapter 5) 1 Convergence of Random Variables Let X 1, X 2,... be a sequence of random variables and let X be another random variable. Let F n denote the cdf of X n and let
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationThe Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis Dimension Prof. Dan A. Simovici UMB 1 / 91 Outline 1 Growth Functions 2 Basic Definitions for Vapnik-Chervonenkis Dimension 3 The Sauer-Shelah Theorem 4 The Link between VCD and
More informationFormalizing Probability. Choosing the Sample Space. Probability Measures
Formalizing Probability Choosing the Sample Space What do we assign probability to? Intuitively, we assign them to possible events (things that might happen, outcomes of an experiment) Formally, we take
More informationX = X X n, + X 2
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 22 Variance Question: At each time step, I flip a fair coin. If it comes up Heads, I walk one step to the right; if it comes up Tails, I walk
More informationLecture 13: Lower Bounds using the Adversary Method. 2 The Super-Basic Adversary Method [Amb02]
Quantum Computation (CMU 18-859BB, Fall 015) Lecture 13: Lower Bounds using the Adversary Method October 1, 015 Lecturer: Ryan O Donnell Scribe: Kumail Jaffer 1 Introduction There are a number of known
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Thompson sampling Bernoulli strategy Regret bounds Extensions the flexibility of Bayesian strategies 1 Bayesian bandit strategies
More informationAn Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement
An Optimal Bidimensional Multi Armed Bandit Auction for Multi unit Procurement Satyanath Bhat Joint work with: Shweta Jain, Sujit Gujar, Y. Narahari Department of Computer Science and Automation, Indian
More informationExponential Tail Bounds
Exponential Tail Bounds Mathias Winther Madsen January 2, 205 Here s a warm-up problem to get you started: Problem You enter the casino with 00 chips and start playing a game in which you double your capital
More information10.1 The Formal Model
67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationFinite Automata. Mahesh Viswanathan
Finite Automata Mahesh Viswanathan In this lecture, we will consider different models of finite state machines and study their relative power. These notes assume that the reader is familiar with DFAs,
More informationNote that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +
Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n
More informationLecture 7: Pseudo Random Generators
Introduction to ryptography 02/06/2018 Lecture 7: Pseudo Random Generators Instructor: Vipul Goyal Scribe: Eipe Koshy 1 Introduction Randomness is very important in modern computational systems. For example,
More informationMeasure and integration
Chapter 5 Measure and integration In calculus you have learned how to calculate the size of different kinds of sets: the length of a curve, the area of a region or a surface, the volume or mass of a solid.
More informationCS264: Beyond Worst-Case Analysis Lecture #14: Smoothed Analysis of Pareto Curves
CS264: Beyond Worst-Case Analysis Lecture #14: Smoothed Analysis of Pareto Curves Tim Roughgarden November 5, 2014 1 Pareto Curves and a Knapsack Algorithm Our next application of smoothed analysis is
More informationLecture 28: April 26
CS271 Randomness & Computation Spring 2018 Instructor: Alistair Sinclair Lecture 28: April 26 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They
More informationConnectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).
Connectedness 1 Motivation Connectedness is the sort of topological property that students love. Its definition is intuitive and easy to understand, and it is a powerful tool in proofs of well-known results.
More informationLecture 3: September 10
CS294 Markov Chain Monte Carlo: Foundations & Applications Fall 2009 Lecture 3: September 10 Lecturer: Prof. Alistair Sinclair Scribes: Andrew H. Chan, Piyush Srivastava Disclaimer: These notes have not
More informationDiscrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20
CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 20 Today we shall discuss a measure of how close a random variable tends to be to its expectation. But first we need to see how to compute
More informationLecture 10 + additional notes
CSE533: Information Theorn Computer Science November 1, 2010 Lecturer: Anup Rao Lecture 10 + additional notes Scribe: Mohammad Moharrami 1 Constraint satisfaction problems We start by defining bivariate
More informationReinforcement Learning
Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the
More information2.1 Optimization formulation of k-means
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 2: k-means Clustering Lecturer: Jiaming Xu Scribe: Jiaming Xu, September 2, 2016 Outline Optimization formulation of k-means Convergence
More informationExpectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,
Expectation is linear So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then, E(αX) = ω = ω (αx)(ω) Pr(ω) αx(ω) Pr(ω) = α ω X(ω) Pr(ω) = αe(x). Corollary. For α, β R, E(αX + βy ) = αe(x) + βe(y ).
More informationChapter 2.5 Random Variables and Probability The Modern View (cont.)
Chapter 2.5 Random Variables and Probability The Modern View (cont.) I. Statistical Independence A crucially important idea in probability and statistics is the concept of statistical independence. Suppose
More informationHomework 4 Solutions
CS 174: Combinatorics and Discrete Probability Fall 01 Homework 4 Solutions Problem 1. (Exercise 3.4 from MU 5 points) Recall the randomized algorithm discussed in class for finding the median of a set
More informationA Drifting-Games Analysis for Online Learning and Applications to Boosting
A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire
More informationLecture 19: Interactive Proofs and the PCP Theorem
Lecture 19: Interactive Proofs and the PCP Theorem Valentine Kabanets November 29, 2016 1 Interactive Proofs In this model, we have an all-powerful Prover (with unlimited computational prover) and a polytime
More informationSelecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden
1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized
More informationNotes 1 : Measure-theoretic foundations I
Notes 1 : Measure-theoretic foundations I Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Wil91, Section 1.0-1.8, 2.1-2.3, 3.1-3.11], [Fel68, Sections 7.2, 8.1, 9.6], [Dur10,
More information7.1 Coupling from the Past
Georgia Tech Fall 2006 Markov Chain Monte Carlo Methods Lecture 7: September 12, 2006 Coupling from the Past Eric Vigoda 7.1 Coupling from the Past 7.1.1 Introduction We saw in the last lecture how Markov
More information1 MDP Value Iteration Algorithm
CS 0. - Active Learning Problem Set Handed out: 4 Jan 009 Due: 9 Jan 009 MDP Value Iteration Algorithm. Implement the value iteration algorithm given in the lecture. That is, solve Bellman s equation using
More informationOn the Complexity of Best Arm Identification with Fixed Confidence
On the Complexity of Best Arm Identification with Fixed Confidence Discrete Optimization with Noise Aurélien Garivier, joint work with Emilie Kaufmann CNRS, CRIStAL) to be presented at COLT 16, New York
More information1 Recap: Interactive Proofs
Theoretical Foundations of Cryptography Lecture 16 Georgia Tech, Spring 2010 Zero-Knowledge Proofs 1 Recap: Interactive Proofs Instructor: Chris Peikert Scribe: Alessio Guerrieri Definition 1.1. An interactive
More informationA. Notation. Attraction probability of item d. (d) Highest attraction probability, (1) A
A Notation Symbol Definition (d) Attraction probability of item d max Highest attraction probability, (1) A Binary attraction vector, where A(d) is the attraction indicator of item d P Distribution over
More informationAnalysis of Thompson Sampling for the multi-armed bandit problem
Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show
More informationLecture 5: Probabilistic tools and Applications II
T-79.7003: Graphs and Networks Fall 2013 Lecture 5: Probabilistic tools and Applications II Lecturer: Charalampos E. Tsourakakis Oct. 11, 2013 5.1 Overview In the first part of today s lecture we will
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More informationCS Foundations of Communication Complexity
CS 2429 - Foundations of Communication Complexity Lecturer: Sergey Gorbunov 1 Introduction In this lecture we will see how to use methods of (conditional) information complexity to prove lower bounds for
More informationNotes from Week 8: Multi-Armed Bandit Problems
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 8: Multi-Armed Bandit Problems Instructor: Robert Kleinberg 2-6 Mar 2007 The multi-armed bandit problem The multi-armed bandit
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More information18.S097 Introduction to Proofs IAP 2015 Lecture Notes 1 (1/5/2015)
18.S097 Introduction to Proofs IAP 2015 Lecture Notes 1 (1/5/2015) 1. Introduction The goal for this course is to provide a quick, and hopefully somewhat gentle, introduction to the task of formulating
More informationLecture 1: September 25, A quick reminder about random variables and convexity
Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications
More informationCell-Probe Lower Bounds for Prefix Sums and Matching Brackets
Cell-Probe Lower Bounds for Prefix Sums and Matching Brackets Emanuele Viola July 6, 2009 Abstract We prove that to store strings x {0, 1} n so that each prefix sum a.k.a. rank query Sumi := k i x k can
More informationCLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES
CLASSICAL PROBABILITY 2008 2. MODES OF CONVERGENCE AND INEQUALITIES JOHN MORIARTY In many interesting and important situations, the object of interest is influenced by many random factors. If we can construct
More informationLecture Notes on Metric Spaces
Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],
More information10-704: Information Processing and Learning Fall Lecture 10: Oct 3
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationLecture 2: From Classical to Quantum Model of Computation
CS 880: Quantum Information Processing 9/7/10 Lecture : From Classical to Quantum Model of Computation Instructor: Dieter van Melkebeek Scribe: Tyson Williams Last class we introduced two models for deterministic
More informationProbability and Measure
Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real
More information1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).
CS 70 Discrete Mathematics for CS Spring 2006 Vazirani Lecture 8 Conditional Probability A pharmaceutical company is marketing a new test for a certain medical condition. According to clinical trials,
More informationLecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya
BBM 205 Discrete Mathematics Hacettepe University http://web.cs.hacettepe.edu.tr/ bbm205 Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya Resources: Kenneth Rosen, Discrete
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More information2 Completing the Hardness of approximation of Set Cover
CSE 533: The PCP Theorem and Hardness of Approximation (Autumn 2005) Lecture 15: Set Cover hardness and testing Long Codes Nov. 21, 2005 Lecturer: Venkat Guruswami Scribe: Atri Rudra 1 Recap We will first
More informationRevisiting the Exploration-Exploitation Tradeoff in Bandit Models
Revisiting the Exploration-Exploitation Tradeoff in Bandit Models joint work with Aurélien Garivier (IMT, Toulouse) and Tor Lattimore (University of Alberta) Workshop on Optimization and Decision-Making
More information1 Approximate Quantiles and Summaries
CS 598CSC: Algorithms for Big Data Lecture date: Sept 25, 2014 Instructor: Chandra Chekuri Scribe: Chandra Chekuri Suppose we have a stream a 1, a 2,..., a n of objects from an ordered universe. For simplicity
More information