Supplementary: Battle of Bandits

Size: px
Start display at page:

Download "Supplementary: Battle of Bandits"

Transcription

1 Supplementary: Battle of Bandits A Proof of Lemma Proof. We sta by noting that PX i PX i, i {U, V } Pi {U, V } PX i i {U, V } PX i i {U, V } j,j i j,j i j,j i PX i {U, V } {i, j} Q Q aia a ia j j, j,j i where the penultimate equality is by appealing to sampling without replacement and the definition of the win probability for i in the pairwise preference model. B Regret analysis of Battling-Doubler We will be using the following regret guarantee of UCB algorithm for classical MAB problem in order to proof the regret bounds of Battling-Doubler: Theorem. UCB Regret 5. Assume n denotes the set of arms. µ i denotes the expected reward associated with arm i n, such that µ > µ µ n and H n i i, where i µ µ i, then the expected regret of the UCB algorithm with confidence parameter ζ > 0 is of OζH ln T. Proof of Theorem 3 Proof. For any time horizon T Z +, let BT denotes the supremum of the expected regret of the SBM S after T steps, over all possible reward distributions on the arm set n. Clearly, length of epoch l in the algorithm is exactly l, l. We denote this by T l l. Note that for all time steps t inside this epoch, the first arms a t, a t,, a t are chosen uniformly from L, and thus the observed reward of the th arm also comes from a fixed distribution, as argued below: t within epoch l. Clearly, Eθ a t Eθ a t Eθ a t θ, since each of the first arms were drawn uniformly from L. Now, the SBM S is playing a standard MAB game over the set of arms n with binary rewards. Let b t denote the binary reward of the th arm a t in the tth step. Clearly for all t within epoch l, Eb t a t, a t,, a t + j θ a t θ a t j PX i, {U, V } {i, j} i {U, V } Thus at round t, if a t x, for any x n, P{U, V } {i, j} i {U, V } Eb t a t x E + j θ a x θ a t j + θ a x θ + θ a x θ Now for SBM S, the best arm with highest expected reward is still arm-, where Eb t a t + θ θ 5 By the definition of the bound function BT, the total expected regret in the traditional MAB sense of the SBM S, in the epoch l is at most BT l B l, which implies that E x + θ θ The interesting thing to note is that, at any round t of epoch l, the expected regret incurred by any of the first arms, i.e. a t j such that j, is exactly same as the average expected regret contributed by the set of arms drawn as the th arm in epoch l since Let θ E a UnifL θ a denote the expected utility Line 7 selects a t, a t,, a t uniformly from L at any score value for each of the first arms in all steps epoch l of length l, and thus it is at most B l / l. Thus,. + θ a x θ B l E x θ θ ax B l This in other words says that the expected contribution of the th arm to the regret in phase i is at most B l. It only remains to bound the expected contribution of the remaining arms to the regret, which are drawn from a distribution that assigns to all arms a n a probability propoional to the frequency in which a is played as the th arm in the previous epoch l.

2 Hence the total expected regret incurred by any of the first arms, a t j, j in epoch l is bounded by l B l / l B l. Since any finite time horizon of T can be uniquely decomposed as s + Z, for some integer s and 0 Z s+. Thus the total expected regret of Battling- Doubler is given by the following function of T : + 3 B + B + B s + BZ 6 Now by our assumption, Bt Oln t β, for any t Z +, the theorem claim follows by simplifying and bounding the above expression. Proof of corollary 4 Note that if the SBM S in Battling- Doubler is UCB that operates on the set of n arms, with expected reward of arm i n being + θ θ from equation 5. Thus for SBM S, the complexity of the total gap of the suboptimal arms become n i i H. Now from Theorem, we have that Bt OζH ln t for any t Z +. Thus we get T j S ER T E t θ θ j t + 3 B + B + B s + BZ Oζ H ln T OζH ln T. Proof of Corollary 5 The claim simply follows from the above theorem and the fact that the worst possible gap n ln T for UCB algorithm can be at most O T, as argued in 8 or 4 see Corollary.. The claim now follows by substituting H n O nt ln T in Corollary 4. C Regret analysis of Battling-MultiSBM Proof of Theorem 6. Proof. We sta by noting that in Battling-MultiSBM, only one SBM advances at each round. Denote by ρ x t the total number of times S x was advanced after t iterations of the algorithm, for any arm x n. + j θ a θ a tj t + jt + θ a θ j a from and Line 5 of Battling-MultiSBM. Note that for all SBMs S y, y n, the best arm the one with highest expected reward to play is arm. Thus at round t, the reward corresponding to the best arm is + t jt + Then at any round t where x is played as the th arm, words, this is the contribution of the th bandit S x internally sees a world in which the rewards are binary, choices to the regret at all times t for which the and the expected reward of any arm a n is exactly th arm is chosen to be x, and S x s internal θ θ j a. Now it is easy to see that for all SBMs Sy, y n, the suboptimalities of the arms are the same: the suboptimality regret associated to a is a n. θ θ a, for all arm Following is the ey observation in this proof, the total regret incurred by Battling-MultiSBM can be written as: T j S R T t θ θ j t T j θ θ a t j t θ θ a 0 + θ θ a 0 + θ θ a T + θ θ a t + t + θ θ a T + + θ θ a T θ θ a T T θ θ a t + t 7 as, max θ a a n This essentially conveys that bounding the above regret is equivalent to bounding regret of each SBM S y, y. In other words, this equivalently says that any suboptimal SBM S y, y n \ {} can not be played to many times, and any SBM can not play a suboptimal arm a n\{} too many times. The rest of the proof justifies the above claim. Let us denote by ρ x, the total number of times SBM S x is called, clearly ρ x T t at outputs x and by ρ xy the total number of times SBM S x played arm y, i.e. ρ xy T t at x and at y. We also denote by R x T {t ρ x T, a t x} θ θ x, the regret incurred due to advancing SBM S x, till time T. In

3 counter has not surpassed T. Similarly we denote by R xy T {t ρ x T, a t x, at y} θ θ y the regret incurred due to SBM S x for playing the arm y n, upto time T. Clearly R xy T ρ xy T y. From equation 7 it is now easy to see that, in order to bound the expected regret R T, it suffices to bound the expressions ER xy ρ x T, x, y n. Note that here we assume that each SBM policy used in Battling-MultiSBM is α-robust which would be crucially used in the proof. The main insight is to exploit the fact that due to α-robustness of the SBMs used, ρ x T is order of ln T for any suboptimal x. We begin with the observation that for any fixed x, y n, x being a suboptimal arm, if we choose α >, and s 4α, from the α-robustness propey of the SBM we get, P R xy T s ln T y s/ ln T P R xy T y / s/ ln T P ρ xy y / α s/ ln T α y / α s ln T α y s ln T α, 8 where the last inequality follows due to the fact that y, y n and α. Then under the same assumption on s and x, y, using union bound we get, P p {0,..., ln ln T } : R xy e ep sp/ y s α We now bound the quantity ρ x T for any non-optimal fixed x using the fact that all z n satisfy ρ z T T, and any SBM S x is advanced in an iteration only if x was the th bandit arm in the previous round. Thus we have that for all suboptimal x n \ {}, where the rightmost inequality is by union bound and 8. Now fix some x, y n, such that is x suboptimal. The last two inequalities give rise to a random variable Z defined as the minimal scalar for which we have for all T e, e e, e e,, e e ln ln T, ρ x T Zn ln T x, and R xy T Z ln T y. By 8 and 9, we have that for all s 4α, PZ s s α + ns ln T α. Also, conditioned on the event that {Z s}, we have R xy ρ x T R s xy : se lnsn ln T / x y ln x. Combining above we get, ER xy ρ x T OR 8α xy + se y ln ln T + ln n + ln s i R 8α+i xy 4α + i α + n4α + i ln T α. ln n Now since α max{3, + ln ln T }, it can be verified that the last expression converges to ORxy, 8α hence ER xy ρ x T Oα y ln ln T + ln n ln x. 0 Combining above and using 7 we get, the total expected regret ER T is at most +ER ρ T + x,y n\{} R xyρ x T. The desired regret bound now follows from 0. Proof of Corollary 7 Similar to the case of Battling- Doubler Corollary 5, the current claim simply follows from the regret guarantee of Battling-MultiSBM and the n ln T fact that the worst case gap can be at most O T, as argued in 8 or 4 see Corollary.. The claim now follows by substituting H n O nt ln T in Theorem 6. D Regret analysis of Battling-Duel Proof of Theorem 8 Pρ x T sn ln T / x P ρ zx T sn ln T / x z n s P R zx T ln T z n x ns ln T α, 9 Proof. Without loss of generality we will assume that the Condorcet arm a throughout the proof. Our goal is to analyse the regret of Battling-Duel in terms of its underlying dueling bandit algorithm. Considering Q to be the pairwise preference matrix perceived by dueling bandit algorithm D, i.e. upon playing any pair of item x t, y t n n, it receives feedbac according to the pairwise preference P x t beats y t Q x t,y t, we 3

4 now that the regret seen by the underlying dueling bandit algorithm is given by RT DB T Q,x t +Q,y t t. We now sta by analysing the relation between Q and Q. Case. is even. This is the easy case, since note that at any round t both x t and y t are replicated exactly for times. Thus at any round t: DB D, where the second last equality follows from Equation. Thus summing over t,,... T, the cumulative regret of Battling-Duel BD over T rounds become: Q x t, y t / P i S t i Q xt,x t + Qxt,y t Q x t,y t + 4, R BB T BD T t T t BB BD r DB t D RT DB D, where the second equality follows from the definition of pairwise-subset choice model see Lemma for details and the last equality follows from the fact that Q i,i, i n. Similarly we get, Q y t, x t P l S t l + Q yt,y t + Qyt,x t Q y t,x t + 4. Note the above expressions hold true for any pair of items x t, y t n n. Also, its also woh noting that indeed these give Q i,j + Q j,i, and Q i,i, i, j n. Thus for any pair of items i, j, we have Q i,j Q i,j + 4. Now let us analyse the instantaneous regret of Battling- Duel at round t, BB BD; using our definition of regret as defined in 3, gives: BB BD Q,j Q,j Q,x t + Q,xt Q,xt + Q,yt 4 and the claim follows. Now let us consider the case when is odd. Case. is odd. Note that, similar to the case before, we again have that at any round t, Q x t, y t + + / i + Q x t,y t + 4 P i S t + +/ Q xt,x t + + / P i S t i Qxt,y t + Q xt,x t + / Qxt,y t Similarly one can show that Q y t,x t + Q y t,x t + 4. It is easy to verify that as desired Q x t,y t + Q y t,x t. Fuhermore above relation also gives that Q xt,y t Q x + t,y t. Then similar to Case, analysing the instantaneous regret of Battling-Duel at round t, BB BD; using our definition of regret as defined in 3, gives: BB BD Q,j Q,j

5 Q,xt + + Q,yt + + Q,xt + Q,yt Q,xt + Q,yt Q,xt + Q,yt + + rdb t D, where the second last equality follows from Equation. Now summing over t,,... T, the cumulative regret of Battling-Duel BD over T rounds become: R BB T BD + T t T t BB BD r DB t D + RDB T D, score, i.e. θ > max 8 i θ i and rest of the θ i s are in an arithmetic or geometric progression respectively, as their name suggests. The two score vectors are described in Table. arith geom Table : Parameters for linear-subset choice model The next two utility score vectors has n 5 items in each. Similarly as before, item is the Condorcet winner here as well, with θ > max 8 i θ i. More specifically for g, the individual score vectors are of the form: θ i { 0.8, if i 0., i 5 \ {} For g3 the individual score vectors are of the form: and the claim for Case follows. Proof of Corollary 9 Proof. The result is immediate from Theorem 8 along with the expected regret guarantee of the RUCB algorithm as derived in Theorem 5 of 4. Proof of Corollary Proof. The proof immediately follows from Theorem 0 along with the regret lower bound for any DB problem as derived in Theorem of 3. This is because any smaller regret for A BB would violate the best achievable regret for DB, which is a logical contradiction. E Datasets for experiments 0.8, if i θ i 0.7, i 8 \ {} 0.6, otherwise We also used a bigger version of the g dataset with n 50 items to run experiments with varying subset size as shown in Figure 5 Section 5.3. The individual score vectors of g3 are the form: θ i { 0.8, if i 0., i 50 \ {} E. Parameters for linear-subset choice model For synthetic experiments with linear-subset choice model, we use the following four different utility score vectors θ 0, n :. arith. geom 3. g and 4. g3. Both arith and geom utility score vector has n 8 items, with item as the best Condorcet item with highest E. Pairwise preference matrices used in synthetic experiments with pairwise-subset choice model We run experiments on two synthetic preference matrices arith-pref and arxiv-pref for pairwise-subset choice model. The datasets are shown in Table 3 and 4 respectively. 5

6 Arith preference dataset Table 3: arith-pref: Arith preference matrix Arxiv preference dataset Table 4: arxiv-pref: Arxiv preference matrix E.3 Pairwise preference matrices used in real world experiments with pairwise-subset choice model Hurdy dataset features. Similar to Car, we construct the underlying pairwise preference matrix Q such that Q ij is computed by taing the empirical average of number of times a sushi type i is preferred over j. We fuher sample 6 sushis out of these 00 such that the underlying preference matrix contains a Condorcet winner. The dataset obtained is given in Table 7. Tennis dataset The tennis preference matrix compiles the all-time winloss results of tennis matches among 8 international tennis players as recorded by the Association of Tennis Professionals ATP. For each pair of players arms, say i and j, Q ij is set to be the fraction of matches between i and j that were won by i. The dataset is adopted from the Tennis data used by 6 which has the item player as its Condorcet winner. This resulted pairwise preference matrix is given below in Table Table 8: Tennis Dataset The Hudry tournament data is a well-studied tournament on 3 items and which actually has a special propey of being the smallest tournament dataset for which the Bans and Copeland sets are different as shown in 6. Although the original dataset does not contain a Condorcet item. Hence we delete three of the 3 items so that the resulting preference matrix contains a Condorcet winner. The preference matrix is shown in Table 5. Car dataset This dataset contains pairwise preferences of 0 cars given by 60 users, where each car has 4 features. From the user preferences, we compute the underlying pairwise preference matrix Q, where Q ij is computed by taing the empirical average of number of times a car i is preferred over item j by the users. The preference matrix obtained in this way for Car is given in Table 6. Sushi Dataset This dataset contains over 00 sushis rated according to their preferences, where each sushi is represented by 7 6

7 Table 5: Hurdy Dataset Table 6: Car Dataset Table 7: Sushi Dataset 7

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem Masrour Zoghi 1, Shimon Whiteson 1, Rémi Munos 2 and Maarten de Rijke 1 University of Amsterdam 1 ; INRIA Lille / MSR-NE 2 June 24,

More information

A. Notation. Attraction probability of item d. (d) Highest attraction probability, (1) A

A. Notation. Attraction probability of item d. (d) Highest attraction probability, (1) A A Notation Symbol Definition (d) Attraction probability of item d max Highest attraction probability, (1) A Binary attraction vector, where A(d) is the attraction indicator of item d P Distribution over

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Instance-dependent Regret Bounds for Dueling Bandits

Instance-dependent Regret Bounds for Dueling Bandits JMLR: Workshop and Conference Proceedings vol 49:1 24, 2016 Instance-dependent Regret Bounds for Dueling Bandits Akshay Balsubramani * UC San Diego, CA, USA Zohar Karnin Yahoo! Research, New York, NY,

More information

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University

Bandit Algorithms. Zhifeng Wang ... Department of Statistics Florida State University Bandit Algorithms Zhifeng Wang Department of Statistics Florida State University Outline Multi-Armed Bandits (MAB) Exploration-First Epsilon-Greedy Softmax UCB Thompson Sampling Adversarial Bandits Exp3

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

Online Learning: Bandit Setting

Online Learning: Bandit Setting Online Learning: Bandit Setting Daniel asabi Summer 04 Last Update: October 0, 06 Introduction [TODO Bandits. Stocastic setting Suppose tere exists unknown distributions ν,..., ν, suc tat te loss at eac

More information

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits Pratik Gajane Tanguy Urvoy Fabrice Clérot Orange-labs, Lannion, France PRATIK.GAJANE@ORANGE.COM TANGUY.URVOY@ORANGE.COM

More information

Multi-armed bandit models: a tutorial

Multi-armed bandit models: a tutorial Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)

More information

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations

Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations Mathematics Course 111: Algebra I Part I: Algebraic Structures, Sets and Permutations D. R. Wilkins Academic Year 1996-7 1 Number Systems and Matrix Algebra Integers The whole numbers 0, ±1, ±2, ±3, ±4,...

More information

Stochastic bandits: Explore-First and UCB

Stochastic bandits: Explore-First and UCB CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this

More information

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.

Administration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon. Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,

More information

Bandit models: a tutorial

Bandit models: a tutorial Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Chapter 2 Section 2.1: Proofs Proof Techniques. CS 130 Discrete Structures

Chapter 2 Section 2.1: Proofs Proof Techniques. CS 130 Discrete Structures Chapter 2 Section 2.1: Proofs Proof Techniques CS 130 Discrete Structures Some Terminologies Axioms: Statements that are always true. Example: Given two distinct points, there is exactly one line that

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

arxiv: v1 [cs.lg] 15 Jan 2016

arxiv: v1 [cs.lg] 15 Jan 2016 A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits extended version arxiv:6.3855v [cs.lg] 5 Jan 26 Pratik Gajane Tanguy Urvoy Fabrice Clérot Orange-labs, Lannion, France

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3

COS 402 Machine Learning and Artificial Intelligence Fall Lecture 22. Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 22 Exploration & Exploitation in Reinforcement Learning: MAB, UCB, Exp3 How to balance exploration and exploitation in reinforcement

More information

The K-armed Dueling Bandits Problem

The K-armed Dueling Bandits Problem The K-armed Dueling Bandits Problem Yisong Yue, Josef Broder, Robert Kleinberg, Thorsten Joachims Abstract We study a partial-information online-learning problem where actions are restricted to noisy comparisons

More information

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm

Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Annealing-Pareto Multi-Objective Multi-Armed Bandit Algorithm Saba Q. Yahyaa, Madalina M. Drugan and Bernard Manderick Vrije Universiteit Brussel, Department of Computer Science, Pleinlaan 2, 1050 Brussels,

More information

arxiv: v1 [cs.lg] 14 May 2014

arxiv: v1 [cs.lg] 14 May 2014 Reducing Dueling Bandits to Cardinal Bandits arxiv:145.3396v1 [cs.lg] 14 May 214 Nir Ailon CS Dept. Technion Haifa, Israel nailon@cs.technion.ac.il Thorsten Joachims Cs Dept. Cornell Ithaca, NY tj@cs.cornell.edu

More information

International Mathematics TOURNAMENT OF THE TOWNS

International Mathematics TOURNAMENT OF THE TOWNS International Mathematics TOURNAMENT OF THE TOWNS Senior A-Level Paper Fall 2010 1 1 There are 100 points on the plane All 4950 pairwise distances between two points have been recorded (a) A single record

More information

Stochastic Contextual Bandits with Known. Reward Functions

Stochastic Contextual Bandits with Known. Reward Functions Stochastic Contextual Bandits with nown 1 Reward Functions Pranav Sakulkar and Bhaskar rishnamachari Ming Hsieh Department of Electrical Engineering Viterbi School of Engineering University of Southern

More information

Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don t Help

Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don t Help Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don t Help Reinhard Heckel Nihar B. Shah Kannan Ramchandran Martin J. Wainwright, Department of Statistics, and Department of Electrical

More information

A Bound for the Number of Different Basic Solutions Generated by the Simplex Method

A Bound for the Number of Different Basic Solutions Generated by the Simplex Method ICOTA8, SHANGHAI, CHINA A Bound for the Number of Different Basic Solutions Generated by the Simplex Method Tomonari Kitahara and Shinji Mizuno Tokyo Institute of Technology December 12th, 2010 Contents

More information

The Multi-Arm Bandit Framework

The Multi-Arm Bandit Framework The Multi-Arm Bandit Framework A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course In This Lecture A. LAZARIC Reinforcement Learning Algorithms Oct 29th, 2013-2/94

More information

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits

An Experimental Evaluation of High-Dimensional Multi-Armed Bandits An Experimental Evaluation of High-Dimensional Multi-Armed Bandits Naoki Egami Romain Ferrali Kosuke Imai Princeton University Talk at Political Data Science Conference Washington University, St. Louis

More information

Solving a linear equation in a set of integers II

Solving a linear equation in a set of integers II ACTA ARITHMETICA LXXII.4 (1995) Solving a linear equation in a set of integers II by Imre Z. Ruzsa (Budapest) 1. Introduction. We continue the study of linear equations started in Part I of this paper.

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Lower Bounds for q-ary Codes with Large Covering Radius

Lower Bounds for q-ary Codes with Large Covering Radius Lower Bounds for q-ary Codes with Large Covering Radius Wolfgang Haas Immanuel Halupczok Jan-Christoph Schlage-Puchta Albert-Ludwigs-Universität Mathematisches Institut Eckerstr. 1 79104 Freiburg, Germany

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

arxiv: v3 [cs.gt] 17 Jun 2015

arxiv: v3 [cs.gt] 17 Jun 2015 An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance Shweta Jain, Sujit Gujar, Satyanath Bhat, Onno Zoeter, Y. Narahari arxiv:1406.7157v3 [cs.gt] 17 Jun 2015 June 18,

More information

Solution Set for Homework #1

Solution Set for Homework #1 CS 683 Spring 07 Learning, Games, and Electronic Markets Solution Set for Homework #1 1. Suppose x and y are real numbers and x > y. Prove that e x > ex e y x y > e y. Solution: Let f(s = e s. By the mean

More information

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms

Introduction to Bandit Algorithms. Introduction to Bandit Algorithms Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,

More information

Lecture 5: Regret Bounds for Thompson Sampling

Lecture 5: Regret Bounds for Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/2/6 Lecture 5: Regret Bounds for Thompson Sampling Instructor: Alex Slivkins Scribed by: Yancy Liao Regret Bounds for Thompson Sampling For each round t, we defined

More information

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem

More information

Computing with voting trees

Computing with voting trees Computing with voting trees Jennifer Iglesias Nathaniel Ince Po-Shen Loh Abstract The classical paradox of social choice theory asserts that there is no fair way to deterministically select a winner in

More information

Estimation Considerations in Contextual Bandits

Estimation Considerations in Contextual Bandits Estimation Considerations in Contextual Bandits Maria Dimakopoulou Zhengyuan Zhou Susan Athey Guido Imbens arxiv:1711.07077v4 [stat.ml] 16 Dec 2018 Abstract Contextual bandit algorithms are sensitive to

More information

Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade

Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade Bandits and Exploration: How do we (optimally) gather information? Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 22

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12

Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Advanced Topics in Machine Learning and Algorithmic Game Theory Fall semester, 2011/12 Lecture 4: Multiarmed Bandit in the Adversarial Model Lecturer: Yishay Mansour Scribe: Shai Vardi 4.1 Lecture Overview

More information

Evaluation of multi armed bandit algorithms and empirical algorithm

Evaluation of multi armed bandit algorithms and empirical algorithm Acta Technica 62, No. 2B/2017, 639 656 c 2017 Institute of Thermomechanics CAS, v.v.i. Evaluation of multi armed bandit algorithms and empirical algorithm Zhang Hong 2,3, Cao Xiushan 1, Pu Qiumei 1,4 Abstract.

More information

Learning to play K-armed bandit problems

Learning to play K-armed bandit problems Learning to play K-armed bandit problems Francis Maes 1, Louis Wehenkel 1 and Damien Ernst 1 1 University of Liège Dept. of Electrical Engineering and Computer Science Institut Montefiore, B28, B-4000,

More information

arxiv: v1 [stat.ml] 31 Jan 2015

arxiv: v1 [stat.ml] 31 Jan 2015 Sparse Dueling Bandits arxiv:500033v [statml] 3 Jan 05 Kevin Jamieson, Sumeet Katariya, Atul Deshpande and Robert Nowak Department of Electrical and Computer Engineering University of Wisconsin Madison

More information

Goldbach s conjecture

Goldbach s conjecture Goldbach s conjecture Ralf Wüsthofen ¹ January 31, 2017 ² ABSTRACT This paper presents an elementary and short proof of the strong Goldbach conjecture. Whereas the traditional approaches focus on the control

More information

Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks

Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks 1 Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks Monireh Dabaghchian, Amir Alipour-Fanid, Kai Zeng, Qingsi Wang, Peter Auer arxiv:1709.10128v3 [cs.ni]

More information

Subsampling, Concentration and Multi-armed bandits

Subsampling, Concentration and Multi-armed bandits Subsampling, Concentration and Multi-armed bandits Odalric-Ambrym Maillard, R. Bardenet, S. Mannor, A. Baransi, N. Galichet, J. Pineau, A. Durand Toulouse, November 09, 2015 O-A. Maillard Subsampling and

More information

Online learning with feedback graphs and switching costs

Online learning with feedback graphs and switching costs Online learning with feedback graphs and switching costs A Proof of Theorem Proof. Without loss of generality let the independent sequence set I(G :T ) formed of actions (or arms ) from to. Given the sequence

More information

Lecture 20: LP Relaxation and Approximation Algorithms. 1 Introduction. 2 Vertex Cover problem. CSCI-B609: A Theorist s Toolkit, Fall 2016 Nov 8

Lecture 20: LP Relaxation and Approximation Algorithms. 1 Introduction. 2 Vertex Cover problem. CSCI-B609: A Theorist s Toolkit, Fall 2016 Nov 8 CSCI-B609: A Theorist s Toolkit, Fall 2016 Nov 8 Lecture 20: LP Relaxation and Approximation Algorithms Lecturer: Yuan Zhou Scribe: Syed Mahbub Hafiz 1 Introduction When variables of constraints of an

More information

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models

Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models c Qing Zhao, UC Davis. Talk at Xidian Univ., September, 2011. 1 Multi-Armed Bandit: Learning in Dynamic Systems with Unknown Models Qing Zhao Department of Electrical and Computer Engineering University

More information

The Ellipsoid Algorithm

The Ellipsoid Algorithm The Ellipsoid Algorithm John E. Mitchell Department of Mathematical Sciences RPI, Troy, NY 12180 USA 9 February 2018 Mitchell The Ellipsoid Algorithm 1 / 28 Introduction Outline 1 Introduction 2 Assumptions

More information

Multiple Identifications in Multi-Armed Bandits

Multiple Identifications in Multi-Armed Bandits Multiple Identifications in Multi-Armed Bandits arxiv:05.38v [cs.lg] 4 May 0 Sébastien Bubeck Department of Operations Research and Financial Engineering, Princeton University sbubeck@princeton.edu Tengyao

More information

Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit

Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit European Worshop on Reinforcement Learning 14 (2018 October 2018, Lille, France. Thompson Sampling for the non-stationary Corrupt Multi-Armed Bandit Réda Alami Orange Labs 2 Avenue Pierre Marzin 22300,

More information

Discrete Mathematics: Midterm Test with Answers. Professor Callahan, section (A or B): Name: NetID: 30 multiple choice, 3 points each:

Discrete Mathematics: Midterm Test with Answers. Professor Callahan, section (A or B): Name: NetID: 30 multiple choice, 3 points each: Discrete Mathematics: Midterm Test with Answers Professor Callahan, section (A or B): Name: NetID: 30 multiple choice, 3 points each: 1. If f is defined recursively by: f (0) = -2, f (1) = 1, and for n

More information

arxiv: v1 [cs.lg] 7 Sep 2018

arxiv: v1 [cs.lg] 7 Sep 2018 Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms Alihan Hüyük Bilkent University Cem Tekin Bilkent University arxiv:809.02707v [cs.lg] 7 Sep 208

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Discrete Applied Mathematics

Discrete Applied Mathematics Discrete Applied Mathematics 194 (015) 37 59 Contents lists available at ScienceDirect Discrete Applied Mathematics journal homepage: wwwelseviercom/locate/dam Loopy, Hankel, and combinatorially skew-hankel

More information

Beat the Mean Bandit

Beat the Mean Bandit Yisong Yue H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY, USA yisongyue@cmu.edu tj@cs.cornell.edu

More information

Incomplete version for students of easllc2012 only. 94 First-Order Logic. Incomplete version for students of easllc2012 only. 6.5 The Semantic Game 93

Incomplete version for students of easllc2012 only. 94 First-Order Logic. Incomplete version for students of easllc2012 only. 6.5 The Semantic Game 93 65 The Semantic Game 93 In particular, for every countable X M there is a countable submodel N of M such that X N and N = T Proof Let T = {' 0, ' 1,} By Proposition 622 player II has a winning strategy

More information

Scott Taylor 1. EQUIVALENCE RELATIONS. Definition 1.1. Let A be a set. An equivalence relation on A is a relation such that:

Scott Taylor 1. EQUIVALENCE RELATIONS. Definition 1.1. Let A be a set. An equivalence relation on A is a relation such that: Equivalence MA Relations 274 and Partitions Scott Taylor 1. EQUIVALENCE RELATIONS Definition 1.1. Let A be a set. An equivalence relation on A is a relation such that: (1) is reflexive. That is, (2) is

More information

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes

An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes An Estimation Based Allocation Rule with Super-linear Regret and Finite Lock-on Time for Time-dependent Multi-armed Bandit Processes Prokopis C. Prokopiou, Peter E. Caines, and Aditya Mahajan McGill University

More information

Analysis of Thompson Sampling for the multi-armed bandit problem

Analysis of Thompson Sampling for the multi-armed bandit problem Analysis of Thompson Sampling for the multi-armed bandit problem Shipra Agrawal Microsoft Research India shipra@microsoft.com avin Goyal Microsoft Research India navingo@microsoft.com Abstract We show

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem The Multi-Armed Bandit Problem Electrical and Computer Engineering December 7, 2013 Outline 1 2 Mathematical 3 Algorithm Upper Confidence Bound Algorithm A/B Testing Exploration vs. Exploitation Scientist

More information

arxiv: v1 [math.pr] 14 Dec 2016

arxiv: v1 [math.pr] 14 Dec 2016 Random Knockout Tournaments arxiv:1612.04448v1 [math.pr] 14 Dec 2016 Ilan Adler Department of Industrial Engineering and Operations Research University of California, Berkeley Yang Cao Department of Industrial

More information

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions Siddartha Ramamohan Indian Institute of Science Bangalore 56001, India siddarthayr@csaiiscernetin Arun Rajkumar Xerox Research

More information

CHOMP ON NUMERICAL SEMIGROUPS

CHOMP ON NUMERICAL SEMIGROUPS CHOMP ON NUMERICAL SEMIGROUPS IGNACIO GARCÍA-MARCO AND KOLJA KNAUER ABSTRACT. We consider the two-player game chomp on posets associated to numerical semigroups and show that the analysis of strategies

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

Math 421, Homework #9 Solutions

Math 421, Homework #9 Solutions Math 41, Homework #9 Solutions (1) (a) A set E R n is said to be path connected if for any pair of points x E and y E there exists a continuous function γ : [0, 1] R n satisfying γ(0) = x, γ(1) = y, and

More information

14.1 Finding frequent elements in stream

14.1 Finding frequent elements in stream Chapter 14 Streaming Data Model 14.1 Finding frequent elements in stream A very useful statistics for many applications is to keep track of elements that occur more frequently. It can come in many flavours

More information

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang

The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits John Langford and Tong Zhang Presentation by Terry Lam 02/2011 Outline The Contextual Bandit Problem Prior Works The Epoch Greedy Algorithm

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Learning from paired comparisons: three is enough, two is not

Learning from paired comparisons: three is enough, two is not Learning from paired comparisons: three is enough, two is not University of Cambridge, Statistics Laboratory, UK; CNRS, Paris School of Economics, France Paris, December 2014 2 Pairwise comparisons A set

More information

The Goldbach Conjecture An Emergence Effect of the Primes

The Goldbach Conjecture An Emergence Effect of the Primes The Goldbach Conjecture An Emergence Effect of the Primes Ralf Wüsthofen ¹ October 31, 2018 ² ABSTRACT The present paper shows that a principle known as emergence lies beneath the strong Goldbach conjecture.

More information

Lecture 2: Minimax theorem, Impagliazzo Hard Core Lemma

Lecture 2: Minimax theorem, Impagliazzo Hard Core Lemma Lecture 2: Minimax theorem, Impagliazzo Hard Core Lemma Topics in Pseudorandomness and Complexity Theory (Spring 207) Rutgers University Swastik Kopparty Scribe: Cole Franks Zero-sum games are two player

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

BURGESS INEQUALITY IN F p 2. Mei-Chu Chang

BURGESS INEQUALITY IN F p 2. Mei-Chu Chang BURGESS INEQUALITY IN F p 2 Mei-Chu Chang Abstract. Let be a nontrivial multiplicative character of F p 2. We obtain the following results.. Given ε > 0, there is δ > 0 such that if ω F p 2\F p and I,

More information

Gambling in a rigged casino: The adversarial multi-armed bandit problem

Gambling in a rigged casino: The adversarial multi-armed bandit problem Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at

More information

Notes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed

Notes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed CS 683 Learning, Games, and Electronic Markets Spring 007 Notes from Week 9: Multi-Armed Bandit Problems II Instructor: Robert Kleinberg 6-30 Mar 007 1 Information-theoretic lower bounds for multiarmed

More information

WE INTRODUCE and study a collaborative online learning

WE INTRODUCE and study a collaborative online learning IEEE/ACM TRANSACTIONS ON NETWORKING Collaborative Learning of Stochastic Bandits Over a Social Network Ravi Kumar Kolla, Krishna Jagannathan, and Aditya Gopalan Abstract We consider a collaborative online

More information

The Multi-Armed Bandit Problem

The Multi-Armed Bandit Problem Università degli Studi di Milano The bandit problem [Robbins, 1952]... K slot machines Rewards X i,1, X i,2,... of machine i are i.i.d. [0, 1]-valued random variables An allocation policy prescribes which

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

Saturday, September 7, 2013 TEST BOOKLET. Test Version A. Your test version (A, B, C, or D) is above on this page.

Saturday, September 7, 2013 TEST BOOKLET. Test Version A. Your test version (A, B, C, or D) is above on this page. AdvAntAge testing FoundAtion MAth The Fifth Prize Annual For girls Math Prize for Girls Saturday, September 7, 2013 TEST BOOKLET Test Version A DIRECTIONS 1. Do not open this test until your proctor instructs

More information

THE STRUCTURE OF RAINBOW-FREE COLORINGS FOR LINEAR EQUATIONS ON THREE VARIABLES IN Z p. Mario Huicochea CINNMA, Querétaro, México

THE STRUCTURE OF RAINBOW-FREE COLORINGS FOR LINEAR EQUATIONS ON THREE VARIABLES IN Z p. Mario Huicochea CINNMA, Querétaro, México #A8 INTEGERS 15A (2015) THE STRUCTURE OF RAINBOW-FREE COLORINGS FOR LINEAR EQUATIONS ON THREE VARIABLES IN Z p Mario Huicochea CINNMA, Querétaro, México dym@cimat.mx Amanda Montejano UNAM Facultad de Ciencias

More information

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SEQUENTIAL DATA So far, when thinking

More information

Introduction to Information Entropy Adapted from Papoulis (1991)

Introduction to Information Entropy Adapted from Papoulis (1991) Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION

More information

1 Closest Pair of Points on the Plane

1 Closest Pair of Points on the Plane CS 31: Algorithms (Spring 2019): Lecture 5 Date: 4th April, 2019 Topic: Divide and Conquer 3: Closest Pair of Points on a Plane Disclaimer: These notes have not gone through scrutiny and in all probability

More information

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15)

Problem 1: (Chernoff Bounds via Negative Dependence - from MU Ex 5.15) Problem 1: Chernoff Bounds via Negative Dependence - from MU Ex 5.15) While deriving lower bounds on the load of the maximum loaded bin when n balls are thrown in n bins, we saw the use of negative dependence.

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Sequences, their sums and Induction

Sequences, their sums and Induction Sequences, their sums and Induction Example (1) Calculate the 15th term of arithmetic progression, whose initial term is 2 and common differnce is 5. And its n-th term? Find the sum of this sequence from

More information

NOTES ON HYPERBOLICITY CONES

NOTES ON HYPERBOLICITY CONES NOTES ON HYPERBOLICITY CONES Petter Brändén (Stockholm) pbranden@math.su.se Berkeley, October 2010 1. Hyperbolic programming A hyperbolic program is an optimization problem of the form minimize c T x such

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Two optimization problems in a stochastic bandit model

Two optimization problems in a stochastic bandit model Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization

More information

The Goldbach Conjecture An Emergence Effect of the Primes

The Goldbach Conjecture An Emergence Effect of the Primes The Goldbach Conjecture An Emergence Effect of the Primes Ralf Wüsthofen ¹ April 20, 2018 ² ABSTRACT The present paper shows that a principle known as emergence lies beneath the strong Goldbach conjecture.

More information

Discover Relevant Sources : A Multi-Armed Bandit Approach

Discover Relevant Sources : A Multi-Armed Bandit Approach Discover Relevant Sources : A Multi-Armed Bandit Approach 1 Onur Atan, Mihaela van der Schaar Department of Electrical Engineering, University of California Los Angeles Email: oatan@ucla.edu, mihaela@ee.ucla.edu

More information

Online Learning with Abstention. Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 2016a).

Online Learning with Abstention. Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 2016a). + + + + + + Figure 6: Simple example of the benefits of learning with abstention (Cortes et al., 206a). A. Further Related Work Learning with abstention is a useful paradigm in applications where the cost

More information

Irredundant Families of Subcubes

Irredundant Families of Subcubes Irredundant Families of Subcubes David Ellis January 2010 Abstract We consider the problem of finding the maximum possible size of a family of -dimensional subcubes of the n-cube {0, 1} n, none of which

More information

15.S24 Sample Exam Solutions

15.S24 Sample Exam Solutions 5.S4 Sample Exam Solutions. In each case, determine whether V is a vector space. If it is not a vector space, explain why not. If it is, find basis vectors for V. (a) V is the subset of R 3 defined by

More information

arxiv: v1 [cs.lg] 21 Sep 2018

arxiv: v1 [cs.lg] 21 Sep 2018 SIC - MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits arxiv:809.085v [cs.lg] 2 Sep 208 Etienne Boursier and Vianney Perchet,2 CMLA, ENS Cachan, CNRS, Université Paris-Saclay,

More information

Lecture 4: Two-point Sampling, Coupon Collector s problem

Lecture 4: Two-point Sampling, Coupon Collector s problem Randomized Algorithms Lecture 4: Two-point Sampling, Coupon Collector s problem Sotiris Nikoletseas Associate Professor CEID - ETY Course 2013-2014 Sotiris Nikoletseas, Associate Professor Randomized Algorithms

More information