Game Theory, On-line prediction and Boosting (Freund, Schapire)
|
|
- Lorena Griffith
- 5 years ago
- Views:
Transcription
1 Game heory, On-line prediction and Boosting (Freund, Schapire) Idan Attias /4/208 INRODUCION he purpose of this paper is to bring out the close connection between game theory, on-line prediction and boosting, seemingly unrelated topics. aper outline: Review of game theory. Algorithm for learning to play repeated games (based on on-line prediction methods of Littlestone and Warmuth (LW). New simple proof of Von Neumann s minimax theorem (stem from the analysis of the algorithm above). Method of approximately solving a game (stem from the analysis of the algorithm above). On-line prediction model (obtained by applying the game playing algorithm above to an appropriate choice of game). Boosting algorithm (obtained by applying the same algorithm to the dual of this game) 2 GAME HEORY Example: he loss matrix of Rock, aper, Scissors is: R S R S 0 2
2 Game setting: We study two person game in normal form. hat is, each game is defined by a matrix M. here are two players called the row player and column player. o play the game, the row player chooses a row i, and simultaneously the column player chooses a column j. Definition: ure strategy is a deterministic single choice of row(column) by the players. Definition: he loss suffered by the row player is M(i, j). layers goals: he row player s goal is to minimize its loss, the goal of the column player is to maximize this loss (zero sum game). However, the results apply when no assumptions are made about the goal or strategy of the column player. Assumptions:. All the losses are in [0,], simple scaling can be used to get more general results. 2. he game matrix is finite (that is, the number of choices available to each player is finite). Most of the results translate with very mild additional assumptions to cases of infinite matrix games. 2. RANDOMIZED LAY Game setting: he row player chooses a distribution over the rows of M and simultaneously the column player chooses a distribution over the columns. Definition: Mixed strategy is a choice of distribution over the matrix rows(columns) by the players. Definition: computed as he expected loss of the row player (the loss from now on) is (i)m(i, j)(j) = M := M(, ) i,j If the row player chooses a mixed strategy but the column player chooses pure strategy, single column j, the loss is (i)m(i, j) := M(, j) he notation M(i, ) is defined analogously. i 2
3 2.2 SEUENIAL LAY Game Setting: Suppose now that instead of choosing strategies simultaneously the play is sequential. he column player chooses its strategy after the row player has chosen and announced its strategy. layers goals: he column player s goal is to maximize the row player s loss - zero sum game.given, such a worst case / adversarial column player will choose to maximize M(,). he row player s plays payoff will be maxm(, ). Knowing this, the row player should choose to minimize it, and its loss will be min max M(, ). Notice that the row player plays first here. If the column player plays first and the row player can choose its play with the benefit of knowing strategy, its loss will be max min M(, ). 2.3 HE MINMAX HEOREM Definition: M inmax strategy is a mixed strategy * realizing this minimum: min max M(, ). Maxmin strategy is a mixed strategy * realizing this maximum: max min M(, ). heorem: max min M(, ) = min max M(, ) := v. Definition: v is called the value of the game M. Explanations: We expect the player who chooses its strategy last to have the advantage, since it plays knowing its opponent s strategy. hus, max min M(, ) min max M(, ). However the theorem states that playing last doesn t give any advantage. he row player has a (minmax) strategy * such that regardless of the strategy played by the column player, the loss suffered M(, ) will be at most v. Symmetrically, it means that the column player has a (maxmin) strategy * such that, regardless of the strategy played by the row player the loss will be at least v. his means that the strategies * and * are optimal in a strong sense. laying by classical game theory: Given zero sum game M, one should play using a minmax strategy (can be computed as Linear rogramming). roblems with this approach: Explain why these extrema exist 3
4 he column player may not be truly adversarial and may behave in a manner that admits loss significantly smaller than the game value v. M may be unknown. M may be too large for computing. 2.4 REEAED LAY Motivation: We try to overcome these difficulties by playing the game repeatedly (one shot game is hopeless). Our goal is to learn how to play well against a particular opponent (which is not necessarily adversarial). Game setting: We refer to the row player as the learner and the column player as the environment. Let M be a matrix, possibly unknown to the learner. he game is played repeatedly in a sequence of rounds. On round t =,..., :. he learner chooses mixed strategy t. 2. he environment chooses mixed strategy t (which may be chosen with knowledge of t ). 3. he learner is permitted to observe the loss M(i, t ) for each row i. 4. he learner suffers loss M( t, t ). Learner s goal: o suffer cumulative loss t= M( t, t ) which is not much worse than min M(, t ), which is the loss of the best fixed strategy in hindsight against the actual sequence of plays,...,. 4
5 Algorithm LW(β) arameter: β [0, ) Initialize all the weights are set to unity w (i) =, i n. Do for t =,..., :. Compute mixed strategy: t (i) = wt(i) i wt(i), i n 2. Update the weights: w t+ (i) = w t (i)β M(i,t) (sanity check: why these updates make sense?) Output:,... t. [w t (i) := weight at time t on row i] heorem : For any matrix M with n rows and entries in [0,], for any sequence of mixed strategies,..., played by the environment, the sequence of mixed strategies,... produced by algorithm LW with parameter β [0, ) satisfy: where t= M( t, t ) a β min a β = ln(/β) β M(, t ) + c β ln(n) t= c β = β roof: For t =,..., we have that: i= w t+ (i) = def w t (i)β M(i,t) i= w t (i)( ( β)m(i, t )) = ( ( β)m( t, t )) w t (i) i= he inequality follows from: β x ( β)x, for β > 0 and x [0, ]. 2 he last equality follows from: n i= w t (i)( ( β)m(i, t )) = w n t(i) n i= w t(i) ( w t (i)( ( β)m(i, t ))) = i= = ( w t (i)) t (i)( ( β)m(i, t )) = ( w t (i)) t (i) t (i)( β)m(i, t )) = i= i= i= i= 2 rove it. hint: use convexity argument (recall the definition). i= i= 5
6 = ( w t (i))( ( β)m( t, t )) i= [Recall: i t(i) =, t (i) = wt(i) i wt(i)] Unwrapping this recurrence gives: [base case: w (i) =, n i= w (i) = n] w + (i) n ( ( β)m( t, t )) ( ) i= Note that for any j n: t= β t= M(j,t) = w + (j) Combining (*) and (**) and taking logs gives: ln(β) w + (i) ( ) i= M(j, t ) ln(n)+ ln( ( β)m( t, t )) t= t= ln( x) x for x< ln(n) ( β) Rearranging terms and noticing that this holds for any j (so taking the min over j gives a tight bound) we get: t= M( t, t ) ln(β) β min j t= M(j, t ) + ln(n) β Since the minimum (over mixed strategies ) in the bound of the theorem must be achieved by a pure strategy j, this implies the theorem. M( t, t ) t= Remarks: lim β a β = For fixed β and as the number of rounds becomes large, c β ln(n) becomes negligible relative to. hus, by choosing β close to, the learner ensures that its loss will not be much worse than the loss of the best strategy (formalized in the following corollary). Corollary 2: Under the conditions of heorem and with β = average per trial loss suffered by the learner is t= M( t, t ) min M(, t ) + t= + 2ln(n), the 6
7 where = 2ln(n) + ln(n) Corollary 3: Under the conditions of Corollary 2, ln(n) = O( ). M( t, t ) v + t= roof: Let be the minmax strategy for M, so for all column strategy : M(, ) v. Using corollary 2: Remarks: M( t, t ) t= corollary 2 M(, t ) + v + t= heorem guarantees that the cumulative loss is not much larger than that of any fixed mixed strategy. In particular not much larger than the game value (corollary 3). If the environment is non-adversarial, there might be a better fixed mixed strategy for the player and in which case the algorithm is guaranteed to be almost as good as this better strategy. 2.5 ROOF OF HE MINMAX HEOREM We prove: min max M(, ) max min M(, ) Suppose that we run algorithm LW (produced,.., ) against the maximally adversarial environment: on each round t the environment chooses: t = argmaxm( t, ). Denote = t= t and = t= t (both are probability distributions). We have: min max M max M def of max t M t= t= max t M = def of t t M t t= corollary 2 min t= M t + = def of min M + max min M+ can be made arbitrarily close to zero and it ends the proof. 7
8 2.6 AROXIMAELY SOLVING A GAME he algorithm LW can be used to find an approximate minmax or maxmin strategy. Definition: Approximate minmax strategy satisfy: max M(, ) v + ( can be made arbitrarily small). Approximate maxmin strategy analogously. Claim: = t= t is an approximate minmax strategy ( t produced by LW) 3. roof: from proof 2.5 we get: max M(, ) max minm(, ) + = v +. Claim: = t= t such that t = argmaxm( t, )] is an approximate maxmin strategy (from proof 2.5): v minm(, ). t can always be chosen to be a pure strategy 4. It is important to our derivation of a boosting algorithm in section 4. 3 ON-LINE REDICION On-line prediction setting: In the on-line prediction model, the learner observes a sequence of examples and predicts their label one at a time. he learner s goal is to minimize its prediction errors. Formal definition: Let X be a finite set of instances, and let H be a finite set of hypothesis h : X {0, } (there are generalizations for infinite cases). Let c : X {0, } be an unknown target concept, not necessarily in H (how to choose a good H? bias-variance trade-off). he learning takes place in a sequence of rounds. On round t =,..., :. he learner observes an example x t X. 2. he learner makes a randomized prediction ŷ t {0, } of the label associated with x t. 3. he learner observes the correct label c(x t ). 3 Give a full proof. Follow proof 2.5 exactly. 4 rove it. Use directly the definition of the loss M( t, ). 8
9 he goal of the learner: o minimize the expected number (with respect to its own randomization) of mistakes that it makes relative to the best hypothesis in the space H. Definition: Mistake matrix M H X (the game matrix in this case) defined as follows: { if h(x) c(x) M(h, x) = 0 else Reduction to the repeated game problem: he environment s choice of column corresponds to a choice of an instance x that is presented to the learner in a given iteration (pure strategy). he learner s choice of a distribution over the matrix rows (hypotheses) corresponds to making a random choice of a hypothesis with which to predict (mixed strategy). Apply the LW algorithm: On round t, we Have distribution t over H. Given instance x t (pure strategy) We randomly select h t H according to t and predict y t = ˆh t (x t ). Given c(x t ) we compute M(h, x t ), h H and update the weights by LW. Analysis: M( t, x t ) = h H t (h)m(h, x t ) = r h t [h(x t ) c(x t )] herefore, the expected number of mistakes made by the learner equals (by corollary 2) Remarks: t= M( t, x t ) min h H M(h, x t ) + O( ln H ) t= Analysis using theorem gives a better bound. he result can be generalized to any bounded loss function and also to a more generalized settings. 9
10 4 BOOSING Weak learning definition: For γ > 0, we say the algorithm WL is a γ-weak learning algorithm for (H, c) if for any distribution over X, the algorithm takes as input a set of labeled examples distributed according to and outputs a hypothesis h H with error slightly better than a random guessing: r [h(x) c(x)] x 2 γ Boosting definition: Boosting is the problem of converting a weak learning algorithm into one that performs with good accuracy. he goal is to run the weak algorithm many times on many distributions and combine the selected hypothesis into final hypothesis with small error rate. he main issues are how to choose the distributions (D t ) and how to combine the hypothesis. he boosting process: Boosting proceeds in rounds. On round t =,..., :. he booster constructs a distribution D t on X which is passed to the weak learner. 2. he weak learner produces a hypothesis h t H with error: r x D t [h t (x) c(x)] 2 γ 3. After rounds the weak hypothesis h,..., h are combined into a final hypothesis h fin. Assumptions: robability of successes: We assume that the weak learner always succeeds, thus the boosting algorithm succeeds with absolute certainty (usually we assume that the weak learning succeeds with high probability, thus the boosting algorithm succeed with probability > δ [AC]). Final hypothesis error: We have full access to the labels associated with the entire domain X. hus, we require that the final hypothesis have error zero so that all instances are correctly classified. he algorithm can be modified to fit the more standard (and practical) model in which the final error must be less than some positive parameter ɛ. With given labeled training set, all distributions would be computed over the training set. he generalization error of the final hypothesis can then be bounded using, for instance, standard VC theory. 0
11 4. BOOSING AND HE MINMAX HEORM Motivation: next section. his part gives an intuition for the boosting algorithm in the Relationship between the mistake matrix M (section 3) and the minmax theorem: min max M(, x) = min max M(, ) = v = max min M(, ) = max min M(h, ) ( ) x Last equality: for any, minm(, ), is realized at pure strategy h. First equality: for any, maxm(, ), is realized at pure strategy x. Note that M(h, ) = r [h(x) c(x)]. x here exist a distribution (maxmin strategy) on X such that for every h: M(h, ) = r [h(x) c(x)] v. x From weak learnability, exists h such that: r [h(x) c(x)] x 2 γ. Hence, v 2 γ. here exist a distribution (minmax strategy) over H such that for every x: M(, x) = r [h(x) c(x)] v h 2 γ < 2. hat is, every x is misclassified by less than 2 of the hypothesis (as weighted by minmax strategy). herefore, the target concept c is equivalent to a weighted majority of hypothesis in H. Corollary: If (H, c) are γ weakly learnable, then c can be computed exactly as a weighted majority of hypotheses in H. he weights defined by distribution on rows (hypotheses) of game M are a minmax strategy for this game. 4.2 IDEA FOR BOOSING ALGORIHM he idea: By the corollary above, we approximate c by approximating the weights of the minmax strategy (recall subsection 2.6). Adapt LW to the boosting model: he LW algorithm does not fit the boosting model. Recall that on each round, algorithm LW computes a distribution over the rows of the game matrix (hypotheses). However, in the boosting model, we want to compute on each round a distribution over instances (columns of M). Rather than using game M directly, we construct the dual of M which is the identical game except that the roles of the row and column players have been h
12 reversed. Construct the dual matrix: We modify our matrix game M (rows-hypothesis, columns- instances) to fit the boosting model.we construct to dual game matrix M as follows:. he algorithm computes a distribution over rows (hypothesis), in the boosting model we want to compute at each round a distribution over the instances (columns of M). We need to reverse row and column so we take M. 2. he column player of M wants to maximize to loss but the row player of M wants to minimize it. We want to reverse the meaning of minimum and maximum, therefore we take -M. 3. Our convention of losses being in [0,], for that reason we take -M. Definition: he dual matrix is defined as M(x, h) = M(h, x) = M(h, x) = { if h(x) = c(x) 0 else Remark: Any minmax strategy of the game M becomes a maxmin strategy of the game M. herefore, whereas before we were interested in finding an approximate minmax strategy of M, we are now interested in finding an approximate maxmin strategy of M. Apply algorithm LW to M : We apply LW to the dual game matrix M. he reduction proceeds as follows: On round t of boosting:. LW Computes distribution t over rows of M (over X). 2. he boosting algorithm sets D t = t and passes D t to the weak learning algorithm. 3. he weak learning algorithm return h t : r x D t [h t (x) = c(x)] 2 + γ 4. he weights maintained by LW are updated where t is defined to be pure strategy h t. hat is w t+ (i) = w t (i)β M(i,ht), i. 5. Return hypotheses h,..., h. Constructing approximate maxmin strategy: According to the method of approximately solving a game (2.6), on each round t, t may be a pure strategy h t that should be chosen to maximize: M ( t, h t ) = t (x)m (x, h t ) = r [h t (x) = c(x)]. x t x hat is, h t should have maximum accuracy with respect to distribution t. 2
13 For that, we use the weak learner. Although it is not guaranteed to succeed in finding the best h t, finding one of accuracy 2 + γ turns out to be sufficient for our purposes. Finally, this method suggests that = t= t is an approximate maxmin strategy, and we showed that the target c is equivalent to a majority of the hypotheses if weighted by a maxmin strategy of M (by 4.). t is a pure strategy (hypothesis) h t, this leads us to choose a simple majority of h,..., h : h fin = majority(h,..., h ). 4.3 ANALYSIS Our boosting procedure will compute h fin identical to c (for sufficiently large ). For all t: By corollary 2 we have: 2 + γ M ( t, h t ) = r x t [h t (x) = c(x)] 2 + γ t= Rearranging terms, for all x: 2 < for large ( <γ) M ( t, h t ) min x 2 + γ M (x, h t ) + t= M (x, h t ) ln X ) = O( ).We choose = Ω( γ ln X ) for 2 < γ. By definition of M, t= M (x, h t ) is exactly the number of hypothesis h t which agree with c on x, and its more than 2. So by definition of h fin we get h fin = c(x), x. t= 3
14 Algorithm 2 Boosting algorithm: Input: instance space X and target function c. γ-weak learning algorithm.. Set = 4 γ 2 ln X (so that < γ) 2. Set β = + 3. D (x) = X 4. For t =,..., : 2ln X for x X. ass distribution D t to weak learner. Get back hypothesis h t, s.t r [h t (x) c(x)] x D 2 γ t { β if h t (x) = c(x) w t+ (x) = w t (x), x else Update D t : D t+ (x) = wt+(x) x wt+(x), x Output: final hypothesis h fin = majority(h,..., h ) 4
COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 208 Review of Game heory he games we will discuss are two-player games that can be modeled by a game matrix
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013
COS 5: heoretical Machine Learning Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 203 Review of Zero-Sum Games At the end of last lecture, we discussed a model for two player games (call
More informationGame Theory, On-line Prediction and Boosting
roceedings of the Ninth Annual Conference on Computational Learning heory, 996. Game heory, On-line rediction and Boosting Yoav Freund Robert E. Schapire A& Laboratories 600 Mountain Avenue Murray Hill,
More informationTheory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!
Theory and Applications of A Repeated Game Playing Algorithm Rob Schapire Princeton University [currently visiting Yahoo! Research] Learning Is (Often) Just a Game some learning problems: learn from training
More informationLecture 8. Instructor: Haipeng Luo
Lecture 8 Instructor: Haipeng Luo Boosting and AdaBoost In this lecture we discuss the connection between boosting and online learning. Boosting is not only one of the most fundamental theories in machine
More informationLearning, Games, and Networks
Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationTotally Corrective Boosting Algorithms that Maximize the Margin
Totally Corrective Boosting Algorithms that Maximize the Margin Manfred K. Warmuth 1 Jun Liao 1 Gunnar Rätsch 2 1 University of California, Santa Cruz 2 Friedrich Miescher Laboratory, Tübingen, Germany
More informationAgnostic Online learnability
Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationComputational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*
Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture
More informationLittlestone s Dimension and Online Learnability
Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David
More informationGambling in a rigged casino: The adversarial multi-armed bandit problem
Gambling in a rigged casino: The adversarial multi-armed bandit problem Peter Auer Institute for Theoretical Computer Science University of Technology Graz A-8010 Graz (Austria) pauer@igi.tu-graz.ac.at
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationGame Theory, On-line Prediction and Boosting
Game Theory, On-line Prediction and Boosting Yoav Freund Robert E. Schapire AT&T Laboratories 600 Mountain Avenue Murray Hill, NJ 07974-0636 {yoav, schapire}~research. att.com Abstract We study the close
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationManfred K. Warmuth - UCSC S.V.N. Vishwanathan - Purdue & Microsoft Research. Updated: March 23, Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62
Updated: March 23, 2010 Warmuth (UCSC) ICML 09 Boosting Tutorial 1 / 62 ICML 2009 Tutorial Survey of Boosting from an Optimization Perspective Part I: Entropy Regularized LPBoost Part II: Boosting from
More informationCS261: A Second Course in Algorithms Lecture #12: Applications of Multiplicative Weights to Games and Linear Programs
CS26: A Second Course in Algorithms Lecture #2: Applications of Multiplicative Weights to Games and Linear Programs Tim Roughgarden February, 206 Extensions of the Multiplicative Weights Guarantee Last
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationPolynomial time Prediction Strategy with almost Optimal Mistake Probability
Polynomial time Prediction Strategy with almost Optimal Mistake Probability Nader H. Bshouty Department of Computer Science Technion, 32000 Haifa, Israel bshouty@cs.technion.ac.il Abstract We give the
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationMotivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory
Instructor: Shengyu Zhang 1 LP Motivating examples Introduction to algorithms Simplex algorithm On a particular example General algorithm Duality An application to game theory 2 Example 1: profit maximization
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More information10.1 The Formal Model
67577 Intro. to Machine Learning Fall semester, 2008/9 Lecture 10: The Formal (PAC) Learning Model Lecturer: Amnon Shashua Scribe: Amnon Shashua 1 We have see so far algorithms that explicitly estimate
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationLecture 25 of 42. PAC Learning, VC Dimension, and Mistake Bounds
Lecture 25 of 42 PAC Learning, VC Dimension, and Mistake Bounds Thursday, 15 March 2007 William H. Hsu, KSU http://www.kddresearch.org/courses/spring2007/cis732 Readings: Sections 7.4.17.4.3, 7.5.17.5.3,
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More informationThe Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm
he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationCS446: Machine Learning Spring Problem Set 4
CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationThe Boosting Approach to. Machine Learning. Maria-Florina Balcan 10/31/2016
The Boosting Approach to Machine Learning Maria-Florina Balcan 10/31/2016 Boosting General method for improving the accuracy of any given learning algorithm. Works by creating a series of challenge datasets
More informationLectures 6, 7 and part of 8
Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationIntroduction to Machine Learning
Introduction to Machine Learning PAC Learning and VC Dimension Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE
More informationGame Theory: Lecture 3
Game Theory: Lecture 3 Lecturer: Pingzhong Tang Topic: Mixed strategy Scribe: Yuan Deng March 16, 2015 Definition 1 (Mixed strategy). A mixed strategy S i : A i [0, 1] assigns a probability S i ( ) 0 to
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationGeneralization bounds
Advanced Course in Machine Learning pring 200 Generalization bounds Handouts are jointly prepared by hie Mannor and hai halev-hwartz he problem of characterizing learnability is the most basic question
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16
600.463 Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16 25.1 Introduction Today we re going to talk about machine learning, but from an
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More information1 Review of Zero-Sum Games
COS 5: heoreical Machine Learning Lecurer: Rob Schapire Lecure #23 Scribe: Eugene Brevdo April 30, 2008 Review of Zero-Sum Games Las ime we inroduced a mahemaical model for wo player zero-sum games. Any
More informationGame Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin
Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij
More information1 Primals and Duals: Zero Sum Games
CS 124 Section #11 Zero Sum Games; NP Completeness 4/15/17 1 Primals and Duals: Zero Sum Games We can represent various situations of conflict in life in terms of matrix games. For example, the game shown
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationAlgorithmic Game Theory and Applications. Lecture 4: 2-player zero-sum games, and the Minimax Theorem
Algorithmic Game Theory and Applications Lecture 4: 2-player zero-sum games, and the Minimax Theorem Kousha Etessami 2-person zero-sum games A finite 2-person zero-sum (2p-zs) strategic game Γ, is a strategic
More informationZero-Sum Games Public Strategies Minimax Theorem and Nash Equilibria Appendix. Zero-Sum Games. Algorithmic Game Theory.
Public Strategies Minimax Theorem and Nash Equilibria Appendix 2013 Public Strategies Minimax Theorem and Nash Equilibria Appendix Definition Definition A zero-sum game is a strategic game, in which for
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 12 Scribe: Indraneel Mukherjee March 12, 2008 In the previous lecture, e ere introduced to the SVM algorithm and its basic motivation
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 12: Weak Learnability and the l 1 margin Converse to Scale-Sensitive Learning Stability Convex-Lipschitz-Bounded Problems
More informationMachine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.
10-601 Machine Learning, Midterm Exam: Spring 2008 Please put your name on this cover sheet If you need more room to work out your answer to a question, use the back of the page and clearly mark on the
More informationAdaptive Game Playing Using Multiplicative Weights
Games and Economic Behavior 29, 79 03 (999 Article ID game.999.0738, available online at http://www.idealibrary.com on Adaptive Game Playing Using Multiplicative Weights Yoav Freund and Robert E. Schapire
More informationCS260: Machine Learning Theory Lecture 12: No Regret and the Minimax Theorem of Game Theory November 2, 2011
CS260: Machine Learning heory Lecture 2: No Regret and the Minimax heorem of Game heory November 2, 20 Lecturer: Jennifer Wortman Vaughan Regret Bound for Randomized Weighted Majority In the last class,
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationBackground. Adaptive Filters and Machine Learning. Bootstrap. Combining models. Boosting and Bagging. Poltayev Rassulzhan
Adaptive Filters and Machine Learning Boosting and Bagging Background Poltayev Rassulzhan rasulzhan@gmail.com Resampling Bootstrap We are using training set and different subsets in order to validate results
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationClassification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13
Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationQuestion 1. (p p) (x(p, w ) x(p, w)) 0. with strict inequality if x(p, w) x(p, w ).
University of California, Davis Date: August 24, 2017 Department of Economics Time: 5 hours Microeconomics Reading Time: 20 minutes PRELIMINARY EXAMINATION FOR THE Ph.D. DEGREE Please answer any three
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationThe AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m
) Set W () i The AdaBoost algorithm =1/n for i =1,...,n 1) At the m th iteration we find (any) classifier h(x; ˆθ m ) for which the weighted classification error m m =.5 1 n W (m 1) i y i h(x i ; 2 ˆθ
More informationComputational Learning Theory
0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationMistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron
Stat 928: Statistical Learning Theory Lecture: 18 Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron Instructor: Sham Kakade 1 Introduction This course will be divided into 2 parts.
More informationThe Power of Random Counterexamples
Proceedings of Machine Learning Research 76:1 14, 2017 Algorithmic Learning Theory 2017 The Power of Random Counterexamples Dana Angluin DANA.ANGLUIN@YALE.EDU and Tyler Dohrn TYLER.DOHRN@YALE.EDU Department
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationCS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria
CS364A: Algorithmic Game Theory Lecture #13: Potential Games; A Hierarchy of Equilibria Tim Roughgarden November 4, 2013 Last lecture we proved that every pure Nash equilibrium of an atomic selfish routing
More informationAdvanced Machine Learning
Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal
More informationA simple algorithmic explanation for the concentration of measure phenomenon
A simple algorithmic explanation for the concentration of measure phenomenon Igor C. Oliveira October 10, 014 Abstract We give an elementary algorithmic argument that sheds light on the concentration of
More informationComputing Minmax; Dominance
Computing Minmax; Dominance CPSC 532A Lecture 5 Computing Minmax; Dominance CPSC 532A Lecture 5, Slide 1 Lecture Overview 1 Recap 2 Linear Programming 3 Computational Problems Involving Maxmin 4 Domination
More informationTopic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar
15-859(M): Randomized Algorithms Lecturer: Anuam Guta Toic: Lower Bounds on Randomized Algorithms Date: Setember 22, 2004 Scribe: Srinath Sridhar 4.1 Introduction In this lecture, we will first consider
More informationLecture 6 Lecturer: Shaddin Dughmi Scribes: Omkar Thakoor & Umang Gupta
CSCI699: opics in Learning & Game heory Lecture 6 Lecturer: Shaddin Dughmi Scribes: Omkar hakoor & Umang Gupta No regret learning in zero-sum games Definition. A 2-player game of complete information is
More informationOnline Learning Summer School Copenhagen 2015 Lecture 1
Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture
More informationi=1 = H t 1 (x) + α t h t (x)
AdaBoost AdaBoost, which stands for ``Adaptive Boosting", is an ensemble learning algorithm that uses the boosting paradigm []. We will discuss AdaBoost for binary classification. That is, we assume that
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationComputational Learning Theory (COLT)
Computational Learning Theory (COLT) Goals: Theoretical characterization of 1 Difficulty of machine learning problems Under what conditions is learning possible and impossible? 2 Capabilities of machine
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationAlgorithms, Games, and Networks January 17, Lecture 2
Algorithms, Games, and Networks January 17, 2013 Lecturer: Avrim Blum Lecture 2 Scribe: Aleksandr Kazachkov 1 Readings for today s lecture Today s topic is online learning, regret minimization, and minimax
More informationLecture 5: Efficient PAC Learning. 1 Consistent Learning: a Bound on Sample Complexity
Universität zu Lübeck Institut für Theoretische Informatik Lecture notes on Knowledge-Based and Learning Systems by Maciej Liśkiewicz Lecture 5: Efficient PAC Learning 1 Consistent Learning: a Bound on
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationDeterministic Calibration and Nash Equilibrium
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2-2008 Deterministic Calibration and Nash Equilibrium Sham M. Kakade Dean P. Foster University of Pennsylvania Follow
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 8: Boosting (and Compression Schemes) Boosting the Error If we have an efficient learning algorithm that for any distribution
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationOptimal and Adaptive Algorithms for Online Boosting
Optimal and Adaptive Algorithms for Online Boosting Alina Beygelzimer 1 Satyen Kale 1 Haipeng Luo 2 1 Yahoo! Labs, NYC 2 Computer Science Department, Princeton University Jul 8, 2015 Boosting: An Example
More informationComputational Learning Theory
Computational Learning Theory Slides by and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5) Computational Learning Theory Inductive learning: given the training set, a learning algorithm
More informationAlgorithmic Game Theory and Applications. Lecture 7: The LP Duality Theorem
Algorithmic Game Theory and Applications Lecture 7: The LP Duality Theorem Kousha Etessami recall LP s in Primal Form 1 Maximize c 1 x 1 + c 2 x 2 +... + c n x n a 1,1 x 1 + a 1,2 x 2 +... + a 1,n x n
More informationOptimization 4. GAME THEORY
Optimization GAME THEORY DPK Easter Term Saddle points of two-person zero-sum games We consider a game with two players Player I can choose one of m strategies, indexed by i =,, m and Player II can choose
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationA Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997
A Tutorial on Computational Learning Theory Presented at Genetic Programming 1997 Stanford University, July 1997 Vasant Honavar Artificial Intelligence Research Laboratory Department of Computer Science
More information16.1 L.P. Duality Applied to the Minimax Theorem
CS787: Advanced Algorithms Scribe: David Malec and Xiaoyong Chai Lecturer: Shuchi Chawla Topic: Minimax Theorem and Semi-Definite Programming Date: October 22 2007 In this lecture, we first conclude our
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationCS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm
CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the
More informationInterval values for strategic games in which players cooperate
Interval values for strategic games in which players cooperate Luisa Carpente 1 Balbina Casas-Méndez 2 Ignacio García-Jurado 2 Anne van den Nouweland 3 September 22, 2005 Abstract In this paper we propose
More informationComputational Learning Theory
1 Computational Learning Theory 2 Computational learning theory Introduction Is it possible to identify classes of learning problems that are inherently easy or difficult? Can we characterize the number
More informationVoting (Ensemble Methods)
1 2 Voting (Ensemble Methods) Instead of learning a single classifier, learn many weak classifiers that are good at different parts of the data Output class: (Weighted) vote of each classifier Classifiers
More information