Lecture 2: Weighted Majority Algorithm

Similar documents
Lecture 16: Perceptron and Exponential Weights Algorithm

1 Review of Vertex Cover

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Online Convex Optimization

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 2: Convex Sets and Functions

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Conditional Gradient Method

CS 395T Computational Learning Theory. Scribe: Rahul Suri

Algorithms, Games, and Networks January 17, Lecture 2

Lecture 9: Low Rank Approximation

THE WEIGHTED MAJORITY ALGORITHM

LECTURE 3 LECTURE OUTLINE

Lecture 10. ( x domf. Remark 1 The domain of the conjugate function is given by

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

Optimization and Optimal Control in Banach Spaces

On a Problem of Alsina Again

Lecture 6: September 12

1 Basic Information Theory

Lecture : Lovász Theta Body. Introduction to hierarchies.

Linear Extensions of N-free Orders

= 1 2 x (x 1) + 1 {x} (1 {x}). [t] dt = 1 x (x 1) + O (1), [t] dt = 1 2 x2 + O (x), (where the error is not now zero when x is an integer.

Lecture 23: November 19

From Batch to Transductive Online Learning

Lecture 5: September 15

Lecture 2: Stable Polynomials

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture 24 Scribe: Sachin Ravi May 2, 2013

Bregman Divergence and Mirror Descent

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Introduction to Machine Learning

Online Learning, Mistake Bounds, Perceptron Algorithm

Lecture 14: Newton s Method

Lecture 22: Hyperplane Rounding for Max-Cut SDP

Lecture 9: SVD, Low Rank Approximation

Lecture 6: September 19

1 Review of the Perceptron Algorithm

Mistake Bound Model, Halving Algorithm, Linear Classifiers, & Perceptron

1 Ways to Describe a Stochastic Process

6.4 graphs OF logarithmic FUnCTIOnS

8.1 Polynomial Threshold Functions

Online Learning Summer School Copenhagen 2015 Lecture 1

2.1 Laplacian Variants

Online Learning with Experts & Multiplicative Weights Algorithms

A New Lower Bound Technique for Quantum Circuits without Ancillæ

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

Math 410 Linear Algebra Summer Session American River College

Learning Theory: Basic Guarantees

Computational Learning Theory. Definitions

Today s exercises. 5.17: Football Pools. 5.18: Cells of Line and Hyperplane Arrangements. Inclass: PPZ on the formula F

Exponential Weights on the Hypercube in Polynomial Time

1.4 FOUNDATIONS OF CONSTRAINED OPTIMIZATION

Machine Learning Basics III

Agnostic Online learnability

Lecture 15: October 15

Lecture 7: Dynamic Programming I: Optimal BSTs

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Reconstruction. Reading for this lecture: Lecture Notes.

The Uniform Hardcore Lemma via Approximate Bregman Projections

The Power of Random Counterexamples

Lecture 4: January 26

Lecture 3: September 10

A note on monotone real circuits

4. Convex Sets and (Quasi-)Concave Functions

Lecture 13: Spectral Graph Theory

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #24 Scribe: Jad Bechara May 2, 2018

Lecture 1: September 25, A quick reminder about random variables and convexity

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Lecture 16: Constraint Satisfaction Problems

Lecture 7: Schwartz-Zippel Lemma, Perfect Matching. 1.1 Polynomial Identity Testing and Schwartz-Zippel Lemma

Aditya Bhaskara CS 5968/6968, Lecture 1: Introduction and Review 12 January 2016

R u t c o r Research R e p o r t. Relations of Threshold and k-interval Boolean Functions. David Kronus a. RRR , April 2008

A Necessary Condition for Learning from Positive Examples

CSC 576: Gradient Descent Algorithms

COS 341: Discrete Mathematics

CS 4407 Algorithms Lecture: Shortest Path Algorithms

Design and Analysis of Algorithms

Lecture 4: Hashing and Streaming Algorithms

Limits and Their Properties

Math 273a: Optimization Subgradients of convex functions

Chapter 4 Page 1 of 16. Lecture Guide. Math College Algebra Chapter 4. to accompany. College Algebra by Julie Miller

Math 104: Homework 7 solutions

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Lecture 19: November 10

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Outline. Complexity Theory. Introduction. What is abduction? Motivation. Reference VU , SS Logic-Based Abduction

ROTH S THEOREM THE FOURIER ANALYTIC APPROACH NEIL LYALL

Homework 4 Solutions

Kernel Density Estimation

Section 7.2: One-to-One, Onto and Inverse Functions

Section 1.5 Formal definitions of limits

3.1 Decision Trees Are Less Expressive Than DNFs

Weighted Majority and the Online Learning Approach

CS261: Problem Set #3

CSE 431/531: Analysis of Algorithms. Dynamic Programming. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

Online Submodular Minimization

Universidad Carlos III de Madrid

IS 709/809: Computational Methods in IS Research Fall Exam Review

Tutorial: Device-independent random number generation. Roger Colbeck University of York

CSCI-6971 Lecture Notes: Probability theory

CS364A: Algorithmic Game Theory Lecture #16: Best-Response Dynamics

Transcription:

EECS 598-6: Prediction, Learning and Games Fall 3 Lecture : Weighted Majority Algorithm Lecturer: Jacob Abernethy Scribe: Petter Nilsson Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.. Course announcements A GSI will likely) be recruited to the course. For now, office hours are Tuesday pm-pm. Homework is about to be posted, due 9/3. Group work is encouraged for homework.. Predictive sorting revisited Lecture ended by studying the problem of predicting the outcomes of games played among k teams, where a game between two of these k teams is played in every round. Assume the outcomes y s, s =... t in t rounds have been observed for pairs of teams i, j ),..., i t, j t ). Further assume that there eists a perfect ranking π S k determining the outcomes y s of each game. Without loss of generality WLOG) assume π i s ) < π j s ) for s =,..., t. In round t, the pair i t, j t ) is observed. The Halving algorithm makes a prediction ŷ t of the outcome y t by comparing the number of members of the following sets: C < t := {π S k πi s ) < πj s ) s =,..., t and πi t ) < πj t )},.) C t > := {π S k πi s ) < πj s ) s =,..., t and πi t ) > πj t )}..) The prediction follows majority vote, i.e. { < if C < t C > t, ŷ t =.3) > otherwise. The naive way to determine ŷ t is to enumerate S k and eliminate permutations that are inconsistent with past outcomes. This is however not efficient since S k = k!. The problem of computing the quantitites C t < and C t > is in fact #P -hard. The outcomes can be recorded in a directed acyclic graph DAG), and the problem is equivalent to finding the number of topological orderings of that graph, which is known to be #P -complete []. Challenge Problem) Continuation from last lecture: Is the problem of determining the largest of these two sets also #P -complete, or is there an efficient algorithm?.3 Review of basics in Conve Analysis Definition. A function f : R n R is conve if and only if any of the following conditions hold: a) For all, y R n, f) + + y fy) f )..4) -

Lecture : Weighted Majority Algorithm - b) For all, y R n and α [, ], c) For all random variables X, Jensen s inequality is satisfied. α)f) + αfy) f α) + αy)..5) EfX) fex)).6) Remark. As seen in.6), Jensen s inequality can actually be used to define conveness. Eercise) Prove that.4)-.6) are equivalent. Proposition.3 A sufficient condition for f to be conve is that f,.7) where stands for positive semi-definiteness, i.e. f has non-negative eigenvalues. Proposition.4 If f is conve and differentiable, then for all, δ R n, f + δ) f ) + δ f )..8) Remark.5 Equation.8) could also be used to define conveness. However, in the case of non-differentiability the concept of subgradient is required..3. Approimations The following inequalities will be used in the course. The functions are plotted in Figure. for a visual interpretation of these inequalities. ) Upper bound on the eponential function obtained by linearizing e in : e +..9) ) Lower bound on the logarithm function, derived by using the log operator on.9): log + )..) ) This inequality follows from the conveity of fz) = e αz and is obtained by inserting and as evaluation points in.5): e α + e α ) for [, ]..) 3) Finally, a harder inequality: log ) + for [, /]..) Challenge Problem) Logarithm inequality: Prove the inequality.).

Lecture : Weighted Majority Algorithm -3 3.5 f) = e f) = +. f) = log + ) f) =.8 f).5 f).6.4..5.5.5 3.5 f) = e 3 f) = + e α )....4.6.8.. f) = log ) f) = +.5.8 f) f).6.4.5..5...4.6.8......3.4.5.6 Figure.: Plots of the functions in the inequalities.9) -.).

Lecture : Weighted Majority Algorithm -4.4 Back to Online Learning The drawback of the Halving algorithm presented last time is the strong assumption that a perfect epert eists. How can this assumption be removed? The answer is to use weights for the eperts, based on past performance. This leads to the Weighted Majority Algorithm WMA) outlined below. This algorithm was first proposed 994 by Littlestone and Warmuth in []. Algorithm : Weighted Majority Algorithm input: N eperts predicting the outcomes, a parameter ɛ >. wi for all i =,..., N weights initialized to for all eperts) for rounds t=,,... do fi t {, } obtain prediction of epert i, i =,..., N) ŷ t round i wt i f i t ) compute prediction of WMA) i wt i Outcome y t is revealed w t+ i w t i ɛ)[f t i y t ] epert weights are updated according to their predictions) end The function round : [, ] {, } above is defined as round) = { if [, /], if /, ]..3) Theorem.6 For all eperts i, it holds that M T WMA) log N ɛ + + ɛ)m T epert i),.4) where M T WMA) resp. M T epert i)) is the number of mistakes of the algorithm resp. epert i) up to round T. Proof Idea: The proof is similar to the that of the bound on the Halving algorithm, but instead of remaining number of eperts, total weight is used. Proof: Define total weight at time t as Evidently, we have that Furthermore, it holds that Φ t = N wi. t.5) i= Φ = N..6) Φ T w T i = ɛ) M T epert i),.7) since the weight of epert i will have been decreased M T epert i) times. Suppose that the algorithm makes a false prediction at time t. Then more than half of the weight Φ t must have been on eperts who made false predictions. Thus, their weight would decrease by a factor ɛ. This leads to the following result: Lemma.7 If the algorithm makes a mistake at time t, the following inequality holds: Φ t+ Φ t ɛ )..8)

Lecture : Weighted Majority Algorithm -5 Eercise) Prove this lemma rigorously. By repeatedly applying the lemma, Using.6),.7) and.9) one then obtains Φ T Φ ɛ )M T WMA)..9) ɛ) M T epert i) N ɛ )M T WMA)..) Apply the negative logarithm to get log ɛ) M T epert i) logn) + log ɛ ) }{{} ) M T WMA),.) }{{} ɛ + ɛ ɛ/ where the new inequalities result from using.) and.), respectively. By inserting these and rearranging the terms, the inequality stated in the theorem is derived..5 Net lecture The Weighted Majority Algorithm will be discussed further in the net lecture; a variant of the algorithm will be presented which gets rid of the factor in Theorem.6. References [] Brightwell, G., and Winkler, P. Counting linear etensions is #p-complete. In Proceedings of the twenty-third annual ACM symposium on Theory of computing New York, NY, USA, 99), STOC 9, ACM, pp. 75 8. [] Littlestone, N., and Warmuth, M. The weighted majority algorithm. Information and Computation 8, 994), 6. To use this inequality, ε must be less than.5 or actually.68)