Advanced Machine Learning

Similar documents
Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Advanced Machine Learning

Learning with Large Number of Experts: Component Hedge Algorithm

Perceptron Mistake Bounds

Advanced Machine Learning

Learning, Games, and Networks

Advanced Machine Learning

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Advanced Machine Learning

Exponential Weights on the Hypercube in Polynomial Time

New bounds on the price of bandit feedback for mistake-bounded online multiclass learning

From Bandits to Experts: A Tale of Domination and Independence

Online Submodular Minimization

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Efficient learning by implicit exploration in bandit problems with side observations

On the Generalization Ability of Online Strongly Convex Programming Algorithms

Agnostic Online learnability

arxiv: v2 [cs.lg] 19 Oct 2018

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Online Learning for Time Series Prediction

A survey: The convex optimization approach to regret minimization

On-Line Learning with Path Experts and Non-Additive Losses

Full-information Online Learning

Time Series Prediction & Online Learning

The Online Approach to Machine Learning

Defensive forecasting for optimal prediction with expert advice

Theory and Applications of A Repeated Game Playing Algorithm. Rob Schapire Princeton University [currently visiting Yahoo!

Minimax strategy for prediction with expert advice under stochastic assumptions

Game Theory, On-line Prediction and Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting

Introduction to Machine Learning Lecture 11. Mehryar Mohri Courant Institute and Google Research

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The No-Regret Framework for Online Learning

Online Forest Density Estimation

Online Learning with Feedback Graphs

Lecture 14: Approachability and regret minimization Ramesh Johari May 23, 2007

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Advanced Machine Learning

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Online Learning and Online Convex Optimization

Adaptive Online Prediction by Following the Perturbed Leader

Optimization for Machine Learning

Online Learning of Probabilistic Graphical Models

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Bandits for Online Optimization

From Batch to Transductive Online Learning

Online prediction with expert advise

The convex optimization approach to regret minimization

Online Submodular Minimization

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

The Price of Differential Privacy for Online Learning

Generalization Bounds for Online Learning Algorithms with Pairwise Loss Functions

Conditional Swap Regret and Conditional Correlated Equilibrium

Adaptive Sampling Under Low Noise Conditions 1

Applications of on-line prediction. in telecommunication problems

Online Learning Class 12, 20 March 2006 Andrea Caponnetto, Sanmay Das

Lecture 8. Instructor: Haipeng Luo

Adaptive Online Gradient Descent

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Optimization, Learning, and Games with Predictable Sequences

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

Gambling in a rigged casino: The adversarial multi-armed bandit problem

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

An Online Convex Optimization Approach to Blackwell s Approachability

Move from Perturbed scheme to exponential weighting average

Experts in a Markov Decision Process

Online Learning and Sequential Decision Making

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Lecture 3: Lower Bounds for Bandit Algorithms

Online combinatorial optimization with stochastic decision sets and adversarial losses

No-Regret Algorithms for Unconstrained Online Convex Optimization

Online Learning for Non-Stationary A/B Tests

Regret Minimization With Concept Drift

Online Learning, Mistake Bounds, Perceptron Algorithm

Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes

Learning for Contextual Bandits

Introduction to Machine Learning Lecture 13. Mehryar Mohri Courant Institute and Google Research

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Better Algorithms for Benign Bandits

Adaptive Game Playing Using Multiplicative Weights

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Deep Boosting. Joint work with Corinna Cortes (Google Research) Umar Syed (Google Research) COURANT INSTITUTE & GOOGLE RESEARCH.

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

No-regret algorithms for structured prediction problems

Online Aggregation of Unbounded Signed Losses Using Shifting Experts

Foundations of Machine Learning Multi-Class Classification. Mehryar Mohri Courant Institute and Google Research

Online Learning with Feedback Graphs

A simpler unified analysis of Budget Perceptrons

Convex Repeated Games and Fenchel Duality

Using Additive Expert Ensembles to Cope with Concept Drift

Ensemble Methods for Structured Prediction

Minimax Policies for Combinatorial Prediction Games

Lecture 4: Lower Bounds (ending); Thompson Sampling

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

Foundations of Machine Learning

Learning Hurdles for Sleeping Experts

Perceptron (Theory) + Linear Regression

CS 395T Computational Learning Theory. Scribe: Rahul Suri

Transcription:

Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH.

General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a tree. includes expert setting. sum of edge losses along a path. sum of other substructures losses in a discrete problem. page 2

FPL General linear decision problem: w t 2 W R N player selects,. x t 2 X R N player incurs loss, sup w x apple R. player receives,. Objective: minimize cumulative loss or regret. M(x) = argmin Notation:. w t x t X {x: kxk 1 apple X 1 } w2w w2w,x2x w x l 1 -diam(w) apple W 1 (Kalai and Vempala, 2004) page 3

FL Follow the Leader (FL): use play). at every round (aka fictitious FL problem: Suppose and consider a sequence 0 starting with and then alternating and. hen, 1/2 M N =2 FL incurs loss 1 at every round, overall. any single expert incurs loss overall. /2 ( 1 0 ) ( 0 1 ) page 4

FPL Algorithms Additive bound Follow the Perturbed Leader (FPL):. Multiplicative bound Follow the Perturbed Leader (FPL*): p t U([0, 1/ ] N ) w t = argmin w2w P t 1 s=1 w x s + w p t = M(x 1:t 1 + p t ). p t f(x) = 2 e kxk 1 Laplacian with density. w t = argmin w2w P t 1 s=1 w x s + w p t = M(x 1:t 1 + p t ). (Hannan 1957; Kalai and Vempala, 2004) page 5

FPL - Bound >0 heorem: fix. hen, the expected cumulative loss of additive FPL( ) is bounded as follows For = E[L ] apple L min + RX 1 + W 1. r W1 RX 1 E[L ] apple L min +2 p X 1 W 1 R. page 6

FPL* - Bound >0 heorem: fix and assume that. hen, the expected cumulative loss of (multiplicative) FPL*( /2X 1 ) is bounded as follows For =min q E[L ] apple L min +4 1/2X 1, L min q W, X R N + E[L ] apple (1 + )L min + 2X 1W 1 (1 + log N) W 1 (1 + log N)/X 1 L min X 1W 1 (1 + log N)+4X 1 W 1 (1 + log N).. page 7

Proof Outline Be the perturbed leader (BPL): w t = M(x 1:t + p t ). 1. Bound on regret of BPL: E[R (BPL)] apple W 1. 2. Bound on difference of regrets of FPL and BPL: E[M(x 1:t 1 + p 1 ) x t ] E[M(x 1:t + p 1 ) x t ]. 3. Difference of expectations small because similar distributions. page 8

Proof: BL Regret Lemma 1: P M(x 1:t) x t apple M(x 1: ) x 1:. Proof: case =1 is clear. By induction, X+1 M(x 1:t ) x t apple M(x 1: ) x 1: + M(x 1: +1 ) x +1 apple M(x 1: +1 ) x 1: + M(x 1: +1 ) x +1 (induction) (def. of M(x 1: ) as minimizer) = M(x 1: +1 ) x 1: +1. page 9

Proof: BPL Regret p 0 =0 Lemma 2: let. hen, the following holds: X X M(x 1:t + p t ) x t apple M(x 1: ) x 1: + W 1 kp t p t 1 k 1. hus, Proof: use Lemma 1 with x 0 t = x t + p t p t 1 X M(x 1:t + p t ) (x t + p t p t 1 ) apple M(x 1: + p ) (x 1: + p ) X M(x 1:t + p t ) x t apple M(x 1: ) x 1: + apple M(x 1: ) x 1: + W 1, then apple M(x 1: ) (x 1: + p ) X = M(x 1: ) x 1: + M(x 1: ) p t p t 1. X M(x1: ) M(x 1:t + p t ) p t p t 1 X p t p t 1 1. page 10

Proof: FPL vs. BPL Regrets p t = p 1 t>0 X M(x 1:t + p 1 ) x t apple M(x 1: ) x 1: + W 1 kp 1 k 1. Proof: for the expected loss, we can just choose all, which yields: hus, X E[M(x 1:t 1 + p 1 ) x t ] = apple X E[M(x 1:t 1 + p 1 ) x t ] E[M(x 1:t + p 1 ) x t ]+E[M(x 1:t + p 1 ) x t ] X h E[M(x 1:t 1 + p 1 ) x t ] E[M(x 1:t + p 1 ) x t ] i + L min + W 1 kp 1 k 1. for page 11

Proof: FPL By definition of the perturbation,. x 1:t + p 1 x 1:t 1 + p 1 Now, and both follow a uniform distribution over a cube. hus, wo cubes and overlap over at least the fraction : if but then for at least one i, most. kp 1 k 1 apple 1 E[M(x 1:t 1 + p 1 ) x t ] E[M(x 1:t + p 1 ) x t ] apple R(1 fraction of overlap). [0, 1/ ] N v +[0, 1/ ] N (1 kvk 1 ) x 2 [0, 1/ ] N x 62 v +[0, 1/ ] N x i 62 v i +[0, 1/ ] N v i 0 v i 1/ v i +1/, which has probability at v i mass page 12

Proof: FPL hus, E[M(x 1:t 1 + p 1 ) x t ] E[M(x 1:t + p 1 ) x t ] apple R kx t k 1 apple R X 1. And, E[R ] apple R X 1 + W 1. page 13

Proof: FPL* Lemma 3: E[M(x 1:t 1 + p 1 ) x t ] apple e X 1 E[M(x 1:t + p 1 ) x t ]. Proof: E[M(x 1:t 1 + p 1 ) x t ] Z = M(x 1:t 1 + u) x t dµ(u) R Z N = M(x 1:t + v) x t dµ(x t + v) (change of var. v = u + x t ) R Z N = M(x 1:t + v) x t e kx t+vk 1 kvk 1 {z } d(v) R N applee X 1 apple e X 1 E[M(x 1:t + p 1 ) x t ]. page 14

Proof: FPL* apple 1/X 1 For,, thus, X E[M(x 1:t 1 + p 1 ) x t ] apple hus, h i E[kp 1 k 1 ]=E max p 1,i i2[1,n] e X 1 apple (1 + 2 X 1 ) = apple apple 2 =2 X (1 + 2 X 1 )E[M(x 1:t + p 1 ) x t ] X Z +1 0 Z +1 0 Z u 0 apple 2u + N (1 + 2 X 1 )(L min + W 1 E[kp 1 k 1 ]). h i Pr max p 1,i >t dt i2[1,n] h i Pr max p 1,i >t dt i2[1,n] h i Pr max p 1,i >t dt + i2[1,n] Z +1 u =2u + N e u Pr apple h i p 1,1 >t dt 2(1 + log N) Z +1 u h i Pr max p 1,i >t dt i2[1,n] (best choice of u). page 15

Expert Setting W 1 =1X 1 = N R =1,, and ; for FLP*( ), E[L ] apple (1 + 2N )L min + 2(1+log(N). More favorable bound: x t! x t,1 e 1...x t,n e N. new L min N = old L min. E[L old ] apple E[L new N]. new guarantee: for FLP*( ), E[L ] apple (1 + 2 )L min + 2(1+log(N)). E[R ] apple 2 p 2L min (1 + log(n)). page 16

RWM = FPL Let FPL( ) be an instance of the general FPL algorithm with a perturbation defined by apple log( log(u1 )) p 1 =,..., log( log(u > N )), where u j is drawn according to the uniform distribution over [0, 1]. hen, FPL( ) and RWM( ) coincide. page 17

References Nicolò Cesa-Bianchi, Alex Conconi, Claudio Gentile: On the Generalization Ability of On-Line Learning Algorithms. IEEE ransactions on Information heory 50(9): 2050-2057. 2004. Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, learning, and games. Cambridge University Press, 2006. Yoav Freund and Robert Schapire. Large margin classification using the perceptron algorithm. In Proceedings of COL 1998. ACM Press, 1998. Adam. Kalai, Santosh Vempala. Efficient algorithms for online decision problems. J. Comput. Syst. Sci. 71(3): 291-307. 2005. Nick Littlestone. From On-Line to Batch Learning. COL 1989: 269-284. Nick Littlestone. "Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm" Machine Learning 285-318(2). 1988. page 18

References Nick Littlestone, Manfred K. Warmuth: he Weighted Majority Algorithm. FOCS 1989: 256-261. om Mitchell. Machine Learning, McGraw Hill, 1997. Novikoff, A. B. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical heory of Automata, 12, 615-622. Polytechnic Institute of Brooklyn. page 19