Learning with Large Number of Experts: Component Hedge Algorithm

Size: px
Start display at page:

Download "Learning with Large Number of Experts: Component Hedge Algorithm"

Transcription

1 Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, / 3

2 Learning with Large Number of Experts Regret of RWM is O( T ln N). Informative even for very large number of experts. What if there is overlap between experts? RWM with path experts FPL with path experts can we do better? [Littlestone and Warmuth, 1989; Kalai and Vempala, 24] 2 / 3

3 Better Bounds in Structured Case? Can overlap between experts lead to better regret guarantees? What are the lower bounds in the structured setting? Computationally efficient solutions that realize these bounds? 3 / 3

4 Outline Learning Scenario Component Hedge Algorithm Regret Bounds Applications & Lower Bounds Conclusion & Open Problems 4 / 3

5 Learning Scenario Assumptions: Goal: Structured concept class C {, 1} d Composed of components: C t indicates which components are used for each trial t. Additive loss l t incurred at each trial t. Loss of each concept C is C l t M := max C C C minimize expected regret after T trials T T R T = E[C t ] l t min C l t C C t=1 t=1 5 / 3

6 Component Hedge Algorithm [Koolen, Warmuth, and Kivinen, 21 ] CH maintains weights w t conv(c) [, 1] d over the components at each round t. Update: 1 weights: ŵ t i = wi t 1 e ηlt i 2 relative entropy projection: w t := argmin w conv(c) (w ŵ t ) where (w v) = d i=1 (w i ln w i v i + v i w i ) 6 / 3

7 Component Hedge Algorithm Prediction: 1 Decomposition of weights: w t = C C α C C where α is a distribution over C 2 Sample C t+1 according to α 7 / 3

8 Efficiency Need efficient implementation of: Decomposition (not unique) of weights over the concepts Entropy projection step (convex problem) Sufficient: conv(c) described by polynomial in d constraints 8 / 3

9 Regret Bounds Theorem: Regret Bounds for CH Let l = min C C C (l l T ) be the loss of the best concept in hindsight, then by choosing η = R T 2l M ln(d/m) + M ln(d/m) 2M ln(d/m) l Since l MT, regret R T O(M T ln d). Matching lower bounds in applications. 9 / 3

10 Comparison of CH, RWM and FPL 1 CH has significantly better regret bounds: CH: R T O(M T ln d). RWM: R T O(M MT ln d) FPL: R T O(M dt ln d) 2 CH is optimal w.r.t. regret bounds while RWM and FPL are not optimal. 3 Standard expert setting (no structure): CH, RWM and FPL reduce to the same algorithm. 1 / 3

11 Applications On-line shortest path problems. On-line PCA (k-sets). On-line ranking (k-permutations). Spanning trees. 11 / 3

12 On-line Shortest Path Problem (SPP) G = (V, E) is a directed graph. s is the source and t is the destination. Each s t path is an expert. The loss is additive over edges. 12 / 3

13 Unit Flow Polytope Convex hull of paths cannot be captured by linear constraints Unit flow polytope relaxation is used: w u,v, (u, v) E w s,v = 1 v V v V w v,u = v V w u,v, u V Relaxation does not hurt regret bounds. 13 / 3

14 Example of Unit Flow Polytope / 3

15 Entropy Projection on Unit Flow Polytope min w (u,v) E w u,v ln w u,v ŵ u,v + ŵ u,v w u,v subject to: w u,v, (u, v) E w s,v = 1 v V v V w v,u = v V w u,v, u V 15 / 3

16 Dual problem max {λ s } ŵ u,v e λ u λ v λ (u,v) E No constraints. Only V variables. Primal solution: w u,v = ŵ u,v e λ u λ v 16 / 3

17 Convex Decomposition 1 Find any non-zero path from s to t. 2 Subtract the smallest weight from each edge. 3 Repeat until no path is found. = At most E iteration is needed. 17 / 3

18 Example of Convex Decomposition / 3

19 Example of Convex Decomposition / 3

20 Example of Convex Decomposition / 3

21 Example of Convex Decomposition / 3

22 Example of Convex Decomposition / 3

23 Regret Bounds for SPP Expected regret is bounded by 2 l k ln V + 2k ln V O(M T ln V ) Bound holds for arbitrary graphs. 23 / 3

24 Lower Bounds Any algorithm can be forced to have expected regret l k ln V k Idea of the proof: Minimize the overlap. Create V /k disjoint paths of length k. Apply lower bounds for standard expert setting. 24 / 3

25 Conclusions Regret of CH is often better than that of RWM or FPL in structured setting. Regret of CH often matches lower bounds in applications. Efficient solutions exist for a wide range of applications: on-line shortest path, on-line PCA, on-line ranking, spanning trees. 25 / 3

26 References Wouter Koolen, Manfred K. Warmuth, and Jyrki Kivinen. Hedging Structured Concepts. In COLT (21). Nick Littlestone and Manfred K. Warmuth. The Weighted Majority Algorithm. FOCS 1989: Adam Kalai and Santosh Vempala. Efficient algorithms for online decision problems. J. Comput. Syst. Sci., 23. Eiji Takimoto and Manfred K. Warmuth. Path kernels and multiplicative updates. JMLR, 4:773818, 23. T. van Erven, W. Kotlowski, and M. K. Warmuth. Follow the leader with dropout perturbations. In COLT, 214. Nicolo Cesa-Bianchi and Gabor Lugosi. Combinatorial bandits. In Proceedings of the 22nd Annual Conference on Learning Theory, / 3

27 Regret Bounds Theorem: Regret Bounds for CH Let l = min C C C (l l T ) be the loss of the best concept in hindsight, then by choosing η = R T 2l M ln(d/m) + M ln(d/m) 2M ln(d/m) l 27 / 3

28 Proof of CH Regret Bound 1 Bound: (1 e η )w t 1 l t (C w t 1 ) (C w t )+ηc l t. 1 e ηx (1 e η )x Generalized Pythagorean Theorem 2 Sum over trials t: (1 e η ) T t=1 w t 1 l t (C w ) (C w T ) + ηc l T where l T = l l T. 3 Use w t 1 l t = E[C t ] l t : T t=1 E[C t ] l t (C w ) (C w T )+ηc l T (1 e η ) 28 / 3

29 Proof of CH Regret Bound 4 w assumes uniform distribution over concepts wi = M d = (C w ) = M ln( d M ) 5 let l best concept in hind-sight and choosing 2M ln( η = d M ) l = Regret bound R T. 29 / 3

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Hedging Structured Concepts

Hedging Structured Concepts Hedging Structured Concepts Wouter M. Koolen Advanced Systems Research Centrum Wiskunde en Informatica wmkoolen@cwi.nl Manfred K. Warmuth Department of Computer Science UC Santa Cruz manfred@cse.ucsc.edu

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Online Dynamic Programming

Online Dynamic Programming Online Dynamic Programming Holakou Rahmanian Department of Computer Science University of California Santa Cruz Santa Cruz, CA 956 holakou@ucsc.edu S.V.N. Vishwanathan Department of Computer Science University

More information

Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension

Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension Journal of Machine Learning Research 9 (2008) 2287-2320 Submitted 9/07; Published 10/08 Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension Manfred K. Warmuth Dima

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Learning Permutations with Exponential Weights

Learning Permutations with Exponential Weights Learning Permutations with Exponential Weights David P. Helmbold and Manfred K. Warmuth Computer Science Department University of California, Santa Cruz {dph manfred}@cse.ucsc.edu Abstract. We give an

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

Unified Algorithms for Online Learning and Competitive Analysis

Unified Algorithms for Online Learning and Competitive Analysis JMLR: Workshop and Conference Proceedings vol 202 9 25th Annual Conference on Learning Theory Unified Algorithms for Online Learning and Competitive Analysis Niv Buchbinder Computer Science Department,

More information

Minimax Fixed-Design Linear Regression

Minimax Fixed-Design Linear Regression JMLR: Workshop and Conference Proceedings vol 40:1 14, 2015 Mini Fixed-Design Linear Regression Peter L. Bartlett University of California at Berkeley and Queensland University of Technology Wouter M.

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel

More information

On-Line Learning with Path Experts and Non-Additive Losses

On-Line Learning with Path Experts and Non-Additive Losses On-Line Learning with Path Experts and Non-Additive Losses Joint work with Corinna Cortes (Google Research) Vitaly Kuznetsov (Courant Institute) Manfred Warmuth (UC Santa Cruz) MEHRYAR MOHRI MOHRI@ COURANT

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Online Forest Density Estimation

Online Forest Density Estimation Online Forest Density Estimation Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr UAI 16 1 Outline 1 Probabilistic Graphical Models 2 Online Density Estimation 3 Online Forest Density

More information

The Free Matrix Lunch

The Free Matrix Lunch The Free Matrix Lunch Wouter M. Koolen Wojciech Kot lowski Manfred K. Warmuth Tuesday 24 th April, 2012 Koolen, Kot lowski, Warmuth (RHUL) The Free Matrix Lunch Tuesday 24 th April, 2012 1 / 26 Introduction

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

Move from Perturbed scheme to exponential weighting average

Move from Perturbed scheme to exponential weighting average Move from Perturbed scheme to exponential weighting average Chunyang Xiao Abstract In an online decision problem, one makes decisions often with a pool of decisions sequence called experts but without

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Combinatorial Bandits

Combinatorial Bandits Combinatorial Bandits Nicolò Cesa-Bianchi Università degli Studi di Milano, Italy Gábor Lugosi 1 ICREA and Pompeu Fabra University, Spain Abstract We study sequential prediction problems in which, at each

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Time Series Prediction & Online Learning

Time Series Prediction & Online Learning Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.

More information

Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits

Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Journal of Machine Learning Research 17 (2016 1-21 Submitted 3/15; Revised 7/16; Published 8/16 Importance Weighting Without Importance Weights: An Efficient Algorithm for Combinatorial Semi-Bandits Gergely

More information

Minimax Policies for Combinatorial Prediction Games

Minimax Policies for Combinatorial Prediction Games Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica

More information

Minimax strategy for prediction with expert advice under stochastic assumptions

Minimax strategy for prediction with expert advice under stochastic assumptions Minimax strategy for prediction ith expert advice under stochastic assumptions Wojciech Kotłosi Poznań University of Technology, Poland otlosi@cs.put.poznan.pl Abstract We consider the setting of prediction

More information

Online Learning of Probabilistic Graphical Models

Online Learning of Probabilistic Graphical Models 1/34 Online Learning of Probabilistic Graphical Models Frédéric Koriche CRIL - CNRS UMR 8188, Univ. Artois koriche@cril.fr CRIL-U Nankin 2016 Probabilistic Graphical Models 2/34 Outline 1 Probabilistic

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Regret in Online Combinatorial Optimization

Regret in Online Combinatorial Optimization Regret in Online Combinatorial Optimization Jean-Yves Audibert Imagine, Université Paris Est, and Sierra, CNRS/ENS/INRIA audibert@imagine.enpc.fr Sébastien Bubeck Department of Operations Research and

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations omáš Kocák Gergely Neu Michal Valko Rémi Munos SequeL team, INRIA Lille Nord Europe, France {tomas.kocak,gergely.neu,michal.valko,remi.munos}@inria.fr

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information

Yevgeny Seldin. University of Copenhagen

Yevgeny Seldin. University of Copenhagen Yevgeny Seldin University of Copenhagen Classical (Batch) Machine Learning Collect Data Data Assumption The samples are independent identically distributed (i.i.d.) Machine Learning Prediction rule New

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Online Aggregation of Unbounded Signed Losses Using Shifting Experts

Online Aggregation of Unbounded Signed Losses Using Shifting Experts Proceedings of Machine Learning Research 60: 5, 207 Conformal and Probabilistic Prediction and Applications Online Aggregation of Unbounded Signed Losses Using Shifting Experts Vladimir V. V yugin Institute

More information

Online Submodular Minimization

Online Submodular Minimization Journal of Machine Learning Research 13 (2012) 2903-2922 Submitted 12/11; Revised 7/12; Published 10/12 Online Submodular Minimization Elad Hazan echnion - Israel Inst. of ech. echnion City Haifa, 32000,

More information

Perceptron Mistake Bounds

Perceptron Mistake Bounds Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce

More information

Putting Bayes to sleep

Putting Bayes to sleep Putting Bayes to sleep Wouter M. Koolen Dmitry Adamskiy Manfred Warmuth Thursday 30 th August, 2012 Menu How a beautiful and intriguing piece of technology provides new insights in existing methods and

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes

Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes Improved Bounds for Online Learning Over the Permutahedron and Other Ranking Polytopes Nir Ailon Department of Computer Science, echnion II, Haifa, Israel nailon@cs.technion.ac.il Abstract Consider the

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

No-regret algorithms for structured prediction problems

No-regret algorithms for structured prediction problems No-regret algorithms for structured prediction problems Geoffrey J. Gordon December 21, 2005 CMU-CALD-05-112 School of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Abstract No-regret

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Defensive forecasting for optimal prediction with expert advice

Defensive forecasting for optimal prediction with expert advice Defensive forecasting for optimal prediction with expert advice Vladimir Vovk $25 Peter Paul $0 $50 Peter $0 Paul $100 The Game-Theoretic Probability and Finance Project Working Paper #20 August 11, 2007

More information

Multitask Learning With Expert Advice

Multitask Learning With Expert Advice University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1-28-2007 Multitask Learning With Expert Advice Jacob D. Abernethy University of California - Berkeley Peter Bartlett

More information

Logarithmic Regret Algorithms for Online Convex Optimization

Logarithmic Regret Algorithms for Online Convex Optimization Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality

CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality CS264: Beyond Worst-Case Analysis Lecture #20: From Unknown Input Distributions to Instance Optimality Tim Roughgarden December 3, 2014 1 Preamble This lecture closes the loop on the course s circle of

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

CS281B/Stat241B. Statistical Learning Theory. Lecture 14. CS281B/Stat241B. Statistical Learning Theory. Lecture 14. Wouter M. Koolen Convex losses Exp-concave losses Mixable losses The gradient trick Specialists 1 Menu Today we solve new online learning problems

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning and Games MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Normal form games Nash equilibrium von Neumann s minimax theorem Correlated equilibrium Internal

More information

Online Prediction: Bayes versus Experts

Online Prediction: Bayes versus Experts Marcus Hutter - 1 - Online Prediction Bayes versus Experts Online Prediction: Bayes versus Experts Marcus Hutter Istituto Dalle Molle di Studi sull Intelligenza Artificiale IDSIA, Galleria 2, CH-6928 Manno-Lugano,

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

arxiv: v2 [cs.lg] 19 Oct 2018

arxiv: v2 [cs.lg] 19 Oct 2018 Learning in Non-convex Games with an Optimization Oracle arxiv:1810.07362v2 [cs.lg] 19 Oct 2018 Alon Gonen Elad Hazan Abstract We consider adversarial online learning in a non-convex setting under the

More information

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting

On Minimaxity of Follow the Leader Strategy in the Stochastic Setting On Minimaxity of Follow the Leader Strategy in the Stochastic Setting Wojciech Kot lowsi Poznań University of Technology, Poland wotlowsi@cs.put.poznan.pl Abstract. We consider the setting of prediction

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Regret to the Best vs. Regret to the Average

Regret to the Best vs. Regret to the Average Regret to the Best vs. Regret to the Average Eyal Even-Dar 1, Michael Kearns 1, Yishay Mansour 2, Jennifer Wortman 1 1 Department of Computer and Information Science, University of Pennsylvania 2 School

More information

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations

UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations UC Santa Cruz UC Santa Cruz Electronic Theses and Dissertations Title Optimal Online Learning with Matrix Parameters Permalink https://escholarship.org/uc/item/7xf477q3 Author Nie, Jiazhong Publication

More information

Online Sparse Linear Regression

Online Sparse Linear Regression JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU

More information

Regret Minimization for Branching Experts

Regret Minimization for Branching Experts JMLR: Workshop and Conference Proceedings vol 30 (2013) 1 21 Regret Minimization for Branching Experts Eyal Gofer Tel Aviv University, Tel Aviv, Israel Nicolò Cesa-Bianchi Università degli Studi di Milano,

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

New Algorithms for Contextual Bandits

New Algorithms for Contextual Bandits New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised

More information

The on-line shortest path problem under partial monitoring

The on-line shortest path problem under partial monitoring The on-line shortest path problem under partial monitoring András György Machine Learning Research Group Computer and Automation Research Institute Hungarian Academy of Sciences Kende u. 11-13, Budapest,

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs Instructor: Shaddin Dughmi Outline 1 Introduction 2 Shortest Path 3 Algorithms for Single-Source Shortest

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Continuous Experts and the Binning Algorithm

Continuous Experts and the Binning Algorithm Continuous Experts and the Binning Algorithm Jacob Abernethy 1, John Langford 1, and Manfred K. Warmuth,3 1 Toyota Technological Institute, Chicago University of California at Santa Cruz 3 Supported by

More information

Adaptive Online Prediction by Following the Perturbed Leader

Adaptive Online Prediction by Following the Perturbed Leader Journal of Machine Learning Research 6 (2005) 639 660 Submitted 10/04; Revised 3/05; Published 4/05 Adaptive Online Prediction by Following the Perturbed Leader Marcus Hutter Jan Poland IDSIA, Galleria

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Agnostic Online learnability

Agnostic Online learnability Technical Report TTIC-TR-2008-2 October 2008 Agnostic Online learnability Shai Shalev-Shwartz Toyota Technological Institute Chicago shai@tti-c.org ABSTRACT We study a fundamental question. What classes

More information

Online convex optimization in the bandit setting: gradient descent without a gradient

Online convex optimization in the bandit setting: gradient descent without a gradient Online convex optimization in the bandit setting: gradient descent without a gradient Abraham D. Flaxman Adam Tauman Kalai H. Brendan McMahan November 30, 2004 Abstract We study a general online convex

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Accelerating Online Convex Optimization via Adaptive Prediction

Accelerating Online Convex Optimization via Adaptive Prediction Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Learning rotations with little regret

Learning rotations with little regret Learning rotations with little regret Elad Hazan IBM Almaden 650 Harry Road San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301 Great America Parkway Santa Clara, CA 95054 skale@yahoo-inc.com

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Prediction by random-walk perturbation

Prediction by random-walk perturbation Prediction by rando-walk perturbation Luc Devroye School of Coputer Science McGill University Gábor Lugosi ICREA and Departent of Econoics Universitat Popeu Fabra lucdevroye@gail.co gabor.lugosi@gail.co

More information

The Matching Polytope: General graphs

The Matching Polytope: General graphs 8.433 Combinatorial Optimization The Matching Polytope: General graphs September 8 Lecturer: Santosh Vempala A matching M corresponds to a vector x M = (0, 0,,, 0...0) where x M e is iff e M and 0 if e

More information

Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization

Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization Jacob Abernethy Computer Science Division UC Berkeley jake@cs.berkeley.edu (eligible for best student paper award Elad Hazan

More information

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13

Classification. Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY. Slides adapted from Mohri. Jordan Boyd-Graber UMD Classification 1 / 13 Classification Jordan Boyd-Graber University of Maryland WEIGHTED MAJORITY Slides adapted from Mohri Jordan Boyd-Graber UMD Classification 1 / 13 Beyond Binary Classification Before we ve talked about

More information

Online Learning of Non-stationary Sequences

Online Learning of Non-stationary Sequences Online Learning of Non-stationary Sequences Claire Monteleoni and Tommi Jaakkola MIT Computer Science and Artificial Intelligence Laboratory 200 Technology Square Cambridge, MA 02139 {cmontel,tommi}@ai.mit.edu

More information