Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Size: px
Start display at page:

Download "Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm"

Transcription

1 Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University Jun 11, 2013 J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

2 Setup Setting is learning from experts: n experts, T rounds For t = 1,...,T : Learner chooses distribution w t n over the experts Nature reveals losses z t [ 1,1] n of the experts Learner suffers loss w t z t Goal: minimize Regret = where i is the best fixed expert. T t=1 w t z t T t=1 z t,i, Typical algorithm: multiplicative weights (aka exponentiated gradient): w t+1,i w t,i exp( ηz t,i ). J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

3 Outline Compare two variants of the multiplicative weights (exponentiated gradient) algorithm Understand the difference through lens of adaptive mirror descent (Orabona et al., 2013) Combine with machinery of optimistic updates (Rakhlin & Sridharan, 2012) to beat best existing bounds. J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

4 Two Types of Updates In literature, two similar but different updates (Kivinen & Warmuth, 1997; Cesa-Bianchi et al., 2007): w t+1,i w t,i exp( ηz t,i ) w t+1,i w t,i (1 ηz t,i ) (MW1) (MW2) The regret is bounded as Regret log(n) η Regret log(n) η + η + η T t=1 z t 2 T z 2 t,i t=1 (Regret:MW1) (Regret:MW2) If best expert i has loss close to zero, then second bound better than first. Gap can be Θ( T) (in actual performance, not just upper bounds). J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

5 Two Types of Updates In literature, two similar but different updates (Kivinen & Warmuth, 1997; Cesa-Bianchi et al., 2007): w t+1,i w t,i exp( ηz t,i ) w t+1,i w t,i (1 ηz t,i ) (MW1) (MW2) Mirror descent is the gold standard meta-algorithm for online learning. How do (MW1, MW2) relate to it? J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

6 Two Types of Updates In literature, two similar but different updates (Kivinen & Warmuth, 1997; Cesa-Bianchi et al., 2007): w t+1,i w t,i exp( ηz t,i ) w t+1,i w t,i (1 ηz t,i ) (MW1) (MW2) Mirror descent is the gold standard meta-algorithm for online learning. How do (MW1, MW2) relate to it? (MW1) is mirror descent with regularizer 1 η n i=1 w i log(w i ) J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

7 Two Types of Updates In literature, two similar but different updates (Kivinen & Warmuth, 1997; Cesa-Bianchi et al., 2007): w t+1,i w t,i exp( ηz t,i ) w t+1,i w t,i (1 ηz t,i ) (MW1) (MW2) Mirror descent is the gold standard meta-algorithm for online learning. How do (MW1, MW2) relate to it? (MW1) is mirror descent with regularizer 1 η n i=1 w i log(w i ) (MW2) is NOT mirror descent for any fixed regularizer J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

8 Two Types of Updates In literature, two similar but different updates (Kivinen & Warmuth, 1997; Cesa-Bianchi et al., 2007): w t+1,i w t,i exp( ηz t,i ) w t+1,i w t,i (1 ηz t,i ) (MW1) (MW2) Mirror descent is the gold standard meta-algorithm for online learning. How do (MW1, MW2) relate to it? (MW1) is mirror descent with regularizer 1 η n i=1 w i log(w i ) (MW2) is NOT mirror descent for any fixed regularizer Unsettling: should we abandon mirror descent as a gold standard? J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

9 Two Types of Updates In literature, two similar but different updates (Kivinen & Warmuth, 1997; Cesa-Bianchi et al., 2007): w t+1,i w t,i exp( ηz t,i ) w t+1,i w t,i (1 ηz t,i ) (MW1) (MW2) Mirror descent is the gold standard meta-algorithm for online learning. How do (MW1, MW2) relate to it? (MW1) is mirror descent with regularizer 1 η n i=1 w i log(w i ) (MW2) is NOT mirror descent for any fixed regularizer Unsettling: should we abandon mirror descent as a gold standard? No: can cast (MW2) as adaptive mirror descent (Orabona et al., 2013) J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

10 Adaptive Mirror Descent to the Rescue Recall that mirror descent is the (meta-)algorithm t 1 w t = argmin w ψ(w) + s=1 w z s. For ψ(w) = 1 η n i=1 w i log(w i ), we recover (MW1). J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

11 Adaptive Mirror Descent to the Rescue Recall that mirror descent is the (meta-)algorithm t 1 w t = argmin w ψ(w) + s=1 w z s. For ψ(w) = 1 η n i=1 w i log(w i ), we recover (MW1). Adaptive mirror descent (Orabona et al., 2013) is the meta-algorithm t 1 w t = argmin w ψ t (w) + s=1 w z s. J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

12 Adaptive Mirror Descent to the Rescue Recall that mirror descent is the (meta-)algorithm t 1 w t = argmin w ψ(w) + s=1 w z s. For ψ(w) = 1 η n i=1 w i log(w i ), we recover (MW1). Adaptive mirror descent (Orabona et al., 2013) is the meta-algorithm t 1 w t = argmin w ψ t (w) + s=1 w z s. J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

13 Adaptive Mirror Descent to the Rescue Recall that mirror descent is the (meta-)algorithm t 1 w t = argmin w ψ(w) + s=1 w z s. For ψ(w) = 1 η n i=1 w i log(w i ), we recover (MW1). Adaptive mirror descent (Orabona et al., 2013) is the meta-algorithm t 1 w t = argmin w ψ t (w) + s=1 w z s. For ψ t (w) = 1 η n i=1 w i log(w i ) + η n i=1 t 1 s=1 w izs,i 2, we approximately recover (MW2). Update: w t+1,i w t,i exp( ηz t,i η 2 zt,i 2 ) w t,i(1 ηz t,i ) Enough to achieve better regret bound. Can recover (MW2) exactly with more complicated ψ t. J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

14 Advantages of Our Perspective So far, we have cast [ (MW2) as adaptive mirror ] descent, with regularizer ψ t (w) = n i=1 w 1 i η log(w i) + η t 1 s=1 z2 s,i. Explains the better regret bound while staying within the mirror descent framework, which is nice. Our new perspective also allows us to apply lots of modern machinery: optimistic updates (Rakhlin & Sridharan, 2012) matrix multiplicative weights (Tsuda et al., 2005; Arora & Kale, 2007) By turning the crank, we get results that beat state of the art! J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

15 Beating State of the Art Optimism Adaptivity In the above we let J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

16 Beating State of the Art Optimism S Kivinen & Warmuth, 1997 Adaptivity In the above we let S = T t=1 z t 2 J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

17 Beating State of the Art Optimism S Kivinen & Warmuth, 1997 max i S i Adaptivity In the above we let S i Cesa-Bianchi et al., 2007 S = T t=1 z t 2 S i = T t=1 z2 t,i J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

18 Beating State of the Art Optimism V S Kivinen & Warmuth, 1997 max i V i Hazan & Kale, 2008 max i S i Adaptivity In the above we let S i Cesa-Bianchi et al., 2007 V = T t=1 z t z 2, S = T t=1 z t 2 V i = T t=1(z t,i z i ) 2, S i = T t=1 z2 t,i J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

19 Beating State of the Art D Chiang et al., 2012 Optimism V S Kivinen & Warmuth, 1997 max i V i Hazan & Kale, 2008 max i S i Adaptivity In the above we let S i Cesa-Bianchi et al., 2007 D = T t=1 z t z t 1 2, V = T t=1 z t z 2, S = T t=1 z t 2 V i = T t=1(z t,i z i ) 2, S i = T t=1 z2 t,i J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

20 Beating State of the Art D Chiang et al., 2012 Optimism V S Kivinen & Warmuth, 1997 max i D i max i V i Hazan & Kale, 2008 max i S i Adaptivity In the above we let D i this work V i S i Cesa-Bianchi et al., 2007 D = T t=1 z t z t 1 2, V = T t=1 z t z 2, S = T t=1 z t 2 D i = T t=1(z t,i z t 1,i ) 2, V i = T t=1(z t,i z i ) 2, S i = T t=1 z2 t,i J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

21 Optimistic Updates: A Brief Review (Normal) mirror descent: [ ] t 1 w t = argmin w ψ(w) + w z s s=1 J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

22 Optimistic Updates: A Brief Review (Normal) mirror descent: [ ] t 1 w t = argmin w ψ(w) + w z s s=1 Optimistic mirror descent (Rakhlin & Sridharan, 2012); add a hint m t : ] w t = argmin w ψ(w) + w [m t 1 t + z s Guesses (m t ) the next term (z t ) in the cost function. Pay regret in terms of z t m t rather than z t. s=1 J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

23 Optimistic Updates: A Brief Review (Normal) mirror descent: [ ] t 1 w t = argmin w ψ(w) + w z s s=1 Optimistic mirror descent (Rakhlin & Sridharan, 2012); add a hint m t : ] w t = argmin w ψ(w) + w [m t 1 t + z s Guesses (m t ) the next term (z t ) in the cost function. Pay regret in terms of z t m t rather than z t. E.g.: m t = z t 1, m t = 1 t t 1 s=1 z s s=1 J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

24 Multiplicative Weights with Optimism Name Auxiliary Update Prediction (w t ) MW1 β t+1,i = β t,i ηz t,i w t,i exp(β t,i ) J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

25 Multiplicative Weights with Optimism Name Auxiliary Update Prediction (w t ) MW1 β t+1,i = β t,i ηz t,i w t,i exp(β t,i ) MW2 β t+1,i = β t,i ηz t,i η 2 zt,i 2 w t,i exp(β t,i ) J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

26 Multiplicative Weights with Optimism Name Auxiliary Update Prediction (w t ) MW1 β t+1,i = β t,i ηz t,i w t,i exp(β t,i ) MW2 β t+1,i = β t,i ηz t,i η 2 zt,i 2 w t,i exp(β t,i ) MW3 β t+1,i = β t,i ηz t,i η 2 (z t,i z t 1,i ) 2 w t,i exp(β t,i ηz t 1,i ) J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

27 Multiplicative Weights with Optimism Name Auxiliary Update Prediction (w t ) MW1 β t+1,i = β t,i ηz t,i w t,i exp(β t,i ) MW2 β t+1,i = β t,i ηz t,i η 2 zt,i 2 w t,i exp(β t,i ) MW3 β t+1,i = β t,i ηz t,i η 2 (z t,i z t 1,i ) 2 w t,i exp(β t,i ηz t 1,i ) Regret of MW3: Regret log(n) η + η T t=1 Dominates all existing bounds in this setting! (z t,i z t 1,i ) 2 J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

28 Summary Cast multiplicative weights algorithm as adaptive mirror descent Applied machinery of optimistic updates to beat best existing bounds Also in paper: extension to general convex losses extension to matrices generalization of FTRL lemma to convex cones J. Steinhardt & P. Liang (Stanford) Adaptive Exponentiated Gradient Jun 11, / 10

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University, 353 Serra Street, Stanford, CA 94305 USA JSTEINHARDT@CS.STANFORD.EDU PLIANG@CS.STANFORD.EDU Abstract We present an adaptive variant of the exponentiated

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Accelerating Online Convex Optimization via Adaptive Prediction

Accelerating Online Convex Optimization via Adaptive Prediction Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Online Optimization with Gradual Variations

Online Optimization with Gradual Variations JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute

More information

Learnability, Stability, Regularization and Strong Convexity

Learnability, Stability, Regularization and Strong Convexity Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Learning, Games, and Networks

Learning, Games, and Networks Learning, Games, and Networks Abhishek Sinha Laboratory for Information and Decision Systems MIT ML Talk Series @CNRG December 12, 2016 1 / 44 Outline 1 Prediction With Experts Advice 2 Application to

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Bandit Convex Optimization

Bandit Convex Optimization March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems 216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research 1 Microsoft Way, Redmond,

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015 5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

Notes on AdaGrad. Joseph Perla 2014

Notes on AdaGrad. Joseph Perla 2014 Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

More Adaptive Algorithms for Adversarial Bandits

More Adaptive Algorithms for Adversarial Bandits Proceedings of Machine Learning Research vol 75: 29, 208 3st Annual Conference on Learning Theory More Adaptive Algorithms for Adversarial Bandits Chen-Yu Wei University of Southern California Haipeng

More information

Minimax Policies for Combinatorial Prediction Games

Minimax Policies for Combinatorial Prediction Games Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica

More information

Meta Algorithms for Portfolio Selection. Technical Report

Meta Algorithms for Portfolio Selection. Technical Report Meta Algorithms for Portfolio Selection Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159 USA TR

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Supplementary Material of Dynamic Regret of Strongly Adaptive Methods

Supplementary Material of Dynamic Regret of Strongly Adaptive Methods Supplementary Material of A Proof of Lemma 1 Lijun Zhang 1 ianbao Yang Rong Jin 3 Zhi-Hua Zhou 1 We first prove the first part of Lemma 1 Let k = log K t hen, integer t can be represented in the base-k

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Online Convex Optimization with Unconstrained Domains and Losses

Online Convex Optimization with Unconstrained Domains and Losses Online Convex Optimization with Unconstrained Domains and Losses Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

Efficient learning by implicit exploration in bandit problems with side observations

Efficient learning by implicit exploration in bandit problems with side observations Efficient learning by implicit exploration in bandit problems with side observations Tomáš Kocák, Gergely Neu, Michal Valko, Rémi Munos SequeL team, INRIA Lille - Nord Europe, France SequeL INRIA Lille

More information

Optimistic Bandit Convex Optimization

Optimistic Bandit Convex Optimization Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu

More information

CS281B/Stat241B. Statistical Learning Theory. Lecture 14.

CS281B/Stat241B. Statistical Learning Theory. Lecture 14. CS281B/Stat241B. Statistical Learning Theory. Lecture 14. Wouter M. Koolen Convex losses Exp-concave losses Mixable losses The gradient trick Specialists 1 Menu Today we solve new online learning problems

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Composite Objective Mirror Descent

Composite Objective Mirror Descent Composite Objective Mirror Descent John C. Duchi 1,3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research

More information

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected

More information

Meta Optimization and its Application to Portfolio Selection

Meta Optimization and its Application to Portfolio Selection Meta Optimization and its Application to Portfolio Selection Puja Das Dept of Computer Science & Engg Univ of Minnesota, win Cities pdas@cs.umn.edu Arindam Banerjee Dept of Computer Science & Engg Univ

More information

Time Series Prediction & Online Learning

Time Series Prediction & Online Learning Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.

More information

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora

Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm. Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: (Today s notes below are

More information

A Boosting Framework on Grounds of Online Learning

A Boosting Framework on Grounds of Online Learning A Boosting Framework on Grounds of Online Learning Tofigh Naghibi, Beat Pfister Computer Engineering and Networks Laboratory ETH Zurich, Switzerland naghibi@tik.ee.ethz.ch, pfister@tik.ee.ethz.ch Abstract

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi Elad Hazan Yoram Singer Electrical Engineering and Computer Sciences University of California at Berkeley Technical

More information

arxiv: v1 [math.st] 23 May 2018

arxiv: v1 [math.st] 23 May 2018 EFFICIEN ONLINE ALGORIHMS FOR FAS-RAE REGRE BOUNDS UNDER SPARSIY PIERRE GAILLARD AND OLIVIER WINENBERGER arxiv:8050974v maths] 3 May 08 Abstract We consider the online convex optimization problem In the

More information

Convex Repeated Games and Fenchel Duality

Convex Repeated Games and Fenchel Duality Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,

More information

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

Explore no more: Improved high-probability regret bounds for non-stochastic bandits Explore no more: Improved high-probability regret bounds for non-stochastic bandits Gergely Neu SequeL team INRIA Lille Nord Europe gergely.neu@gmail.com Abstract This work addresses the problem of regret

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed

More information

Relative Loss Bounds for Multidimensional Regression Problems

Relative Loss Bounds for Multidimensional Regression Problems Relative Loss Bounds for Multidimensional Regression Problems Jyrki Kivinen and Manfred Warmuth Presented by: Arindam Banerjee A Single Neuron For a training example (x, y), x R d, y [0, 1] learning solves

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo A Dissertation Presented to the Faculty of Princeton University in Candidacy for the Degree of Doctor of Philosophy Recommended for Acceptance by the Department

More information

Convergence rate of SGD

Convergence rate of SGD Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

0.1 Motivating example: weighted majority algorithm

0.1 Motivating example: weighted majority algorithm princeton univ. F 16 cos 521: Advanced Algorithm Design Lecture 8: Decision-making under total uncertainty: the multiplicative weight algorithm Lecturer: Sanjeev Arora Scribe: Sanjeev Arora (Today s notes

More information

Optimization, Learning, and Games With Predictable Sequences

Optimization, Learning, and Games With Predictable Sequences University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 03 Optimization, Learning, and Games With Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

Logarithmic Regret Algorithms for Online Convex Optimization

Logarithmic Regret Algorithms for Online Convex Optimization Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org

More information

Competing With Strategies

Competing With Strategies Competing With Strategies Wei Han Univ. of Pennsylvania Alexander Rakhlin Univ. of Pennsylvania February 3, 203 Karthik Sridharan Univ. of Pennsylvania Abstract We study the problem of online learning

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

Beating SGD: Learning SVMs in Sublinear Time

Beating SGD: Learning SVMs in Sublinear Time Beating SGD: Learning SVMs in Sublinear Time Elad Hazan Tomer Koren Technion, Israel Institute of Technology Haifa, Israel 32000 {ehazan@ie,tomerk@cs}.technion.ac.il Nathan Srebro Toyota Technological

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Follow-he-Perturbed Leader MEHRYAR MOHRI MOHRI@ COURAN INSIUE & GOOGLE RESEARCH. General Ideas Linear loss: decomposition as a sum along substructures. sum of edge losses in a

More information

Applications of on-line prediction. in telecommunication problems

Applications of on-line prediction. in telecommunication problems Applications of on-line prediction in telecommunication problems Gábor Lugosi Pompeu Fabra University, Barcelona based on joint work with András György and Tamás Linder 1 Outline On-line prediction; Some

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

Regret in Online Combinatorial Optimization

Regret in Online Combinatorial Optimization Regret in Online Combinatorial Optimization Jean-Yves Audibert Imagine, Université Paris Est, and Sierra, CNRS/ENS/INRIA audibert@imagine.enpc.fr Sébastien Bubeck Department of Operations Research and

More information

Lecture notes for quantum semidefinite programming (SDP) solvers

Lecture notes for quantum semidefinite programming (SDP) solvers CMSC 657, Intro to Quantum Information Processing Lecture on November 15 and 0, 018 Fall 018, University of Maryland Prepared by Tongyang Li, Xiaodi Wu Lecture notes for quantum semidefinite programming

More information

Minimizing Adaptive Regret with One Gradient per Iteration

Minimizing Adaptive Regret with One Gradient per Iteration Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

On-line Variance Minimization

On-line Variance Minimization On-line Variance Minimization Manfred Warmuth Dima Kuzmin University of California - Santa Cruz 19th Annual Conference on Learning Theory M. Warmuth, D. Kuzmin (UCSC) On-line Variance Minimization COLT06

More information