Bandit Online Convex Optimization

Size: px
Start display at page:

Download "Bandit Online Convex Optimization"

Transcription

1 March 31, 2015

2 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks

3 Review of (Online) Convex Optimization Set-up 1 Sequence of convex functions {c t } : S R over convex set S. 2 Learner chooses point x t S and receives c t (x t ) 3 Goal: minimize regret n c t(x t ) min n x S c t(x) 4 Update rule: x t+1 = x t η c t (x t ). Scenarios: 1 Offline: c t c fixed function 2 Online Stochastic: c t (x) = c(x) + ɛ t (x) noisy estimate 3 Online Adversarial: c t chosen adversarially

4 Review of (Online) Convex Optimization Set-up 1 Sequence of convex functions {c t } : S R over convex set S. 2 Learner chooses point x t S and receives c t (x t ) 3 Goal: minimize regret n c t(x t ) min n x S c t(x) 4 Update rule: x t+1 = x t η c t (x t ). Scenarios: 1 Offline: c t c fixed function 2 Online Stochastic: c t (x) = c(x) + ɛ t (x) noisy estimate 3 Online Adversarial: c t chosen adversarially Key Information: knowledge of gradients!

5 Bandit Setting Bandit Scenario: Set-up 1 Sequence of convex functions {c t } : S R over convex set S. 2 Learner chooses point x t S and receives value c t (x t ) 3 Goal: minimize regret n c t(x t ) min n x S c t(x) Question: can we perform gradient descent without gradients?

6 Estimating Gradients - Multi-point Multi-point use finite differences! 1 one dim: f (x) 1 h (f(x + h) f(x)) 2 d dim: f(x) 1 h ((f(x + he 1) f(x)),..., (f(x + he d ) f(x))) Theorem (Agarwal et al, 2010) Querying d + 1 points is enough to recover standard OCO bounds, and even querying 2 points will get you to within log(t ) terms.

7 Estimating Gradients - One-point 1 One-point f(x) vol(δb 1 ) δb 1 f(x + v)dv one dim: δ δ 1 f(x) 1 2δ f(x + v)dv 2 f (x) 1 δ 2δ δ f (x + v)dv = 1 2δ (f(x + δ) f(x δ)) d dim: 1 f(x) 2 R δb 1 f(x+v)dv vol(δb 1 ) f(x) ˆf(x) = = E v B1 [f(x + δv)] =: ˆf(x) δb 1 f(x + v)dv vol(δb 1 ) = = E u B 1 [f(x + δu)u]vol( (δb 1 )) vol(δb 1 ) (δb 1 ) u f(x + u) u du vol(δb 1 ) = E u B 1 [f(x + δu)u] d δ

8 Algorithm f(x) ˆf(x) = d δ E [f(x + δu)u] where u U( B 1 ) u B 1 Algorithm [Bandit Gradient Descent - Flaxman et al, 2004]: 1 Draw u t U( B 1 ) 2 Play x t = y t + δu t and receive value c t (x t ) 3 Update: y t+1 = P S (y t νc t (x t )u t )

9 Algorithm Algorithm [Bandit Gradient Descent - Flaxman et al, 2004]: 1 Draw u t U( B 1 ) 2 Play x t = y t + δu t and receive value c t (x t ) 3 Update: y t+1 = P S (y t νc t (x t )u t ) Subtle issue: y t + δu t might fall out of S, so use P (1 α)s for α (0, 1) instead New Update: y t+1 = P (1 α)s (y t νc t (x t )u t )

10 Review of OCO Results Theorem (Zinkevich 2003) Let S B R, {c t } : S R seq of convex functions, G = sup t c t (x t ), and x t+1 = P S (x t η c t (x t )). Then c t (x t ) min x S Lemma (Randomized Zinkevich) c t (x) R2 η + nηg2 2 Let S B R, {c t } : S R seq of convex functions, {g t } s.t. E[g t x t ] = c t (x t ), G = sup t g t, and x t+1 = P S (x t ηg t ). Then E[ c t (x t )] min x S c t (x) R2 η + nηg2 2

11 Bandit OCO Algorithm [Bandit Gradient Descent - Flaxman et al, 2004]: 1 Draw u t U( B 1 ) 2 Play x t = y t + δu t and receive value c t (x t ) 3 Update: y t+1 = P (1 α)s (y t νc t (x t )u t ) Theorem (Flaxman et al 2004) Assume that B r S B R and that c t C uniformly. Then for sufficiently large n and suitable choices of ν, δ, and α, we have E[ c t (x t )] min x S ( dr c t (x) 3Cn 5/6 r ) 1/3

12 Extending Randomized Zinkevich Need to show: 1 BGD s updates valid for randomized Zinkevich. 1 x t S for chosen α and δ. 2 g t = d δ c t(x t )u t are valid 2 Upper bounds on G for randomized Zinkevich. 3 Bound for ĉ t can be extended to c t. 4 Bound for (1 α)s can be extended to S.

13 BGD s Updates Valid for Randomized Zinkevich 1 x t S for chosen α and δ. δ αr x t = y t + δu t S 2 g t = d δ c t(x t )u t are valid E[g t y t ] = d δ E u B 1 [c t (y t + δu)u] = E v B1 [c t (y t + δv)]

14 Upper Bounds on G for Zinkevich s Theorem Fact: g t = d δ c t(x t )u t dc δ Thus, for α (0, 1), δ αr, η = ν δ d, G = dc δ, randomized Zinkevich implies that E[ ĉ t (y t )] min x (1 α)s ĉ t (x) R2 η + nηg2 2 = RG n = R dc δ n for η = R G n

15 Extending from ĉ t to c t Lemma c t (y) c t (x) 2C x y, αr ĉ t (y t ) c t (y t ) δ 2C αr, y (1 α)s, x S ĉ t(y t ) c t (x t ) 2δ 2C αr The previous regret bound implies or E[ c t (x t ) 2δ 2C αr ] E[ c t (x t )] min min x (1 α)s x (1 α)s c t (x) + δ 2C αr RdC n δ c t (x) RdC n δ + 3δ 2C αr

16 Extending from (1 α)s to S Lemma min x (1 α)s c t (x) min x S c t (x) + 2αCn Thus, E[ c t (x t )] min x S c t (x) RdC n δ + 3δ 2C αr + 2αCn

17 Recap 1) E[ ĉ t (y t )] min x (1 α)s ĉ t (x) R dc n δ (optimal η, and δ αr, α < 1, B r S, c t C) 2) E[ c t (x t )] min x (1 α)s c t (x) RdC n δ (Lip across (1 α)s and S, B r S, c t C) + 3δ 2C αr 3) E[ c t (x t )] min x S c t (x) RdC n δ (min s are close, B r S, c t C) + 3δ 2C αr + 2αCn

18 Remarks 1 Optimize over δ and α to get the theorem: E[ c t (x t )] min x S ( dr c t (x) 3Cn 5/6 r ) 1/3 2 If Lip(c t ) L known, then get O(n 3/4 ) bound. 3 Preconditioning can improve ratio R r

19 Theorem for L-Lipschitz Cost Functions Theorem (Flaxman et al 2004 for L-Lipschitz cost functions) If each c t is L-Lipschitz, then for n sufficiently large and ν = R C n, δ = n.25 RdCr 3(Lr+C), and α = δ r, we have. E[ c t (x t )] min x S c t (x) 2n 3RdC(L 3/4 + C r )

20 Reshaping Reshaping increases the accuracy of gradient descent! Above regret bound depends on R/r - can be very large. Idea: reshape the body to make it more round - put it in isotropic position: This amounts to finding an affine transformation T.

21 Isotropic Position and Algorithm Isotropic position: 1 Estimate covariance of random samples from S (estimating r and R in B r S B R ). 2 Find an affine transformation T so that the new covariance matrix is the identity matrix. 3 Apply T to S R d so B 1 T (S) B d. 4 Then R = d and r = 1. Algorithm [Lovasz and Vempala, 2003] - runs in time O(d 4 )poly-log(d, R/r) and - puts the body in a nearly isotropic position: R = 1.01d and r = 1.

22 Reshaping Properties Lemma (New Lip constant L = LR) Let c t(u) = c t (T 1 (u)). Then c t is LR-Lipschitz. Proof outline: Let x 1, x 2 S and u 1 = T (x 1 ), u 2 = T (x 2 ). Then, c t(u 1 ) c t(u 2 ) = c t (x 1 ) c t (x 2 ) L x 1 x 2 Using that T is affine hence bounded, prove by contradiction that x 1 x 2 R u 1 u 2, thus the LR-Lipschitz condition on c t.

23 Reshaping and the BGD Algorithm Corollary (Reshaping) For a set S of diameter D, and c t L-Lipschitz, after putting S into near-isotropic position, the BGD algorithm has expected regret E[ c t (x t )] min x S c t (x) 6n 3/4 d( CLR + C) Without the L-Lipschitz condition Proof. E[ c t (x t )] min x S c t (x) 6n 5/6 dc Use r = 1, R = 1.01d, L = LR, and C = C.

24 Oblivious Adversary So far, we have analyzed the algorithm in the case of an oblivious adversary who: 1 fixes the sequence of functions c 1, c 2,... 2 knows the decision maker s algorithm 3 doesn t have knowledge of the random decisions of the algorithm

25 Adaptive Adversary Consider an adaptive adversary who plays a game with the decision maker: 1 decision maker knows x 1, c 1 (x 1 ), x 2, c 2 (x 2 ),..., x t 1, c t 1 (x t 1 ) 2 decision maker chooses x t 3 adaptive adversary knows x 1, c 1, x 2, c 2,..., x t 1, c t 1 4 adaptive adversary chooses c t Main takeaway: theorems against an oblivious adversary all hold against an adaptive adversary, up to changes of multiplicative constant by a factor of at most 3.

26 Adaptive Adversary - Regret Fact: The bounds relating costs c t (x t ), c t (y t ), ĉ t (y t ) were all worst case bounds, i.e. they hold for arbitrary c t, regardless of whether the c t are adaptively chosen or not. Thus it suffices to bound the regret: E[ ĉ t (y t ) min y S ĉ t (y)] Idea: Need to show that the adversary s extra knowledge of {x k } t k=1 cannot help to maximize the above regret.

27 Extending Randomized Zinkevich for an Adaptive Adversary Need to show: 1 BGD s updates valid for randomized Zinkevich - next slides 1 x t S for chosen α and δ. 2 g t = d δ c t(x t )u t are valid 2 Upper bounds on G for randomized Zinkevich. (from before) 3 Bound for ĉ t can be extended to c t (from before) 4 Bound for (1 α)s can be extended to S - (from before)

28 Lemma - BGD Updates for an Adaptive Adversary Lemma (BGD updates for adaptive costs) Let S B R, {c t } : S R seq of convex differentiable functions,(c t+1 possibly depending on z 1, z 2,..., z t ), where z 1, z 2,..., z t S are defined by z t+1 = P S (z t ηg t ). Here {g t } are vector-valued random variables s.t. E[g t z 1, c 1, z 2, c 2,..., z t, c t ] = c t (z t ), G = sup t g t. Then, for η = E[ R G n c t (z t ) min x S c t (x)] 3( R2 η + nηg2 2 ) = 3RG n

29 Proof of Adaptive Adversary Lemma Proof: Let h t (x) := c t (x) + xξ t, where ξ t = g t c t (z t ). Observe that h t (z t ) = c t (z t ) + ξ t = g t and ξ t g t + c t (z t ) 2G. By Zinkevich s theorem, applied to h t - which are deterministic at this point of the game Since it suffices to show that h t (z t ) min x S h t (x) + RG n E[h t (z t )] = E[c t (z t )] + E[ξ t.z t ] = E[c t (z t )], E[min x S h t (x)] E[min x S c t (x)] + 2RG n

30 Proof of Adaptive Adversary Lemma - part 2 Proof continued: Left to show: E[min x S h t (x)] E[min x S c t (x)] + 2RG n By Cauchy Schwartz n (h t(x) c t (x)) = x ξ t x ξ t R ξ t. This is in particular true for the minimal x. We take the expectation and bound the sum by using properties of i.i.d. vectors (recall: ξ t 2G): (E[ ξ t ]) 2 E[ ξ t 2 ] = E[ ξ t 2 ]+2 E[ξ s.ξ t ] 4nG 2. 1 s<t n

31 Summary The paper extends Zinkevich s gradient descent idea to a problem in which one doesn t have access to the gradient. Instead, the gradient of a function is approximated from a single sample. Interpretation: approximation at each step is the gradient of a smoothed out version of the function at that step. Analysis applies to both oblivious and adaptive adversaries (bounds change by a factor of 3). Preconditioning ( reshaping ) improves bounds significantly.

32 Possible Extensions Extension of BGD to Zinkevich s model in the case of adaptive step size. Extension of BGD to Zinkevich s model in the case of a non-stationary adversary, i.e. when regret is of the form: c t (x t ) min w 1,w 2,...,w n S c t (w t ) Potential extension to minimizing any (convex?) function over a convex set by updates y t+1 := y t ν(c t (x t ) c t 1 (x t 1 ))u t, even if we don t know the uniform bound on c t.

33 References Main paper: A. D. Flaxman, A. T. Kalai, H. B. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient Zinkevich s paper: M. Zinkevich. Online Convex Programming and Generalized Infinitesimal Gradient Ascent. In Proceedings of the Twentieth International Conference on Machine Learning pp , Multi-point paper: A. Agarwal, O. Dekel, L. Xiao, (2010), Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback., in Adam Tauman Kalai and Mehryar Mohri, ed., COLT, Omnipress,, pp

34 Thank you Thank you!

Online convex optimization in the bandit setting: gradient descent without a gradient

Online convex optimization in the bandit setting: gradient descent without a gradient Online convex optimization in the bandit setting: gradient descent without a gradient Abraham D. Flaxman Adam Tauman Kalai H. Brendan McMahan November 30, 2004 Abstract We study a general online convex

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information

Bandit Convex Optimization

Bandit Convex Optimization March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Optimistic Bandit Convex Optimization

Optimistic Bandit Convex Optimization Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu

More information

Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb

Online Convex Optimization in the Bandit Setting: Gradient Descent Without a Gradient. -Avinash Atreya Feb Olie Covex Optimizatio i the Badit Settig: Gradiet Descet Without a Gradiet -Aviash Atreya Feb 9 2011 Outlie Itroductio The Problem Example Backgroud Notatio Results Oe Poit Estimate Mai Theorem Extesios

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Logarithmic Regret Algorithms for Online Convex Optimization

Logarithmic Regret Algorithms for Online Convex Optimization Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Bandit Convex Optimization: T Regret in One Dimension

Bandit Convex Optimization: T Regret in One Dimension Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

4.1 Online Convex Optimization

4.1 Online Convex Optimization CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..

More information

Online Learning with Non-Convex Losses and Non-Stationary Regret

Online Learning with Non-Convex Losses and Non-Stationary Regret Online Learning with Non-Convex Losses and Non-Stationary Regret Xiang Gao Xiaobo Li Shuzhong Zhang University of Minnesota University of Minnesota University of Minnesota Abstract In this paper, we consider

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Online Convex Optimization Using Predictions

Online Convex Optimization Using Predictions Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Lecture 5: Bandit optimisation Alexandre Proutiere, Sadegh Talebi, Jungseul Ok KTH, The Royal Institute of Technology Objectives of this lecture Introduce bandit optimisation: the

More information

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem Yisong Yue Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY 14853 USA yyue@cs.cornell.edu

More information

Accelerating Online Convex Optimization via Adaptive Prediction

Accelerating Online Convex Optimization via Adaptive Prediction Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework

More information

Online Submodular Minimization

Online Submodular Minimization Journal of Machine Learning Research 13 (2012) 2903-2922 Submitted 12/11; Revised 7/12; Published 10/12 Online Submodular Minimization Elad Hazan echnion - Israel Inst. of ech. echnion City Haifa, 32000,

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 72076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United

More information

Littlestone s Dimension and Online Learnability

Littlestone s Dimension and Online Learnability Littlestone s Dimension and Online Learnability Shai Shalev-Shwartz Toyota Technological Institute at Chicago The Hebrew University Talk at UCSD workshop, February, 2009 Joint work with Shai Ben-David

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Notes on AdaGrad. Joseph Perla 2014

Notes on AdaGrad. Joseph Perla 2014 Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

The convex optimization approach to regret minimization

The convex optimization approach to regret minimization The convex optimization approach to regret minimization Elad Hazan Technion - Israel Institute of Technology ehazan@ie.technion.ac.il Abstract A well studied and general setting for prediction and decision

More information

Efficient Bandit Algorithms for Online Multiclass Prediction

Efficient Bandit Algorithms for Online Multiclass Prediction Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Online Sparse Linear Regression

Online Sparse Linear Regression JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU

More information

Lecture 4: Lower Bounds (ending); Thompson Sampling

Lecture 4: Lower Bounds (ending); Thompson Sampling CMSC 858G: Bandits, Experts and Games 09/12/16 Lecture 4: Lower Bounds (ending); Thompson Sampling Instructor: Alex Slivkins Scribed by: Guowei Sun,Cheng Jie 1 Lower bounds on regret (ending) Recap from

More information

Stochastic bandits: Explore-First and UCB

Stochastic bandits: Explore-First and UCB CSE599s, Spring 2014, Online Learning Lecture 15-2/19/2014 Stochastic bandits: Explore-First and UCB Lecturer: Brendan McMahan or Ofer Dekel Scribe: Javad Hosseini In this lecture, we like to answer this

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Bits of Machine Learning Part 2: Unsupervised Learning

Bits of Machine Learning Part 2: Unsupervised Learning Bits of Machine Learning Part 2: Unsupervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

An Online Convex Optimization Approach to Blackwell s Approachability

An Online Convex Optimization Approach to Blackwell s Approachability Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical

More information

arxiv: v2 [stat.ml] 7 Sep 2018

arxiv: v2 [stat.ml] 7 Sep 2018 Projection-Free Bandit Convex Optimization Lin Chen 1, Mingrui Zhang 3 Amin Karbasi 1, 1 Yale Institute for Network Science, Department of Electrical Engineering, 3 Department of Statistics and Data Science,

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Composite Objective Mirror Descent

Composite Objective Mirror Descent Composite Objective Mirror Descent John C. Duchi 1,3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research

More information

Learning with Large Number of Experts: Component Hedge Algorithm

Learning with Large Number of Experts: Component Hedge Algorithm Learning with Large Number of Experts: Component Hedge Algorithm Giulia DeSalvo and Vitaly Kuznetsov Courant Institute March 24th, 215 1 / 3 Learning with Large Number of Experts Regret of RWM is O( T

More information

Online Learning: Random Averages, Combinatorial Parameters, and Learnability

Online Learning: Random Averages, Combinatorial Parameters, and Learnability Online Learning: Random Averages, Combinatorial Parameters, and Learnability Alexander Rakhlin Department of Statistics University of Pennsylvania Karthik Sridharan Toyota Technological Institute at Chicago

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Introduction to Machine Learning (67577) Lecture 7

Introduction to Machine Learning (67577) Lecture 7 Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew

More information

A simpler unified analysis of Budget Perceptrons

A simpler unified analysis of Budget Perceptrons Ilya Sutskever University of Toronto, 6 King s College Rd., Toronto, Ontario, M5S 3G4, Canada ILYA@CS.UTORONTO.CA Abstract The kernel Perceptron is an appealing online learning algorithm that has a drawback:

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Stochastic Gradient Descent

Stochastic Gradient Descent Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Notes from Week 8: Multi-Armed Bandit Problems

Notes from Week 8: Multi-Armed Bandit Problems CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 8: Multi-Armed Bandit Problems Instructor: Robert Kleinberg 2-6 Mar 2007 The multi-armed bandit problem The multi-armed bandit

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

ODE Final exam - Solutions

ODE Final exam - Solutions ODE Final exam - Solutions May 3, 018 1 Computational questions (30 For all the following ODE s with given initial condition, find the expression of the solution as a function of the time variable t You

More information

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria

More information

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba Tutorial on: Optimization I (from a deep learning perspective) Jimmy Ba Outline Random search v.s. gradient descent Finding better search directions Design white-box optimization methods to improve computation

More information

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm

CS261: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm CS61: A Second Course in Algorithms Lecture #11: Online Learning and the Multiplicative Weights Algorithm Tim Roughgarden February 9, 016 1 Online Algorithms This lecture begins the third module of the

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Online Learning with Gaussian Payoffs and Side Observations

Online Learning with Gaussian Payoffs and Side Observations Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic

More information

Efficient Algorithms for Universal Portfolios

Efficient Algorithms for Universal Portfolios Efficient Algorithms for Universal Portfolios Adam Kalai CMU Department of Computer Science akalai@cs.cmu.edu Santosh Vempala y MIT Department of Mathematics and Laboratory for Computer Science vempala@math.mit.edu

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

How hard is this function to optimize?

How hard is this function to optimize? How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize

More information

arxiv: v2 [math.oc] 8 Oct 2011

arxiv: v2 [math.oc] 8 Oct 2011 Stochastic convex optimization with bandit feedback Alekh Agarwal Dean P. Foster Daniel Hsu Sham M. Kakade, Alexander Rakhlin Department of EECS Department of Statistics Microsoft Research University of

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems 216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

More information

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

The No-Regret Framework for Online Learning

The No-Regret Framework for Online Learning The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

arxiv: v4 [cs.lg] 27 Jan 2016

arxiv: v4 [cs.lg] 27 Jan 2016 The Computational Power of Optimization in Online Learning Elad Hazan Princeton University ehazan@cs.princeton.edu Tomer Koren Technion tomerk@technion.ac.il arxiv:1504.02089v4 [cs.lg] 27 Jan 2016 Abstract

More information

A Drifting-Games Analysis for Online Learning and Applications to Boosting

A Drifting-Games Analysis for Online Learning and Applications to Boosting A Drifting-Games Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Subgradient Methods for Maximum Margin Structured Learning

Subgradient Methods for Maximum Margin Structured Learning Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Learning with Large Expert Spaces MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Problem Learning guarantees: R T = O( p T log N). informative even for N very large.

More information