Bandit Convex Optimization

Size: px
Start display at page:

Download "Bandit Convex Optimization"

Transcription

1 March 7, 2017

2 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion

3 Learning scenario Compact convex action set K R d. For t = 1 to T : Predict x t K. Receive convex loss function f t : K R. Incur loss f t(x t). Bandit setting: only loss revealed, no other information. Regret of algorithm A: Reg T (A) = f t (x t ) min x K f t (x).

4 Related settings Online convex optimization: f t (and maybe 2 f t, 3 f t,...) known at each round. Multi-armed bandit: K = {1, 2,..., K } discrete. Zero-th order optimization: f t = f. Stochastic bandit convex optimization: f t (x) = f (x) + ɛ t, ɛ t D noisy estimate. Multi-point bandit convex optimization: Query f t at points (x t,i ) m i=1, m 2.

5 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion

6 Gradient descent without a gradient Online gradient descent algorithm: x t+1 x t η f t (x t ). BCO setting: f t (x t ) is not known! BCO idea: Find ĝ t such that ĝ t f t(x t). Update x t+1 x t ηĝ t. Question: how do we pick ĝ t?

7 Single-point gradient estimates (one dimension) By the fundamental theorem of calculus: δ f (x) 1 f (x + y)dy = 1 [f (x + δ) f (x δ)] 2δ δ 2δ [ ] 1 z = E f (x + z) z D δ z where D(z) = δ w.p. 1 2 and = δ w.p With enough regularity (e.g. f Lipschitz), d 1 δ f (x + y)dy = 1 δ f (x + y)dy. dx 2δ δ 2δ δ

8 Single-point gradient estimates (higher dimensions) B 1 = {x R n : x 2 1}. S 1 = {x R n : x 2 = 1}. dy = A. A By Stokes theorem, f (x) 1 f (x + y)dy = 1 f (x + z) z δb 1 δb 1 δb 1 δs 1 z dz = 1 f (x + δz)zdz = S 1 f (x + δz)zdz δb 1 S 1 S 1 δb 1 S 1 = S 1 δb 1 E z U(S 1 ) [f (x + δz)] = d δ E z U(S 1 ) [f (x + δz)]. With enough regularity on f, 1 f (x + y)dy = 1 f (x + y)dy. δb 1 δb 1 δb 1 δb 1

9 Projection method [Flaxman et al, 2005] Let f (x) = 1 δb 1 δb 1 f (x + y)dy. Estimate f (x) by sampling on δs 1 (x) Project gradient descent update to keep samples inside K: K δ = 1 1 δ K. BANDITPGD(T, η, δ): x 1 0. For t = 1, 2,..., T : u t SAMPLE(S 1 ) y t x t + δu t PLAY(y t) f t(y t) RECEIVELOSS(y t) ĝ t d δ ft(yt)ut x t+1 Π Kδ (x t ηĝ t).

10 Analysis of BANDITPGD Theorem (Flaxman et al, 2005) Assume diam(k) D, f t C, and f t L. Then after T rounds, the (expected) regret of the BANDITPGD algorithm is bounded by: D 2 2η + ηc2 d 2 T 2δ 2 + δ(d + 2)LT. In particular, by setting η = δd Cd and δ = DCd, the regret is T (D+2)LT 1/2 upper bounded by: O(d 1/2 T 3/4 ).

11 Proof of BANDITPGD regret For any x K, let x δ = Π Kδ (x). ft (z) f t (z). Then E[f t (y t ) f t (x )] = [ E f t (y t ) f t (x t ) + f t (x t ) f t (x t ) + f t (x t ) f t (xδ ) + f t (x δ ) f t (x δ ) + f t (x δ ) f t (x ) E [ ft (x t ) f ] t (xδ ) + [2δLT + δdlt ] E [ ft (x t ) f ] t (xδ ) + δ(d + 2)LT. ]

12 Proof of BANDITPGD regret E [ ĝ t 2 ] C 2 d 2 δ 2. Thus, = = E [ ft (x t ) f ] t (xδ ) E [ ĝ t (x t xδ ) ] [ ] E f t (x t ) (x t xδ ) 1 2η E [ η 2 ĝ t 2 + x t x δ 2 x t ηĝ t x δ 2] 1 [η 2η E 2 C2 d 2 ] δ 2 + x t xδ 2 x t+1 xδ 2 1 2η E [ x 1 x δ 2 + η 2 C2 d 2 δ 2 ] 1 [D 2 + η 2 C2 d 2 2η δ 2 ].

13 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion

14 Revisiting projection Goal of projection: Keep x t K δ so that y t K. Total cost of projection: δdlt. Deficiency: completely separate from gradient descent update. Question: is there a better way to ensure that y t K?

15 Gradient Descent to Follow-the-Regularized-Leader Let ĝ 1:t = t s=1 ĝs. Proximal form of gradient descent: x t+1 x t ηĝ t x t+1 argmin x R d ηĝ 1:t x + x 2 Follow-the-Regularized-Leader: for R : R d R, x t+1 argmin ĝ 1:t x + R(x), x R d BCO wishlist for R: Want to ensure that x t+1 stays inside K. Want enough room so that y t+1 K as well.

16 Self-concordant barriers Definition (Self-concordant barrier (SCB)) Let ν 0. A C 3 function R : int(k) R is a ν-self-concordant barrier for K if for any sequence (z s ) s=1 int(k), with z s K, we have R(z s ), and for all x K and y R n, the following inequalities hold: 3 R(x)[y, y, y] 2 y 3 x, R(x) y ν 1/2 y x, where z 2 x = z 2 2 R(x) = z 2 R(x)z.

17 Examples of barriers K = B 1 : R(x) = log(1 x 2 ) is 1-self-concordant. K = {x : ai x b i } m i=1 : R(x) = m log(b i ai x) i=1 is m-self-concordant. Existence of universal barrier [Nesterov & Nemirovski, 1994]: every closed convex domain K admits a O(d)-self-concordant barrier.

18 Properties of self-concordant-barriers Translation invariance: for any constant c R, R + z is also a SCB (so wlog, we assume min z K R(z) = 0.) Dikin ellipsoid contained in interior: let E(x) = {y R n : y x 1}. Then for any x int(k), E(x) int(k). Logarithmic growth away from boundary: for any ɛ (0, 1], let y = argmin z K R(z) and K y,ɛ = {y + (1 ɛ)(x y) : x K}. Then for all x K y,ɛ, R(x) ν log(1/ɛ). Proximity to minimizer: If R(x) x, 1 2, then x argmin R x 2 R(x) x,.

19 Adjusting to the local geometry Let A 0 SPD matrix. Sampling around A instead of Euclidean ball: u SAMPLE(S 1 ), x y + δau Smoothing over A instead of Euclidean ball: f (x) = Eu U(S1 )[f (x + δau)]. One-point gradient estimate based on A: ĝ = d δ f (x + [ĝ] δau)a 1 u, E u U(S1 ) = f (x). Local norm bound: ĝ 2 A 2 d 2 δ 2 C2.

20 BANDITFTRL [Abernethy et al, 2008; Saha and Tewari 2011] BANDITFTRL(R, δ, η, T, x 1 ) For t 1 to T : u t SAMPLE(U(S 1 )). y t x t + δ( 2 R(x t)) 1/2 u t. PLAY(y t). f t(y t) RECEIVELOSS(y t). ĝ t d δ ft(yt) 2 R(x t)u t. x t+1 argmin x R d ηĝ 1:tx + R(x).

21 Analysis of BANDITFTRL Theorem (Abernethy et al, 2008; Saha and Tewari, 2011) Assume diam(k) D. Let R be a self-concordant-barrier for K, f t C, and f t L. Then the regret of BANDITFTRL is upper bounded as follows: If (f t ) T are linear functions, then Reg T (BANDITFTRL) = Õ(T 1/2 ). If (f t ) T have Lipschitz gradients, then Reg T (BANDITFTRL) = Õ(T 2/3 ).

22 Proof of BANDITFTRL regret: linear case Approximation error of smoothed losses: x T = argmin x K f t(x) xɛ argmin y K,dist(y, K)>ɛ y x Because f t are linear, [ ] Reg T (BANDITFTRL) = E f t (y t ) f t (x ) [ = E f t (y t ) f t (y t ) + f t (y t ) f t (x t ) + f t (x t ) f t (xɛ ) + f t (xɛ ) ] f t (xɛ ) + f t (xɛ ) f t (x ) [ ] [ ] E ft (x t ) f t (xɛ ) + ɛlt = E ĝt (x t xɛ ) + ɛlt.

23 Proof of BANDITFTRL regret: linear case Claim: for any z K, ĝ t (x t+1 z) 1 η R(z). T = 1 case is true by definition of x 2. Assuming statement is true for T 1: 1 ĝt x t+1 = ĝt x t+1 + ĝt x T +1 1 T η R(x 1 T ) + ĝt x T + ĝt x T +1 1 T η R(x 1 T +1) + ĝ t x T +1 + ĝt x T +1 1 η R(z) + ĝt z.

24 Proof of BANDITFTRL regret: linear case [ ] E ft (x t ) f t (xɛ ) Proximity to minimizer for SCB: Recall: x t+1 = argmin x K ηĝ 1:tx + R(x) F t(x) = ηĝ 1:tx + R(x) is a SCB. E [ ft (x t ) f t (x t+1 ) + f t (x t+1 ) f ] t (xɛ ) E [ ĝ t xt, x t x t+1 xt ] + 1 η R(x ɛ ) Proximity bound: x t x t+1 xt F t (x t ) xt, = η ĝ t xt,

25 Proof of BANDITFTRL regret: linear case Reg T (BANDITFTRL) ɛlt + E [ η ĝ t 2 ] 1 x t, + η R(x ɛ ) By the local norm bound: E [ ] η ĝ t 2 x t, C 2 d 2. δ 2 By the logarithmic growth of the SCB: 1 η R(x ɛ ) ν log ( ) 1 ɛ. Reg T (BANDITFTRL) ɛlt + T η C2 d 2 δ 2 + ν ( ) 1 η log. ɛ

26 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion

27 Issues with BANDITFTRL in the non-linear case Approximation error of f f : δt for C 0,1 functions δ 2 T for C 1,1 functions Variance of gradient estimates: E [ ] η ĝ t 2 x t, C 2 d 2 δ 2 Regret for non-linear loss functions: O(T 3/4 ) for C 0,1 functions O(T 2/3 ) for C 1,1 functions Question: can we reduce the variance of the gradient estimates to improve the regret?

28 Variance reduction Observation [Dekel et al, 2015]: If ḡ t = 1 k+1 k i=0 ĝt i, then ( C ḡ t 2 2 d 2 ) x t, = O δ 2. (k + 1) Note: averaged gradient ḡ t is no longer an unbiased estimate of f t. Idea: If f t is sufficiently regular, then the bias will still be manageable.

29 Improving variance reduction via optimism Optimistic FTRL [Rakhlin and Sridharan, 2013]: x t+1 argmin(g 1:t + g t+1 ) x + R(x) x K f t (x t ) f t (x ) η g t g t xt, + 1 η R(x ) By re-centering the averaged gradient at each step, we can further reduce the variance: g t = 1 k + 1 k ĝ t i. i=1 Variance of re-centered averaged gradients: ( ḡ t g t 2 x t, = 1 (k + 1) 2 ĝ t 2 C 2 = O d 2 ) x t, δ 2 (k + 1) 2.

30 BANDITFTRL-VR BANDITFTRL-VR(R, δ, η, k, T, x 1 ) For t 1 T : u t SAMPLE(U(S 1 )) y t x t + δ( 2 R(x t)) 1 2 ut PLAY(y t) f t(y t) RECEIVELOSS(y t) ĝ t d δ ft(yt)( 2 R(x t)) 1 2 ut ḡ t 1 k+1 k i=0 ĝt i g t+1 1 k+1 k i=1 ĝt+1 i x t+1 argmin x R d η(ḡ 1:t + g t+1 ) x + R(x)

31 Analysis of BANDITFTRL-VR Theorem (Mohri & Y., 2016) Assume diam(k) D. Let R be a self-concordant-barrier for K, f t C, and f t L. Then the regret of BANDITFTRL is upper bounded as follows: If (f t ) T are Lipschitz, then Reg 11 T (BANDITFTRL-VR) = Õ(T 16 ). If (f t ) T have Lipschitz gradients, then Reg T (BANDITFTRL-VR) = Õ(T 8 13 ).

32 Proof of BANDITFTRL-VR regret: Lipschitz case Approximation: real to smoothed losses Relate global optimum x to projected optimum x ɛ. Use Lipschitz property of losses to relate y t to x t and f t to f t. [ ] Reg T (BANDITFTRL-VR) = E f t (y t ) f t (x ) ɛlt + 2LδDT + E [ ft (x t ) f ] (xɛ ).

33 Proof of BANDITFTRL-VR regret: Lipschitz case Approximation: smoothed to averaged losses E [ ft (x t ) f ] (xɛ ) = + 1 k + 1 Ck 2 k i=0 + LT sup [ 1 k E ( ft (x t ) k + 1 f t i (x t i )) i=0 k ( ft (xɛ ) f ) ] t (xɛ ) ( ft i (x t i ) f t (x ɛ )) + 1 k + 1 t [1,T ] i [0,k t] E [ x t i x t 2 ] + i=0 E [ ḡt (x t xɛ ) ].

34 Proof of BANDITFTRL-VR regret: Lipschitz case FTRL analysis on averaged gradients with re-centering: E [ ḡt (x t x ɛ ) ] 2C2 d 2 ηt δ 2 (k + 1) η R(x ɛ ). Cumulative analysis: Reg T (BANDITFTRL-VR) ɛlt + 2LδDT + Ck 2 + 2C2 d 2 ηt δ 2 (k + 1) η R(x ɛ ) + LT sup t [1,T ] i [0,k t] E [ x t i x t 2 ].

35 Proof of BANDITFTRL-VR regret: Lipschitz case Stability estimate for the actions Want to bound: sup t [1,T ] E [ x t i x t 2 ]. i [0,k t] Fact: Let D be the diameter of K. For any x K and z R d, D 1 z x, z 2 D z x. By triangle inequality and equivalence of norms, E [ x t i x t 2 ] D t 1 s=t i t 1 s=t i E [ x s x s+1 2 ] E [ x s x s+1 xs ] D t 1 s=t i 2ηE [ ḡ s + g s+1 g s xs, ].

36 Proof of BANDITFTRL-VR regret: Lipschitz case ḡ s + g s+1 g s = 1 k k+1 i=0 ĝs i + 1 k+1ĝs Thus, E [ ḡ s + g s+1 g s 2 ] x s, 3 k 1 ] 2 k 2 E s i [ĝs i + 3 k 2 E k 1 ĝ s i E s i [ĝ s i ] i=0 x s, i=0 3 k 2 L + 2D2 L k 1 k 2 E ĝ s i E s i [ĝ s i ]. i=0 x s, 2 x s, + 3 k 2 L

37 Proof of BANDITFTRL-VR regret: Lipschitz case Fact: 0 i k such that t i 1, 1 2 z x t i, z xt, 2 z xt i,. Because the terms in the sum make up martingale difference, 2 k 1 E ĝ s i E s i [ĝ s i ] 4E k 1 ĝ s i E s i [ĝ s i ] i=0 x s, k 1 [ ĝs i 4 E E s i [ĝ s i ] ] 2 x s k, i=0 k 1 [ 16 E ĝ s i E s i [ĝ s i ] ] 2 x s i, i=0 k 1 16 i=0 [ ĝs i ] E 2 x s i, k 1 16 i=0 C 2 d 2 i=0 δ 2 = 16k C2 d 2 δ 2. 2 x s k,

38 Proof of BANDITFTRL-VR regret: Lipschitz case By combining the components of the stability estimate, E [ x t i x t 2 ] 2ηD t 1 s=t i By the previous calculations, 3 k 2 L + 2D2 L k 2 16k C2 d 2 δ 2. Reg T (BANDITFTRL-VR) ɛlt + 2LδDT + Ck 2 + 2C2 d 2 ηt δ 2 (k + 1) η log(1/ɛ) + LTD2ηk 3 k 2 L + 2D2 L k 2 C 2 d 2 δ 2. Now set η = T 11/16 d 3/8, δ = T 5/16 d 3/8, k = T 1/8 d 1/4.

39 Discussion of BANDITFTRL-VR regret: Lipschitz gradient case Approximation of real to smoothed losses incurs a δ 2 D 2 T penalty instead of δdt. Rest of analysis also leads to some changes in constants. General regret bound: Reg T (BANDITFTRL-VR) ɛlt + Hδ 2 D 2 T + Ck + 2C2 d 2 ηt δ 2 (k + 1) η log(1/ɛ) + (TL + DHT )2ηkD 3 k 2 L + 2D2 L k 2 C 2 d 2 δ 2. Now set η = T 8/13 d 5/6, δ = T 5/26 d 1/3, k = T 1/13 d 5/3.

40 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion

41 Other BCO methods Strongly convex loss functions: Augment R in BANDITFTRL with additional regularization. C 0,1 [Agarawal et al, 2010]: O(T 2/3 ) regret C 1,1 [Hazan & Levy, 2014]: O(T 1/2 ) regret Other types of algorithms: Ellipsoid method-based algorithm [Hazan and Li, 2016]: O(2 d 4 log(t ) 2d T 1/2 ). Kernel-based algorithm [Bubeck et al, 2017]: O(d 9.5 T 1/2 )

42 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion

43 Conclusion BCO is a flexible framework for modeling learning problems with sequential data and very limited feedback. BCO generalizes many existing models of online learning and optimization. State-of-the-art algorithms leverage techniques from online convex optimization and interior-point methods. Efficient algorithms obtaining optimal guarantees in C 0,1, C 1,1 cases are still open.

Optimistic Bandit Convex Optimization

Optimistic Bandit Convex Optimization Optimistic Bandit Convex Optimization Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 mohri@cims.nyu.edu Scott Yang Courant Institute 25 Mercer Street New York, NY 002 yangs@cims.nyu.edu

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Accelerating Online Convex Optimization via Adaptive Prediction

Accelerating Online Convex Optimization via Adaptive Prediction Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Bandit Convex Optimization: T Regret in One Dimension

Bandit Convex Optimization: T Regret in One Dimension Bandit Convex Optimization: T Regret in One Dimension arxiv:1502.06398v1 [cs.lg 23 Feb 2015 Sébastien Bubeck Microsoft Research sebubeck@microsoft.com Tomer Koren Technion tomerk@technion.ac.il February

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Exponential Weights on the Hypercube in Polynomial Time

Exponential Weights on the Hypercube in Polynomial Time European Workshop on Reinforcement Learning 14 (2018) October 2018, Lille, France. Exponential Weights on the Hypercube in Polynomial Time College of Information and Computer Sciences University of Massachusetts

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

10. Unconstrained minimization

10. Unconstrained minimization Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com

More information

Supplement: Universal Self-Concordant Barrier Functions

Supplement: Universal Self-Concordant Barrier Functions IE 8534 1 Supplement: Universal Self-Concordant Barrier Functions IE 8534 2 Recall that a self-concordant barrier function for K is a barrier function satisfying 3 F (x)[h, h, h] 2( 2 F (x)[h, h]) 3/2,

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Better Algorithms for Benign Bandits Elad Hazan IBM Almaden 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Microsoft Research One Microsoft Way, Redmond, WA 98052 satyen.kale@microsoft.com

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015 5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

More information

Online Learning with Predictable Sequences

Online Learning with Predictable Sequences JMLR: Workshop and Conference Proceedings vol (2013) 1 27 Online Learning with Predictable Sequences Alexander Rakhlin Karthik Sridharan rakhlin@wharton.upenn.edu skarthik@wharton.upenn.edu Abstract We

More information

Notes on AdaGrad. Joseph Perla 2014

Notes on AdaGrad. Joseph Perla 2014 Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Online Learning with Non-Convex Losses and Non-Stationary Regret

Online Learning with Non-Convex Losses and Non-Stationary Regret Online Learning with Non-Convex Losses and Non-Stationary Regret Xiang Gao Xiaobo Li Shuzhong Zhang University of Minnesota University of Minnesota University of Minnesota Abstract In this paper, we consider

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

Unconstrained minimization

Unconstrained minimization CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained

More information

The Interplay Between Stability and Regret in Online Learning

The Interplay Between Stability and Regret in Online Learning The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com

More information

arxiv: v2 [math.oc] 8 Oct 2011

arxiv: v2 [math.oc] 8 Oct 2011 Stochastic convex optimization with bandit feedback Alekh Agarwal Dean P. Foster Daniel Hsu Sham M. Kakade, Alexander Rakhlin Department of EECS Department of Statistics Microsoft Research University of

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems Sébastien Bubeck Theory Group Part 1: i.i.d., adversarial, and Bayesian bandit models i.i.d. multi-armed bandit, Robbins [1952]

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University

Convex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Convergence rate of SGD

Convergence rate of SGD Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected

More information

Online Learning for Time Series Prediction

Online Learning for Time Series Prediction Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan

The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm

More information

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University

EASINESS IN BANDITS. Gergely Neu. Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University EASINESS IN BANDITS Gergely Neu Pompeu Fabra University THE BANDIT PROBLEM Play for T rounds attempting to maximize rewards THE BANDIT PROBLEM Play

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Half of Final Exam Name: Practice Problems October 28, 2014

Half of Final Exam Name: Practice Problems October 28, 2014 Math 54. Treibergs Half of Final Exam Name: Practice Problems October 28, 24 Half of the final will be over material since the last midterm exam, such as the practice problems given here. The other half

More information

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization

More information

Gaussian Process Optimization with Mutual Information

Gaussian Process Optimization with Mutual Information Gaussian Process Optimization with Mutual Information Emile Contal 1 Vianney Perchet 2 Nicolas Vayatis 1 1 CMLA Ecole Normale Suprieure de Cachan & CNRS, France 2 LPMA Université Paris Diderot & CNRS,

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

arxiv: v2 [stat.ml] 7 Sep 2018

arxiv: v2 [stat.ml] 7 Sep 2018 Projection-Free Bandit Convex Optimization Lin Chen 1, Mingrui Zhang 3 Amin Karbasi 1, 1 Yale Institute for Network Science, Department of Electrical Engineering, 3 Department of Statistics and Data Science,

More information

A Mini-Course on Convex Optimization with a view toward designing fast algorithms. Nisheeth K. Vishnoi

A Mini-Course on Convex Optimization with a view toward designing fast algorithms. Nisheeth K. Vishnoi A Mini-Course on Convex Optimization with a view toward designing fast algorithms Nisheeth K. Vishnoi i Preface These notes are largely based on lectures delivered by the author in the Fall of 2014. The

More information

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known

More information

An Optimal Affine Invariant Smooth Minimization Algorithm.

An Optimal Affine Invariant Smooth Minimization Algorithm. An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Bandit View on Continuous Stochastic Optimization

Bandit View on Continuous Stochastic Optimization Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Linear Bandits. Peter Bartlett

Linear Bandits. Peter Bartlett Linear Bandits. Peter Bartlett Linear bandits. Exponential weights with unbiased loss estimates. Controlling loss estimates and their variance. Barycentric spanner. Uniform distribution. John s distribution.

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Claudio Gentile INRIA and Google NY clagentile@gmailcom NYC March 6th, 2018 1 Content of this lecture Regret analysis of sequential prediction problems lying between

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent

CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi. Gradient Descent CS 435, 2018 Lecture 3, Date: 8 March 2018 Instructor: Nisheeth Vishnoi Gradient Descent This lecture introduces Gradient Descent a meta-algorithm for unconstrained minimization. Under convexity of the

More information

Barrier Method. Javier Peña Convex Optimization /36-725

Barrier Method. Javier Peña Convex Optimization /36-725 Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Better Algorithms for Benign Bandits

Better Algorithms for Benign Bandits Journal of Machine Learning Research 2 (20) 287-3 Submitted 8/09; Revised 0/2; Published 4/ Better Algorithms for Benign Bandits Elad Hazan Department of IE&M echnion Israel Institute of echnology Haifa

More information

I. The space C(K) Let K be a compact metric space, with metric d K. Let B(K) be the space of real valued bounded functions on K with the sup-norm

I. The space C(K) Let K be a compact metric space, with metric d K. Let B(K) be the space of real valued bounded functions on K with the sup-norm I. The space C(K) Let K be a compact metric space, with metric d K. Let B(K) be the space of real valued bounded functions on K with the sup-norm Proposition : B(K) is complete. f = sup f(x) x K Proof.

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4 Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.

More information

A Quantum Interior Point Method for LPs and SDPs

A Quantum Interior Point Method for LPs and SDPs A Quantum Interior Point Method for LPs and SDPs Iordanis Kerenidis 1 Anupam Prakash 1 1 CNRS, IRIF, Université Paris Diderot, Paris, France. September 26, 2018 Semi Definite Programs A Semidefinite Program

More information

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):

More information

Online Submodular Minimization

Online Submodular Minimization Online Submodular Minimization Elad Hazan IBM Almaden Research Center 650 Harry Rd, San Jose, CA 95120 hazan@us.ibm.com Satyen Kale Yahoo! Research 4301 Great America Parkway, Santa Clara, CA 95054 skale@yahoo-inc.com

More information

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem

Lecture 19: UCB Algorithm and Adversarial Bandit Problem. Announcements Review on stochastic multi-armed bandit problem Lecture 9: UCB Algorithm and Adversarial Bandit Problem EECS598: Prediction and Learning: It s Only a Game Fall 03 Lecture 9: UCB Algorithm and Adversarial Bandit Problem Prof. Jacob Abernethy Scribe:

More information

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs

Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Extracting Certainty from Uncertainty: Regret Bounded by Variation in Costs Elad Hazan IBM Almaden Research Center 650 Harry Rd San Jose, CA 95120 ehazan@cs.princeton.edu Satyen Kale Yahoo! Research 4301

More information

Unconstrained minimization: assumptions

Unconstrained minimization: assumptions Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,

More information

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,

More information

Sublinear Time Algorithms for Approximate Semidefinite Programming

Sublinear Time Algorithms for Approximate Semidefinite Programming Noname manuscript No. (will be inserted by the editor) Sublinear Time Algorithms for Approximate Semidefinite Programming Dan Garber Elad Hazan Received: date / Accepted: date Abstract We consider semidefinite

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

An Online Convex Optimization Approach to Blackwell s Approachability

An Online Convex Optimization Approach to Blackwell s Approachability Journal of Machine Learning Research 17 (2016) 1-23 Submitted 7/15; Revised 6/16; Published 8/16 An Online Convex Optimization Approach to Blackwell s Approachability Nahum Shimkin Faculty of Electrical

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

Bandit Online Learning with Unknown Delays

Bandit Online Learning with Unknown Delays Bandit Online Learning with Unknown Delays Bingcong Li, Tianyi Chen, and Georgios B. Giannakis University of Minnesota - Twin Cities, Minneapolis, MN 55455, USA {lixx5599,chen387,georgios}@umn.edu November

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his

More information

Math 361: Homework 1 Solutions

Math 361: Homework 1 Solutions January 3, 4 Math 36: Homework Solutions. We say that two norms and on a vector space V are equivalent or comparable if the topology they define on V are the same, i.e., for any sequence of vectors {x

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Online Optimization with Gradual Variations

Online Optimization with Gradual Variations JMLR: Workshop and Conference Proceedings vol (0) 0 Online Optimization with Gradual Variations Chao-Kai Chiang, Tianbao Yang 3 Chia-Jung Lee Mehrdad Mahdavi 3 Chi-Jen Lu Rong Jin 3 Shenghuo Zhu 4 Institute

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Minimax Policies for Combinatorial Prediction Games

Minimax Policies for Combinatorial Prediction Games Minimax Policies for Combinatorial Prediction Games Jean-Yves Audibert Imagine, Univ. Paris Est, and Sierra, CNRS/ENS/INRIA, Paris, France audibert@imagine.enpc.fr Sébastien Bubeck Centre de Recerca Matemàtica

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information