Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Size: px
Start display at page:

Download "Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016"

Transcription

1 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

2 The General Setting

3 The General Setting (Cover) Given only the above, learning isn't always possible

4 Some Natural Restrictions Realizability There is some hypothesis that makes no mistakes Implies simple algs like Consistent and Halving

5 Some Natural Restrictions Realizability There is some hypothesis that makes no mistakes Implies simple algs like Consistent and Halving Randomization Our predictions are made via a probability distribution, and the environment does not control the randomness Suggests Expected Regret as a performance metric Recall:

6 Some Natural Restrictions Realizability There is some hypothesis that makes no mistakes Implies simple algs like Consistent and Halving Randomization Our predictions are made via a probability distribution, and the environment does not control the randomness Suggests Expected Regret as a performance metric Non-adversarial Loss functions are sampled iid from some distribution This is stochastic gradient descent

7 Unifying Framework

8 Unifying Framework Randomization is just OCO with the probability simplex and a 'surrogate loss function' original loss function. that upper-bounds the

9 Unifying Framework Randomization is just OCO with the probability simplex and a 'surrogate loss function' original loss function. Realizability additionally assumes that upper-bounds the

10 How to Learn in OCO

11 The Naïve Approach For quadratic optimization problems, in which, FTL has regret.

12 The Naïve Approach For linear optimization problems, in which, FTL has unbounded regret. Just imagine a 1-dimensional alternating between -1 and 1 each round.

13 Follow-the-Regularized-Leader (FoReL) Regularization stabilizes the solution of follow-the-leader Goal: eliminate the jumpiness! Regularization function: R: S R t w t = argmin w S σ 1 i=1 f i w + R(w)

14 Special Case: Linear Loss Functions t FoReL: w t = argmin w S [σ 1 i=1 f i w + R(w)] Let f t w = < w, z t >, S = R d, R w = 1 2η w 2 2 w t+1 = argmin w σt i=1 = argmin w σt i=1 Set derivative equal to zero: 0 = σt i=1 f i w + R w < w, z i > + 1 w 2 2η 2 z i + 1 2w w 2η t+1 = η σt i=1 z i

15 Special Case: Linear Loss Functions t t 1 w t+1 = η z i = η i=1 i=1 z i ηz t = w t η < z t, t > 1 Note that f t w = < w, z t > f t w t = z t This gives us the formula for gradient descent for linear functions: w t+1 = w t η f t w t

16 Special Case: Linear Loss Functions How can we bound the regret? Regret T u = σ t=1 T (f t w t f t (u)) Let f t w = < w, z t >, S = R d, R w = 1 2η w 2 2 For all u, we have the following regret bound:

17 Special Case: Linear Loss Functions How can we bound the regret? Regret T u = σ T t=1 (f t w t f t (u)) Let f t w = < w, z t >, S = R d, R w = 1 w 2η 2 2 For all u, we have the following regret bound: Regularization term Loss functions have greater magnitude

18 Digression: Proof of Lemma 2.3 Define a function f 0 to be equal to the regularization function R Then, running FoReL on f 1,, f T is equivalent to running FTL on f 0, f 1,, f T Apply Lemma 2.1, which is:

19 Digression: Proof of Lemma 2.3, continued Since f 0 = R, this is equivalent to: T T f t w t f t u + R w 0 R u t=1 t=1 f t w t f t w t+1 + R w 0 R(w 1 )

20 Special Case: Linear Loss Functions (Thm. 2.4) From Lemma 2.3: By definitions of R and f t : Regret T u 1 2η u η w 1 1 2η u σt=1 T <w t w t+1, z t > Using that w t+1 = w t ηz t : Thus: σt=1 T <w t w t+1, z t > = 1 2η u σt=1 T <ηz t, z t > = 1 2η u 2 2 +η σt=1 T z t 2 2

21 Extending from Linear to Convex Functions: Subgradients

22 Extending from Linear to Convex Functions: Online Gradient Descent SGD is the special case where loss functions are sampled iid from a distribution

23 Extending from Linear to Convex Functions: (Unsatisfying) Regret Bound We want sublinear regret

24 Extending from Linear to Convex Functions: Lipschitz-ness to the rescue! A function f is L-Lipschitz if for all x, y in the domain of f, we have f(x) f(y) <= L x y

25 Extending from Linear to Convex Functions: Lipschitz-ness to the rescue! A function f is L-Lipschitz if for all x, y in the domain of f, we have f(x) f(y) <= L x y

26 Extending from Linear to Convex Functions: How to pick step size?

27 Special Cases: Specific Convex Functions Logarithmic Regret Algorithms (Hazan) If the loss functions have bounded first and second gradients, OGD has regret Other algorithms exist for weaker assumptions: Newton Step/FTApproximateL (for bounded gradient and concave exponential) Exponentially Weighted Optimization (for concave exponential)

28 Special Case: Smoothed OCO Andrew, L., Barman, S., Ligett, K., Lin, M., Meyerson, A., Roytman, A., & Wierman, A. (2013, June). A tale of two metrics: Simultaneous bounds on competitiveness and regret. In ACM SIGMETRICS Performance Evaluation Review (Vol. 41, No. 1, pp ). ACM. Chen, N., Agarwal, A., Wierman, A., Barman, S., & Andrew, L. L. (2015, June). Online convex optimization using predictions. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (pp ). ACM.

29 Special Case: Smoothed OCO The cost model Applications: Geographical Load Balancing, Dynamic Capacity Provisioning

30 Special Case: Smoothed OCO Some performance metrics

31 Special Case: Smoothed OCO Result from first paper You can t have both! (in the worst case)

32 Special Case: Smoothed OCO Result from second paper but in practice, it doesn t matter! Most noise is not adversarial Often have access to noisy predictions => Propose an algorithm with good performance in both metrics

33 Special Case: Smoothed OCO Averaging Fixed Horizon Control Fixed Horizon Control: Given predictions over timesteps t t + w, just play whatever is optimal over these steps. AFHC: average several FHC algorithms starting at different time steps

34 Special Case: Iterative Soft-Thresholding Algorithm Assume L can be decomposed into differentiable and nondifferentiable components: L w = G w + H(w) Differentiable Non-differentiable Example: L w = σ (xi,y i ) S l x i, y i, w + λ w 1 G H

35 Special Case: Iterative Soft-Thresholding Algorithm L w = G w + H w = σ (xi,y i ) S l x i, y i, w + λ w 1 Solve the differentiable part G using gradient descent: v t = w t 1 η t w G(w = w t 1 ) For the non-differentiable part H, add a regularization term: Perform the minimization:

36 Special Case: Iterative Soft-Thresholding Algorithm

37 Generalizing OCO

38 Generalizations What if our can break the rules sometimes? If the weight vectors need only be convex on average in the long run, lower-regret algorithms exist. (Jennaton, et al.)

39 Generalizations What if our can't change too quickly? Not only is the regret worse under these 'ramp constraints', but learners must be designed not to constrain their future actions. (Badiei, Li, Wierman)

40 Generalizations What if our loss functions aren't convex? Often OCO techniques still work, especially when training deep learners. (Balduzzi)

41 Generalizations What if we, or our experts, don't have to play each round? Often OCO techniques still work. (Balduzzi)

42 Generalizations What if we have a whole network of online learners that can communicate? Even when communication is local, low regret-algorithms exist. (Koppel, et al.)

43 Generalizations: Dynamically-Varying Environment What if the underlying environment varies over time? To improve performance, dynamically model the environment (Hall 2013, Hall 2014) Dynamic Mirror Descent: incorporates dynamical model state updates Dynamic Fixed Share: selects a dynamical model from a family of candidates at each time step DMD tracking regret bound: Φ is the dynamical system, and we take the regret with respect to the sequence of moves {θ t } R θ T = O( T[1 + θ t+1 Φ t θ t ]) Note: regret scales with deviation of {θ t } from dynamical system Φ t

44 Generalizations: Bandit Setting (Agarwal 2010) Bandit setting: at each time t, we only find out l t (x t ), not all of l t Completely adaptive adversary: chooses l t knowing x 1,, x t Regret is Ω(T): at least order T Adaptive adversary: chooses l t knowing x 1,, x t 1 Regret is Ω( T) Regret is O T with linear loss functions (i.e. O( T) with high probability) Compare with regret for full information case O( T) for convex Lipschitz and smooth loss functions O(log T ) for strongly convex and smooth loss functions

45 Generalizations: Multi-Point Bandit Setting Player queries each loss function at k randomized points Take expected regret WRT the player s randomness: Setting Regret = E 1 k σ T t=1 Multi-Point, k = 2; adaptive adversary Multi-Point, k = d + 1; completely adaptive adversary σ k i=1 l t y t,i Regret for convex Lipschitz and smooth loss functions O T (with high probability) T min E l t x x K t=1 Regret for strongly convex and smooth loss functions O(log T ) (expected) O( T) (deterministic) O(log T ) (deterministic) Full Information O( T) (deterministic) O(log T ) (deterministic)

46 Summary Online convex optimization framework captures a huge part of online learning Lots of algorithms we already know are OCO, for instance SGD Follow-the-regularized leader, and variants, all perform well in OCO Generalizations of OCO are widespread, and rely on OCO algorithms

47 Questions?

48 References Agarwal, Alekh, Ofer Dekel, and Lin Xiao. "Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback." COLT Andrew, L., Barman, S., Ligett, K., Lin, M., Meyerson, A., Roytman, A., & Wierman, A. (2013, June). A tale of two metrics: Simultaneous bounds on competitiveness and regret. In ACM SIGMETRICS Performance Evaluation Review (Vol. 41, No. 1, pp ). ACM. Badiei, Masoud, Na Li, and Adam Wierman. "Online Convex Optimization with Ramp Constraints. Balduzzi, David. "Deep Online Convex Optimization by Putting Forecaster to Sleep." arxiv preprint arxiv: (2015). Chen, N., Agarwal, A., Wierman, A., Barman, S., & Andrew, L. L. (2015, June). Online convex optimization using predictions. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (pp ). ACM. Hall, Eric C., and Rebecca M. Willett. "Online convex optimization in dynamic environments." Selected Topics in Signal Processing, IEEE Journal of 9.4 (2015): Hall, Eric C., and Rebecca M. Willett. "Dynamical models and tracking regret in online convex programming." Proceedings of the 30th International Conference on Machine Learning Hazan, Elad, Amit Agarwal, and Satyen Kale. "Logarithmic regret algorithms for online convex optimization." Machine Learning (2007): Jenatton, Rodolphe, Jim Huang, and Cédric Archambeau. "Adaptive Algorithms for Online Convex Optimization with Long-term Constraints." arxiv preprint arxiv: (2015). Koppel, Alec, Felicia Y. Jakubiec, and Alejandro Ribeiro. "A saddle point algorithm for networked online convex optimization." Signal Processing, IEEE Transactions on (2015): Yue, Yisong. Advanced Optimization Notes. Course notes.

49 A Couple Back-Up Slides

50 Strongly Convex Function

51 Eliminating the Time Horizon Dependence: The Doubling Trick For m = 0, 1, 2,, log 2 T : run algorithm on the 2 m rounds t = 2 m,, min(2 m+1 1, T) m = 0: run for t = 1 m = 1: run for t = 2, 3 m = 2: run for t = 3,, 7 Etc.

52 Eliminating the Time Horizon Dependence: The Doubling Trick Assume the algorithm s regret on each 2 m rounds is bounded by α 2 m log Regret T u σ 2 T m=0 α 2 m log = α σ 2 T m=0 2 m = α 1 2 log 2 T α 1 2log 2 T = α 1 2T 1 2 = 2T α α T So, the regret bound only worsens by a constant multiplicative factor

Online Convex Optimization Using Predictions

Online Convex Optimization Using Predictions Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F

More information

Online Learning with Experts & Multiplicative Weights Algorithms

Online Learning with Experts & Multiplicative Weights Algorithms Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect

More information

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.

Tutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning. Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research

OLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and

More information

Adaptive Online Learning in Dynamic Environments

Adaptive Online Learning in Dynamic Environments Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn

More information

Bandit Online Convex Optimization

Bandit Online Convex Optimization March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm

Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)

More information

1 Review and Overview

1 Review and Overview DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last

More information

A survey: The convex optimization approach to regret minimization

A survey: The convex optimization approach to regret minimization A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization

More information

Exponentiated Gradient Descent

Exponentiated Gradient Descent CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow

More information

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017

Alireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017 s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses

More information

Regret bounded by gradual variation for online convex optimization

Regret bounded by gradual variation for online convex optimization Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date

More information

Online Learning and Online Convex Optimization

Online Learning and Online Convex Optimization Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game

More information

The FTRL Algorithm with Strongly Convex Regularizers

The FTRL Algorithm with Strongly Convex Regularizers CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked

More information

Stochastic and Adversarial Online Learning without Hyperparameters

Stochastic and Adversarial Online Learning without Hyperparameters Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Lecture 16: Perceptron and Exponential Weights Algorithm

Lecture 16: Perceptron and Exponential Weights Algorithm EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

ONLINE CONVEX OPTIMIZATION USING PREDICTIONS

ONLINE CONVEX OPTIMIZATION USING PREDICTIONS ONLINE CONVEX OPTIMIZATION USING PREDICTIONS NIANGJUN CHEN, ANISH AGARWAL, ADAM WIERMAN, SIDDHARTH BARMAN, LACHLAN L. H. ANDREW Abstract. Making use of predictions is a crucial, but under-explored, area

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

The Online Approach to Machine Learning

The Online Approach to Machine Learning The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I

More information

Notes on AdaGrad. Joseph Perla 2014

Notes on AdaGrad. Joseph Perla 2014 Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.

More information

Learnability, Stability, Regularization and Strong Convexity

Learnability, Stability, Regularization and Strong Convexity Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago

More information

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems 216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems

More information

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015 5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and

More information

Optimal and Adaptive Online Learning

Optimal and Adaptive Online Learning Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Distributed online optimization over jointly connected digraphs

Distributed online optimization over jointly connected digraphs Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

arxiv: v1 [stat.ml] 23 Dec 2015

arxiv: v1 [stat.ml] 23 Dec 2015 Adaptive Algorithms for Online Convex Optimization with Long-term Constraints Rodolphe Jenatton Jim Huang Cédric Archambeau Amazon, Berlin Amazon, Seattle {jenatton,huangjim,cedrica}@amazon.com arxiv:5.74v

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016 AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

From Bandits to Experts: A Tale of Domination and Independence

From Bandits to Experts: A Tale of Domination and Independence From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Minimizing Adaptive Regret with One Gradient per Iteration

Minimizing Adaptive Regret with One Gradient per Iteration Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,

More information

Optimization, Learning, and Games with Predictable Sequences

Optimization, Learning, and Games with Predictable Sequences Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic

More information

Online Learning with Non-Convex Losses and Non-Stationary Regret

Online Learning with Non-Convex Losses and Non-Stationary Regret Online Learning with Non-Convex Losses and Non-Stationary Regret Xiang Gao Xiaobo Li Shuzhong Zhang University of Minnesota University of Minnesota University of Minnesota Abstract In this paper, we consider

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms

Lecture 23: Online convex optimization Online convex optimization: generalization of several algorithms EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected

More information

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri

Online Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over

More information

arxiv: v2 [cs.lg] 8 Jul 2018

arxiv: v2 [cs.lg] 8 Jul 2018 Proceedings of Machine Learning Research vol 75:1 21, 2018 31st Annual Conference on Learning Theory Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent Niangjun Chen a Gautam

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Lecture: Adaptive Filtering

Lecture: Adaptive Filtering ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate

More information

No-Regret Algorithms for Unconstrained Online Convex Optimization

No-Regret Algorithms for Unconstrained Online Convex Optimization No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com

More information

Logarithmic Regret Algorithms for Strongly Convex Repeated Games

Logarithmic Regret Algorithms for Strongly Convex Repeated Games Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600

More information

Introduction to Machine Learning (67577) Lecture 7

Introduction to Machine Learning (67577) Lecture 7 Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew

More information

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,

More information

Online Optimization : Competing with Dynamic Comparators

Online Optimization : Competing with Dynamic Comparators Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Dynamic Regret of Strongly Adaptive Methods

Dynamic Regret of Strongly Adaptive Methods Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret

More information

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Minghong Lin Computer Science California Institute of Technology Adam Wierman Computer Science California Institute of Technology

More information

The Interplay Between Stability and Regret in Online Learning

The Interplay Between Stability and Regret in Online Learning The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com

More information

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm

The Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,

More information

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints

A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering

More information

Subgradient Methods for Maximum Margin Structured Learning

Subgradient Methods for Maximum Margin Structured Learning Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing

More information

Online Sparse Linear Regression

Online Sparse Linear Regression JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU

More information

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret

A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret JMLR: Workshop and Conference Proceedings vol 30 (2013) 1 23 A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Lachlan L. H. Andrew Swinburne University of Technology Siddharth Barman

More information

Information theoretic perspectives on learning algorithms

Information theoretic perspectives on learning algorithms Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

Online Passive-Aggressive Algorithms

Online Passive-Aggressive Algorithms Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Projection-free Distributed Online Learning in Networks

Projection-free Distributed Online Learning in Networks Wenpeng Zhang Peilin Zhao 2 Wenwu Zhu Steven C. H. Hoi 3 Tong Zhang 4 Abstract The conditional gradient algorithm has regained a surge of research interest in recent years due to its high efficiency in

More information

Convergence rate of SGD

Convergence rate of SGD Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected

More information

Online Algorithms: From Prediction to Decision

Online Algorithms: From Prediction to Decision Online Algorithms: From Prediction to Decision hesis by Niangjun Chen In Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy CALIFORNIA INSIUE OF ECHNOLOGY Pasadena, California

More information

Logarithmic Regret Algorithms for Online Convex Optimization

Logarithmic Regret Algorithms for Online Convex Optimization Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org

More information

Logistic Regression Logistic

Logistic Regression Logistic Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,

More information

Lecture 6: Non-stochastic best arm identification

Lecture 6: Non-stochastic best arm identification CSE599i: Online and Adaptive Machine Learning Winter 08 Lecturer: Kevin Jamieson Lecture 6: Non-stochastic best arm identification Scribes: Anran Wang, eibin Li, rian Chan, Shiqing Yu, Zhijin Zhou Disclaimer:

More information

Near-Optimal Algorithms for Online Matrix Prediction

Near-Optimal Algorithms for Online Matrix Prediction JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.

More information

On the Generalization Ability of Online Strongly Convex Programming Algorithms

On the Generalization Ability of Online Strongly Convex Programming Algorithms On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

Online Learning, Mistake Bounds, Perceptron Algorithm

Online Learning, Mistake Bounds, Perceptron Algorithm Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.

More information

Online Learning with Gaussian Payoffs and Side Observations

Online Learning with Gaussian Payoffs and Side Observations Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic

More information

A Second-order Bound with Excess Losses

A Second-order Bound with Excess Losses A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4 Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Online and Stochastic Learning with a Human Cognitive Bias

Online and Stochastic Learning with a Human Cognitive Bias Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Online and Stochastic Learning with a Human Cognitive Bias Hidekazu Oiwa The University of Tokyo and JSPS Research Fellow 7-3-1

More information

Bandits for Online Optimization

Bandits for Online Optimization Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each

More information

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random

More information

Empirical Risk Minimization Algorithms

Empirical Risk Minimization Algorithms Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:

More information

Active Learning: Disagreement Coefficient

Active Learning: Disagreement Coefficient Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which

More information

Online Learning with Feedback Graphs

Online Learning with Feedback Graphs Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft

More information

Stochastic Gradient Descent with Only One Projection

Stochastic Gradient Descent with Only One Projection Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Proximal and First-Order Methods for Convex Optimization

Proximal and First-Order Methods for Convex Optimization Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information