Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
|
|
- Diana Kennedy
- 5 years ago
- Views:
Transcription
1 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
2 The General Setting
3 The General Setting (Cover) Given only the above, learning isn't always possible
4 Some Natural Restrictions Realizability There is some hypothesis that makes no mistakes Implies simple algs like Consistent and Halving
5 Some Natural Restrictions Realizability There is some hypothesis that makes no mistakes Implies simple algs like Consistent and Halving Randomization Our predictions are made via a probability distribution, and the environment does not control the randomness Suggests Expected Regret as a performance metric Recall:
6 Some Natural Restrictions Realizability There is some hypothesis that makes no mistakes Implies simple algs like Consistent and Halving Randomization Our predictions are made via a probability distribution, and the environment does not control the randomness Suggests Expected Regret as a performance metric Non-adversarial Loss functions are sampled iid from some distribution This is stochastic gradient descent
7 Unifying Framework
8 Unifying Framework Randomization is just OCO with the probability simplex and a 'surrogate loss function' original loss function. that upper-bounds the
9 Unifying Framework Randomization is just OCO with the probability simplex and a 'surrogate loss function' original loss function. Realizability additionally assumes that upper-bounds the
10 How to Learn in OCO
11 The Naïve Approach For quadratic optimization problems, in which, FTL has regret.
12 The Naïve Approach For linear optimization problems, in which, FTL has unbounded regret. Just imagine a 1-dimensional alternating between -1 and 1 each round.
13 Follow-the-Regularized-Leader (FoReL) Regularization stabilizes the solution of follow-the-leader Goal: eliminate the jumpiness! Regularization function: R: S R t w t = argmin w S σ 1 i=1 f i w + R(w)
14 Special Case: Linear Loss Functions t FoReL: w t = argmin w S [σ 1 i=1 f i w + R(w)] Let f t w = < w, z t >, S = R d, R w = 1 2η w 2 2 w t+1 = argmin w σt i=1 = argmin w σt i=1 Set derivative equal to zero: 0 = σt i=1 f i w + R w < w, z i > + 1 w 2 2η 2 z i + 1 2w w 2η t+1 = η σt i=1 z i
15 Special Case: Linear Loss Functions t t 1 w t+1 = η z i = η i=1 i=1 z i ηz t = w t η < z t, t > 1 Note that f t w = < w, z t > f t w t = z t This gives us the formula for gradient descent for linear functions: w t+1 = w t η f t w t
16 Special Case: Linear Loss Functions How can we bound the regret? Regret T u = σ t=1 T (f t w t f t (u)) Let f t w = < w, z t >, S = R d, R w = 1 2η w 2 2 For all u, we have the following regret bound:
17 Special Case: Linear Loss Functions How can we bound the regret? Regret T u = σ T t=1 (f t w t f t (u)) Let f t w = < w, z t >, S = R d, R w = 1 w 2η 2 2 For all u, we have the following regret bound: Regularization term Loss functions have greater magnitude
18 Digression: Proof of Lemma 2.3 Define a function f 0 to be equal to the regularization function R Then, running FoReL on f 1,, f T is equivalent to running FTL on f 0, f 1,, f T Apply Lemma 2.1, which is:
19 Digression: Proof of Lemma 2.3, continued Since f 0 = R, this is equivalent to: T T f t w t f t u + R w 0 R u t=1 t=1 f t w t f t w t+1 + R w 0 R(w 1 )
20 Special Case: Linear Loss Functions (Thm. 2.4) From Lemma 2.3: By definitions of R and f t : Regret T u 1 2η u η w 1 1 2η u σt=1 T <w t w t+1, z t > Using that w t+1 = w t ηz t : Thus: σt=1 T <w t w t+1, z t > = 1 2η u σt=1 T <ηz t, z t > = 1 2η u 2 2 +η σt=1 T z t 2 2
21 Extending from Linear to Convex Functions: Subgradients
22 Extending from Linear to Convex Functions: Online Gradient Descent SGD is the special case where loss functions are sampled iid from a distribution
23 Extending from Linear to Convex Functions: (Unsatisfying) Regret Bound We want sublinear regret
24 Extending from Linear to Convex Functions: Lipschitz-ness to the rescue! A function f is L-Lipschitz if for all x, y in the domain of f, we have f(x) f(y) <= L x y
25 Extending from Linear to Convex Functions: Lipschitz-ness to the rescue! A function f is L-Lipschitz if for all x, y in the domain of f, we have f(x) f(y) <= L x y
26 Extending from Linear to Convex Functions: How to pick step size?
27 Special Cases: Specific Convex Functions Logarithmic Regret Algorithms (Hazan) If the loss functions have bounded first and second gradients, OGD has regret Other algorithms exist for weaker assumptions: Newton Step/FTApproximateL (for bounded gradient and concave exponential) Exponentially Weighted Optimization (for concave exponential)
28 Special Case: Smoothed OCO Andrew, L., Barman, S., Ligett, K., Lin, M., Meyerson, A., Roytman, A., & Wierman, A. (2013, June). A tale of two metrics: Simultaneous bounds on competitiveness and regret. In ACM SIGMETRICS Performance Evaluation Review (Vol. 41, No. 1, pp ). ACM. Chen, N., Agarwal, A., Wierman, A., Barman, S., & Andrew, L. L. (2015, June). Online convex optimization using predictions. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (pp ). ACM.
29 Special Case: Smoothed OCO The cost model Applications: Geographical Load Balancing, Dynamic Capacity Provisioning
30 Special Case: Smoothed OCO Some performance metrics
31 Special Case: Smoothed OCO Result from first paper You can t have both! (in the worst case)
32 Special Case: Smoothed OCO Result from second paper but in practice, it doesn t matter! Most noise is not adversarial Often have access to noisy predictions => Propose an algorithm with good performance in both metrics
33 Special Case: Smoothed OCO Averaging Fixed Horizon Control Fixed Horizon Control: Given predictions over timesteps t t + w, just play whatever is optimal over these steps. AFHC: average several FHC algorithms starting at different time steps
34 Special Case: Iterative Soft-Thresholding Algorithm Assume L can be decomposed into differentiable and nondifferentiable components: L w = G w + H(w) Differentiable Non-differentiable Example: L w = σ (xi,y i ) S l x i, y i, w + λ w 1 G H
35 Special Case: Iterative Soft-Thresholding Algorithm L w = G w + H w = σ (xi,y i ) S l x i, y i, w + λ w 1 Solve the differentiable part G using gradient descent: v t = w t 1 η t w G(w = w t 1 ) For the non-differentiable part H, add a regularization term: Perform the minimization:
36 Special Case: Iterative Soft-Thresholding Algorithm
37 Generalizing OCO
38 Generalizations What if our can break the rules sometimes? If the weight vectors need only be convex on average in the long run, lower-regret algorithms exist. (Jennaton, et al.)
39 Generalizations What if our can't change too quickly? Not only is the regret worse under these 'ramp constraints', but learners must be designed not to constrain their future actions. (Badiei, Li, Wierman)
40 Generalizations What if our loss functions aren't convex? Often OCO techniques still work, especially when training deep learners. (Balduzzi)
41 Generalizations What if we, or our experts, don't have to play each round? Often OCO techniques still work. (Balduzzi)
42 Generalizations What if we have a whole network of online learners that can communicate? Even when communication is local, low regret-algorithms exist. (Koppel, et al.)
43 Generalizations: Dynamically-Varying Environment What if the underlying environment varies over time? To improve performance, dynamically model the environment (Hall 2013, Hall 2014) Dynamic Mirror Descent: incorporates dynamical model state updates Dynamic Fixed Share: selects a dynamical model from a family of candidates at each time step DMD tracking regret bound: Φ is the dynamical system, and we take the regret with respect to the sequence of moves {θ t } R θ T = O( T[1 + θ t+1 Φ t θ t ]) Note: regret scales with deviation of {θ t } from dynamical system Φ t
44 Generalizations: Bandit Setting (Agarwal 2010) Bandit setting: at each time t, we only find out l t (x t ), not all of l t Completely adaptive adversary: chooses l t knowing x 1,, x t Regret is Ω(T): at least order T Adaptive adversary: chooses l t knowing x 1,, x t 1 Regret is Ω( T) Regret is O T with linear loss functions (i.e. O( T) with high probability) Compare with regret for full information case O( T) for convex Lipschitz and smooth loss functions O(log T ) for strongly convex and smooth loss functions
45 Generalizations: Multi-Point Bandit Setting Player queries each loss function at k randomized points Take expected regret WRT the player s randomness: Setting Regret = E 1 k σ T t=1 Multi-Point, k = 2; adaptive adversary Multi-Point, k = d + 1; completely adaptive adversary σ k i=1 l t y t,i Regret for convex Lipschitz and smooth loss functions O T (with high probability) T min E l t x x K t=1 Regret for strongly convex and smooth loss functions O(log T ) (expected) O( T) (deterministic) O(log T ) (deterministic) Full Information O( T) (deterministic) O(log T ) (deterministic)
46 Summary Online convex optimization framework captures a huge part of online learning Lots of algorithms we already know are OCO, for instance SGD Follow-the-regularized leader, and variants, all perform well in OCO Generalizations of OCO are widespread, and rely on OCO algorithms
47 Questions?
48 References Agarwal, Alekh, Ofer Dekel, and Lin Xiao. "Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback." COLT Andrew, L., Barman, S., Ligett, K., Lin, M., Meyerson, A., Roytman, A., & Wierman, A. (2013, June). A tale of two metrics: Simultaneous bounds on competitiveness and regret. In ACM SIGMETRICS Performance Evaluation Review (Vol. 41, No. 1, pp ). ACM. Badiei, Masoud, Na Li, and Adam Wierman. "Online Convex Optimization with Ramp Constraints. Balduzzi, David. "Deep Online Convex Optimization by Putting Forecaster to Sleep." arxiv preprint arxiv: (2015). Chen, N., Agarwal, A., Wierman, A., Barman, S., & Andrew, L. L. (2015, June). Online convex optimization using predictions. In Proceedings of the 2015 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (pp ). ACM. Hall, Eric C., and Rebecca M. Willett. "Online convex optimization in dynamic environments." Selected Topics in Signal Processing, IEEE Journal of 9.4 (2015): Hall, Eric C., and Rebecca M. Willett. "Dynamical models and tracking regret in online convex programming." Proceedings of the 30th International Conference on Machine Learning Hazan, Elad, Amit Agarwal, and Satyen Kale. "Logarithmic regret algorithms for online convex optimization." Machine Learning (2007): Jenatton, Rodolphe, Jim Huang, and Cédric Archambeau. "Adaptive Algorithms for Online Convex Optimization with Long-term Constraints." arxiv preprint arxiv: (2015). Koppel, Alec, Felicia Y. Jakubiec, and Alejandro Ribeiro. "A saddle point algorithm for networked online convex optimization." Signal Processing, IEEE Transactions on (2015): Yue, Yisong. Advanced Optimization Notes. Course notes.
49 A Couple Back-Up Slides
50 Strongly Convex Function
51 Eliminating the Time Horizon Dependence: The Doubling Trick For m = 0, 1, 2,, log 2 T : run algorithm on the 2 m rounds t = 2 m,, min(2 m+1 1, T) m = 0: run for t = 1 m = 1: run for t = 2, 3 m = 2: run for t = 3,, 7 Etc.
52 Eliminating the Time Horizon Dependence: The Doubling Trick Assume the algorithm s regret on each 2 m rounds is bounded by α 2 m log Regret T u σ 2 T m=0 α 2 m log = α σ 2 T m=0 2 m = α 1 2 log 2 T α 1 2log 2 T = α 1 2T 1 2 = 2T α α T So, the regret bound only worsens by a constant multiplicative factor
Online Convex Optimization Using Predictions
Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F
More informationOnline Learning with Experts & Multiplicative Weights Algorithms
Online Learning with Experts & Multiplicative Weights Algorithms CS 159 lecture #2 Stephan Zheng April 1, 2016 Caltech Table of contents 1. Online Learning with Experts With a perfect expert Without perfect
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationBandit Online Convex Optimization
March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More information1 Review and Overview
DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationAlireza Shafaei. Machine Learning Reading Group The University of British Columbia Summer 2017
s s Machine Learning Reading Group The University of British Columbia Summer 2017 (OCO) Convex 1/29 Outline (OCO) Convex Stochastic Bernoulli s (OCO) Convex 2/29 At each iteration t, the player chooses
More informationRegret bounded by gradual variation for online convex optimization
Noname manuscript No. will be inserted by the editor Regret bounded by gradual variation for online convex optimization Tianbao Yang Mehrdad Mahdavi Rong Jin Shenghuo Zhu Received: date / Accepted: date
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationLecture 16: FTRL and Online Mirror Descent
Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Mathematical Theory of Networks and Systems
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationONLINE CONVEX OPTIMIZATION USING PREDICTIONS
ONLINE CONVEX OPTIMIZATION USING PREDICTIONS NIANGJUN CHEN, ANISH AGARWAL, ADAM WIERMAN, SIDDHARTH BARMAN, LACHLAN L. H. ANDREW Abstract. Making use of predictions is a crucial, but under-explored, area
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationNotes on AdaGrad. Joseph Perla 2014
Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationOnline Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
More information15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #16: Gradient Descent February 18, 2015
5-859E: Advanced Algorithms CMU, Spring 205 Lecture #6: Gradient Descent February 8, 205 Lecturer: Anupam Gupta Scribe: Guru Guruganesh In this lecture, we will study the gradient descent algorithm and
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationDistributed online optimization over jointly connected digraphs
Distributed online optimization over jointly connected digraphs David Mateos-Núñez Jorge Cortés University of California, San Diego {dmateosn,cortes}@ucsd.edu Southern California Optimization Day UC San
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationarxiv: v1 [stat.ml] 23 Dec 2015
Adaptive Algorithms for Online Convex Optimization with Long-term Constraints Rodolphe Jenatton Jim Huang Cédric Archambeau Amazon, Berlin Amazon, Seattle {jenatton,huangjim,cedrica}@amazon.com arxiv:5.74v
More informationarxiv: v4 [math.oc] 5 Jan 2016
Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The
More information1 Overview. 2 Learning from Experts. 2.1 Defining a meaningful benchmark. AM 221: Advanced Optimization Spring 2016
AM 1: Advanced Optimization Spring 016 Prof. Yaron Singer Lecture 11 March 3rd 1 Overview In this lecture we will introduce the notion of online convex optimization. This is an extremely useful framework
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationFrom Bandits to Experts: A Tale of Domination and Independence
From Bandits to Experts: A Tale of Domination and Independence Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Domination and Independence 1 / 1 From Bandits to Experts: A
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationMinimizing Adaptive Regret with One Gradient per Iteration
Minimizing Adaptive Regret with One Gradient per Iteration Guanghui Wang, Dakuan Zhao, Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China wanggh@lamda.nju.edu.cn,
More informationOptimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic
More informationOnline Learning with Non-Convex Losses and Non-Stationary Regret
Online Learning with Non-Convex Losses and Non-Stationary Regret Xiang Gao Xiaobo Li Shuzhong Zhang University of Minnesota University of Minnesota University of Minnesota Abstract In this paper, we consider
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationOnline Learning. Jordan Boyd-Graber. University of Colorado Boulder LECTURE 21. Slides adapted from Mohri
Online Learning Jordan Boyd-Graber University of Colorado Boulder LECTURE 21 Slides adapted from Mohri Jordan Boyd-Graber Boulder Online Learning 1 of 31 Motivation PAC learning: distribution fixed over
More informationarxiv: v2 [cs.lg] 8 Jul 2018
Proceedings of Machine Learning Research vol 75:1 21, 2018 31st Annual Conference on Learning Theory Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent Niangjun Chen a Gautam
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationLecture: Adaptive Filtering
ECE 830 Spring 2013 Statistical Signal Processing instructors: K. Jamieson and R. Nowak Lecture: Adaptive Filtering Adaptive filters are commonly used for online filtering of signals. The goal is to estimate
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationDelay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms
Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationDynamic Regret of Strongly Adaptive Methods
Lijun Zhang 1 ianbao Yang 2 Rong Jin 3 Zhi-Hua Zhou 1 Abstract o cope with changing environments, recent developments in online learning have introduced the concepts of adaptive regret and dynamic regret
More informationA Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret
A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Minghong Lin Computer Science California Institute of Technology Adam Wierman Computer Science California Institute of Technology
More informationThe Interplay Between Stability and Regret in Online Learning
The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com
More informationThe Algorithmic Foundations of Adaptive Data Analysis November, Lecture The Multiplicative Weights Algorithm
he Algorithmic Foundations of Adaptive Data Analysis November, 207 Lecture 5-6 Lecturer: Aaron Roth Scribe: Aaron Roth he Multiplicative Weights Algorithm In this lecture, we define and analyze a classic,
More informationA Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints
A Low Complexity Algorithm with O( T ) Regret and Finite Constraint Violations for Online Convex Optimization with Long Term Constraints Hao Yu and Michael J. Neely Department of Electrical Engineering
More informationSubgradient Methods for Maximum Margin Structured Learning
Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing
More informationOnline Sparse Linear Regression
JMLR: Workshop and Conference Proceedings vol 49:1 11, 2016 Online Sparse Linear Regression Dean Foster Amazon DEAN@FOSTER.NET Satyen Kale Yahoo Research SATYEN@YAHOO-INC.COM Howard Karloff HOWARD@CC.GATECH.EDU
More informationA Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret
JMLR: Workshop and Conference Proceedings vol 30 (2013) 1 23 A Tale of Two Metrics: Simultaneous Bounds on Competitiveness and Regret Lachlan L. H. Andrew Swinburne University of Technology Siddharth Barman
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationMaking Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationAlgorithmic Stability and Generalization Christoph Lampert
Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationProjection-free Distributed Online Learning in Networks
Wenpeng Zhang Peilin Zhao 2 Wenwu Zhu Steven C. H. Hoi 3 Tong Zhang 4 Abstract The conditional gradient algorithm has regained a surge of research interest in recent years due to its high efficiency in
More informationConvergence rate of SGD
Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected
More informationOnline Algorithms: From Prediction to Decision
Online Algorithms: From Prediction to Decision hesis by Niangjun Chen In Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy CALIFORNIA INSIUE OF ECHNOLOGY Pasadena, California
More informationLogarithmic Regret Algorithms for Online Convex Optimization
Logarithmic Regret Algorithms for Online Convex Optimization Elad Hazan 1, Adam Kalai 2, Satyen Kale 1, and Amit Agarwal 1 1 Princeton University {ehazan,satyen,aagarwal}@princeton.edu 2 TTI-Chicago kalai@tti-c.org
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationLecture 6: Non-stochastic best arm identification
CSE599i: Online and Adaptive Machine Learning Winter 08 Lecturer: Kevin Jamieson Lecture 6: Non-stochastic best arm identification Scribes: Anran Wang, eibin Li, rian Chan, Shiqing Yu, Zhijin Zhou Disclaimer:
More informationNear-Optimal Algorithms for Online Matrix Prediction
JMLR: Workshop and Conference Proceedings vol 23 (2012) 38.1 38.13 25th Annual Conference on Learning Theory Near-Optimal Algorithms for Online Matrix Prediction Elad Hazan Technion - Israel Inst. of Tech.
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationOnline Learning, Mistake Bounds, Perceptron Algorithm
Online Learning, Mistake Bounds, Perceptron Algorithm 1 Online Learning So far the focus of the course has been on batch learning, where algorithms are presented with a sample of training data, from which
More informationAdvanced Machine Learning
Advanced Machine Learning Online Convex Optimization MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Outline Online projected sub-gradient descent. Exponentiated Gradient (EG). Mirror descent.
More informationOnline Learning with Gaussian Payoffs and Side Observations
Online Learning with Gaussian Payoffs and Side Observations Yifan Wu 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science University of Alberta 2 Department of Electrical and Electronic
More informationA Second-order Bound with Excess Losses
A Second-order Bound with Excess Losses Pierre Gaillard 12 Gilles Stoltz 2 Tim van Erven 3 1 EDF R&D, Clamart, France 2 GREGHEC: HEC Paris CNRS, Jouy-en-Josas, France 3 Leiden University, the Netherlands
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 4
Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationOnline and Stochastic Learning with a Human Cognitive Bias
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Online and Stochastic Learning with a Human Cognitive Bias Hidekazu Oiwa The University of Tokyo and JSPS Research Fellow 7-3-1
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationAdaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade
Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random
More informationEmpirical Risk Minimization Algorithms
Empirical Risk Minimization Algorithms Tirgul 2 Part I November 2016 Reminder Domain set, X : the set of objects that we wish to label. Label set, Y : the set of possible labels. A prediction rule, h:
More informationActive Learning: Disagreement Coefficient
Advanced Course in Machine Learning Spring 2010 Active Learning: Disagreement Coefficient Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz In previous lectures we saw examples in which
More informationOnline Learning with Feedback Graphs
Online Learning with Feedback Graphs Nicolò Cesa-Bianchi Università degli Studi di Milano Joint work with: Noga Alon (Tel-Aviv University) Ofer Dekel (Microsoft Research) Tomer Koren (Technion and Microsoft
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More information