Composite Objective Mirror Descent
|
|
- Eustacia Fleming
- 5 years ago
- Views:
Transcription
1 Composite Objective Mirror Descent John C. Duchi 1,3 Shai Shalev-Shwartz 2 Yoram Singer 3 Ambuj Tewari 4 1 University of California, Berkeley 2 Hebrew University of Jerusalem, Israel 3 Google Research 4 Toyota Technological Institute, Chicago June 29, 2010 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
2 Large scale logistic regression Problem: n huge, 1 n min log(1+exp( a i,x )) +λ x x n 1 i=1 }{{} =f(x) Usual approach: online gradient descent (Zinkevich 03). Let g t = log(1+exp( a t,x t )) x t+1 = x t η t g t η t λsign(x t ) Then perform online to batch conversion Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
3 Problems with usual approach Regret bound/convergence rate: set G = max t g t +λsign(x t ) 2 f(x T )+λ x T 1 = f(x )+λ x 1 +O ( ) x 2 G T But G = Θ( d) additional penalty of sign(x t ) No sparsity in x T Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
4 Problems with usual approach Regret bound/convergence rate: set G = max t g t +λsign(x t ) 2 f(x T )+λ x T 1 = f(x )+λ x 1 +O ( ) x 2 G T But G = Θ( d) additional penalty of sign(x t ) No sparsity in x T Why should we suffer from 1 term? Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
5 Online Gradient Descent Let g t = log(1+exp( a t,x t ))+λsign(x t ). OGD step (Zinkevich 03): { x t+1 = x t ηg t = argmin η g t,x + 1 } x 2 x x t 2 2 f(x) + λ x 1 f(x t ) + g t, x - x t Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
6 Online Gradient Descent Let g t = log(1+exp( a t,x t ))+λsign(x t ). OGD step (Zinkevich 03): { x t+1 = x t ηg t = argmin η g t,x + 1 } x 2 x x t 2 2 g t, x + B ψ (x, x t ) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
7 Problems with Subgradient Methods Subgradients are non-informative at singularities Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
8 Problems with Subgradient Methods Subgradients are non-informative at singularities Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
9 Problems with Subgradient Methods Subgradients are non-informative at singularities Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
10 Composite Objective Approach Let g t = log(1+exp( a t,x t )). Truncated gradient (Langford et al. 08, Duchi & Singer 09): x t+1 = argmin x { 1 2 x x t 2 +η g t,x +ηλ x 1 = sign(x t ηg t ) [ x t ηg t ηλ] + x t - ηg t } [ x t - ηg t - ηλ] + Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
11 Composite Objective Approach Update is Two nice things: Sparsity from [ ] + x t+1 = sign(x t ηg t ) [ x t ηg t ηλ] + Convergence rate: let G = max t g t 2 f(x T )+λ x T 1 = f(x )+λ x 1 +O ( ) x 2 G T No extra penalty from λ x 1! Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
12 Abstraction to Regularized Online Convex Optimization Repeat: Learner plays point x t Receive f t +ϕ (ϕ known) Suffer loss f t (x t )+ϕ(x t ) Goal: attain small regret R(T) := T t=1 f t (x t )+ϕ(x t ) inf x X T f t (x)+ϕ(x) t=1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
13 Composite Objective MIrror Descent Let g t = f t (x t ). Comid step: x t+1 = argmin{b ψ (x,x t )+η g t,x +ηϕ(x)} x X f(x) f(x) + ϕ(x) ϕ(x) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
14 Composite Objective MIrror Descent Let g t = f t (x t ). Comid step: x t+1 = argmin{b ψ (x,x t )+η g t,x +ηϕ(x)} x X f(x) + ϕ(x) B ψ (x, x t ) + g, x + ϕ(x) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
15 Convergence Results Old (online gradient/mirror descent): Theorem: For any x X, T f t (x t )+ϕ(x t ) f t (x ) ϕ(x ) t=1 T 1 η B ψ(x,x 1 )+ η 2 t=1 f t (x t )+ ϕ(x t ) 2 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
16 Convergence Results Old (online gradient/mirror descent): Theorem: For any x X, T f t (x t )+ϕ(x t ) f t (x ) ϕ(x ) t=1 T 1 η B ψ(x,x 1 )+ η 2 t=1 f t (x t )+ ϕ(x t ) 2 New (Comid): Theorem: For any x X, T f t (x t )+ϕ(x t ) f t (x ) ϕ(x ) 1 η B ψ(x,x 1 )+ η 2 t=1 T f t (x t ) 2 t=1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
17 Derived Algorithms FOBOS (Duchi & Singer, 2009) p-norm divergences Mixed-norm regularization Matrix Comid Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
18 p-norms Better l 1 algorithms: ϕ(x) = λ x 1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
19 p-norms Better l 1 algorithms: ϕ(x) = λ x 1 Idea: non-euclidean geometry (e.g. dense gradients, sparse x ) Recall 1 2(p 1) x 2 p is strongly convex over Rd w.r.t. l p, 1 < p 2 Take ψ(x) = 1 2 x 2 p Corollary: When f t(x t ) G, take p = 1+1/logd to get ) R(T) = O ( x 1 G T logd Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
20 Derived p-norm algorithms SMIDAS (Shalev-Shwartz & Tewari 2009): take ϕ(x) = λ x 1. Assume sign([ ψ(x)] j ) = sign(x j ), define S λ (z) = sign(z) [ z λ] + Then x t+1 = ( ψ) 1 S ηλ ( ψ(x t ) ηf t(x t )) λη t 0 x t+1 x t+1/2 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22 0
21 Comid with mixed norms ϕ(x) = X l1 /l q = X = x 1 x 2. x d d x j q j=1 x 1 q x 2 q. x d q Separable and solvable using previous methods Multitask and multiclass learning xj associated with feature j Penalize xj once Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
22 Mixed-norm p-norm algorithms Specialize problem to min x v,x x 2 p +λ x Closed form? No. Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
23 Mixed-norm p-norm algorithms Specialize problem to min x v,x x 2 p +λ x Closed form? No. Dual problem (x = v β): min β v β q subject to β 1 λ Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
24 Mixed-norm p-norm algorithms Problem: min β v β q subject to β 1 λ Observation: Monotonicity of β, so v i v j implies β i β j Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
25 Mixed-norm p-norm algorithms Problem: min β v β q subject to β 1 λ Observation: Monotonicity of β, so v i v j implies β i β j Root-finding problem: β 6 (θ) λ = d β i (θ) = i=1 d [ vi θ 1/(q 1)] + i=1 v 4 v 6 v 8 Solve with median-like search Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
26 Matrix Comid Idea: get sparsity in spectrum of X R d 1 d 2. Take ϕ(x) = X 1 = min{d 1,d 2 } i=1 σ i (X) Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
27 Matrix Comid Idea: get sparsity in spectrum of X R d 1 d 2. Take ϕ(x) = X 1 = min{d 1,d 2 } i=1 σ i (X) Schatten p-norms: apply p-norms to columns of X R d 1 d 2 X p = σ(x) p = Important fact: for 1 < p 2, ψ(x) = (min{d 1,d 2 } i=1 1 2(p 1) X 2 p is strongly convex w.r.t. p (Ball et al., 1994) σ i (X) p ) 1/p Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
28 Matrix Comid Schatten p-norms: apply p-norms to columns of X R d 1 d 2 X p = σ(x) p = Important fact: for 1 < p 2, ψ(x) = (min{d 1,d 2 } i=1 1 2(p 1) X 2 p σ i (X) p ) 1/p is strongly convex w.r.t. p (Ball et al., 1994) Consequence: Take p = 1+1/logd, G f t(x t ). Comid with above ψ has ) R(T) = O (G X 1 T logd Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
29 Trace-norm Regularization Idea: get sparsity in spectrum, take ϕ(x) = X 1 = i σ i(x) X t+1 = argminη f t(x t ),X +B ψ (X,X t )+ηλ X 1 X X For 1 < p 2, update is Compute SVD X t Gradient step X t+ 1 2 Compute SVD X t+ 1 2 = Uσ(X t )V = U diag( ψ(σ(x t )))V ηf t(x t ) = Ũσ(X t+ 1)Ṽ 2 Shrinkage X t+1 = [( ψ) Ũ diag 1 S ηλ (σ(x t+ 1)) 2 ]Ṽ Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
30 Trace-norm Regularization Example Proximal function: ψ(x) = 1 2 X 2 2 = 1 2 X 2 Fr Update: X t+ 1 2 Shrinkage: = X t ηf t(x t ) (= UΣ t+ 1V ) X t+1 = U [ ] Σ t+ 1 ηλ V u 2 σ 2 u 3 σ 3 u 4 σ 4 u 1 σ 1 Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
31 Trace-norm Regularization Example Proximal function: ψ(x) = 1 2 X 2 2 = 1 2 X 2 Fr Update: X t+ 1 2 = X t ηf t(x t ) (= UΣ t+ 1V ) 2 u 2 [σ 2 - λ] + u 1 [σ 1 - λ] + Shrinkage: X t+1 = U [ ] Σ t+ 1 ηλ V 2 + Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
32 Proof ideas for trace-norm Idea: Unitary invariance to reduce to vector case (Lewis 1995) ψ(x) = U diag[ ψ(σ(x))]v X 1 = U diag( σ(x) 1 )V Simply reduce to vector case with l 1 -regularization Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
33 Conclusions and Related Work All derivations apply to Regularized Dual Averaging (Xiao 2009) { x t+1 = argmin η x X t τ=1 } g τ,x +ηtϕ(x)+ψ(x) Analysis of online convex programming for regularized objectives Unify several previous algorithms (projected gradient, mirror descent, forward-backward splitting) Derived algorithms for several regularization functions Duchi (UC Berkeley & Google) Composite Objective Mirror Descent June 29, / 22
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationComposite Objective Mirror Descent
Composite Objective Mirror Descent John C. Duchi UC Berkeley jduchi@cs.berkeley.edu Shai Shalev-Shartz Hebre University shais@cs.huji.ac.il Yoram Singer Google Research singer@google.com Ambuj Teari TTI
More informationAdaptive Online Gradient Descent
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 6-4-2007 Adaptive Online Gradient Descent Peter Bartlett Elad Hazan Alexander Rakhlin University of Pennsylvania Follow
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi Elad Hazan Yoram Singer Electrical Engineering and Computer Sciences University of California at Berkeley Technical
More informationNotes on AdaGrad. Joseph Perla 2014
Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.
More informationOptimal Regularized Dual Averaging Methods for Stochastic Optimization
Optimal Regularized Dual Averaging Methods for Stochastic Optimization Xi Chen Machine Learning Department Carnegie Mellon University xichen@cs.cmu.edu Qihang Lin Javier Peña Tepper School of Business
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationFollow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization
Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization H. Brendan McMahan Google, Inc. Abstract We prove that many mirror descent algorithms for online conve optimization
More informationRandomized Smoothing for Stochastic Optimization
Randomized Smoothing for Stochastic Optimization John Duchi Peter Bartlett Martin Wainwright University of California, Berkeley NIPS Big Learn Workshop, December 2011 Duchi (UC Berkeley) Smoothing and
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationAccelerating Online Convex Optimization via Adaptive Prediction
Mehryar Mohri Courant Institute and Google Research New York, NY 10012 mohri@cims.nyu.edu Scott Yang Courant Institute New York, NY 10012 yangs@cims.nyu.edu Abstract We present a powerful general framework
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationLogarithmic Regret Algorithms for Strongly Convex Repeated Games
Logarithmic Regret Algorithms for Strongly Convex Repeated Games Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc 1600
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationUnconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations
JMLR: Workshop and Conference Proceedings vol 35:1 20, 2014 Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations H. Brendan McMahan MCMAHAN@GOOGLE.COM Google,
More informationStochastic Optimization: First order method
Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO
More informationThe Interplay Between Stability and Regret in Online Learning
The Interplay Between Stability and Regret in Online Learning Ankan Saha Department of Computer Science University of Chicago ankans@cs.uchicago.edu Prateek Jain Microsoft Research India prajain@microsoft.com
More informationMirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik
Mirror Descent for Metric Learning Gautam Kunapuli Jude W. Shavlik And what do we have here? We have a metric learning algorithm that uses composite mirror descent (COMID): Unifying framework for metric
More informationOnline Passive-Aggressive Algorithms
Online Passive-Aggressive Algorithms Koby Crammer Ofer Dekel Shai Shalev-Shwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More informationConvergence rate of SGD
Convergence rate of SGD heorem: (see Nemirovski et al 09 from readings) Let f be a strongly convex stochastic function Assume gradient of f is Lipschitz continuous and bounded hen, for step sizes: he expected
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationStochastic Methods for l 1 Regularized Loss Minimization
Shai Shalev-Shwartz Ambuj Tewari Toyota Technological Institute at Chicago, 6045 S Kenwood Ave, Chicago, IL 60637, USA SHAI@TTI-CORG TEWARI@TTI-CORG Abstract We describe and analyze two stochastic methods
More informationStochastic Optimization Part I: Convex analysis and online stochastic optimization
Stochastic Optimization Part I: Convex analysis and online stochastic optimization Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical
More informationIs the test error unbiased for these programs? 2017 Kevin Jamieson
Is the test error unbiased for these programs? 2017 Kevin Jamieson 1 Is the test error unbiased for this program? 2017 Kevin Jamieson 2 Simple Variable Selection LASSO: Sparse Regression Machine Learning
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationAdaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade
Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random
More informationLecture 1: Background on Convex Analysis
Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes
More informationIntroduction to Machine Learning (67577) Lecture 3
Introduction to Machine Learning (67577) Lecture 3 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem General Learning Model and Bias-Complexity tradeoff Shai Shalev-Shwartz
More informationAdaptive Online Learning in Dynamic Environments
Adaptive Online Learning in Dynamic Environments Lijun Zhang, Shiyin Lu, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China {zhanglj, lusy, zhouzh}@lamda.nju.edu.cn
More informationA Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds
Proceedings of Machine Learning Research 76: 40, 207 Algorithmic Learning Theory 207 A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds Pooria
More informationOn the Generalization Ability of Online Strongly Convex Programming Algorithms
On the Generalization Ability of Online Strongly Convex Programming Algorithms Sham M. Kakade I Chicago Chicago, IL 60637 sham@tti-c.org Ambuj ewari I Chicago Chicago, IL 60637 tewari@tti-c.org Abstract
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. October 7, Efficiency: If size(w) = 100B, each prediction is expensive:
Simple Variable Selection LASSO: Sparse Regression Machine Learning CSE546 Carlos Guestrin University of Washington October 7, 2013 1 Sparsity Vector w is sparse, if many entries are zero: Very useful
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationLecture 16: FTRL and Online Mirror Descent
Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which
More informationThe Online Approach to Machine Learning
The Online Approach to Machine Learning Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Approach to ML 1 / 53 Summary 1 My beautiful regret 2 A supposedly fun game I
More informationAdaptive Gradient Methods AdaGrad / Adam
Case Study 1: Estimating Click Probabilities Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 The Problem with GD (and SGD)
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More information10725/36725 Optimization Homework 2 Solutions
10725/36725 Optimization Homework 2 Solutions 1 No Regrets About Taking Optimization? (Aaditya) 1.1 A Game Against An Adversary [2.5 points] We shall deal with a fixed, closed, non-empty, convex set S
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationConvex Repeated Games and Fenchel Duality
Convex Repeated Games and Fenchel Duality Shai Shalev-Shwartz 1 and Yoram Singer 1,2 1 School of Computer Sci. & Eng., he Hebrew University, Jerusalem 91904, Israel 2 Google Inc. 1600 Amphitheater Parkway,
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationDelay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms
Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms Pooria Joulani 1 András György 2 Csaba Szepesvári 1 1 Department of Computing Science, University of Alberta,
More informationNo-Regret Algorithms for Unconstrained Online Convex Optimization
No-Regret Algorithms for Unconstrained Online Convex Optimization Matthew Streeter Duolingo, Inc. Pittsburgh, PA 153 matt@duolingo.com H. Brendan McMahan Google, Inc. Seattle, WA 98103 mcmahan@google.com
More informationOnline Learning and Online Convex Optimization
Online Learning and Online Convex Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Online Learning 1 / 49 Summary 1 My beautiful regret 2 A supposedly fun game
More informationBregman Alternating Direction Method of Multipliers
Bregman Alternating Direction Method of Multipliers Huahua Wang, Arindam Banerjee Dept of Computer Science & Engg, University of Minnesota, Twin Cities {huwang,banerjee}@cs.umn.edu Abstract The mirror
More informationOnline Learning Summer School Copenhagen 2015 Lecture 1
Online Learning Summer School Copenhagen 2015 Lecture 1 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Online Learning Shai Shalev-Shwartz (Hebrew U) OLSS Lecture
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationBregman Divergence and Mirror Descent
Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,
More informationA survey: The convex optimization approach to regret minimization
A survey: The convex optimization approach to regret minimization Elad Hazan September 10, 2009 WORKING DRAFT Abstract A well studied and general setting for prediction and decision making is regret minimization
More informationEfficient Bandit Algorithms for Online Multiclass Prediction
Efficient Bandit Algorithms for Online Multiclass Prediction Sham Kakade, Shai Shalev-Shwartz and Ambuj Tewari Presented By: Nakul Verma Motivation In many learning applications, true class labels are
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationMachine Learning in the Data Revolution Era
Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,
More information4.1 Online Convex Optimization
CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..
More information1 Review and Overview
DRAFT a final version will be posted shortly CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture # 16 Scribe: Chris Cundy, Ananya Kumar November 14, 2018 1 Review and Overview Last
More informationStabilized Sparse Online Learning for Sparse Data
Journal of Machine Learning Research 18 (2017) 1-36 Submitted 4/16; Revised 8/17; Published 12/17 Stabilized Sparse Online Learning for Sparse Data Yuting Ma Department of Statistics Columbia University
More informationl 1 and l 2 Regularization
David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationNoisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get
Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 4
Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationIncremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning
Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationMirror Descent for Metric Learning: A Unified Approach
Mirror Descent for Metric Learning: A Unified Approach Gautam Kunapuli and Jude Shavlik University of Wisconsin-Madison, USA Abstract. Most metric learning methods are characterized by diverse loss functions
More informationOslo Class 4 Early Stopping and Spectral Regularization
RegML2017@SIMULA Oslo Class 4 Early Stopping and Spectral Regularization Lorenzo Rosasco UNIGE-MIT-IIT June 28, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More information1. Introduction. We consider the problem of minimizing the sum of two convex functions: = F (x)+r(x)}, f i (x),
SIAM J. OPTIM. Vol. 24, No. 4, pp. 2057 2075 c 204 Society for Industrial and Applied Mathematics A PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION LIN XIAO AND TONG ZHANG Abstract.
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationOn the tradeoff between computational complexity and sample complexity in learning
On the tradeoff between computational complexity and sample complexity in learning Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Joint work with Sham
More informationExponentiated Gradient Descent
CSE599s, Spring 01, Online Learning Lecture 10-04/6/01 Lecturer: Ofer Dekel Exponentiated Gradient Descent Scribe: Albert Yu 1 Introduction In this lecture we review norms, dual norms, strong convexity,
More informationAdaptivity and Optimism: An Improved Exponentiated Gradient Algorithm
Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm Jacob Steinhardt Percy Liang Stanford University {jsteinhardt,pliang}@cs.stanford.edu Jun 11, 2013 J. Steinhardt & P. Liang (Stanford)
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationMachine Learning Lecture 6 Note
Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationBasis Adaptation for Sparse Nonlinear Reinforcement Learning
Basis Adaptation for Sparse Nonlinear Reinforcement Learning Sridhar Mahadevan, Stephen Giguere, and Nicholas Jacek School of Computer Science University of Massachusetts, Amherst mahadeva@cs.umass.edu,
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationA New Perspective on an Old Perceptron Algorithm
A New Perspective on an Old Perceptron Algorithm Shai Shalev-Shwartz 1,2 and Yoram Singer 1,2 1 School of Computer Sci & Eng, The Hebrew University, Jerusalem 91904, Israel 2 Google Inc, 1600 Amphitheater
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More informationStochastic Subgradient Method
Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationStochastic and Adversarial Online Learning without Hyperparameters
Stochastic and Adversarial Online Learning without Hyperparameters Ashok Cutkosky Department of Computer Science Stanford University ashokc@cs.stanford.edu Kwabena Boahen Department of Bioengineering Stanford
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More information