How hard is this function to optimize?
|
|
- Maude Dean
- 5 years ago
- Views:
Transcription
1 How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016
2 Problem minimize f(x) := E[F(x;ξ)] subject to x X. (1) where x F(x;ξ) is convex, X is closed convex set
3 Problem minimize f(x) := E[F(x;ξ)] subject to x X. (1) where x F(x;ξ) is convex, X is closed convex set Two questions: 1. How hard is it to solve problem (1) for that f 2. Can an algorithm(s) do as well as possible for each f?
4 Outline Part 0 Complexity of problems Part I Complexity lower bounds General lower bounds Super-efficiency Part II Toward achievability? Part III Problem geometry and dimensionality
5 Problem Complexity Large literature on guarantees of optimality Wald 1939, Contributions to the theory of statistical estimation and testing hypotheses (minimax complexity) Nemirovski and Yudin 1983, Problem Complexity and Method Efficiency in Optimization (information-based complexity)
6 Problem Complexity Large literature on guarantees of optimality Wald 1939, Contributions to the theory of statistical estimation and testing hypotheses (minimax complexity) Nemirovski and Yudin 1983, Problem Complexity and Method Efficiency in Optimization (information-based complexity) Three main considerations: i. Information oracle (how we get information on problem) ii. Problem class (what problems must the algorithm solve) iii. How to measure error
7 Minimax error and oracles Information oracle How algorithm/procedure receives information about problem Optimization: function value f(x) (zero-order), gradient f(x) (first-order), Hessian 2 f(x) (second-order) Statistics: observations ξi from probability distribution P
8 Minimax error and oracles Information oracle How algorithm/procedure receives information about problem Optimization: function value f(x) (zero-order), gradient f(x) (first-order), Hessian 2 f(x) (second-order) Statistics: observations ξi from probability distribution P Minimax principle Develop algorithm that has best worst case performance (risk) for problem class F R N (F) := inf sup A A N f F error(a,f) where A N is algorithms with N queries, F is problem class
9 Example F consist of 1-Lipschitz convex functions on R d Oracle returns f(x)+ε, where E[ε] = 0 and ε 1 Domain X containts l -ball
10 Example F consist of 1-Lipschitz convex functions on R d Oracle returns f(x)+ε, where E[ε] = 0 and ε 1 Domain X containts l -ball Minimax rate: let x N be output of algorithm A after N steps d inf E[f( x N ) f(x )]. N sup A A N f F (Agarwal et al. 12, Nemirovski & Yudin 83)
11 Example F consist of 1-Lipschitz convex functions on R d Oracle returns f(x)+ε, where E[ε] = 0 and ε 1 Domain X containts l -ball Minimax rate: let x N be output of algorithm A after N steps d inf E[f( x N ) f(x )]. N sup A A N f F (Agarwal et al. 12, Nemirovski & Yudin 83) More generally optimality guarantee: L-Lipschitz convex functions on sets X with diameter D, minimax rate LD/ N
12 An optimal? algorithm Algorithm: At iteration t Choose random ξ, set g t = F(x t ;ξ i ) Update x t+1 = x t α t g t
13 An optimal? algorithm Algorithm: At iteration t Choose random ξ, set g t = F(x t ;ξ i ) Update x t+1 = x t α t g t
14 An optimal? algorithm Algorithm: At iteration t Choose random ξ, set g t = F(x t ;ξ i ) Update x t+1 = x t α t g t Theorem (Russians / Hungarians): Let x N = 1 N N t=1 x t and assume D x x 1 2, L 2 E[ g t 2 2 ]. Then E[f( x N ) f(x ) LD N
15 Stochastic gradient descent Let s use stochastic gradient descent to solve minimize x f(x) = 1 2 x2 subject to x [ 1,1]
16 Stochastic gradient descent Let s use stochastic gradient descent to solve minimize x f(x) = 1 2 x2 subject to x [ 1,1] / t -steps t
17 Stochastic gradient descent Let s use stochastic gradient descent to solve minimize x f(x) = 1 2 x2 subject to x [ 1,1] t -steps 1/t-steps 1/ 1/t t
18 A local notion of complexity Minimax complexity R N (F) := inf sup A A N f F error(a,f)
19 A local notion of complexity Minimax complexity R N (F) := inf sup A A N f F error(a,f) Local minimax complexity: Fix function f, and look for hardest local alternative R N (f;f) := sup g F inf max{error(a,f),error(a,g)} A A N (Related ideas in statistics: Donoho & Liu 1987, 1991, Cai & Low 2015)
20 Local minimax complexity for stochastic optimization Noisy subgradient oracle: ξ N(0,σ 2 I d d ), return f (x)+ξ Function class F: convex functions (can restrict) Algorithm A N : all algorithms with N noisy subgradient queries Error metric err : X F R
21 Local minimax complexity for stochastic optimization Noisy subgradient oracle: ξ N(0,σ 2 I d d ), return f (x)+ξ Function class F: convex functions (can restrict) Algorithm A N : all algorithms with N noisy subgradient queries Error metric err : X F R Local minimax complexity ( x A is output of algorithm) R N (f;f) := sup inf max{e f [err( x A,f)],E g [err( x A,g)]}. g F A A N
22 Distances on functions and moduli of continuity Distance for solutions Error must to satisfy exclusion inequality err(x,f) d(f,g) implies err(x,g) d(f,g). Example: how well one or other can be optimized { f d(f 0,f 1 ) := sup δ 0 : 1 (x) f1 +δ implies f 0 (x) f0 +δ f 0 (x) f0 +δ implies f 1 (x) f1 +δ error err(x,f) = f(x) f },
23 Separation f 0 f 1 δ = d(f 0,f 1 ) x : f 1 (x) f 1 +δ
24 Other error metrics Distance for solutions Error must to satisfy exclusion inequality err(x,f) d(f,g) implies err(x,g) d(f,g). Example: distance to optimality X f := argmin x X f(x), Xg := argming(x) x X d(f,g) = dist(x f,x g)
25 Distances on functions We study first-order methods, so define κ(f,g) := sup f (x) g (x) x X
26 Modulus of continuity Given solution metric d and function metric κ, modulus of continuity of d with respect to κ at f is ω f (ǫ) := sup{d(f,g) : κ(f,g) ǫ} g
27 Modulus of continuity Given solution metric d and function metric κ, modulus of continuity of d with respect to κ at f is Examples: with d(f,g) = x f x g ω f (ǫ) := sup{d(f,g) : κ(f,g) ǫ} g
28 Modulus of continuity Given solution metric d and function metric κ, modulus of continuity of d with respect to κ at f is Examples: with d(f,g) = x f x g ω f (ǫ) := sup{d(f,g) : κ(f,g) ǫ} g Quadratic f(x) = 1 2 x2, ω f (ǫ) = ǫ
29 Modulus of continuity Given solution metric d and function metric κ, modulus of continuity of d with respect to κ at f is Examples: with d(f,g) = x f x g ω f (ǫ) := sup{d(f,g) : κ(f,g) ǫ} g Quadratic f(x) = 1 2 x2, ω f (ǫ) = ǫ Power f(x) = 1 k x k, ω f (ǫ) = ǫ 1 k 1
30 Illustration of modulus of continuity If X R, { ω f (ǫ) = sup x x f : f (x) ǫ } x
31 Main Theorem I Theorem (Chatterjee, D., Lafferty, Zhu): Under Gaussian σ 2 -noise, for (almost) any f ( ) ( ) σ σ cω f R N (f;f) Cω f. N N
32 Main Theorem I Theorem (Chatterjee, D., Lafferty, Zhu): Under Gaussian σ 2 -noise, for (almost) any f ( ) ( ) σ σ cω f R N (f;f) Cω f. N N Modulus of continuity precisely determines rate of convergence
33 Proof intuition If P f = distribution when f is true, P g = distribution when g is true max{e f [err( x,f)],e g [err( x,g)]} 1 2 E f[err( x,f)]+ 1 2 E g[err( x,g)]
34 Proof intuition If P f = distribution when f is true, P g = distribution when g is true max{e f [err( x,f)],e g [err( x,g)]} d(f,g) 2 [1+P g (err( x,f) d(f,g)) P f (err( x,g) d(f,g))]
35 Is this real? cω f ( σ N ) R N (f;f) Cω f ( σ N ).
36 Is this real? ( ) ( ) σ σ cω f R N (f;f) Cω f. N N Is it possible to have a faster algorithm? (Were we too adversarial?) Is it too easy? (Were we not adversarial enough?)
37 You probably cannot be faster Theorem (Chatterjee, D., Lafferty, Zhu): Let an algorithm x N satisfy ( ) σ E f [err( x N,f)] δω f. N Then for ǫ N = σ log 1 δ N and g one of g 1 (x) = f(x) ǫ N x, g 1 (x) = f(x)+ǫ N x, E g [err( x N,g)] cω g σ log 1 δ N.
38 Is the local problem too easy? Not in 1-dimension! Sign-based binary search From interval [a 0,b 0 ], for k = 1,2,... Query T points at x k = 1 2 (a+b) for gradients G 1,...,G T If T t=1 G t > 0, set b = x k otherwise a = x k Proposition (Chatterjee, D., Lafferty, Zhu): After k epochs, define { } I k := x : f (x) σ log(k/δ) T Then w.p. 1 δ, dist(x k,i k ) 2 k b 0 a 0 Õ(1)ω f ( ) σ N
39 Intuition Recall flat set {x : f (x) ǫ}
40 Simulations risk risk 5e-06 5e-05 5e-04 5e-03 risk risk k = t k l = 1.5, k r = 2 risk risk risk risk k = t k l = 1.5, k r = 3 risk risk risk risk k = t k l = 2, k r
41 More achievability Problem: In higher dimensions, we have some issues
42 More achievability Problem: In higher dimensions, we have some issues Note: an epoch-based stochastic gradient descent procedure of Juditsky and Nesterov (2014) is nearly adaptive (result coming soon)
43 More achievability Problem: In higher dimensions, we have some issues Note: an epoch-based stochastic gradient descent procedure of Juditsky and Nesterov (2014) is nearly adaptive (result coming soon) Intuition: for k = 1,2,..., Perform stochastic gradient descent for T k 2 k iterations with constant stepsize α k Halve stepsize, double iteration length T k+1 = 2T k, run again
44 More achievability Problem: In higher dimensions, we have some issues Note: an epoch-based stochastic gradient descent procedure of Juditsky and Nesterov (2014) is nearly adaptive (result coming soon) Intuition: for k = 1,2,..., Perform stochastic gradient descent for T k 2 k iterations with constant stepsize α k Halve stepsize, double iteration length T k+1 = 2T k, run again One of these will be correct stepsize, and everything later will not allow any movement anyway
45 Part III: Problem Geometry Coming soon... just a handwritten teaser if time.
46 Summary Local notions of minimax risk Worst single alternative Shrinking neighborhoods (suitably or poorly) defined Some optimal algorithms, but work to be done! Some here: more soon...
Local Minimax Complexity of Stochastic Convex Optimization
Local Minimax Complexity of Stochastic Convex Optimization Yuancheng Zhu Sabyasachi Chatterjee John Duchi and John Lafferty arxiv:1605.07596v3 [stat.ml] 6 May 016 Department of Statistics Department of
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationErgodic Subgradient Descent
Ergodic Subgradient Descent John Duchi, Alekh Agarwal, Mikael Johansson, Michael Jordan University of California, Berkeley and Royal Institute of Technology (KTH), Sweden Allerton Conference, September
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationRandomized Smoothing Techniques in Optimization
Randomized Smoothing Techniques in Optimization John Duchi Based on joint work with Peter Bartlett, Michael Jordan, Martin Wainwright, Andre Wibisono Stanford University Information Systems Laboratory
More informationStochastic Subgradient Method
Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)
More informationStochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values
Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationMinimax rates for Batched Stochastic Optimization
Minimax rates for Batched Stochastic Optimization John Duchi based on joint work with Feng Ruan and Chulhee Yun Stanford University Tradeoffs Major problem in theoretical statistics: how do we characterize
More informationOp#mal convex op#miza#on under Tsybakov noise through connec#ons to ac#ve learning
Op#mal convex op#miza#on under Tsybakov noise through connec#ons to ac#ve learning Aar$ Singh Joint work with: Aaditya Ramdas Connec#ons between convex op#miza#on and ac#ve learning (a formal reduc#on)
More informationLecture 2: Subgradient Methods
Lecture 2: Subgradient Methods John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this lecture, we discuss first order methods for the minimization of convex functions. We
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationInformation theoretic perspectives on learning algorithms
Information theoretic perspectives on learning algorithms Varun Jog University of Wisconsin - Madison Departments of ECE and Mathematics Shannon Channel Hangout! May 8, 2018 Jointly with Adrian Tovar-Lopez
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationLecture 25: Subgradient Method and Bundle Methods April 24
IE 51: Convex Optimization Spring 017, UIUC Lecture 5: Subgradient Method and Bundle Methods April 4 Instructor: Niao He Scribe: Shuanglong Wang Courtesy warning: hese notes do not necessarily cover everything
More information10. Ellipsoid method
10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian
More informationRandomized Smoothing for Stochastic Optimization
Randomized Smoothing for Stochastic Optimization John Duchi Peter Bartlett Martin Wainwright University of California, Berkeley NIPS Big Learn Workshop, December 2011 Duchi (UC Berkeley) Smoothing and
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationQuery Complexity of Derivative-Free Optimization
Query Complexity of Derivative-Free Optimization Kevin G Jamieson University of Wisconsin Madison, WI 53706, USA kgjamieson@wiscedu Robert D Nowak University of Wisconsin Madison, WI 53706, USA nowak@engrwiscedu
More informationThe Inverse Function Theorem via Newton s Method. Michael Taylor
The Inverse Function Theorem via Newton s Method Michael Taylor We aim to prove the following result, known as the inverse function theorem. Theorem 1. Let F be a C k map (k 1) from a neighborhood of p
More informationLEARNING IN CONCAVE GAMES
LEARNING IN CONCAVE GAMES P. Mertikopoulos French National Center for Scientific Research (CNRS) Laboratoire d Informatique de Grenoble GSBE ETBC seminar Maastricht, October 22, 2015 Motivation and Preliminaries
More informationIncremental Gradient, Subgradient, and Proximal Methods for Convex Optimization
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationEstimation, Optimization, and Parallelism when Data is Sparse
Estimation, Optimization, and Parallelism when Data is Sparse John C. Duchi, Michael I. Jordan University of California, Berkeley Berkeley, CA 9470 {jduchi,jordan}@eecs.berkeley.edu H. Brendan McMahan
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationBlock stochastic gradient update method
Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationUnconstrained minimization
CSCI5254: Convex Optimization & Its Applications Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions 1 Unconstrained
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationStochastic Gradient Descent. Ryan Tibshirani Convex Optimization
Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so
More informationStochastic Semi-Proximal Mirror-Prox
Stochastic Semi-Proximal Mirror-Prox Niao He Georgia Institute of echnology nhe6@gatech.edu Zaid Harchaoui NYU, Inria firstname.lastname@nyu.edu Abstract We present a direct extension of the Semi-Proximal
More informationarxiv: v1 [stat.ml] 12 Nov 2015
Random Multi-Constraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints Mengdi Wang, Yichen Chen, Jialin Liu, Yuantao Gu arxiv:5.03760v [stat.ml] Nov 05 November 3,
More informationConvergence of Cubic Regularization for Nonconvex Optimization under KŁ Property
Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationAccelerating Nesterov s Method for Strongly Convex Functions
Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationMini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationLarge-scale Machine Learning and Optimization
1/84 Large-scale Machine Learning and Optimization Zaiwen Wen http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Dimitris Papailiopoulos and Shiqian Ma lecture notes
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationSecond-Order Methods for Stochastic Optimization
Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationComputational Learning Theory - Hilary Term : Learning Real-valued Functions
Computational Learning Theory - Hilary Term 08 8 : Learning Real-valued Functions Lecturer: Varun Kanade So far our focus has been on learning boolean functions. Boolean functions are suitable for modelling
More informationarxiv: v2 [math.oc] 8 Oct 2011
Stochastic convex optimization with bandit feedback Alekh Agarwal Dean P. Foster Daniel Hsu Sham M. Kakade, Alexander Rakhlin Department of EECS Department of Statistics Microsoft Research University of
More informationO(1/ɛ) Lower Bounds for Convex Optimization with Stochastic Oracles
Technical Note July 8, 200 O(/ɛ) Lower Bounds for Convex Optimization with Stochastic Oracles Yaoliang Yu Xinhua Zhang yaoliang@cs.ualberta.ca xinhua.zhang.cs@gmail.com Department of Computing Science
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationHamiltonian Descent Methods
Hamiltonian Descent Methods Chris J. Maddison 1,2 with Daniel Paulin 1, Yee Whye Teh 1,2, Brendan O Donoghue 2, Arnaud Doucet 1 Department of Statistics University of Oxford 1 DeepMind London, UK 2 The
More informationTsybakov noise adap/ve margin- based ac/ve learning
Tsybakov noise adap/ve margin- based ac/ve learning Aar$ Singh A. Nico Habermann Associate Professor NIPS workshop on Learning Faster from Easy Data II Dec 11, 2015 Passive Learning Ac/ve Learning (X j,?)
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationA DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson
204 IEEE INTERNATIONAL WORKSHOP ON ACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 2 24, 204, REIS, FRANCE A DELAYED PROXIAL GRADIENT ETHOD WITH LINEAR CONVERGENCE RATE Hamid Reza Feyzmahdavian, Arda Aytekin,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationOptimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic
More informationOnline Optimization : Competing with Dynamic Comparators
Ali Jadbabaie Alexander Rakhlin Shahin Shahrampour Karthik Sridharan University of Pennsylvania University of Pennsylvania University of Pennsylvania Cornell University Abstract Recent literature on online
More informationSADAGRAD: Strongly Adaptive Stochastic Gradient Methods
Zaiyi Chen * Yi Xu * Enhong Chen Tianbao Yang Abstract Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition,
More informationBandit Online Convex Optimization
March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationRelative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent
Relative-Continuity for Non-Lipschitz Non-Smooth Convex Optimization using Stochastic (or Deterministic) Mirror Descent Haihao Lu August 3, 08 Abstract The usual approach to developing and analyzing first-order
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationConvex Optimization. 9. Unconstrained minimization. Prof. Ying Cui. Department of Electrical Engineering Shanghai Jiao Tong University
Convex Optimization 9. Unconstrained minimization Prof. Ying Cui Department of Electrical Engineering Shanghai Jiao Tong University 2017 Autumn Semester SJTU Ying Cui 1 / 40 Outline Unconstrained minimization
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationarxiv: v2 [math.oc] 1 Nov 2017
Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationStatistical Sparse Online Regression: A Diffusion Approximation Perspective
Statistical Sparse Online Regression: A Diffusion Approximation Perspective Jianqing Fan Wenyan Gong Chris Junchi Li Qiang Sun Princeton University Princeton University Princeton University University
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationGradient Sliding for Composite Optimization
Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this
More informationA New Look at the Performance Analysis of First-Order Methods
A New Look at the Performance Analysis of First-Order Methods Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Yoel Drori, Google s R&D Center, Tel Aviv Optimization without
More informationIFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent
IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):
More informationOnline Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
216 IEEE 55th Conference on Decision and Control (CDC) ARIA Resort & Casino December 12-14, 216, Las Vegas, USA Online Optimization in Dynamic Environments: Improved Regret Rates for Strongly Convex Problems
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More information