Minimax rates for Batched Stochastic Optimization
|
|
- Rafe Lloyd
- 5 years ago
- Views:
Transcription
1 Minimax rates for Batched Stochastic Optimization John Duchi based on joint work with Feng Ruan and Chulhee Yun Stanford University
2 Tradeoffs Major problem in theoretical statistics: how do we characterize statistical optimality for problems with constraints?
3 Tradeoffs Major problem in theoretical statistics: how do we characterize statistical optimality for problems with constraints? Computational [Berthet & Rigollet 13, Ma & Wu 15, Brennan et al. 18, Feldman et al. 18] Privacy [Dwork et al. 06, Hardt & Talwar 09, Duchi et al. 13] Robustness [Huber 81, Hardt & Moitra 13, Diakonikolas et al. 16] Memory / communication [Duchi et al. 14, Braverman et al. 15, Steinhardt & Duchi 16]
4 Tradeoffs Major problem in theoretical statistics: how do we characterize statistical optimality for problems with constraints? Statistical Performance Data set size n
5 Tradeoffs Major problem in theoretical statistics: how do we characterize statistical optimality for problems with constraints? Statistical Performance Data set size n Other resources
6 Tradeoffs Major problem in theoretical statistics: how do we characterize statistical optimality for problems with constraints? Statistical Performance Data set size n Computation
7 Tradeoffs Major problem in theoretical statistics: how do we characterize statistical optimality for problems with constraints? Statistical Performance Data set size n Energy use
8 Tradeoffs Major problem in theoretical statistics: how do we characterize statistical optimality for problems with constraints? Statistical Performance Data set size n Disclosure risk (privacy)
9 Problem Setting minimize f(x) where f convex given mean-zero noisy gradient information g = rf(x)+
10 Problem Setting minimize f(x) where f convex given mean-zero noisy gradient information g = rf(x)+ computational complexity for these problems?
11 Stochastic Gradient methods Iterate (for k = 1, 2, ) g k = rf(x k )+ k x k+1 = x k k g k
12 Stochastic Gradient methods Iterate (for k = 1, 2, ) g k = rf(x k )+ k x k+1 = x k k g k g 1 g 2 g 3 g 4 g 5 g 6 g 7 g 8 g 9 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9
13 Stochastic Gradient methods Iterate (for k = 1, 2, ) g k = rf(x k )+ k x k+1 = x k k g k g 1 g 2 g 3 g 4 g 5 g 6 g 7 g 8 g 9 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Theorem (Nemirovski & Yudin 83; Nemirovski et al. 09; Agarwal et al. 11) After k iterations, we have (optimal) convergence E[f(x k )] f?. 1 p k
14 Parellelization and interactivity? g 1 g 2 g 3 g 4 g 5 g 6 g 7 g 8 g 9 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Requires many iterations, lots of interaction, no parallelism
15 Parellelization and interactivity? g 1 g 2 g 3 g 4 g 5 g 6 g 7 g 8 g 9 x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Requires many iterations, lots of interaction, no parallelism g 1 g 2 g 3 g 4 g 5 g 6 g 7 g 8 g 9 x 1 x 2 x 3 Trade off breadth for depth?
16 Batched optimization? Medical trials [Perchet et al. 16, Hardwick & Stout 02, Stein 45]
17 Batched optimization? Medical trials [Perchet et al. 16, Hardwick & Stout 02, Stein 45] Ideal: get patient, give treatment, observe outcome 1 year 1 year 1 year 1 year 1 year 1 year 100s of years
18 Batched optimization? Medical trials [Perchet et al. 16, Hardwick & Stout 02, Stein 45] Ideal: get patient, give treatment, observe outcome 1 year 1 year 1 year 1 year 1 year 1 year 100s of years Reality:
19 Local Differential Privacy
20 Problem Statement Problem: given M rounds of adaptation and n computations, what is the optimal error in optimization?
21 Problem Statement Problem: given M rounds of adaptation and n computations, what is the optimal error in optimization? A M,n = Algorithms with M rounds of computation and n (noisy) gradient computations F = Function class of interest
22 Problem Statement Problem: given M rounds of adaptation and n computations, what is the optimal error in optimization? A M,n = Algorithms with M rounds of computation and n (noisy) gradient computations F = Function class of interest Study minimax optimization error E[f(bx)] f?
23 Problem Statement Problem: given M rounds of adaptation and n computations, what is the optimal error in optimization? A M,n = Algorithms with M rounds of computation and n (noisy) gradient computations F = Function class of interest Study minimax optimization error sup f2f {E[f(bx)] f? }
24 Problem Statement Problem: given M rounds of adaptation and n computations, what is the optimal error in optimization? A M,n = Algorithms with M rounds of computation and n (noisy) gradient computations F = Function class of interest Study minimax optimization error inf sup bx2a M,n f2f {E[f(bx)] f? }
25 Problem Statement Problem: given M rounds of adaptation and n computations, what is the optimal error in optimization? A M,n = Algorithms with M rounds of computation and n (noisy) gradient computations F = Function class of interest Study minimax optimization error M M,n (F) := inf sup bx2a M,n f2f {E[f(bx)] f? }
26 Background and hopes
27 Background and hopes [Perchet, RCS 18] Batched Bandits. For 2 armed bandit, optimal regret achievable with M = O(log log n)
28 Background and hopes [Perchet, RCS 18] Batched Bandits. For 2 armed bandit, optimal regret achievable with M = O(log log n) [Nemirovski et al. 09, Ghadimi & Lan 12] Stochastic strongly convex optimization: f(bx) f?. kx 0 x? k 2 exp( M/ p Cond(f)) + Var( ) or M & p Cond(f) log n rounds n
29 Background and hopes [Perchet, RCS 18] Batched Bandits. For 2 armed bandit, optimal regret achievable with M = O(log log n) [Nemirovski et al. 09, Ghadimi & Lan 12] Stochastic strongly convex optimization: f(bx) f?. kx 0 x? k 2 exp( M/ p Cond(f)) + Var( ) or M & p Cond(f) log n rounds [Smith, TU 17] To solve convex optimization, need n M & log 1 rounds
30 Main Results F H, := { strongly convex,h smooth f}
31 Main Results F H, := { strongly convex,h smooth f} Theorem (D., Ruan, Yun 18) M M,n (F H, ) C(d, n) n 1 where C(d, n) poly(n) 1 ( d d+2) M
32 Main Results F H, := { strongly convex,h smooth f} Theorem (D., Ruan, Yun 18) M M,n (F H, ) C(d, n) n 1 where C(d, n) poly(n) 1 ( d d+2) M Theorem (D., Ruan, Yun 18) If M apple (d/2) log log n P f(bx) f? Cn then there is an algorithm s.t. 1 ( d+2) d M log n! 0
33 Achievability For convex functions f, x? 2 {y hrf(x),y xi apple0}
34 Achievability For convex functions f, x? 2 {y hrf(x),y xi apple0}
35 Maintain feasible box At round t, take points Achievability B t = c t +[ r t,r t ] d x i 2 B t,i=1,...,m with center c t
36 Maintain feasible box At round t, take points get parallel (noisy) gradients Achievability B t = c t +[ r t,r t ] d x i 2 B t,i=1,...,m with center c t brf(x 1 )... brf(x m )
37 Maintain feasible box At round t, take points get parallel (noisy) gradients Achievability B t = c t +[ r t,r t ] d x i 2 B t,i=1,...,m with center c t brf(x 1 )... brf(x m ) w.h.p. x? 2 {y h b rf(x i ),y xi apple ky xk}
38 Maintain feasible box At round t, take points Achievability B t = c t +[ r t,r t ] d x i 2 B t,i=1,...,m with center c t get parallel (noisy) gradients x 4 brf(x 1 )... brf(x m ) x 1 x 3 w.h.p. x? 2 {y h b rf(x i ),y xi apple ky xk} x 2
39 Maintain feasible box At round t, take points get parallel (noisy) gradients Achievability B t = c t +[ r t,r t ] d x i 2 B t,i=1,...,m with center c t brf(x 1 )... brf(x m ) w.h.p. x? 2 {y h b rf(x i ),y xi apple ky xk}
40 Recursive shrinking Box radius decreases as r t apple r t 1
41 Recursive shrinking Box radius decreases as r t apple r t 1 r 1
42 Recursive shrinking Box radius decreases as r t apple r t 1 r 2 r 1
43 Recursive shrinking Box radius decreases as r t apple r t 1 r 3 r 2 r 1
44 Recursive shrinking Box radius decreases as r t apple r t 1 r 3 r 2 r 1
45 Recursive shrinking Box radius decreases as r t apple r t 1 or, recursively r t apple r t 1 apple 1+ r 2 apple P t 1 j=0 t 2 apple j r t 0 t 1 1 r 3 r 2 r 1
46 Recursive shrinking Box radius decreases as r t apple r t 1 or, recursively r t apple r t 1 apple 1+ r 2 apple P t 1 j=0 t 2 apple j r t 0 t 1 1 r 3 r 2 r 1 for us, dimension d = d d +2 = n 1 d+2
47 Recursive shrinking Box radius decreases as with = d d+2 or, recursively t 1 r t apple 1 r t apple r t 1 r 3 r 2 r 1
48 Recursive shrinking Box radius decreases as with and = d d+2 t or, recursively t 1 r t apple 1 n iff r t apple r t 1 t. 1 log n r 3 r 2 r 1
49 Recursive shrinking Box radius decreases as with and = d d+2 t or, recursively t 1 r t apple 1 n iff r t apple r t 1 t. 1 log n r 3 r 2 r 1 Solution: t & log log n log 1/ = log log n log(1 + d/2)
50 Main Results F H, := { strongly convex,h smooth f} Theorem (D., Ruan, Yun 18) M M,n (F H, ) C(d, n) n 1 where C(d, n) poly(n) 1 ( d d+2) M Theorem (D., Ruan, Yun 18) If M apple (d/2) log log n P f(bx) f? Cn then there is an algorithm s.t. 1 ( d+2) d M log n! 0
51 How do we prove lower bounds? Lower bound 1. Two functions, where optimizing one means not optimizing the other [Agarwal BRW 13, Duchi 17] 2. Make information available to algorithm to distinguish them small
52 How do we prove lower bounds? Lower bound 1. Two functions, where optimizing one means not optimizing the other [Agarwal BRW 13, Duchi 17] 2. Make information available to algorithm to distinguish them small f 0 f 1 dist(f 0,f 1 )
53 How do we prove lower bounds? Lower bound 1. Two functions, where optimizing one means not optimizing the other [Agarwal BRW 13, Duchi 17] 2. Make information available to algorithm to distinguish them small
54 How do we prove lower bounds? Lower bound 1. Two functions, where optimizing one means not optimizing the other [Agarwal BRW 13, Duchi 17] 2. Make information available to algorithm to distinguish them small inf bx max E [f v(bx) f v? ] v2{0,1} dist(f 0,f 1 ) 2 inf P (A distinguishes f 0,f 1 ) Alg A
55 How do we prove lower bounds? Lower bound 1. Two functions, where optimizing one means not optimizing the other [Agarwal BRW 13, Duchi 17] 2. Make information available to algorithm to distinguish them small inf bx max E [f v(bx) f v? ] v2{0,1} dist(f 0,f 1 ) 2 inf Alg A P (A distinguishes f 0,f 1 ) {z } =1 kp 0 P 1 k TV
56 Lower bound: recursive packing M
57 Lower bound: recursive packing M U (1) = 1 packing of initial set 1
58 Lower bound: recursive packing M U (1) = 1 packing of initial set U (t) u =2 t packing of ball centered at u 2 1
59 Lower bound: recursive packing M U (1) = 1 packing of initial set U (t) u =2 t packing of ball centered at u 2 1
60 Lower bound: recursive packing M U (1) = 1 packing of initial set U (t) u =2 t packing of ball centered at u Idea: 1. define functions recursively on balls 2. optimization means identifying ball 2 1
61 Function constructions M Index functions by path down u 1:M =(u 1,...,u M )
62 Function constructions M Index functions by path down u 1:M =(u 1,...,u M ) f (1) u 1:M (x) =f (0) u 1:M (x) x 62 u M + M B
63 Function constructions M Index functions by path down u 1:M =(u 1,...,u M ) f (1) u 1:M (x) =f (0) u 1:M (x) f (±1) u 1:M (x) =f (±1) u 1:t,ũ t+1:m (x) x 62 u M + M B x 62 u t + t B
64 Function constructions M Index functions by path down u 1:M =(u 1,...,u M ) f (1) u 1:M (x) =f (0) u 1:M (x) f (±1) u 1:M (x) =f (±1) u 1:t,ũ t+1:m (x) x 62 u M + M B x 62 u t + t B Optimizing well means identifying sequence defining function u 1 u 2 u 3
65 Function construction Recurse: f u1 (x) = 1 2 kx u 1k 2 f u1:t (x) = SmoothMax{f u1:t 1 (x), kx u t k 2 + b t } max{f 1 (x),f 2 (x)} f 1 (x) f(x) f 1 (x) f 2 (x) f 2 (x)
66 Information recursion At each round t (of M): D KL (rf u1:m (x)+ rf u1:t,ũ t+1:m (x)+ ). 2 t
67 Information recursion At each round t (of M): D KL (rf u1:m (x)+ rf u1:t,ũ t+1:m (x)+ ). 2 t Choose radius for constant information per round: 2 t n # in packing 1
68 Information recursion At each round t (of M): D KL (rf u1:m (x)+ rf u1:t,ũ t+1:m (x)+ ). 2 t Choose radius for constant information per round: 2 t n d t d t 1 1
69 Information recursion At each round t (of M): D KL (rf u1:m (x)+ rf u1:t,ũ t+1:m (x)+ ). 2 t Choose radius for constant information per round: 2 t n d t d t 1 1 Solution for lower bound: M = n 1 d+2 d d+2 M 1 = n ( d d+2) M
How hard is this function to optimize?
How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationBandit View on Continuous Stochastic Optimization
Bandit View on Continuous Stochastic Optimization Sébastien Bubeck 1 joint work with Rémi Munos 1 & Gilles Stoltz 2 & Csaba Szepesvari 3 1 INRIA Lille, SequeL team 2 CNRS/ENS/HEC 3 University of Alberta
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationOp#mal convex op#miza#on under Tsybakov noise through connec#ons to ac#ve learning
Op#mal convex op#miza#on under Tsybakov noise through connec#ons to ac#ve learning Aar$ Singh Joint work with: Aaditya Ramdas Connec#ons between convex op#miza#on and ac#ve learning (a formal reduc#on)
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationBlock stochastic gradient update method
Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationTwo optimization problems in a stochastic bandit model
Two optimization problems in a stochastic bandit model Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishnan Journées MAS 204, Toulouse Outline From stochastic optimization
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationProximal and First-Order Methods for Convex Optimization
Proximal and First-Order Methods for Convex Optimization John C Duchi Yoram Singer January, 03 Abstract We describe the proximal method for minimization of convex functions We review classical results,
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine
More informationDistributed Statistical Estimation and Rates of Convergence in Normal Approximation
Distributed Statistical Estimation and Rates of Convergence in Normal Approximation Stas Minsker (joint with Nate Strawn) Department of Mathematics, USC July 3, 2017 Colloquium on Concentration inequalities,
More informationCommunication-efficient and Differentially-private Distributed SGD
1/36 Communication-efficient and Differentially-private Distributed SGD Ananda Theertha Suresh with Naman Agarwal, Felix X. Yu Sanjiv Kumar, H. Brendan McMahan Google Research November 16, 2018 2/36 Outline
More informationTutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer
Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,
More informationAccuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization
Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained Empirical Risk Minimization Katrina Ligett HUJI & Caltech joint with Seth Neel, Aaron Roth, Bo Waggoner, Steven Wu NIPS 2017
More informationHamiltonian Descent Methods
Hamiltonian Descent Methods Chris J. Maddison 1,2 with Daniel Paulin 1, Yee Whye Teh 1,2, Brendan O Donoghue 2, Arnaud Doucet 1 Department of Statistics University of Oxford 1 DeepMind London, UK 2 The
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationAdvanced Machine Learning
Advanced Machine Learning Bandit Problems MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Multi-Armed Bandit Problem Problem: which arm of a K-slot machine should a gambler pull to maximize his
More informationMulti-armed bandit models: a tutorial
Multi-armed bandit models: a tutorial CERMICS seminar, March 30th, 2016 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions)
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationStochastic gradient methods for machine learning
Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - September 2012 Context Machine
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More informationErgodic Subgradient Descent
Ergodic Subgradient Descent John Duchi, Alekh Agarwal, Mikael Johansson, Michael Jordan University of California, Berkeley and Royal Institute of Technology (KTH), Sweden Allerton Conference, September
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationTime Series Prediction & Online Learning
Time Series Prediction & Online Learning Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values. earthquakes.
More informationCS281B/Stat241B. Statistical Learning Theory. Lecture 1.
CS281B/Stat241B. Statistical Learning Theory. Lecture 1. Peter Bartlett 1. Organizational issues. 2. Overview. 3. Probabilistic formulation of prediction problems. 4. Game theoretic formulation of prediction
More informationComputational Oracle Inequalities for Large Scale Model Selection Problems
for Large Scale Model Selection Problems University of California at Berkeley Queensland University of Technology ETH Zürich, September 2011 Joint work with Alekh Agarwal, John Duchi and Clément Levrard.
More informationLecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationIntroduction to Bandit Algorithms. Introduction to Bandit Algorithms
Stochastic K-Arm Bandit Problem Formulation Consider K arms (actions) each correspond to an unknown distribution {ν k } K k=1 with values bounded in [0, 1]. At each time t, the agent pulls an arm I t {1,...,
More informationR. Lachieze-Rey Recent Berry-Esseen bounds obtained with Stein s method andgeorgia PoincareTech. inequalities, 1 / with 29 G.
Recent Berry-Esseen bounds obtained with Stein s method and Poincare inequalities, with Geometric applications Raphaël Lachièze-Rey, Univ. Paris 5 René Descartes, Georgia Tech. R. Lachieze-Rey Recent Berry-Esseen
More informationBandit models: a tutorial
Gdt COS, December 3rd, 2015 Multi-Armed Bandit model: general setting K arms: for a {1,..., K}, (X a,t ) t N is a stochastic process. (unknown distributions) Bandit game: a each round t, an agent chooses
More informationOnline Learning for Time Series Prediction
Online Learning for Time Series Prediction Joint work with Vitaly Kuznetsov (Google Research) MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH. Motivation Time series prediction: stock values.
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationThe information complexity of sequential resource allocation
The information complexity of sequential resource allocation Emilie Kaufmann, joint work with Olivier Cappé, Aurélien Garivier and Shivaram Kalyanakrishan SMILE Seminar, ENS, June 8th, 205 Sequential allocation
More informationQuery Complexity of Derivative-Free Optimization
Query Complexity of Derivative-Free Optimization Kevin G Jamieson University of Wisconsin Madison, WI 53706, USA kgjamieson@wiscedu Robert D Nowak University of Wisconsin Madison, WI 53706, USA nowak@engrwiscedu
More informationOnline Optimization in X -Armed Bandits
Online Optimization in X -Armed Bandits Sébastien Bubeck INRIA Lille, SequeL project, France sebastien.bubeck@inria.fr Rémi Munos INRIA Lille, SequeL project, France remi.munos@inria.fr Gilles Stoltz Ecole
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationLearning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley
Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. 1. Prediction with expert advice. 2. With perfect
More informationRandomized Smoothing for Stochastic Optimization
Randomized Smoothing for Stochastic Optimization John Duchi Peter Bartlett Martin Wainwright University of California, Berkeley NIPS Big Learn Workshop, December 2011 Duchi (UC Berkeley) Smoothing and
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationThe multi armed-bandit problem
The multi armed-bandit problem (with covariates if we have time) Vianney Perchet & Philippe Rigollet LPMA Université Paris Diderot ORFE Princeton University Algorithms and Dynamics for Games and Optimization
More informationNesterov s Acceleration
Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x
More informationEfficient Private ERM for Smooth Objectives
Efficient Private ERM for Smooth Objectives Jiaqi Zhang, Kai Zheng, Wenlong Mou, Liwei Wang Key Laboratory of Machine Perception, MOE, School of Electronics Engineering and Computer Science, Peking University,
More informationBeyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
JMLR: Workshop and Conference Proceedings vol (2010) 1 16 24th Annual Conference on Learning heory Beyond the regret minimization barrier: an optimal algorithm for stochastic strongly-convex optimization
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationBandit Convex Optimization
March 7, 2017 Table of Contents 1 (BCO) 2 Projection Methods 3 Barrier Methods 4 Variance reduction 5 Other methods 6 Conclusion Learning scenario Compact convex action set K R d. For t = 1 to T : Predict
More informationStochastic gradient descent and robustness to ill-conditioning
Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More information1-Bit Matrix Completion
1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationStochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values
Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited
More informationBandit Online Convex Optimization
March 31, 2015 Outline 1 OCO vs Bandit OCO 2 Gradient Estimates 3 Oblivious Adversary 4 Reshaping for Improved Rates 5 Adaptive Adversary 6 Concluding Remarks Review of (Online) Convex Optimization Set-up
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationO(1/ɛ) Lower Bounds for Convex Optimization with Stochastic Oracles
Technical Note July 8, 200 O(/ɛ) Lower Bounds for Convex Optimization with Stochastic Oracles Yaoliang Yu Xinhua Zhang yaoliang@cs.ualberta.ca xinhua.zhang.cs@gmail.com Department of Computing Science
More informationOnline Convex Optimization Using Predictions
Online Convex Optimization Using Predictions Niangjun Chen Joint work with Anish Agarwal, Lachlan Andrew, Siddharth Barman, and Adam Wierman 1 c " c " (x " ) F x " 2 c ) c ) x ) F x " x ) β x ) x " 3 F
More informationRegret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I. Sébastien Bubeck Theory Group
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I Sébastien Bubeck Theory Group i.i.d. multi-armed bandit, Robbins [1952] i.i.d. multi-armed bandit, Robbins [1952] Known
More informationLecture 17: October 27
0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationNotes on AdaGrad. Joseph Perla 2014
Notes on AdaGrad Joseph Perla 2014 1 Introduction Stochastic Gradient Descent (SGD) is a common online learning algorithm for optimizing convex (and often non-convex) functions in machine learning today.
More informationBeyond stochastic gradient descent for large-scale machine learning
Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationNon-Smooth, Non-Finite, and Non-Convex Optimization
Non-Smooth, Non-Finite, and Non-Convex Optimization Deep Learning Summer School Mark Schmidt University of British Columbia August 2015 Complex-Step Derivative Using complex number to compute directional
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationConvex optimization problems. Optimization problem in standard form
Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality
More informationStochastic Gradient Descent: The Workhorse of Machine Learning. CS6787 Lecture 1 Fall 2017
Stochastic Gradient Descent: The Workhorse of Machine Learning CS6787 Lecture 1 Fall 2017 Fundamentals of Machine Learning? Machine Learning in Practice this course What s missing in the basic stuff? Efficiency!
More informationScalable robust hypothesis tests using graphical models
Scalable robust hypothesis tests using graphical models Umamahesh Srinivas ipal Group Meeting October 22, 2010 Binary hypothesis testing problem Random vector x = (x 1,...,x n ) R n generated from either
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationUnconstrained minimization: assumptions
Unconstrained minimization I terminology and assumptions I gradient descent method I steepest descent method I Newton s method I self-concordant functions I implementation IOE 611: Nonlinear Programming,
More informationStochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration
Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration Nati Srebro Toyota Technological Institute at Chicago a philanthropically endowed academic computer science institute dedicated
More informationand the polynomial-time Turing p reduction from approximate CVP to SVP given in [10], the present authors obtained a n=2-approximation algorithm that
Sampling short lattice vectors and the closest lattice vector problem Miklos Ajtai Ravi Kumar D. Sivakumar IBM Almaden Research Center 650 Harry Road, San Jose, CA 95120. fajtai, ravi, sivag@almaden.ibm.com
More informationA recursive model-based trust-region method for derivative-free bound-constrained optimization.
A recursive model-based trust-region method for derivative-free bound-constrained optimization. ANKE TRÖLTZSCH [CERFACS, TOULOUSE, FRANCE] JOINT WORK WITH: SERGE GRATTON [ENSEEIHT, TOULOUSE, FRANCE] PHILIPPE
More informationDeterminant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University
Determinant maximization with linear matrix inequality constraints S. Boyd, L. Vandenberghe, S.-P. Wu Information Systems Laboratory Stanford University SCCM Seminar 5 February 1996 1 MAXDET problem denition
More informationBandits : optimality in exponential families
Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities
More information18.657: Mathematics of Machine Learning
8.657: Mathematics of Machine Learning Lecturer: Philippe Rigollet Lecture 3 Scribe: Mina Karzand Oct., 05 Previously, we analyzed the convergence of the projected gradient descent algorithm. We proved
More informationWhat Can We Learn Privately?
What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Homin Lee Kobbi Nissim Adam Smith Los Alamos UT Austin Ben Gurion Penn State To appear in SICOMP
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationLOWELL WEEKLY JOURNAL. ^Jberxy and (Jmott Oao M d Ccmsparftble. %m >ai ruv GEEAT INDUSTRIES
? (») /»» 9 F ( ) / ) /»F»»»»»# F??»»» Q ( ( »»» < 3»» /» > > } > Q ( Q > Z F 5
More informationFundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation
Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation Ohad Shamir Weizmann Institute of Science ohad.shamir@weizmann.ac.il Abstract Many machine learning approaches
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationarxiv: v2 [math.oc] 8 Oct 2011
Stochastic convex optimization with bandit feedback Alekh Agarwal Dean P. Foster Daniel Hsu Sham M. Kakade, Alexander Rakhlin Department of EECS Department of Statistics Microsoft Research University of
More informationUncertainty. Jayakrishnan Unnikrishnan. CSL June PhD Defense ECE Department
Decision-Making under Statistical Uncertainty Jayakrishnan Unnikrishnan PhD Defense ECE Department University of Illinois at Urbana-Champaign CSL 141 12 June 2010 Statistical Decision-Making Relevant in
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationBandits for Online Optimization
Bandits for Online Optimization Nicolò Cesa-Bianchi Università degli Studi di Milano N. Cesa-Bianchi (UNIMI) Bandits for Online Optimization 1 / 16 The multiarmed bandit problem... K slot machines Each
More informationPrivate Empirical Risk Minimization, Revisited
Private Empirical Risk Minimization, Revisited Raef Bassily Adam Smith Abhradeep Thakurta April 10, 2014 Abstract In this paper, we initiate a systematic investigation of differentially private algorithms
More information10. Ellipsoid method
10. Ellipsoid method EE236C (Spring 2008-09) ellipsoid method convergence proof inequality constraints 10 1 Ellipsoid method history developed by Shor, Nemirovski, Yudin in 1970s used in 1979 by Khachian
More informationOptimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin University of Pennsylvania Karthik Sridharan University of Pennsylvania Abstract We provide several applications of Optimistic
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More information