Stochastic Composition Optimization

Size: px
Start display at page:

Download "Stochastic Composition Optimization"

Transcription

1 Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ICCOPT, Tokyo, August 8-11, / 24

2 Collaborators M. Wang, X. Fang, and H. Liu. Stochastic Compositional Gradient Descent: Algorithms for Minimizing Compositions of Expected-Value Functions. Mathematical Programming, Submitted in 2014, to appear in M. Wang and J. Liu. Accelerating Stochastic Composition Optimization M. Wang and J. Liu. A Stochastic Compositional Subgradient Method Using Markov Samples / 24

3 Outline 1 Background: Why is SGD a good method? 2 A New Problem: Stochastic Composition Optimization 3 Stochastic Composition Algorithms: Convergence and Sample Complexity 4 Acceleration via Smoothing-Extrapolation 3 / 24

4 Background: Why is SGD a good method? Outline 1 Background: Why is SGD a good method? 2 A New Problem: Stochastic Composition Optimization 3 Stochastic Composition Algorithms: Convergence and Sample Complexity 4 Acceleration via Smoothing-Extrapolation 4 / 24

5 Background: Why is SGD a good method? Background Machine learning is optimization Learning from batch data Learning from online data min x R d 1 n n i=1 l(x; A i, b i ) + ρ(x) min x R d E A,b [l(x; A, b)] + ρ(x) Both problems can be formulated as Stochastic Convex Optimization min E [f (x, ξ)] x }{{} expectation over batch data set or unknown distribution A general framework encompasses likelihood estimation, online learning, empirical risk minimization, multi-arm bandit, online MDP Stochastic gradient descent (SGD) updates by taking sample gradients: x k+1 = x k α f (x k, ξ k ) A special case of stochastic approximation with a long history (Robbins and Monro, Kushner and Yin, Polyak and Juditsky, Benveniste et al., Ruszcyǹski, Borkar, Bertsekas and Tsitsiklis, and many) 5 / 24

6 Background: Why is SGD a good method? Background: Stochastic first-order methods Stochastic gradient descent (SGD) updates by taking sample gradients: x k+1 = x k α f (x k, ξ k ) (1,410,000 results on Google Scholar Search and 24,400 since 2016!) Why is SGD a good method in practice? When processing either batch or online data, a scalable algorithm needs to update using partial information (a small subset of all data) Answer: We have no other choice Why is SGD a good method beyond practical reasons? SGD achieves optimal convergence after processing k samples: E [F (x k ) F ] = O(1/ k) for convex minimization E [F (x k ) F ] = O(1/k) for strongly convex minimization (Nemirovski and Yudin 1983, Agarwal et al. 2012, Rakhlin et al. 2012, Ghadimi and Lan 2012,2013, Shamir and Zhang 2013 and many more) Beyond convexity: nearly optimal online PCA (Li, Wang, Liu, Zhang 2015) Answer: Strong theoretical guarantees for data-driven problems 6 / 24

7 A New Problem: Stochastic Composition Optimization Outline 1 Background: Why is SGD a good method? 2 A New Problem: Stochastic Composition Optimization 3 Stochastic Composition Algorithms: Convergence and Sample Complexity 4 Acceleration via Smoothing-Extrapolation 7 / 24

8 A New Problem: Stochastic Composition Optimization Stochastic Composition Optimization Consider the problem { } min x X F (x) := (f g)(x) = f (g(x)), where the outer and inner functions are f : R m R, g : R n R m and X is a closed and convex set in R n. f (y) = E [f v (y)], g(x) = E [g w (x)], We focus on the case where the overall problem is convex (for now) No structural assumptions on f, g (nonconvex/nonmonotone/nondifferentiable) We may not know the distribution of v, w. 8 / 24

9 A New Problem: Stochastic Composition Optimization Expectation Minimization vs. Stochastic Composition Optimization Recall the classical problem: min x X E [f (x, ξ)] }{{} linear w.r.t the distribution of ξ In stochastic composition optimization, the objective is no longer a linear functional of the (v, w) distribution: min E[f v (E [g w (x)])] x X }{{} nonlinear w.r.t the distribution of (w,v) In the classical problem, nice properties come from linearity w.r.t. data distribution In stochastic composition optimization, they are all lost A little nonlinearity takes a long way to go. 9 / 24

10 A New Problem: Stochastic Composition Optimization Motivating Example: High-Dimensional Nonparametric Estimation Sparse Additive Model (SpAM): d y i = h j (x ij ) + ɛ i. j=1 High-dimensional feature space with relatively few data samples: Sample size n Features!d Features d Sample size!n n d n d Optimization model for SpAM 1 : [ min E[f v (E [g w (x)])] x min E Y h j H j d ] 2 d h j (X j ) + λ E [ hj 2(X j ) ] The term λ d j=1 E [ hj 2(X j ) ] induces sparsity in the feature space. j=1 j=1 1 P. Ravikumar, J. Lafferty, H. Liu, and L. Wasserman. Sparse additive models. Journal of the Royal Statistical Society: Series B, 71(5): , / 24

11 A New Problem: Stochastic Composition Optimization Motivating Example: Risk-Averse Learning Consider the mean-variance minimization problem Its batch version is 1 min x N min x E a,b [l(x; a, b)] + λvar a,b [l(x; a, b)], ( 2 N l(x; a i, b i ) + λ N l(x; a i, b i ) 1 N l(x; a i, b i )). N N i=1 i=1 i=1 The variance Var[Z] = E [ (Z E[Z]) 2] is a composition between two functions Many other risk functions are equivalent to compositions of multiple expected-value functions (Shapiro, Dentcheva, Ruszcyǹski 2014) A central limit theorem for composite of multiple smooth functions has been established for risk metrics (Dentcheva, Penev, Ruszcyǹski 2016) No good way to optimize a risk-averse objective while learning from online data 11 / 24

12 A New Problem: Stochastic Composition Optimization Motivating Example: Reinforcement Learning On-policy reinforcement learning is to learn the value-per-state of a stochastic system. We want to solve a (huge) Bellman equations γp π V π + r π = V π, where P π is transition prob. matrix and r π are rewards, both unknown. On-policy learning aims to solve Bellman equation via blackbox simulation. It becomes a special stochastic composition optimization problem: min x E[f v (E [g w (x)])] min x R S E[A]x E[b] 2, where E[A] = I γp π and E[b] = r π. 12 / 24

13 Stochastic Composition Algorithms: Convergence and Sample Complexity Outline 1 Background: Why is SGD a good method? 2 A New Problem: Stochastic Composition Optimization 3 Stochastic Composition Algorithms: Convergence and Sample Complexity 4 Acceleration via Smoothing-Extrapolation 13 / 24

14 Stochastic Composition Algorithms: Convergence and Sample Complexity Problem Formulation min x X { F (x) := E[f v (E [g w (x)])] }{{} nonlinear w.r.t the distribution of (w,v) Sampling Oracle (SO) Upon query (x, y), the oracle returns: Noisy inner sample g w (x) and its noisy subgradient g w (x); Noisy outer gradient f v (y) Challenges Stochastic gradient descent (SGD) method does not work since an unbiased sample of the gradient g(x k ) f (g(x k )) is not available. Fenchel dual does not work except for rare conditions Sample average approximation (SAA) subject to curse of dimensionality. Sample complexity unclear }, 14 / 24

15 Stochastic Composition Algorithms: Convergence and Sample Complexity Basic Idea To approximate x k+1 = Π X { xk α k g(x k ) f (g(x k )) }, by a quasi-gradient iteration using estimates of g(x k ) Algorithm 1: Stochastic Compositional Gradient Descent (SCGD) Require: x 0, z 0 R n, y 0 R m, SO, K, stepsizes {α k } K k=1, and {β k} K k=1. Ensure: {x k } K k=1 for k = 1,, K do Query the SO and obtain g wk (x k ), g wk (x k ), f vk (y k+1 ) Update by y k+1 = (1 β k )y k + β k g wk (x k ), end for Remarks x k+1 = Π X { xk α k g wk (x k ) f vk (y k+1 ) }, Each iteration makes simple updates by interacting with SO Scalable with large-scale batch data and can process streaming data points online Considered for the first time by (Ermoliev 1976) as a stochastic approximation method without rate analysis 15 / 24

16 Stochastic Composition Algorithms: Convergence and Sample Complexity Sample Complexity (Wang et al., 2016) Under suitable conditions (inner function nonsmooth, outer function smooth), and X is bounded, let the stepsizes be α k = k 3/4, β k = k 1/2, we have that, if k is large enough, E F 2 k ( x t F 1 ) = O k k 1/4. t=k/2+1 (Optimal rate which matches the lowerbound for stochastic programming) Sample Convexity in Strongly Convex Case (Wang et al., 2016) Under suitable conditions (inner function nonsmooth, outer function smooth), suppose that the compositional function F ( ) is strongly convex, let the stepsizes be we have, if k is sufficiently large, α k = 1 k, and β k = 1 k 2/3, E [ x k x 2] ( 1 ) = O k 2/3. 16 / 24

17 Stochastic Composition Algorithms: Convergence and Sample Complexity Outline of Analysis The auxilary variable y k is taking a running estimate of g(x k ) at biased query points x 0,..., x k : k y k+1 = (Π k t =t β t )gwt (xt). t=0 Two entangled stochastic sequences {ɛ k = y k+1 g(x k ) 2 }, {ξ k = x k x 2 }. Coupled supermatingale analysis E [ɛ k+1 F k ] (1 β k )ɛ k + O(βk x k+1 x k 2 ) β k E [ξ k+1 F k ] (1 + α 2 k )ξ k α k (F (x k ) F ) + O(β k )ɛ k Almost sure convergence by using a Coupled Supermartingale Convergence Theorem (Wang and Bertsekas, 2013) Convergence rate analysis via optimizing over stepsizes and balancing noise-bias tradeoff 17 / 24

18 Acceleration via Smoothing-Extrapolation Outline 1 Background: Why is SGD a good method? 2 A New Problem: Stochastic Composition Optimization 3 Stochastic Composition Algorithms: Convergence and Sample Complexity 4 Acceleration via Smoothing-Extrapolation 18 / 24

19 Acceleration via Smoothing-Extrapolation Acceleration When the function g( ) is smooth, can the algorithms be accelerated? Yes! Algorithm 2: Accelerated SCGD Require: x 0, z 0 R n, y 0 R m, SO, K, stepsizes {α k } K k=1, and {β k } K k=1. Ensure: {x k } K k=1 for k = 1,, K do Query the SO and obtain f vk (y k ), g wk (z k ). Update by x k+1 = x k α k g w k (x k ) f vk (y k ). Update auxiliary variables via extrapolation-smoothing: ) z k+1 = (1 1βk x k + 1 x k+1, β k y k+1 = (1 β k )y k + β k g wk+1 (z k+1 ), where the sample g wk+1 (z k+1 ) is obtained via querying the SO. end for Key to the Acceleration Bias reduction by averaging over extrapolated points (extrapolation-smoothing) y k = k ( k (Π k t =t β t )gwt (z t) g(x k ) = g (Π k t =t β t ). )zt t=0 t=0 19 / 24

20 Acceleration via Smoothing-Extrapolation Accelerated Sample Complexity (Wang et al. 2016) Under suitable conditions (inner function smooth, outer function smooth), if the stepsizes are chosen as α k = k 5/7, β k = k 4/7, we have, where ˆx k = 2 k k t=k/2+1 xt. [ E F (ˆx k ) F ] ( 1 ) = O k 2/7, Strongly Convex Case (Wang et al. 2016) Under suitable conditions (inner function smooth, outer function smooth),, and we assume that F is strongly-convex. If the stepsizes are chosen as we have, α k = 1 k, and β k = 1 k 4/5, ( 1 ) E[ x k x 2 ] = O k 4/5. 20 / 24

21 Acceleration via Smoothing-Extrapolation Regularized Stochastic Composition Optimization(Wang and Liu 2016) min x R n The penalty term R(x) is convex and nonsmooth. (E v f v E w g w )(x) + R(x) Algorithm 3: Accelerated Stochastic Compositional Proximal Gradient (ASC-PG) Require: x 0, z 0 R n, y 0 R m, SO, K, stepsizes {α k } K k=1, and {β k} K k=1. Ensure: {x k } K k=1 for k = 1,, K do Query the SO and obtain f vk (y k ), g wk (z k ). Update the main iterate by ( ) x k+1 = prox αk R( ) x k α k gw k (x k ) f vk (y k ). Update auxillary iterates by an extrapolation-smoothing scheme: ) z k+1 = (1 1βk x k + 1 x k+1, β k y k+1 = (1 β k )y k + β k g wk+1 (z k+1 ), where the sample g wk+1 (z k+1 ) is obtained via querying the SO. end for 21 / 24

22 Acceleration via Smoothing-Extrapolation Sample Complexity for Smooth Optimization (Wang and Liu 2016) Under suitable conditions (inner function nonsmooth, outer function smooth), and X is bounded, let the stepsizes be chosen properly, we have that, if k is large enough, E F 2 k ( x t F 1 ) = O k k 4/9, t=k/2+1 If either the outer or the inner function is linear, the rate improves to be optimal: E F 2 k ( x t F 1 ) = O k k 1/2, t=k/2+1 Sample Convexity in Strongly Convex Case (Wang and Liu 2016) Suppose that the compositional function F ( ) is strongly convex, let the stepsizes be chosen properly, we have, if k is sufficiently large, E [ x k x 2] ( 1 ) = O k 4/5. If either the outer or the inner function is linear, the rate improves to be optimal: E [ x k x 2] ( 1 ) = O k / 24

23 Acceleration via Smoothing-Extrapolation When there are two nested uncertainties: A class of two-timescale algorithms that update using first-order samples Analyzing convergence becomes harder: two coupled stochastic processes and smoothness-noise interplay Convergence rates of stochastic algorithms establish sample complexity upper bounds for the new problem class General Convex Strongly Convex Outer Nonsmooth, Inner Smooth O ( k 1/4) O ( k 2/3)? Outer and Inner Smooth O ( k 4/9)? O ( k 4/5)? Special Case: min x E [f (x; ξ)] O ( k 1/2) O ( k 1) Special Case: min x E [f (E[A]x E[b]; ξ)] O ( k 1/2) O ( k 1) Applications and computations Table : Summary of best known sample complexities First scalable algorithm for sparse nonparametric estimation Optimal algorithm for on-policy reinforcement learning 23 / 24

24 Acceleration via Smoothing-Extrapolation Summary Stochastic Composition Optimization - a new and rich problem class min E[f v (E [g w (x)])] x }{{} nonlinear w.r.t the distribution of (w,v) Applications in risk, data analysis, machine learning, real-time intelligent systems A class of stochastic compositional gradient methods with convergence guarantees. Basic sample complexity being developed: Outer Nonsmooth, Inner Smooth Outer and Inner Smooth Special Case: minx E [f (x; ξ)] Special Case: minx E [f (E[A]x E[b]; ξ)] General/Convex ( O k 1/4) ( O k 4/9)? ( O k 1/2) ( O k 1/2) Strongly Convex ( O k 2/3)? ( O k 4/5)? O (k 1) O (k 1) Many open questions remaining and more works needed! Thank you very much! 24 / 24

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

Accelerating Stochastic Composition Optimization

Accelerating Stochastic Composition Optimization Journal of Machine Learning Research 18 (2017) 1-23 Submitted 10/16; Revised 6/17; Published 10/17 Accelerating Stochastic Composition Optimization Mengdi Wang Department of Operations Research and Financial

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Finite-sum Composition Optimization via Variance Reduced Gradient Descent

Finite-sum Composition Optimization via Variance Reduced Gradient Descent Finite-sum Composition Optimization via Variance Reduced Gradient Descent Xiangru Lian Mengdi Wang Ji Liu University of Rochester Princeton University University of Rochester xiangru@yandex.com mengdiw@princeton.edu

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Joint work with Nicolas Le Roux and Francis Bach University of British Columbia Context: Machine Learning for Big Data Large-scale

More information

Ergodic Subgradient Descent

Ergodic Subgradient Descent Ergodic Subgradient Descent John Duchi, Alekh Agarwal, Mikael Johansson, Michael Jordan University of California, Berkeley and Royal Institute of Technology (KTH), Sweden Allerton Conference, September

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

How hard is this function to optimize?

How hard is this function to optimize? How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize

More information

An interior-point stochastic approximation method and an L1-regularized delta rule

An interior-point stochastic approximation method and an L1-regularized delta rule Photograph from National Geographic, Sept 2008 An interior-point stochastic approximation method and an L1-regularized delta rule Peter Carbonetto, Mark Schmidt and Nando de Freitas University of British

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Linearly-Convergent Stochastic-Gradient Methods

Linearly-Convergent Stochastic-Gradient Methods Linearly-Convergent Stochastic-Gradient Methods Joint work with Francis Bach, Michael Friedlander, Nicolas Le Roux INRIA - SIERRA Project - Team Laboratoire d Informatique de l École Normale Supérieure

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - January 2013 Context Machine

More information

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos 1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - CAP, July

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Stochastic gradient methods for machine learning

Stochastic gradient methods for machine learning Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines, Nicolas Le Roux and Mark Schmidt - September 2012 Context Machine

More information

A random perturbation approach to some stochastic approximation algorithms in optimization.

A random perturbation approach to some stochastic approximation algorithms in optimization. A random perturbation approach to some stochastic approximation algorithms in optimization. Wenqing Hu. 1 (Presentation based on joint works with Chris Junchi Li 2, Weijie Su 3, Haoyi Xiong 4.) 1. Department

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Gradient Sliding for Composite Optimization

Gradient Sliding for Composite Optimization Noname manuscript No. (will be inserted by the editor) Gradient Sliding for Composite Optimization Guanghui Lan the date of receipt and acceptance should be inserted later Abstract We consider in this

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Stochastic Gradient Descent. CS 584: Big Data Analytics

Stochastic Gradient Descent. CS 584: Big Data Analytics Stochastic Gradient Descent CS 584: Big Data Analytics Gradient Descent Recap Simplest and extremely popular Main Idea: take a step proportional to the negative of the gradient Easy to implement Each iteration

More information

Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex Optimization

Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex Optimization Massachusetts Institute of Technology, Cambridge, MA Laboratory for Information and Decision Systems Report LIDS-P-907, July 013 Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex

More information

Fast Stochastic Optimization Algorithms for ML

Fast Stochastic Optimization Algorithms for ML Fast Stochastic Optimization Algorithms for ML Aaditya Ramdas April 20, 205 This lecture is about efficient algorithms for minimizing finite sums min w R d n i= f i (w) or min w R d n f i (w) + λ 2 w 2

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression

On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression On Projected Stochastic Gradient Descent Algorithm with Weighted Averaging for Least Squares Regression arxiv:606.03000v [cs.it] 9 Jun 206 Kobi Cohen, Angelia Nedić and R. Srikant Abstract The problem

More information

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4

Statistical Machine Learning II Spring 2017, Learning Theory, Lecture 4 Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Stochastic Gradient Descent in Continuous Time

Stochastic Gradient Descent in Continuous Time Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

Basis Adaptation for Sparse Nonlinear Reinforcement Learning

Basis Adaptation for Sparse Nonlinear Reinforcement Learning Basis Adaptation for Sparse Nonlinear Reinforcement Learning Sridhar Mahadevan, Stephen Giguere, and Nicholas Jacek School of Computer Science University of Massachusetts, Amherst mahadeva@cs.umass.edu,

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

Optimal Regularized Dual Averaging Methods for Stochastic Optimization

Optimal Regularized Dual Averaging Methods for Stochastic Optimization Optimal Regularized Dual Averaging Methods for Stochastic Optimization Xi Chen Machine Learning Department Carnegie Mellon University xichen@cs.cmu.edu Qihang Lin Javier Peña Tepper School of Business

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Statistical Sparse Online Regression: A Diffusion Approximation Perspective

Statistical Sparse Online Regression: A Diffusion Approximation Perspective Statistical Sparse Online Regression: A Diffusion Approximation Perspective Jianqing Fan Wenyan Gong Chris Junchi Li Qiang Sun Princeton University Princeton University Princeton University University

More information

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS

More information

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization Modern Stochastic Methods Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization 10-725 Last time: conditional gradient method For the problem min x f(x) subject to x C where

More information

arxiv: v1 [stat.ml] 12 Nov 2015

arxiv: v1 [stat.ml] 12 Nov 2015 Random Multi-Constraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints Mengdi Wang, Yichen Chen, Jialin Liu, Yuantao Gu arxiv:5.03760v [stat.ml] Nov 05 November 3,

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

On the fast convergence of random perturbations of the gradient flow.

On the fast convergence of random perturbations of the gradient flow. On the fast convergence of random perturbations of the gradient flow. Wenqing Hu. 1 (Joint work with Chris Junchi Li 2.) 1. Department of Mathematics and Statistics, Missouri S&T. 2. Department of Operations

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Nesterov s Acceleration

Nesterov s Acceleration Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x

More information

Stochastic Gradient Descent with Only One Projection

Stochastic Gradient Descent with Only One Projection Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Stochastic Optimization Part I: Convex analysis and online stochastic optimization

Stochastic Optimization Part I: Convex analysis and online stochastic optimization Stochastic Optimization Part I: Convex analysis and online stochastic optimization Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

Proximal Gradient Temporal Difference Learning Algorithms

Proximal Gradient Temporal Difference Learning Algorithms Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Proximal Gradient Temporal Difference Learning Algorithms Bo Liu, Ji Liu, Mohammad Ghavamzadeh, Sridhar

More information

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method

ORIE 4741: Learning with Big Messy Data. Proximal Gradient Method ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:

More information

Linear classifiers: Overfitting and regularization

Linear classifiers: Overfitting and regularization Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

Stochastic First-Order Methods with Random Constraint Projection

Stochastic First-Order Methods with Random Constraint Projection Stochastic First-Order Methods with Random Constraint Projection The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Reinforcement Learning. Yishay Mansour Tel-Aviv University

Reinforcement Learning. Yishay Mansour Tel-Aviv University Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Tianyu Wang: tiw161@eng.ucsd.edu Yongxi Lu: yol070@eng.ucsd.edu

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Reinforcement Learning as Variational Inference: Two Recent Approaches

Reinforcement Learning as Variational Inference: Two Recent Approaches Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info,

More information

Non-Smooth, Non-Finite, and Non-Convex Optimization

Non-Smooth, Non-Finite, and Non-Convex Optimization Non-Smooth, Non-Finite, and Non-Convex Optimization Deep Learning Summer School Mark Schmidt University of British Columbia August 2015 Complex-Step Derivative Using complex number to compute directional

More information

J. Sadeghi E. Patelli M. de Angelis

J. Sadeghi E. Patelli M. de Angelis J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of

More information

Static Parameter Estimation using Kalman Filtering and Proximal Operators Vivak Patel

Static Parameter Estimation using Kalman Filtering and Proximal Operators Vivak Patel Static Parameter Estimation using Kalman Filtering and Proximal Operators Viva Patel Department of Statistics, University of Chicago December 2, 2015 Acnowledge Mihai Anitescu Senior Computational Mathematician

More information

Asynchronous Non-Convex Optimization For Separable Problem

Asynchronous Non-Convex Optimization For Separable Problem Asynchronous Non-Convex Optimization For Separable Problem Sandeep Kumar and Ketan Rajawat Dept. of Electrical Engineering, IIT Kanpur Uttar Pradesh, India Distributed Optimization A general multi-agent

More information