automating the analysis and design of large-scale optimization algorithms

Size: px
Start display at page:

Download "automating the analysis and design of large-scale optimization algorithms"

Transcription

1 automating the analysis and design of large-scale optimization algorithms Laurent Lessard University of Wisconsin Madison Joint work with Ben Recht, Andy Packard, Bin Hu, Bryan Van Scoy, Saman Cyrus October, 2018

2 2 Unconstrained optimization: minimize subject to f(x) x R N need algorithms that are fast and simple currently favored family: first-order methods

3 Gradient method x k+1 = x k α f(x k ) Heavy ball method x k+1 = x k α f(x k ) + β(x k x k 1 ) Error Nesterov s accelerated method y k = x k + β(x k x k 1 ) x k+1 = y k α f(y k ) x 0 x 1 contours of f(x) (quadratic) 3

4 4 1. Many algorithms can be viewed as dynamical systems with feedback (control systems!). algorithm convergence system stability 2. By solving a small convex program, we can recover state-of-the-art convergence results for these algorithms, automatically and efficiently. 3. The ultimate goal: to move from analysis to design.

5 5 Robust algorithm selection G G : f S : algorithm to use function to minimize ( ) G opt = arg min max cost(f, G) G G f S Similar problem for a finite number of iterations: Drori, Teboulle (2012) Taylor, Hendrickx, Glineur (2016)

6 6 Gradient method x k+1 = x k α f(x k ) Heavy ball method G G x k+1 = x k α f(x k ) + β(x k x k 1 ) Nesterov s accelerated method x k+1 = x k α f ( x k + β(x k x k 1 ) ) + β(x k x k 1 ) Analytically solvable: f S Quadratic functions: f(x) = 1 2 xt Qx p T x with the constraint: m λ(q) L

7 7 Convergence rate ρ Convergence rate on quadratic functions Gradient (quadratic) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Iterations to convergence for Gradient method Condition ratio L/m 1 1/2 Convergence rate : x k x Cρ k x 0 x Iterations to convergence 1 log ρ

8 8 Robust algorithm selection G G : f S : algorithm to use function to minimize ( ) G opt = arg min max cost(f, G) G G f S 1. mathematical representation for G 2. mathematical representation for S 3. main robustness result

9 9 Dynamical system interpretation Heavy ball: x k+1 = x k α f(x k ) + β(x k x k 1 ) Define u k := f(x k ) and p k := x k 1 u [ xk+1 p k+1 algorithm (linear, known, decoupled) ] [ ] [ ] (1 + β)i βi xk = + I 0 p k y k = [ I 0 ] [ ] x k p k [ αi 0 ] u k y u k = f(y k ) function (nonlinear, uncertain, coupled)

10 10 Dynamical system interpretation Heavy ball: x k+1 = x k α f(x k ) + β(x k x k 1 ) Define u k := f(x k ) and p k := x k 1 u algorithm (linear, known, decoupled) [ [ ] [ ] [ ] [ ] (xk+1 [ ) i ] 1 β β (xk (xk+1 ) ) i α i ] [ [ ] 1 β β ] [ [ ] (xk ) i ] [ [ ] α ] (u (p (xk+1 k+1 ) ) i i 1 + β 0 β (p (xk k ) ) i i 0 α (u k ) i (p k+1 ) i = 1 (y k ) i [ 1 0 ] [ 0 (p ] k ) i + 0 (u k ) i (x k ) i (y k ) i [ 1 0 ] [ ] (p k+1 ) i 1 (x k ) i 0 (p k ) i 0 k ) i (p (p k ) (y k ) i = [ 1 0 ] [ ] (x k ) i i (p k ) i k ) i i = 1,..., N y u k = f(y k ) function (nonlinear, uncertain, coupled)

11 11 u G ξ y ξ k+1 = Aξ k + Bu k y k = Cξ k f u k = f(y k ) [ A B C 0 [ ] 1 α 1 0 ] 1 + β β α = β β α β β 0 Gradient Heavy ball Nesterov

12 u f y {}}{ f(x) f(x) f(x) f(x) : x x x linear sector bounded + slope restricted sector bounded f(x) f(x) f(x) f(x) : x x x quadratic strongly convex + Lipschitz gradients radially quasiconvex 12

13 13 u f y Representing function classes express as quadratic constraints on (y, u) u k f is a passive function: y k u k y k 0 sector bounded

14 14 u f y Representing function classes express as quadratic constraints on (y, u) u k L m y k [ yk u k f is sector-bounded: ] T [ ] [ ] 2mL m + L yk 0 m + L 2 u k sector bounded

15 15 u f y Representing function classes express as quadratic constraints on (y, u) u k L m y k f is sector-bounded + slope-restricted: constraint on (y k, u k ) depends on history (y 0,..., y k 1, u 0,..., u k 1 ). sector bounded + slope restricted

16 16 u f y Ψ ζ z Introduce extra dynamics Design dynamics Ψ and multiplier matrix M. Instead of using q(u k, y k ), use z T k Mz k. Systematic way of doing this for strong convexity (Zames & Falb, 1968) General theory: Integral Quadratic Constraints (Megretski & Rantzer, 1997)

17 17 u G f y [ 1 ] α 1 0 [ 1 + β β α [ 1 + β β α β β 0 ] ] Gradient Heavy ball Nesterov [ A B C 0 ] f(x) f(x) f(x) x x x f is quadratic f is strongly convex f is quasiconvex } {{ } (Ψ, M)

18 Main result 18 Problem data: G (the algorithm) Ψ (what we know about f) Auxiliary quantities: Compute (Â, ˆB, Ĉ, ˆD) matrices from (G, Ψ) Choose a candidate rate 0 < ρ < 1. Size of LMI does not grow with problem dimension! e.g. P S 3 3, LMI S 4 4 If there exists P 0 such that [ÂT P Â ρ2 P Â T P ˆB ] ˆB T P Â ˆBT P ˆB + [ Ĉ ˆD ] T [Ĉ M ˆD] 0 then x k x cond(p ) ρ k x 0 x for all k.

19 main results: analytic and numerical 19

20 Gradient method 20 x k+1 = x k α f(x k ) Convergence rate ρ Convergence rate for Gradient method Gradient (all functions) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Iterations to convergence for Gradient method / Condition ratio L/m analytic solution! Same rate for: quadratics, strongly convex, or quasiconvex functions.

21 Nesterov s method 21 x k+1 = x k α f(x k + β(x k x k 1 )) + β(x k x k 1 ) 1 Nesterov rate bounds 10 3 Nesterov iterations Convergence rate ρ IQC (quasiconvex) 0.4 IQC (strongly convex) Nesterov (strongly convex) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Condition ratio L/m Cannot certify stability for quasiconvex functions IQC bound improves upon best known bound!

22 Heavy ball method 22 x k+1 = x k α f(x k ) + β(x k x k 1 ) 1 Heavy ball rate bounds 10 3 Heavy ball iterations Convergence rate ρ IQC (quasiconvex) IQC (strongly convex) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Condition ratio L/m Cannot certify stability for quasiconvex functions Cannot certify stability for strongly convex functions

23 The heavy ball method is not stable! 25 2 x2 x < 1 1 counterexample: f(x) = 2 x2 + 24x 12 1 x < x2 24x + 36 x 2 and start the heavy ball iteration at x 0 = x 1 [3.07, 3.46] f(x) L/m = 25 heavy ball iterations converge to a limit cycle simple counterexample to the Aizerman (1949) and Kalman (1957) conjectures 23

24 uncharted territory: noise robustness and algorithm design 24

25 Noise robustness 25 δ u G ξ y The δ block is uncertain multiplicative noise: u k w k δ w k w f How does an algorithm perform in the presence of noise?

26 26 Gradient method, α = 2 L+m (optimal stepsize with no noise) Convergence rate ρ Rates for different δ 0.2 δ {.01,.02,.05,.1,.2,.5} Gradient method Iterations to convergence Iterations for different δ δ {.01,.02,.05,.1,.2,.5} Gradient method Gradient method, α = 1 L 1 (more conservative stepsize) 10 3 Convergence rate ρ Rates for different δ 0.2 δ {.01,.02,.05,.1,.2,.5} Gradient method Iterations to convergence Iterations for different δ δ {.01,.02,.05,.1,.2,.5} Gradient method

27 27 Nesterov s method (strongly convex f, with noise) Convergence rate ρ Rates for different δ 0.2 δ {.05,.1,.2,.3,.4,.5} Nesterov (quadratic) Iterations to convergence Iterations for different δ δ {.05,.1,.2,.3,.4,.5} Nesterov (quadratic) Condition ratio L/m Condition ratio L/m Nesterov s method is not robust to noise. can we have it all? (robustness AND performance)

28 28 Design approach parameterize all proper G of degree 2 parameterization in terms of (α, β, η): x k+1 = x k α f ( ) y k + β(xk x k 1 ) y k = x k + η(x k x k 1 ) Special cases: (α, 0, 0) (α, β, η) = (α, β, 0) (α, β, β) Gradient Heavy ball Nesterov

29 29 Robust Momentum Method (ACC 18) x k+1 = x k α f ( y k ) + β(xk x k 1 ) y k = x k + η(x k x k 1 ) Parameters designed via root locus technique: α = κ(1 ρ)2 (1+ρ) L, β = κρ3 κ 1, η = ρ 3 (κ 1)(1 ρ) 2 (1+ρ) tuning parameter: 1 1 κ }{{} fast + fragile ρ 1 1 κ }{{} slow + robust When ρ = 1 1 κ, recover Gradient with α = 1 L When ρ = 1 1 κ, recover Triple Momentum Method with optimal tuning (Van Scoy et al, 2018)

30 30 Trade-off: performance vs robustness 1 Convergence rate (ρ) 1 1 κ κ 1 κ κ Noise strength (δ) Gradient (α = 2/(L + m)) Gradient (α = 1/L) Gradient (min over α) Nesterov method RMM (ρ = 1 1/ κ) RMM (intermediate 1) RMM (intermediate 2) RMM (min over ρ)

31 Related works 31 In this talk Unified analysis framework (SIOPT 16) Robust momentum method (ACC 18) Other families of algorithms Operator-splitting methods, e.g. ADMM (ICML 15) Weakly convex functions (1/k and 1/k 2 rates) (ICML 17) Distributed optimization algorithms (ALLER 17) Stochastic variance reduction, e.g. SVRG (ICML 18)

32 32 Thank you! Manuscripts + code available:

Dissipativity Theory for Nesterov s Accelerated Method

Dissipativity Theory for Nesterov s Accelerated Method Bin Hu Laurent Lessard Abstract In this paper, we adapt the control theoretic concept of dissipativity theory to provide a natural understanding of Nesterov s accelerated method. Our theory ties rigorous

More information

Gradient Methods Using Momentum and Memory

Gradient Methods Using Momentum and Memory Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set

More information

arxiv: v1 [math.oc] 24 Sep 2018

arxiv: v1 [math.oc] 24 Sep 2018 Unified Necessary and Sufficient Conditions for the Robust Stability of Interconnected Sector-Bounded Systems Saman Cyrus Laurent Lessard arxiv:8090874v mathoc 4 Sep 08 Abstract Classical conditions for

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition

Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition Huaqing Xiong Yuejie Chi Bin Hu Wei Zhang October 2018 arxiv:1810.03229v1 [math.oc

More information

Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs

Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs Bin Hu Peter Seiler Laurent Lessard November, 017 Abstract We present convergence rate analysis

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Optimized first-order minimization methods

Optimized first-order minimization methods Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19 Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Adaptive Restarting for First Order Optimization Methods

Adaptive Restarting for First Order Optimization Methods Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation

More information

A Conservation Law Method in Optimization

A Conservation Law Method in Optimization A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu

More information

arxiv: v1 [math.oc] 3 Nov 2017

arxiv: v1 [math.oc] 3 Nov 2017 Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs arxiv:1711.00987v1 [math.oc] 3 Nov 2017 Bin Hu Peter Seiler Laurent Lessard November 6, 2017

More information

Exponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints

Exponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints Exponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints 1 Bin Hu and Peter Seiler Abstract This paper develops linear matrix inequality (LMI) conditions to test

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

GRADIENT = STEEPEST DESCENT

GRADIENT = STEEPEST DESCENT GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,

More information

Distributed Consensus Optimization

Distributed Consensus Optimization Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized

More information

4. Convex optimization problems

4. Convex optimization problems Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization

More information

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review

ECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Estimating the region of attraction of uncertain systems with Integral Quadratic Constraints

Estimating the region of attraction of uncertain systems with Integral Quadratic Constraints Estimating the region of attraction of uncertain systems with Integral Quadratic Constraints Andrea Iannelli, Peter Seiler and Andrés Marcos Abstract A general framework for Region of Attraction (ROA)

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

Convex optimization problems. Optimization problem in standard form

Convex optimization problems. Optimization problem in standard form Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality

More information

LINEAR QUADRATIC OPTIMAL CONTROL BASED ON DYNAMIC COMPENSATION. Received October 2010; revised March 2011

LINEAR QUADRATIC OPTIMAL CONTROL BASED ON DYNAMIC COMPENSATION. Received October 2010; revised March 2011 International Journal of Innovative Computing, Information and Control ICIC International c 22 ISSN 349-498 Volume 8, Number 5(B), May 22 pp. 3743 3754 LINEAR QUADRATIC OPTIMAL CONTROL BASED ON DYNAMIC

More information

On the Influence of Momentum Acceleration on Online Learning

On the Influence of Momentum Acceleration on Online Learning Journal of Machine Learning Research 17 016) 1-66 Submitted 3/16; Revised 8/16; Published 10/16 On the Influence of Momentum Acceleration on Online Learning Kun Yuan Bicheng Ying Ali H. Sayed Department

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Decentralized LQG Control of Systems with a Broadcast Architecture

Decentralized LQG Control of Systems with a Broadcast Architecture Decentralized LQG Control of Systems with a Broadcast Architecture Laurent Lessard 1,2 IEEE Conference on Decision and Control, pp. 6241 6246, 212 Abstract In this paper, we consider dynamical subsystems

More information

Non-convex optimization. Issam Laradji

Non-convex optimization. Issam Laradji Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725

Gradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725 Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

Stability Multipliers for MIMO Monotone Nonlinearities

Stability Multipliers for MIMO Monotone Nonlinearities Stability Multipliers for MIMO Monotone Nonlinearities Ricardo Mancera and Michael G Safonov Abstract For block structured monotone or incrementally positive n-dimensional nonlinearities, the largest class

More information

Nonlinear Model Predictive Control Tools (NMPC Tools)

Nonlinear Model Predictive Control Tools (NMPC Tools) Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

GLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS

GLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS GLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS Jorge M. Gonçalves, Alexandre Megretski y, Munther A. Dahleh y California Institute of Technology

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

A New Look at the Performance Analysis of First-Order Methods

A New Look at the Performance Analysis of First-Order Methods A New Look at the Performance Analysis of First-Order Methods Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Yoel Drori, Google s R&D Center, Tel Aviv Optimization without

More information

Stochastic dynamical modeling:

Stochastic dynamical modeling: Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou

More information

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization

Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex

More information

Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain Systems

Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain Systems Proceedings of the 27 American Control Conference Marriott Marquis Hotel at Times Square New York City, USA, July -3, 27 FrC.4 Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Optimal control and estimation

Optimal control and estimation Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011

More information

H 2 Adaptive Control. Tansel Yucelen, Anthony J. Calise, and Rajeev Chandramohan. WeA03.4

H 2 Adaptive Control. Tansel Yucelen, Anthony J. Calise, and Rajeev Chandramohan. WeA03.4 1 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 3-July, 1 WeA3. H Adaptive Control Tansel Yucelen, Anthony J. Calise, and Rajeev Chandramohan Abstract Model reference adaptive

More information

Output Feedback Control of Linear Systems with Input and Output Quantization

Output Feedback Control of Linear Systems with Input and Output Quantization Output Feedback Control of Linear Systems with Input and Output Quantization Daniel Coutinho Minyue Fu Carlos E. de Souza Abstract A considerable amount of research has been done on the use of logarithmic

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Non-negative Matrix Factorization via accelerated Projected Gradient Descent

Non-negative Matrix Factorization via accelerated Projected Gradient Descent Non-negative Matrix Factorization via accelerated Projected Gradient Descent Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium Email: manshun.ang@umons.ac.be Homepage: angms.science

More information

arxiv: v2 [math.oc] 28 Nov 2017

arxiv: v2 [math.oc] 28 Nov 2017 Adaptive Restart of the Optimized Gradient Method for Convex Optimization Donghwan Kim Jeffrey A. Fessler arxiv:703.0464v [math.oc] 8 Nov 07 Date of current version: November 9, 07 Abstract First-order

More information

Lecture: Convex Optimization Problems

Lecture: Convex Optimization Problems 1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization

More information

IQC analysis of linear constrained MPC

IQC analysis of linear constrained MPC QC analysis of linear constrained MPC W. P. Heath, G. Li, A. G. Wills and B. Lennox A version of this technical report was presented at the EEE sponsored Colloquium on Predictive Control, Sheffield, April

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

arxiv: v3 [math.oc] 8 Jan 2019

arxiv: v3 [math.oc] 8 Jan 2019 Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate

More information

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez

FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES Danlei Chu Tongwen Chen Horacio J Marquez Department of Electrical and Computer Engineering University of Alberta Edmonton

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

Expanding the reach of optimal methods

Expanding the reach of optimal methods Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization

Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of

More information

Accelerate Subgradient Methods

Accelerate Subgradient Methods Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Model reduction for linear systems by balancing

Model reduction for linear systems by balancing Model reduction for linear systems by balancing Bart Besselink Jan C. Willems Center for Systems and Control Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen, Groningen,

More information

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,

More information

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018

Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018 Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected

More information

Written Examination

Written Examination Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Lecture 5: September 15

Lecture 5: September 15 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

CONSTRAINED MODEL PREDICTIVE CONTROL ON CONVEX POLYHEDRON STOCHASTIC LINEAR PARAMETER VARYING SYSTEMS. Received October 2012; revised February 2013

CONSTRAINED MODEL PREDICTIVE CONTROL ON CONVEX POLYHEDRON STOCHASTIC LINEAR PARAMETER VARYING SYSTEMS. Received October 2012; revised February 2013 International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 10, October 2013 pp 4193 4204 CONSTRAINED MODEL PREDICTIVE CONTROL ON CONVEX

More information

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

arxiv: v4 [math.oc] 1 Apr 2018

arxiv: v4 [math.oc] 1 Apr 2018 Generalizing the optimized gradient method for smooth convex minimization Donghwan Kim Jeffrey A. Fessler arxiv:607.06764v4 [math.oc] Apr 08 Date of current version: April 3, 08 Abstract This paper generalizes

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Numerical Approximation of L 1 Monge-Kantorovich Problems

Numerical Approximation of L 1 Monge-Kantorovich Problems Rome March 30, 2007 p. 1 Numerical Approximation of L 1 Monge-Kantorovich Problems Leonid Prigozhin Ben-Gurion University, Israel joint work with J.W. Barrett Imperial College, UK Rome March 30, 2007 p.

More information

Multilevel Optimization Methods: Convergence and Problem Structure

Multilevel Optimization Methods: Convergence and Problem Structure Multilevel Optimization Methods: Convergence and Problem Structure Chin Pang Ho, Panos Parpas Department of Computing, Imperial College London, United Kingdom October 8, 206 Abstract Building upon multigrid

More information

From Convex Optimization to Linear Matrix Inequalities

From Convex Optimization to Linear Matrix Inequalities Dep. of Information Engineering University of Pisa (Italy) From Convex Optimization to Linear Matrix Inequalities eng. Sergio Grammatico grammatico.sergio@gmail.com Class of Identification of Uncertain

More information