automating the analysis and design of large-scale optimization algorithms
|
|
- Scot Flynn
- 5 years ago
- Views:
Transcription
1 automating the analysis and design of large-scale optimization algorithms Laurent Lessard University of Wisconsin Madison Joint work with Ben Recht, Andy Packard, Bin Hu, Bryan Van Scoy, Saman Cyrus October, 2018
2 2 Unconstrained optimization: minimize subject to f(x) x R N need algorithms that are fast and simple currently favored family: first-order methods
3 Gradient method x k+1 = x k α f(x k ) Heavy ball method x k+1 = x k α f(x k ) + β(x k x k 1 ) Error Nesterov s accelerated method y k = x k + β(x k x k 1 ) x k+1 = y k α f(y k ) x 0 x 1 contours of f(x) (quadratic) 3
4 4 1. Many algorithms can be viewed as dynamical systems with feedback (control systems!). algorithm convergence system stability 2. By solving a small convex program, we can recover state-of-the-art convergence results for these algorithms, automatically and efficiently. 3. The ultimate goal: to move from analysis to design.
5 5 Robust algorithm selection G G : f S : algorithm to use function to minimize ( ) G opt = arg min max cost(f, G) G G f S Similar problem for a finite number of iterations: Drori, Teboulle (2012) Taylor, Hendrickx, Glineur (2016)
6 6 Gradient method x k+1 = x k α f(x k ) Heavy ball method G G x k+1 = x k α f(x k ) + β(x k x k 1 ) Nesterov s accelerated method x k+1 = x k α f ( x k + β(x k x k 1 ) ) + β(x k x k 1 ) Analytically solvable: f S Quadratic functions: f(x) = 1 2 xt Qx p T x with the constraint: m λ(q) L
7 7 Convergence rate ρ Convergence rate on quadratic functions Gradient (quadratic) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Iterations to convergence for Gradient method Condition ratio L/m 1 1/2 Convergence rate : x k x Cρ k x 0 x Iterations to convergence 1 log ρ
8 8 Robust algorithm selection G G : f S : algorithm to use function to minimize ( ) G opt = arg min max cost(f, G) G G f S 1. mathematical representation for G 2. mathematical representation for S 3. main robustness result
9 9 Dynamical system interpretation Heavy ball: x k+1 = x k α f(x k ) + β(x k x k 1 ) Define u k := f(x k ) and p k := x k 1 u [ xk+1 p k+1 algorithm (linear, known, decoupled) ] [ ] [ ] (1 + β)i βi xk = + I 0 p k y k = [ I 0 ] [ ] x k p k [ αi 0 ] u k y u k = f(y k ) function (nonlinear, uncertain, coupled)
10 10 Dynamical system interpretation Heavy ball: x k+1 = x k α f(x k ) + β(x k x k 1 ) Define u k := f(x k ) and p k := x k 1 u algorithm (linear, known, decoupled) [ [ ] [ ] [ ] [ ] (xk+1 [ ) i ] 1 β β (xk (xk+1 ) ) i α i ] [ [ ] 1 β β ] [ [ ] (xk ) i ] [ [ ] α ] (u (p (xk+1 k+1 ) ) i i 1 + β 0 β (p (xk k ) ) i i 0 α (u k ) i (p k+1 ) i = 1 (y k ) i [ 1 0 ] [ 0 (p ] k ) i + 0 (u k ) i (x k ) i (y k ) i [ 1 0 ] [ ] (p k+1 ) i 1 (x k ) i 0 (p k ) i 0 k ) i (p (p k ) (y k ) i = [ 1 0 ] [ ] (x k ) i i (p k ) i k ) i i = 1,..., N y u k = f(y k ) function (nonlinear, uncertain, coupled)
11 11 u G ξ y ξ k+1 = Aξ k + Bu k y k = Cξ k f u k = f(y k ) [ A B C 0 [ ] 1 α 1 0 ] 1 + β β α = β β α β β 0 Gradient Heavy ball Nesterov
12 u f y {}}{ f(x) f(x) f(x) f(x) : x x x linear sector bounded + slope restricted sector bounded f(x) f(x) f(x) f(x) : x x x quadratic strongly convex + Lipschitz gradients radially quasiconvex 12
13 13 u f y Representing function classes express as quadratic constraints on (y, u) u k f is a passive function: y k u k y k 0 sector bounded
14 14 u f y Representing function classes express as quadratic constraints on (y, u) u k L m y k [ yk u k f is sector-bounded: ] T [ ] [ ] 2mL m + L yk 0 m + L 2 u k sector bounded
15 15 u f y Representing function classes express as quadratic constraints on (y, u) u k L m y k f is sector-bounded + slope-restricted: constraint on (y k, u k ) depends on history (y 0,..., y k 1, u 0,..., u k 1 ). sector bounded + slope restricted
16 16 u f y Ψ ζ z Introduce extra dynamics Design dynamics Ψ and multiplier matrix M. Instead of using q(u k, y k ), use z T k Mz k. Systematic way of doing this for strong convexity (Zames & Falb, 1968) General theory: Integral Quadratic Constraints (Megretski & Rantzer, 1997)
17 17 u G f y [ 1 ] α 1 0 [ 1 + β β α [ 1 + β β α β β 0 ] ] Gradient Heavy ball Nesterov [ A B C 0 ] f(x) f(x) f(x) x x x f is quadratic f is strongly convex f is quasiconvex } {{ } (Ψ, M)
18 Main result 18 Problem data: G (the algorithm) Ψ (what we know about f) Auxiliary quantities: Compute (Â, ˆB, Ĉ, ˆD) matrices from (G, Ψ) Choose a candidate rate 0 < ρ < 1. Size of LMI does not grow with problem dimension! e.g. P S 3 3, LMI S 4 4 If there exists P 0 such that [ÂT P Â ρ2 P Â T P ˆB ] ˆB T P Â ˆBT P ˆB + [ Ĉ ˆD ] T [Ĉ M ˆD] 0 then x k x cond(p ) ρ k x 0 x for all k.
19 main results: analytic and numerical 19
20 Gradient method 20 x k+1 = x k α f(x k ) Convergence rate ρ Convergence rate for Gradient method Gradient (all functions) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Iterations to convergence for Gradient method / Condition ratio L/m analytic solution! Same rate for: quadratics, strongly convex, or quasiconvex functions.
21 Nesterov s method 21 x k+1 = x k α f(x k + β(x k x k 1 )) + β(x k x k 1 ) 1 Nesterov rate bounds 10 3 Nesterov iterations Convergence rate ρ IQC (quasiconvex) 0.4 IQC (strongly convex) Nesterov (strongly convex) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Condition ratio L/m Cannot certify stability for quasiconvex functions IQC bound improves upon best known bound!
22 Heavy ball method 22 x k+1 = x k α f(x k ) + β(x k x k 1 ) 1 Heavy ball rate bounds 10 3 Heavy ball iterations Convergence rate ρ IQC (quasiconvex) IQC (strongly convex) 0.2 Nesterov (quadratic) Heavy ball (quadratic) Condition ratio L/m Iterations to convergence Condition ratio L/m Cannot certify stability for quasiconvex functions Cannot certify stability for strongly convex functions
23 The heavy ball method is not stable! 25 2 x2 x < 1 1 counterexample: f(x) = 2 x2 + 24x 12 1 x < x2 24x + 36 x 2 and start the heavy ball iteration at x 0 = x 1 [3.07, 3.46] f(x) L/m = 25 heavy ball iterations converge to a limit cycle simple counterexample to the Aizerman (1949) and Kalman (1957) conjectures 23
24 uncharted territory: noise robustness and algorithm design 24
25 Noise robustness 25 δ u G ξ y The δ block is uncertain multiplicative noise: u k w k δ w k w f How does an algorithm perform in the presence of noise?
26 26 Gradient method, α = 2 L+m (optimal stepsize with no noise) Convergence rate ρ Rates for different δ 0.2 δ {.01,.02,.05,.1,.2,.5} Gradient method Iterations to convergence Iterations for different δ δ {.01,.02,.05,.1,.2,.5} Gradient method Gradient method, α = 1 L 1 (more conservative stepsize) 10 3 Convergence rate ρ Rates for different δ 0.2 δ {.01,.02,.05,.1,.2,.5} Gradient method Iterations to convergence Iterations for different δ δ {.01,.02,.05,.1,.2,.5} Gradient method
27 27 Nesterov s method (strongly convex f, with noise) Convergence rate ρ Rates for different δ 0.2 δ {.05,.1,.2,.3,.4,.5} Nesterov (quadratic) Iterations to convergence Iterations for different δ δ {.05,.1,.2,.3,.4,.5} Nesterov (quadratic) Condition ratio L/m Condition ratio L/m Nesterov s method is not robust to noise. can we have it all? (robustness AND performance)
28 28 Design approach parameterize all proper G of degree 2 parameterization in terms of (α, β, η): x k+1 = x k α f ( ) y k + β(xk x k 1 ) y k = x k + η(x k x k 1 ) Special cases: (α, 0, 0) (α, β, η) = (α, β, 0) (α, β, β) Gradient Heavy ball Nesterov
29 29 Robust Momentum Method (ACC 18) x k+1 = x k α f ( y k ) + β(xk x k 1 ) y k = x k + η(x k x k 1 ) Parameters designed via root locus technique: α = κ(1 ρ)2 (1+ρ) L, β = κρ3 κ 1, η = ρ 3 (κ 1)(1 ρ) 2 (1+ρ) tuning parameter: 1 1 κ }{{} fast + fragile ρ 1 1 κ }{{} slow + robust When ρ = 1 1 κ, recover Gradient with α = 1 L When ρ = 1 1 κ, recover Triple Momentum Method with optimal tuning (Van Scoy et al, 2018)
30 30 Trade-off: performance vs robustness 1 Convergence rate (ρ) 1 1 κ κ 1 κ κ Noise strength (δ) Gradient (α = 2/(L + m)) Gradient (α = 1/L) Gradient (min over α) Nesterov method RMM (ρ = 1 1/ κ) RMM (intermediate 1) RMM (intermediate 2) RMM (min over ρ)
31 Related works 31 In this talk Unified analysis framework (SIOPT 16) Robust momentum method (ACC 18) Other families of algorithms Operator-splitting methods, e.g. ADMM (ICML 15) Weakly convex functions (1/k and 1/k 2 rates) (ICML 17) Distributed optimization algorithms (ALLER 17) Stochastic variance reduction, e.g. SVRG (ICML 18)
32 32 Thank you! Manuscripts + code available:
Dissipativity Theory for Nesterov s Accelerated Method
Bin Hu Laurent Lessard Abstract In this paper, we adapt the control theoretic concept of dissipativity theory to provide a natural understanding of Nesterov s accelerated method. Our theory ties rigorous
More informationGradient Methods Using Momentum and Memory
Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set
More informationarxiv: v1 [math.oc] 24 Sep 2018
Unified Necessary and Sufficient Conditions for the Robust Stability of Interconnected Sector-Bounded Systems Saman Cyrus Laurent Lessard arxiv:8090874v mathoc 4 Sep 08 Abstract Classical conditions for
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationIFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent
IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationAnalytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition
Analytical Convergence Regions of Accelerated First-Order Methods in Nonconvex Optimization under Regularity Condition Huaqing Xiong Yuejie Chi Bin Hu Wei Zhang October 2018 arxiv:1810.03229v1 [math.oc
More informationAnalysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs
Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs Bin Hu Peter Seiler Laurent Lessard November, 017 Abstract We present convergence rate analysis
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationOptimized first-order minimization methods
Optimized first-order minimization methods Donghwan Kim & Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/~fessler UM AIM Seminar 2014-10-03 1 Disclosure
More informationInterpolation-Based Trust-Region Methods for DFO
Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv
More informationJournal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19
Journal Club A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) CMAP, Ecole Polytechnique March 8th, 2018 1/19 Plan 1 Motivations 2 Existing Acceleration Methods 3 Universal
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationAdaptive Restarting for First Order Optimization Methods
Adaptive Restarting for First Order Optimization Methods Nesterov method for smooth convex optimization adpative restarting schemes step-size insensitivity extension to non-smooth optimization continuation
More informationA Conservation Law Method in Optimization
A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu
More informationarxiv: v1 [math.oc] 3 Nov 2017
Analysis of Approximate Stochastic Gradient Using Quadratic Constraints and Sequential Semidefinite Programs arxiv:1711.00987v1 [math.oc] 3 Nov 2017 Bin Hu Peter Seiler Laurent Lessard November 6, 2017
More informationExponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints
Exponential Decay Rate Conditions for Uncertain Linear Systems Using Integral Quadratic Constraints 1 Bin Hu and Peter Seiler Abstract This paper develops linear matrix inequality (LMI) conditions to test
More informationBlock stochastic gradient update method
Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationAgenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples
Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationTutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer
Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,
More informationDistributed Consensus Optimization
Distributed Consensus Optimization Ming Yan Michigan State University, CMSE/Mathematics September 14, 2018 Decentralized-1 Backgroundwhy andwe motivation need decentralized optimization? I Decentralized
More information4. Convex optimization problems
Convex Optimization Boyd & Vandenberghe 4. Convex optimization problems optimization problem in standard form convex optimization problems quasiconvex optimization linear optimization quadratic optimization
More informationECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review
ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationIncremental Gradient, Subgradient, and Proximal Methods for Convex Optimization
Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization Dimitri P. Bertsekas Laboratory for Information and Decision Systems Massachusetts Institute of Technology February 2014
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationEstimating the region of attraction of uncertain systems with Integral Quadratic Constraints
Estimating the region of attraction of uncertain systems with Integral Quadratic Constraints Andrea Iannelli, Peter Seiler and Andrés Marcos Abstract A general framework for Region of Attraction (ROA)
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationConvex optimization problems. Optimization problem in standard form
Convex optimization problems optimization problem in standard form convex optimization problems linear optimization quadratic optimization geometric programming quasiconvex optimization generalized inequality
More informationLINEAR QUADRATIC OPTIMAL CONTROL BASED ON DYNAMIC COMPENSATION. Received October 2010; revised March 2011
International Journal of Innovative Computing, Information and Control ICIC International c 22 ISSN 349-498 Volume 8, Number 5(B), May 22 pp. 3743 3754 LINEAR QUADRATIC OPTIMAL CONTROL BASED ON DYNAMIC
More informationOn the Influence of Momentum Acceleration on Online Learning
Journal of Machine Learning Research 17 016) 1-66 Submitted 3/16; Revised 8/16; Published 10/16 On the Influence of Momentum Acceleration on Online Learning Kun Yuan Bicheng Ying Ali H. Sayed Department
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationDecentralized LQG Control of Systems with a Broadcast Architecture
Decentralized LQG Control of Systems with a Broadcast Architecture Laurent Lessard 1,2 IEEE Conference on Decision and Control, pp. 6241 6246, 212 Abstract In this paper, we consider dynamical subsystems
More informationNon-convex optimization. Issam Laradji
Non-convex optimization Issam Laradji Strongly Convex Objective function f(x) x Strongly Convex Objective function Assumptions Gradient Lipschitz continuous f(x) Strongly convex x Strongly Convex Objective
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationGradient Descent. Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh. Convex Optimization /36-725
Gradient Descent Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 10-725/36-725 Based on slides from Vandenberghe, Tibshirani Gradient Descent Consider unconstrained, smooth convex
More informationALADIN An Algorithm for Distributed Non-Convex Optimization and Control
ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of
More informationStability Multipliers for MIMO Monotone Nonlinearities
Stability Multipliers for MIMO Monotone Nonlinearities Ricardo Mancera and Michael G Safonov Abstract For block structured monotone or incrementally positive n-dimensional nonlinearities, the largest class
More informationNonlinear Model Predictive Control Tools (NMPC Tools)
Nonlinear Model Predictive Control Tools (NMPC Tools) Rishi Amrit, James B. Rawlings April 5, 2008 1 Formulation We consider a control system composed of three parts([2]). Estimator Target calculator Regulator
More informationNetwork Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationGLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS
GLOBAL ANALYSIS OF PIECEWISE LINEAR SYSTEMS USING IMPACT MAPS AND QUADRATIC SURFACE LYAPUNOV FUNCTIONS Jorge M. Gonçalves, Alexandre Megretski y, Munther A. Dahleh y California Institute of Technology
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationA New Look at the Performance Analysis of First-Order Methods
A New Look at the Performance Analysis of First-Order Methods Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Yoel Drori, Google s R&D Center, Tel Aviv Optimization without
More informationStochastic dynamical modeling:
Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou
More informationWorst-Case Complexity Guarantees and Nonconvex Smooth Optimization
Worst-Case Complexity Guarantees and Nonconvex Smooth Optimization Frank E. Curtis, Lehigh University Beyond Convexity Workshop, Oaxaca, Mexico 26 October 2017 Worst-Case Complexity Guarantees and Nonconvex
More informationMin-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain Systems
Proceedings of the 27 American Control Conference Marriott Marquis Hotel at Times Square New York City, USA, July -3, 27 FrC.4 Min-Max Output Integral Sliding Mode Control for Multiplant Linear Uncertain
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationOptimal control and estimation
Automatic Control 2 Optimal control and estimation Prof. Alberto Bemporad University of Trento Academic year 2010-2011 Prof. Alberto Bemporad (University of Trento) Automatic Control 2 Academic year 2010-2011
More informationH 2 Adaptive Control. Tansel Yucelen, Anthony J. Calise, and Rajeev Chandramohan. WeA03.4
1 American Control Conference Marriott Waterfront, Baltimore, MD, USA June 3-July, 1 WeA3. H Adaptive Control Tansel Yucelen, Anthony J. Calise, and Rajeev Chandramohan Abstract Model reference adaptive
More informationOutput Feedback Control of Linear Systems with Input and Output Quantization
Output Feedback Control of Linear Systems with Input and Output Quantization Daniel Coutinho Minyue Fu Carlos E. de Souza Abstract A considerable amount of research has been done on the use of logarithmic
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationContents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationNon-negative Matrix Factorization via accelerated Projected Gradient Descent
Non-negative Matrix Factorization via accelerated Projected Gradient Descent Andersen Ang Mathématique et recherche opérationnelle UMONS, Belgium Email: manshun.ang@umons.ac.be Homepage: angms.science
More informationarxiv: v2 [math.oc] 28 Nov 2017
Adaptive Restart of the Optimized Gradient Method for Convex Optimization Donghwan Kim Jeffrey A. Fessler arxiv:703.0464v [math.oc] 8 Nov 07 Date of current version: November 9, 07 Abstract First-order
More informationLecture: Convex Optimization Problems
1/36 Lecture: Convex Optimization Problems http://bicmr.pku.edu.cn/~wenzw/opt-2015-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/36 optimization
More informationIQC analysis of linear constrained MPC
QC analysis of linear constrained MPC W. P. Heath, G. Li, A. G. Wills and B. Lennox A version of this technical report was presented at the EEE sponsored Colloquium on Predictive Control, Sheffield, April
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationarxiv: v3 [math.oc] 8 Jan 2019
Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate
More informationFINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES. Danlei Chu, Tongwen Chen, Horacio J. Marquez
FINITE HORIZON ROBUST MODEL PREDICTIVE CONTROL USING LINEAR MATRIX INEQUALITIES Danlei Chu Tongwen Chen Horacio J Marquez Department of Electrical and Computer Engineering University of Alberta Edmonton
More informationStochastic Gradient Descent with Variance Reduction
Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationBeyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory
Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics
More informationMini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
Noname manuscript No. (will be inserted by the editor) Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization Saeed Ghadimi Guanghui Lan Hongchao Zhang the date of
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationModel reduction for linear systems by balancing
Model reduction for linear systems by balancing Bart Besselink Jan C. Willems Center for Systems and Control Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen, Groningen,
More informationComplexity analysis of second-order algorithms based on line search for smooth nonconvex optimization
Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization Clément Royer - University of Wisconsin-Madison Joint work with Stephen J. Wright MOPTA, Bethlehem,
More informationSelected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 2018
Selected Methods for Modern Optimization in Data Analysis Department of Statistics and Operations Research UNC-Chapel Hill Fall 08 Instructor: Quoc Tran-Dinh Scriber: Quoc Tran-Dinh Lecture 4: Selected
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationA Brief Overview of Practical Optimization Algorithms in the Context of Relaxation
A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:
More informationLecture 5: September 15
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 15 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Di Jin, Mengdi Wang, Bin Deng Note: LaTeX template courtesy of UC Berkeley EECS
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationCONSTRAINED MODEL PREDICTIVE CONTROL ON CONVEX POLYHEDRON STOCHASTIC LINEAR PARAMETER VARYING SYSTEMS. Received October 2012; revised February 2013
International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 10, October 2013 pp 4193 4204 CONSTRAINED MODEL PREDICTIVE CONTROL ON CONVEX
More informationOptimisation non convexe avec garanties de complexité via Newton+gradient conjugué
Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué Clément Royer (Université du Wisconsin-Madison, États-Unis) Toulouse, 8 janvier 2019 Nonconvex optimization via Newton-CG
More informationGeneralized Power Method for Sparse Principal Component Analysis
Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work
More informationOptimization Methods for Machine Learning
Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationarxiv: v4 [math.oc] 1 Apr 2018
Generalizing the optimized gradient method for smooth convex minimization Donghwan Kim Jeffrey A. Fessler arxiv:607.06764v4 [math.oc] Apr 08 Date of current version: April 3, 08 Abstract This paper generalizes
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationNumerical Approximation of L 1 Monge-Kantorovich Problems
Rome March 30, 2007 p. 1 Numerical Approximation of L 1 Monge-Kantorovich Problems Leonid Prigozhin Ben-Gurion University, Israel joint work with J.W. Barrett Imperial College, UK Rome March 30, 2007 p.
More informationMultilevel Optimization Methods: Convergence and Problem Structure
Multilevel Optimization Methods: Convergence and Problem Structure Chin Pang Ho, Panos Parpas Department of Computing, Imperial College London, United Kingdom October 8, 206 Abstract Building upon multigrid
More informationFrom Convex Optimization to Linear Matrix Inequalities
Dep. of Information Engineering University of Pisa (Italy) From Convex Optimization to Linear Matrix Inequalities eng. Sergio Grammatico grammatico.sergio@gmail.com Class of Identification of Uncertain
More information