Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives
|
|
- Polly Walters
- 5 years ago
- Views:
Transcription
1 Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Song Zhou (Cornell) and Swati Gupta (Georgia Tech) Rutgers, 11/9/ / 33
2 My old 2013 Macbook Pro: 16GB Ram 2 / 33
3 Gonna buy a new model with more RAM... 3 / 33
4 Gonna buy a new model with more RAM... nope! RAM in 13in Macbook Pro 16 GB. 3 / 33
5 Ok, so RAM isn t smaller. Is it cheaper? 4 / 33
6 Ok, so RAM isn t smaller. Is it cheaper? Nope! 4 / 33
7 RIP Moore s Law for RAM (circa 2013) 10 8 Cost of RAM $/MB Year 5 / 33
8 Why low memory convex optimization? low memory Moore s law is running out low memory algorithms are often fast convex optimization robust convergence elegant analysis 6 / 33
9 Memory plateau and conditional gradient method (Frank & Wolfe 1956) An algorithm for quadratic programming (Levitin & Poljak 1966) Conditional gradient method (Clarkson 2010) Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm (Jaggi 2013) Revisiting Frank-Wolfe: projection-free sparse convex optimization 10 8 Cost of RAM $/MB Year 7 / 33
10 Example: smooth minimization over l 1 ball for g : R n R smooth, α R, find iterative method to solve minimize subject to g(w) w 1 α what kinds of subproblems are easy? projection is complicated (Duchi et al. 2008) linear optimization is easy: αe i = argmin subject to x w w 1 α where i = indmax(w) 8 / 33
11 Conditional gradient method (Frank-Wolfe) minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) ) g(w (0) ) v 4 v 3 v 4 v 3 9 / 33
12 Conditional gradient method (Frank-Wolfe) minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) ) g(w (0) ) v 4 v 3 v 4 v 3 9 / 33
13 Conditional gradient method (Frank-Wolfe) minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) ) g(w (0) ) v 4 v 3 v 4 v 3 9 / 33
14 Conditional gradient method (Frank-Wolfe) minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) ) g(w (0) ) v 4 v 3 v 4 v 3 9 / 33
15 Conditional gradient method (Frank-Wolfe) minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) ) g(w (0) ) v 4 v 3 v 4 v 3 9 / 33
16 Conditional gradient method (Frank-Wolfe) minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) ) g(w (0) ) v 4 v 3 v 4 v 3 9 / 33
17 What s wrong with CGM? slow complexity of iterate grows with number of iterations 10 / 33
18 Outline Limited Memory Kelley s Method 11 / 33
19 Submodularity primer ground set V = {1,..., n} identify subsets of V with Boolean vectors {0, 1} n F : {0, 1} n R is submodular if F (A v) F (A) F (B v) F (B), A B, v V linear functions are (sub)modular: for w R n, define w(a) = i A w i 12 / 33
20 Submodular function: example 13 / 33
21 Submodular polyhedra for submodular function F, define submodular polyhedron P(F ) = {w R n : w(a) F (A), A V } base polytope B(F ) = {w P(F ) : w(v ) = F (V )} both (generically) have exponentially many facets! 14 / 33
22 Lovász extension define the (piecewise-linear) Lovász extension as f (x) = max w x = sup{w(x) : w(a) F (A) A V } w B(F ) the Lovász extension is the convex envelope of F examples: F (A) f (x) f ( x ) A 1 x x 1 min( A, 1) max(x) x j i=1 min( A S j, 1) j i=1 max(x S j ) j i=1 x S j 15 / 33
23 Linear optimization on B(F ) is easy define the (piecewise-linear) Lovász extension as f (x) = f and 1 B(F ) are Fenchel duals: max x w w B(F ) f (x) = 1 B(F ) (x) linear optimization over B(F ) is O(n log n) (Edmonds 1970) define permutation π so x π1 x πn. then max x w = w B(F ) n x πk [F ({π 1, π 2,..., π k }) F ({π 1, π 2,..., π k 1 })] k=1 computing subgradients of f require O(n log n) too! f (x) = argmax x w w B(F ) 16 / 33
24 Primal problem minimize g(x) + f (x) (P) g : R n R strongly convex f : R n R Lovász extension of submodular F piecewise linear homogeneous (generically) exponentially many pieces subgradients are easy O(n log n) 17 / 33
25 Primal problem: example (Source: 18 / 33
26 Original Simplicial Method (OSM) (Bach 2013) Algorithm 1 OSM (to minimize g(x) + f (x)) initialize V. repeat 1. define ˆf (x) = max w V w x 2. solve subproblem x argmin g(x) + ˆf (x) 3. compute v f (x) = argmax w B(F ) x w 4. V V v problem: V keeps growing! No known rate of convergence (Bach 2013) 19 / 33
27 Limited Memory Kelley s Method (LM-KM) Algorithm 2 LM-KM (to minimize g(x) + f (x)) initialize V. repeat 1. define ˆf (x) = max w V w x 2. solve subproblem x argmin g(x) + ˆf (x) 3. compute v f (x) = argmax w B(F ) x w 4. V {w V : w x = f (x)} v does it converge or cycle? how large could V grow? 20 / 33
28 LM-KM: intuition g + f (1) g + f (1) g + f (2) g + f (2) x (1) z (0) z (1) x (2) g + f (4) g + f g + f (3) g + f (3) z (2) x (3) z (3) x (4) 21 / 33
29 L-KM converges linearly with bounded memory Theorem ((Zhou Gupta Udell 2018)) L-KM has bounded memory: V n + 1 L-KM converges when g is strong convex L-KM converges linearly when g is smooth and strongly convex (Corollary: OSM converges linearly, too.) 22 / 33
30 Dual problem minimize g ( w) subject to w B(F ) (D) g : R n R smooth (conjugate of strongly convex g) B(F ) base polytope of submodular F (generically) exponentially many facets linear optimization over B(F ) is easy O(n log n) 23 / 33
31 Conditional gradient methods for the dual linear optimization over constraint is easy so use a conditional gradient method! away-step FW, pairwise FW, fully corrective FW (FCFW) all converge linearly (Lacoste-Julien & Jaggi 2015) FCFW has limited memory (Garber & Hazan 2015) gives linear convergence with one gradient + one linear optimization per iteration 24 / 33
32 Dual to primal suppose g is α-strongly convex and β-smooth solve a dual subproblem inexactly to obtain ŷ B(F ) with g ( y ) g ( ŷ) ɛ g is 1/β-strongly convex, so ŷ y 2 2βɛ define ˆx = y ( g ( ŷ)) = argmin x g(x) + ŷ x since g is 1/α smooth, we have ˆx x 2 1/α 2 ŷ y 2 2βɛ/α 2 if the dual iterates converge linearly, so do the primal iterates 25 / 33
33 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
34 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
35 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
36 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
37 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
38 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
39 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
40 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
41 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
42 Fully corrective Frank-Wolfe minimize subject to g(w) w P v 1 w (0) v 5 w (1) v 2 g(w (1) (2) ) g(w (0) ) v 4 v 3 26 / 33
43 Limited-memory Fully Corrective Frank Wolfe L-FCFW Algorithm 3 FCFW (to minimize g ( y) over y B(F )) initialize V. repeat 1. solve subproblem minimize subject to g ( y) y Conv(V) define solution y = w V λ w w with λ w > 0 and w V λ w = 1 2. compute gradient x = ( g ( y)) 3. solve linear optimization v = argmax w B(F ) x w 4. V {w V : λ w > 0} v 27 / 33
44 Fully corrective Frank Wolfe FCFW: properties bounded memory: Carathéodory = can choose λ w so {w V : λ w > 0} n + 1 finite convergence active set changes at each iteration a vertex that exits the active set is never added again converges linearly for smooth strongly convex objectives useful if linear optimization over B(F ) is hard (so convex subproblem is comparatively cheap) ok to solve subproblem inexactly (Lacoste-Julien & Jaggi 2015) compare to vanilla CGM: memory cost similar: w R n vs n (very simple) vertices solving subproblems not much harder than evaluating g 28 / 33
45 Dual subproblems FCFW subproblem maximize subject to g ( y) y = w V λ w w 1 T λ = 1, λ 0 has dual minimize g(x) + max w V x w which is our primal subproblem! first order optimality conditions show active sets match λ w > 0 w x = max w V x w hence FCFW has a corresponding primal algorithm: LM-KM! 29 / 33
46 LM-KM: numerical experiment g(x) = x Ax + b x + n x 2 for x R n f is the Lovász extension of F (A) = A (2n A + 1) 2 entries of A M n sampled uniformly from [ 1, 1] entries of b R n sampled uniformly from [0, n] 30 / 33
47 LM-KM: numerical experiment dimension n = 10 in upper left, n = 100 in others 31 / 33
48 Conclusion LM-KM gives a new algorithm for composite convex and submodular optimization with bounded (O(n)) storage with two old ideas: Duality Carathéodory 32 / 33
49 References S. Zhou, S. Gupta, and M. Udell. Limited Memory Kelley s Method Converges for Composite Convex and Submodular Objectives. NIPS / 33
Frank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationLearning with Submodular Functions: A Convex Optimization Perspective
Foundations and Trends R in Machine Learning Vol. 6, No. 2-3 (2013) 145 373 c 2013 F. Bach DOI: 10.1561/2200000039 Learning with Submodular Functions: A Convex Optimization Perspective Francis Bach INRIA
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationSketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage
Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage Madeleine Udell Operations Research and Information Engineering Cornell University Based on joint work with Alp Yurtsever (EPFL),
More informationStochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationLecture 23: November 19
10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo
More information1 Maximizing a Submodular Function
6.883 Learning with Combinatorial Structure Notes for Lecture 16 Author: Arpit Agarwal 1 Maximizing a Submodular Function In the last lecture we looked at maximization of a monotone submodular function,
More informationOn the von Neumann and Frank-Wolfe Algorithms with Away Steps
On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili July 16, 015 Abstract The von Neumann algorithm is a simple coordinate-descent algorithm to determine
More informationComplexity bounds for primal-dual methods minimizing the model of objective function
Complexity bounds for primal-dual methods minimizing the model of objective function Yu. Nesterov July 4, 06 Abstract We provide Frank-Wolfe ( Conditional Gradients method with a convergence analysis allowing
More informationLecture 23: Conditional Gradient Method
10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationarxiv: v1 [math.oc] 10 Oct 2018
8 Frank-Wolfe Method is Automatically Adaptive to Error Bound ondition arxiv:80.04765v [math.o] 0 Oct 08 Yi Xu yi-xu@uiowa.edu Tianbao Yang tianbao-yang@uiowa.edu Department of omputer Science, The University
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationPairwise Away Steps for the Frank-Wolfe Algorithm
Pairwise Away Steps for the Frank-Wolfe Algorithm Héctor Allende Department of Informatics Universidad Federico Santa María, Chile hallende@inf.utfsm.cl Ricardo Ñanculef Department of Informatics Universidad
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationFenchel Duality between Strong Convexity and Lipschitz Continuous Gradient
Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Xingyu Zhou The Ohio State University zhou.2055@osu.edu December 5, 2017 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 1
More informationLecture 10 (Submodular function)
Discrete Methods in Informatics January 16, 2006 Lecture 10 (Submodular function) Lecturer: Satoru Iwata Scribe: Masaru Iwasa and Yukie Nagai Submodular functions are the functions that frequently appear
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationLecture 10: Linear programming duality and sensitivity 0-0
Lecture 10: Linear programming duality and sensitivity 0-0 The canonical primal dual pair 1 A R m n, b R m, and c R n maximize z = c T x (1) subject to Ax b, x 0 n and minimize w = b T y (2) subject to
More information2.098/6.255/ Optimization Methods Practice True/False Questions
2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationCSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming
CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming January 26, 2018 1 / 38 Liability/asset cash-flow matching problem Recall the formulation of the problem: max w c 1 + p 1 e 1 = 150
More informationSubmodularity in Machine Learning
Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity
More informationModern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization
Modern Stochastic Methods Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization 10-725 Last time: conditional gradient method For the problem min x f(x) subject to x C where
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationAccelerating Stochastic Optimization
Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz
More informationConditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint
Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and
More informationTaylor-like models in nonsmooth optimization
Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationConvex Optimization Conjugate, Subdifferential, Proximation
1 Lecture Notes, HCI, 3.11.211 Chapter 6 Convex Optimization Conjugate, Subdifferential, Proximation Bastian Goldlücke Computer Vision Group Technical University of Munich 2 Bastian Goldlücke Overview
More informationSubmodular Functions, Optimization, and Applications to Machine Learning
Submodular Functions, Optimization, and Applications to Machine Learning Spring Quarter, Lecture 12 http://www.ee.washington.edu/people/faculty/bilmes/classes/ee596b_spring_2016/ Prof. Jeff Bilmes University
More informationLinear Programming. Chapter Introduction
Chapter 3 Linear Programming Linear programs (LP) play an important role in the theory and practice of optimization problems. Many COPs can directly be formulated as LPs. Furthermore, LPs are invaluable
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationarxiv: v3 [math.oc] 25 Nov 2015
arxiv:1507.04073v3 [math.oc] 5 Nov 015 On the von Neumann and Frank-Wolfe Algorithms with Away Steps Javier Peña Daniel Rodríguez Negar Soheili October 14, 018 Abstract The von Neumann algorithm is a simple
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationCompressed Sensing: a Subgradient Descent Method for Missing Data Problems
Compressed Sensing: a Subgradient Descent Method for Missing Data Problems ANZIAM, Jan 30 Feb 3, 2011 Jonathan M. Borwein Jointly with D. Russell Luke, University of Goettingen FRSC FAAAS FBAS FAA Director,
More informationThe Frank-Wolfe Algorithm:
The Frank-Wolfe Algorithm: New Results, and Connections to Statistical Boosting Paul Grigas, Robert Freund, and Rahul Mazumder http://web.mit.edu/rfreund/www/talks.html Massachusetts Institute of Technology
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini, Mark Schmidt University of British Columbia Linear of Convergence of Gradient-Based Methods Fitting most machine learning
More informationPrimal and Dual Predicted Decrease Approximation Methods
Primal and Dual Predicted Decrease Approximation Methods Amir Beck Edouard Pauwels Shoham Sabach March 22, 2017 Abstract We introduce the notion of predicted decrease approximation (PDA) for constrained
More informationMath 273a: Optimization Convex Conjugacy
Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper
More informationLinear Convergence under the Polyak-Łojasiewicz Inequality
Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based
More informationStochastic Gradient Descent with Only One Projection
Stochastic Gradient Descent with Only One Projection Mehrdad Mahdavi, ianbao Yang, Rong Jin, Shenghuo Zhu, and Jinfeng Yi Dept. of Computer Science and Engineering, Michigan State University, MI, USA Machine
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More information3.7 Strong valid inequalities for structured ILP problems
3.7 Strong valid inequalities for structured ILP problems By studying the problem structure, we can derive strong valid inequalities yielding better approximations of conv(x ) and hence tighter bounds.
More informationInteger Programming Reformulations: Dantzig-Wolfe & Benders Decomposition the Coluna Software Platform
Integer Programming Reformulations: Dantzig-Wolfe & Benders Decomposition the Coluna Software Platform François Vanderbeck B. Detienne, F. Clautiaux, R. Griset, T. Leite, G. Marques, V. Nesello, A. Pessoa,
More informationLocal strong convexity and local Lipschitz continuity of the gradient of convex functions
Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate
More informationFast column generation for atomic norm regularization
Marina Vinyes Université Paris-Est, LIGM (UMR8049) Ecole des Ponts Marne-la-Vallée, France marina.vinyes@imagine.enpc.fr Guillaume Obozinski Université Paris-Est, LIGM (UMR8049) Ecole des Ponts Marne-la-Vallée,
More informationLecture 1: Background on Convex Analysis
Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes
More informationLecture 16: FTRL and Online Mirror Descent
Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which
More informationSCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD
EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationSimplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm
Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm Duy-Khuong Nguyen 13, Quoc Tran-Dinh 2, and Tu-Bao Ho 14 1 Japan Advanced Institute of Science and Technology, Japan
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationSubmodular Functions Properties Algorithms Machine Learning
Submodular Functions Properties Algorithms Machine Learning Rémi Gilleron Inria Lille - Nord Europe & LIFL & Univ Lille Jan. 12 revised Aug. 14 Rémi Gilleron (Mostrare) Submodular Functions Jan. 12 revised
More informationOptimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16: Linear programming. Optimization Problems
Optimization WS 13/14:, by Y. Goldstein/K. Reinert, 9. Dezember 2013, 16:38 2001 Linear programming Optimization Problems General optimization problem max{z(x) f j (x) 0,x D} or min{z(x) f j (x) 0,x D}
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationMulticommodity Flows and Column Generation
Lecture Notes Multicommodity Flows and Column Generation Marc Pfetsch Zuse Institute Berlin pfetsch@zib.de last change: 2/8/2006 Technische Universität Berlin Fakultät II, Institut für Mathematik WS 2006/07
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationIterative Rigid Multibody Dynamics A Comparison of Computational Methods
Iterative Rigid Multibody Dynamics A Comparison of Computational Methods Tobias Preclik, Klaus Iglberger, Ulrich Rüde University Erlangen-Nuremberg Chair for System Simulation (LSS) July 1st 2009 T. Preclik
More informationAn Introduction to Submodular Functions and Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002)
An Introduction to Submodular Functions and Optimization Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002) IMA, November 4, 2002 1. Submodular set functions Generalization to
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More informationParallel Coordinate Optimization
1 / 38 Parallel Coordinate Optimization Julie Nutini MLRG - Spring Term March 6 th, 2018 2 / 38 Contours of a function F : IR 2 IR. Goal: Find the minimizer of F. Coordinate Descent in 2D Contours of a
More informationc 2015 Society for Industrial and Applied Mathematics
SIAM J. OPTIM. Vol. 5, No. 1, pp. 115 19 c 015 Society for Industrial and Applied Mathematics DUALITY BETWEEN SUBGRADIENT AND CONDITIONAL GRADIENT METHODS FRANCIS BACH Abstract. Given a convex optimization
More informationExpanding the reach of optimal methods
Expanding the reach of optimal methods Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with C. Kempton (UW), M. Fazel (UW), A.S. Lewis (Cornell), and S. Roy (UW) BURKAPALOOZA! WCOM
More informationApproximating Submodular Functions. Nick Harvey University of British Columbia
Approximating Submodular Functions Nick Harvey University of British Columbia Approximating Submodular Functions Part 1 Nick Harvey University of British Columbia Department of Computer Science July 11th,
More informationCS675: Convex and Combinatorial Optimization Fall 2016 Combinatorial Problems as Linear and Convex Programs. Instructor: Shaddin Dughmi
CS675: Convex and Combinatorial Optimization Fall 2016 Combinatorial Problems as Linear and Convex Programs Instructor: Shaddin Dughmi Outline 1 Introduction 2 Shortest Path 3 Algorithms for Single-Source
More informationLinear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016
Linear Programming Larry Blume Cornell University, IHS Vienna and SFI Summer 2016 These notes derive basic results in finite-dimensional linear programming using tools of convex analysis. Most sources
More informationLagrangian Relaxation in MIP
Lagrangian Relaxation in MIP Bernard Gendron May 28, 2016 Master Class on Decomposition, CPAIOR2016, Banff, Canada CIRRELT and Département d informatique et de recherche opérationnelle, Université de Montréal,
More informationLP Duality: outline. Duality theory for Linear Programming. alternatives. optimization I Idea: polyhedra
LP Duality: outline I Motivation and definition of a dual LP I Weak duality I Separating hyperplane theorem and theorems of the alternatives I Strong duality and complementary slackness I Using duality
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationCS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs. Instructor: Shaddin Dughmi
CS675: Convex and Combinatorial Optimization Fall 2014 Combinatorial Problems as Linear Programs Instructor: Shaddin Dughmi Outline 1 Introduction 2 Shortest Path 3 Algorithms for Single-Source Shortest
More informationSeparation Techniques for Constrained Nonlinear 0 1 Programming
Separation Techniques for Constrained Nonlinear 0 1 Programming Christoph Buchheim Computer Science Department, University of Cologne and DEIS, University of Bologna MIP 2008, Columbia University, New
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationNumerical Methods for PDE-Constrained Optimization
Numerical Methods for PDE-Constrained Optimization Richard H. Byrd 1 Frank E. Curtis 2 Jorge Nocedal 2 1 University of Colorado at Boulder 2 Northwestern University Courant Institute of Mathematical Sciences,
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationCO 250 Final Exam Guide
Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationSolving large Semidefinite Programs - Part 1 and 2
Solving large Semidefinite Programs - Part 1 and 2 Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Singapore workshop 2006 p.1/34 Overview Limits of Interior
More informationConvergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity Benjamin Grimmer Abstract We generalize the classic convergence rate theory for subgradient methods to
More informationInteger Programming, Part 1
Integer Programming, Part 1 Rudi Pendavingh Technische Universiteit Eindhoven May 18, 2016 Rudi Pendavingh (TU/e) Integer Programming, Part 1 May 18, 2016 1 / 37 Linear Inequalities and Polyhedra Farkas
More information3. Linear Programming and Polyhedral Combinatorics
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans April 5, 2017 3. Linear Programming and Polyhedral Combinatorics Summary of what was seen in the introductory
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationAn Optimal Affine Invariant Smooth Minimization Algorithm.
An Optimal Affine Invariant Smooth Minimization Algorithm. Alexandre d Aspremont, CNRS & École Polytechnique. Joint work with Martin Jaggi. Support from ERC SIPA. A. d Aspremont IWSL, Moscow, June 2013,
More informationLectures 6, 7 and part of 8
Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,
More informationA Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization
A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization Dan Garber Technion - Israel Inst. of Tech. dangar@tx.technion.ac.il Elad Hazan Technion - Israel
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More information