A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

Similar documents
Graph based Subspace Segmentation. Canyi Lu National University of Singapore Nov. 21, 2013

Lasso: Algorithms and Extensions

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Convex Optimization Algorithms for Machine Learning in 10 Slides

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization

Low-Rank Subspace Clustering

Optimization methods

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

Learning with stochastic proximal gradient

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Fast proximal gradient methods

A Unified Approach to Proximal Algorithms using Bregman Distance

Supplementary Materials: Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Big Data Analytics: Optimization and Randomization

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Tensor-Tensor Product Toolbox

Stochastic Optimization: First order method

Robust Component Analysis via HQ Minimization

Accelerated Proximal Gradient Methods for Convex Optimization

Large-scale Stochastic Optimization

An Alternating Proximal Splitting Method with Global Convergence for Nonconvex Structured Sparsity Optimization

Exact Low Tubal Rank Tensor Recovery from Gaussian Measurements

Selected Topics in Optimization. Some slides borrowed from

Supplementary Material of A Novel Sparsity Measure for Tensor Recovery

A Conservation Law Method in Optimization

An ADMM algorithm for optimal sensor and actuator selection

Sparse Covariance Selection using Semidefinite Programming

Accelerated primal-dual methods for linearly constrained convex problems

Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning

Generalized Singular Value Thresholding

Recent Advances in Structured Sparse Models

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

Optimization for Machine Learning

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations

Block stochastic gradient update method

A direct formulation for sparse PCA using semidefinite programming

Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

arxiv: v2 [cs.na] 10 Feb 2018

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Stochastic Composition Optimization

PART II: Basic Theory of Half-quadratic Minimization

Optimization methods

Accelerating SVRG via second-order information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Coordinate Descent and Ascent Methods

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Contents. Preface. 1 Introduction Optimization view on mathematical models NLP models, black-box versus explicit expression 3

Parallel Successive Convex Approximation for Nonsmooth Nonconvex Optimization

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

Optimization for Learning and Big Data

Sparse Optimization Lecture: Dual Methods, Part I

Topographic Dictionary Learning with Structured Sparsity

SPARSE SIGNAL RESTORATION. 1. Introduction

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Sparse and Regularized Optimization

A direct formulation for sparse PCA using semidefinite programming

arxiv: v3 [math.oc] 20 Mar 2013

Proximal Methods for Optimization with Spasity-inducing Norms

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Lecture 2 February 25th

An ADMM algorithm for optimal sensor and actuator selection

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Distributed Consensus Optimization

Optimized first-order minimization methods

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

CSC 576: Variants of Sparse Learning

ECS289: Scalable Machine Learning

Lecture 8: February 9

The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices

Stochastic Gradient Descent with Variance Reduction

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

CPSC 540: Machine Learning

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

6. Proximal gradient method

BENEFITING from the success of Compressive Sensing

A DELAYED PROXIMAL GRADIENT METHOD WITH LINEAR CONVERGENCE RATE. Hamid Reza Feyzmahdavian, Arda Aytekin, and Mikael Johansson

Frist order optimization methods for sparse inverse covariance selection

SVRG++ with Non-uniform Sampling

STA141C: Big Data & High Performance Statistical Computing

Matrix-Tensor and Deep Learning in High Dimensional Data Analysis

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

arxiv: v1 [math.oc] 1 Jul 2016

Conditional Gradient (Frank-Wolfe) Method

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

Finite-sum Composition Optimization via Variance Reduced Gradient Descent

PRINCIPAL Component Analysis (PCA) is a fundamental approach

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

ONLINE VARIANCE-REDUCING OPTIMIZATION

1 Sparsity and l 1 relaxation

Journal Club. A Universal Catalyst for First-Order Optimization (H. Lin, J. Mairal and Z. Harchaoui) March 8th, CMAP, Ecole Polytechnique 1/19

Proximal methods. S. Villa. October 7, 2014

Newton-Like Methods for Sparse Inverse Covariance Estimation

Stochastic gradient descent and robustness to ill-conditioning

Transcription:

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018

Too Many Opt. Problems!

Too Many Opt. Algorithms! Zero-th order algorithms: Interpolation Methods Pattern Search Methods Using f(x k ) only First order algorithms: Using f(x k ) & f(x k ) only Coordinate Descent (Stochastic) Gradient/Subgradient Descent Conjugate Gradient Quasi-Newton/Hessian-Free Methods (Augmented) Lagrangian Method of Multipliers

Too Many Opt. Algorithms! Second order algorithms: Newton s Method Sequential Quadratic Programming Interior Point Methods Using f(x k ), f(x k ) & H f (x k ) only

Questions Can we master optimization algorithms more systematically? Is there a methodology to tweak existing algorithms (without proofs)?

Model Optimization Problem & Algorithm relaxing the objective function relaxing the constraint relaxing the update process

Relaxing the Objective Function Majorization Minimization globally majorant globally maj. locally maj.

Relaxing the Objective Function Majorization Minimization How to choose the majorant function? projected gradient descent gradient descent

Relaxing the Objective Function Majorization Minimization How to choose the stepsize α? locally majorant backtracking globally majorant Armijo s rule

C. Xu, Z. Lin, and H. Zha, Relaxed Majorization-Minimization for Non-smooth and Non-convex Optimization, AAAI 2016. Zhouchen Lin, Chen Xu,and Hongbin Zha, Robust Matrix Factorization by Majorization-Minimization, IEEE TPAMI, 2018. Relaxing the Objective Function Majorization Minimization How to choose the majorant function? asymptotic smoothness globally majorant locally majorant Relaxed Majorization Minimization

Relaxing the Objective Function Majorization Minimization How to choose the majorant function? convex concave procedure (CCCP) low-rankness regularizer A. L. Yuille, A. Rangarajan, The Concave-Convex Procedure. Neural Computation 15(4): 915-936 (2003). C. Lu, J. Tang, S. Yan, Z. Lin, Generalized nonconvex nonsmooth low-rank minimization, CVPR 2014. C. Lu, J. Tang, S. Yan, Z. Lin, Nonconvex nonsmooth low-rank minimization via iteratively reweighted nuclear norm, IEEE TIP 2016. Canyi Lu, Yunchao Wei, Zhouchen Lin, and Shuicheng Yan, Proximal Iteratively Reweighted Algorithm with Multiple Splitting for Nonconvex Sparsity Optimization, AAAI 2014.

Relaxing the Objective Function Majorization Minimization How to choose the majorant function? Chen Xu, Zhouchen Lin, and Hongbin Zha, A Unified Convex Surrogate for the Schatten-p Norm, AAAI 2017. Fanhua Shang, Yuanyuan Liu, James Cheng, Zhi-Quan Luo, and Zhouchen Lin, Bilinear Factor Matrix Norm Minimization for Robust PCA: Algorithms and Applications, IEEE TPAMI, 2018.

Relaxing the Objective Function Majorization Minimization How to choose the majorant function? l p -norm, sparsity regularizer Schatten-p norm, low-rankness regularizer Iteratively Reweighted Least Squares E. Candès, M.B. Wakin, S.P. Boyd, Enhancing sparsity by reweighted l 1 minimization, Journal of Fourier Analysis and Applications 14 (5 6) (2008) 877 905. C. Lu, Z. Lin, S. Yan, Smoothed low rank and sparse matrix recovery by iteratively reweighted least squared minimization, IEEE TIP, 2015.

Relaxing the Objective Function Majorization Minimization How to choose the majorant function? Other choices: Julien Mairal. Incremental majorization-minimization optimization with application to large-scale machine learning. SIAM Journal on Optimization, 25(2):829 855, 2015.

Relaxing the Constraints What if no relaxation? Only works for simple constraints

Relaxing the Constraints Method of Lagrange Multipliers may not have a solution not easy to choose

Relaxing the Constraints Method of Lagrange Multipliers majorant Zhouchen Lin, Risheng Liu, and Huan Li, Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, Machine Learning, 2015

Relaxing the Constraints Method of Lagrange Multipliers can be further relaxed if f is still complex proximal operator relatively easy to solve projection operator Zhouchen Lin, Risheng Liu, and Huan Li, Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, Machine Learning, 2015

Relaxing the Constraints Method of Lagrange Multipliers More investigations: Zhouchen Lin, Risheng Liu, and Huan Li, Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning, Machine Learning, 2015. Canyi Lu, Jiashi Feng, Shuicheng Yan, and Zhouchen Lin, A Unified Alternating Direction Method of Multipliers by Majorization Minimization, IEEE Trans. Pattern Analysis and Machine Intelligence, 2018.

Relaxing the Update Process Relaxing the location to compute gradient Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k 2 ), Soviet Mathematics Doklady 27 (2) (1983) 372 376. A. Beck,M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences 2 (1) (2009) 183 202.

Relaxing the Update Process Relaxing the location to compute gradient monitor corrector Canyi Lu, Huan Li, Zhouchen Lin, and Shuicheng Yan, Fast Proximal Linearized Alternating Direction Method of Multiplier with Parallel Splitting, pp. 739-745, AAAI 2016. Huan Li and Zhouchen Lin, Accelerated Proximal Gradient Methods for Nonconvex Programming, NIPS 2015.

Relaxing the Update Process Relaxing the evaluation of gradient Stochastic gradient descent (SGD) snapshot vector fixed stepsize R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction. NIPS 2013.

Relaxing the Update Process Relaxing the update of variable Asynchronous update R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction. NIPS 2013. Cong Fang and Zhouchen Lin, Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization, AAAI 2017.

Relaxing the Update Process Relaxing the update of variable Asynchronous update Shared Memory Locked/ Unlocked Local Memory Atom/ Wild Local Memory Cong Fang and Zhouchen Lin, Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization, AAAI 2017.

Conclusions Relaxation is good and even necessary for optimization The same for life!

zlin@pku.edu.cn Thanks! http://www.cis.pku.edu.cn/faculty/vision/zlin/zlin.htm Recruitment: PostDocs (540K RMB/year) and Faculties in machine learning related areas Please Google me and visit my webpage!