Proximal Methods for Optimization with Spasity-inducing Norms
|
|
- Rosamund Watson
- 6 years ago
- Views:
Transcription
1 Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology Xiaowei Zhou (HKUST) Proximal Methods Group Learning 1 / 28
2 Outline 1 Background 2 Proximal Methods Overview Convergence Computation of Proximal Operator 3 ProxFlow algorithm for group-sparsity 4 Discussion Xiaowei Zhou (HKUST) Proximal Methods Group Learning 2 / 28
3 Problem Setting min w R p f (w) }{{} data fitting + λω(w) }{{}. (1) sparsity inducing f : R p R is convex differentiable Ω : R p R is convex but nonsmooth l 1 -norm : w 1 p w j j=1 l 1 /l q -norm : w l1 /l q g G η g w g q nuclear norm : w r j=1 σ j Xiaowei Zhou (HKUST) Proximal Methods Group Learning 3 / 28
4 Generic Methods Generic convex optimization methods: Subgradient descent w t+1 = w t α(s + λs ), where s f (w t ), s Ω(w t ) Reformulation as standard solvable convex programs (LP, QP, SOCP, SDP), e.g. Lasso can be formulated as: 1 min w +,w R p 2 y Xw + Xw 2 + λ(1 T w T w ) (2) + General-purpose toolboxes can be used (e.g. CVX). Blind to problem structure, tend to be slow and memory-consuming. Xiaowei Zhou (HKUST) Proximal Methods Group Learning 4 / 28
5 Outline 1 Background 2 Proximal Methods Overview Convergence Computation of Proximal Operator 3 ProxFlow algorithm for group-sparsity 4 Discussion Xiaowei Zhou (HKUST) Proximal Methods Group Learning 5 / 28
6 Proximal methods In proximal methods, (1) is optimized by iteratively solving the following problem: min f (w 0 ) + (w w 0 ) f (w 0 ) w R p }{{} linearization of f (w) +λω(w) + L 2 w w }{{}. (3) augmented term The first two terms linearly expand f at w 0. The last term keeps the solution in a neighborhood where the current linear approximation holds. L > 0 is an upper bound on the Lipschitz constant of f. Xiaowei Zhou (HKUST) Proximal Methods Group Learning 6 / 28
7 Proximal Operator Definition: Proximal Operator The proximal operator associated with the regularization λω is defined as the function that maps a vector u in R p onto the unique solution of: The operator is usually denoted as Prox λω (u) Example: 1 min v R p 2 u v λω(v), (4) When Ω is the l 1 -norm, the proximal operator is the well-known elementwise soft-thresholding operator: { 0 if u j λ j [1,, p], u j sign(u j )( u j λ) + = sign(u j )( u j λ) otherwise. Xiaowei Zhou (HKUST) Proximal Methods Group Learning 7 / 28
8 Algorithm Since (3) can be rewritten as: 1 min w R p 2 w (w 0 1 L f (w 0)) λ Ω(w), (5) L the proximal method can be written as: Algorithm 1: proximal method to solve (1) repeat w prox λ Ω(w 1 L f (w)) L until Convergence Remark. If λ = 0, it turns to be Gradient Descent. Questions: 1 How to compute the proximal operator? 2 How about the convergence of the algorithm? Xiaowei Zhou (HKUST) Proximal Methods Group Learning 8 / 28
9 Outline 1 Background 2 Proximal Methods Overview Convergence Computation of Proximal Operator 3 ProxFlow algorithm for group-sparsity 4 Discussion Xiaowei Zhou (HKUST) Proximal Methods Group Learning 9 / 28
10 Assumption We assume that f has Lipschitz continuous gradient: f (w) f (v) L w v. (6) Remark: It means the change of function gradient is upper-bounded. Lemma The above condition is equivalent to f (w) f (v)+ < f (v), w v > + L 2 w v 2. (7) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 10 / 28
11 Two lemma Following two are important to the proof of convergence: Lemma (Sandwich) If F(w; v) is the linear approximation of F(w) = f (w) + λω(w) in v, w.r.t. f (w), that is F(w; v) = f (v) + f (v), w v + λω(w), then: F(w) F(w; v) + L 2 w v 2 F(w) + L 2 w v 2. (8) Lemma (3-point property) If ŵ = argmin w 1 2 w w φ(w), then for any w: φ(ŵ) ŵ w 0 2 φ(w) w w w ŵ 2. (9) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 11 / 28
12 Proof of the convergence rate I F(w t ) is monotone non-increasing for t = 0,, T : F(w t+1 ) F(w t+1 ; w t ) + L 2 w t+1 w t 2 F(w t ; w t ) + L 2 w t w t 2 = F(w t ) Sandwich-left Definition Also, we have F (w t+1 ) F(w t+1 ; w t ) + L 2 w t+1 w t 2 F(w ; w t ) + L 2 w w t 2 L 2 w w t+1 2 F(w ) + L 2 w w t 2 L 2 w w t+1 2 Sandwich-left 3-point property Sandwich-right Xiaowei Zhou (HKUST) Proximal Methods Group Learning 12 / 28
13 Proof of the convergence rate II Define ɛ t = F(w t ) F(w ), so that ɛ t+1 L 2 w w t 2 L 2 w w t+1 2 T 1 T ɛ t ɛ t+1 L 2 w w 0 2 L 2 w w t 2 L 2 w w 0 2 t=0 ɛ t L w w 0 2 2T (10) At iteration T, Algorithm 1 yields a solution w T that satisfies: F(w T ) F(w ) L w w T Algorithm 1 has a convergence rate of O(1/T ). Xiaowei Zhou (HKUST) Proximal Methods Group Learning 13 / 28
14 Outline 1 Background 2 Proximal Methods Overview Convergence Computation of Proximal Operator 3 ProxFlow algorithm for group-sparsity 4 Discussion Xiaowei Zhou (HKUST) Proximal Methods Group Learning 14 / 28
15 Dual norm Definition: Dual Norm The dual norm Ω of the norm Ω is defined by: Examples: l 2 -norm is self-dual l 1 -norm is dual to l -norm Ω (z) = max w R p zt w, s.t. Ω(w) 1 (11) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 15 / 28
16 Dual proximal operator Dual proximal operator: In the case where Ω is a norm, the following problem is dual to the proximal problem in (4): max v R p 1 2 v u 2 + u 2, s.t. Ω (v) λ. (12) Let Proj Ω λ be the projector on the ball of radius λ associated with Ω, then Proj Ω λ(u) is the unique solution to the problem (12) and it has the following relation with Prox λω : Prox λω = I Proj Ω λ (13) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 16 / 28
17 Proof: conic duality Definition: Dual Cone Assume K is a convex cone, the dual cone K is defined as: Primal: Lagrangian: K = {z R p : z T x 0, x K } min x f (x), s.t. Ax + b K 0 Dual: L(x, λ) = f (x) + λ T (Ax + b), λ K max λ inf L(x, λ), s.t. x λ K Xiaowei Zhou (HKUST) Proximal Methods Group Learning 17 / 28
18 Typical examples When Ω is the l 1 -norm, the solution is: u u Proj. λ(u), which gives the elementwise soft-thresholding: j [1,, p], u j sign(u j )( u j λ) +. When Ω is the l 1 /l 2 -norm with nonoverlapping groups, the proximal problem is separable in every group, and the solution is a group-thresholding operator: { 0 if u g 2 λ g G, u g u g Proj. 2 λ(u g ) = u g 2 λ u g 2 u g otherwise, Xiaowei Zhou (HKUST) Proximal Methods Group Learning 18 / 28
19 Outline 1 Background 2 Proximal Methods Overview Convergence Computation of Proximal Operator 3 ProxFlow algorithm for group-sparsity 4 Discussion Xiaowei Zhou (HKUST) Proximal Methods Group Learning 19 / 28
20 Duality of the proximal problem with group-sparsity The following two problems are dual: P : 1 min v R p 2 u v λ η g v g (14) g G D : max ξ R p G 1 2 u g G ξ g u 2 2 (15) s.t. g G, ξ g λη g and ξ g j = 0 if j / g, and are two convex norms dual to each other. ξ = (ξ g ) g G where ξ g R p is the dual variable associated with v g. Xiaowei Zhou (HKUST) Proximal Methods Group Learning 20 / 28
21 Block Coordinate Descent BCD algorithm to solve the proximal problem in (14) Initialization: ξ = 0. repeat g G, ξ g Proj. λη g ( [ u h g ξh] g ). until Convergence v u g G ξg. Generally, the above algorithm is not guaranteed to converge in finite number of iterations. When is l 1 or l norm and the groups in G are in a hierarchical structure, only one iteration is needed to reach convergence. Jenatton, R. and Mairal, J. and Obozinski, G. and Bach, F. Proximal methods for hierarchical sparse coding Journal of Machine Learning Research 12 (2011) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 21 / 28
22 Quadratic Min-Cost Flow Formulation How to solve the group-sparsity problem efficiently in general cases? If l 1 /l -norm is used, the dual problem (15) becomes: 1 min ξ R p G 2 u ξ g 2 2 g G s.t. g G, ξ g 1 λη g and ξ g j = 0 if j / g, The above problem can be formulated on a graph as a Quadratic Min-Cost Flow problem. Mairal, J. and Jenatton, R. and Obozinski, G. and Bach, F. Convex and Network Flow Optimization for Structured Sparsity Journal of Machine Learning Research 12 (2011) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 22 / 28
23 Xiaowei Figure: Zhou (HKUST) Graph representation Proximal with Methods difference group structures Group Learning 23 / 28
24 Speed comparison Figure: Comparison: Proximal + Network Flow (ProxFlow), Quadratic Programming (QP), Conic Programming (CP) and Sub-gradient Descent (SG) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 24 / 28
25 Outline 1 Background 2 Proximal Methods Overview Convergence Computation of Proximal Operator 3 ProxFlow algorithm for group-sparsity 4 Discussion Xiaowei Zhou (HKUST) Proximal Methods Group Learning 25 / 28
26 Toolbox SPAMS SPArse Modeling Software Developer: CVML INRI. Addressing various machine learning and signal processing problems: Dictionary learning and matrix factorization (NMF, sparse PCA,...). Sparse decomposition problems with LARS, coordinate descent, OMP, SOMP, proximal methods. Structured sparse decomposition problems. Xiaowei Zhou (HKUST) Proximal Methods Group Learning 26 / 28
27 Convex formulation of DECOLOR? Replace the MRFs penalty in DECOLOR with a group sparsity-inducing norm: min L,S 1 2 M L S 2 F + µ L + λω(s), where Ω(S) := g G η g s g. Stable PCP with Group Sparsity by BCD Initialize:S 0 = 0. repeat compute L k+1 = D µ (M S k ); compute S k+1 = Prox λω (M L k+1 ); until convergence Output: L, S. Xiaowei Zhou (HKUST) Proximal Methods Group Learning 27 / 28
28 Key References Mairal, J. and Jenatton, R. and Obozinski, G. and Bach, F. Convex and Network Flow Optimization for Structured Sparsity Journal of Machine Learning Research 12 (2011) Bach, F. and Jenatton, R. and Mairal, J. and Obozinski, G. Convex optimization with sparsity-inducing norms Optimization for Machine Learning, MIT Press, To appear Luca Baldassarre Proximal Methods (Lecture Notes) Jenatton, R. and Mairal, J. and Obozinski, G. and Bach, F. Proximal methods for hierarchical sparse coding Journal of Machine Learning Research 12 (2011) Xiaowei Zhou (HKUST) Proximal Methods Group Learning 28 / 28
1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationRecent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationTopographic Dictionary Learning with Structured Sparsity
Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationStructured Sparse Estimation with Network Flow Optimization
Structured Sparse Estimation with Network Flow Optimization Julien Mairal University of California, Berkeley Neyman seminar, Berkeley Julien Mairal Neyman seminar, UC Berkeley /48 Purpose of the talk introduce
More informationOptimization for Machine Learning
Optimization for Machine Learning Editors: Suvrit Sra suvrit@gmail.com Max Planck Insitute for Biological Cybernetics 7076 Tübingen, Germany Sebastian Nowozin Microsoft Research Cambridge, CB3 0FB, United
More informationSparse Estimation and Dictionary Learning
Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69 What this talk is about?
More information1 Convex Optimization with Sparsity-Inducing Norms
1 Convex Optimization with Sparsity-Inducing Norms Francis Bach INRIA - Willow Project-Team 3, avenue d Italie, 7513 PARIS Rodolphe Jenatton INRIA - Willow Project-Team 3, avenue d Italie, 7513 PARIS Julien
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationParcimonie en apprentissage statistique
Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationSparse Optimization Lecture: Basic Sparse Optimization Models
Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationAn Efficient Proximal Gradient Method for General Structured Sparse Learning
Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationProximal Methods for Hierarchical Sparse Coding
Proximal Methods for Hierarchical Sparse Coding Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis Bach To cite this version: Rodolphe Jenatton, Julien Mairal, Guillaume Obozinski, Francis
More informationSupervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
Journal of Machine Learning Research 4 (03) 449-485 Submitted 4/; Revised 3/3; Published 8/3 Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows Julien Mairal Bin Yu Department
More informationStructured sparsity-inducing norms through submodular functions
Structured sparsity-inducing norms through submodular functions Francis Bach Sierra team, INRIA - Ecole Normale Supérieure - CNRS Thanks to R. Jenatton, J. Mairal, G. Obozinski June 2011 Outline Introduction:
More informationOn Optimal Frame Conditioners
On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationOptimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method
Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationLagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual
More informationIE 521 Convex Optimization
Lecture 14: and Applications 11th March 2019 Outline LP SOCP SDP LP SOCP SDP 1 / 21 Conic LP SOCP SDP Primal Conic Program: min c T x s.t. Ax K b (CP) : b T y s.t. A T y = c (CD) y K 0 Theorem. (Strong
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationLecture 24 November 27
EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationAccelerate Subgradient Methods
Accelerate Subgradient Methods Tianbao Yang Department of Computer Science The University of Iowa Contributors: students Yi Xu, Yan Yan and colleague Qihang Lin Yang (CS@Uiowa) Accelerate Subgradient Methods
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationLOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley
LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More information4y Springer NONLINEAR INTEGER PROGRAMMING
NONLINEAR INTEGER PROGRAMMING DUAN LI Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong Shatin, N. T. Hong Kong XIAOLING SUN Department of Mathematics Shanghai
More informationORIE 4741: Learning with Big Messy Data. Proximal Gradient Method
ORIE 4741: Learning with Big Messy Data Proximal Gradient Method Professor Udell Operations Research and Information Engineering Cornell November 13, 2017 1 / 31 Announcements Be a TA for CS/ORIE 1380:
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationoperator and Group LASSO penalties
Multidimensional shrinkage-thresholding operator and Group LASSO penalties Arnau Tibau Puig, Ami Wiesel, Gilles Fleury and Alfred O. Hero III Abstract The scalar shrinkage-thresholding operator is a key
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More information1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method
L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationApproximate Farkas Lemmas in Convex Optimization
Approximate Farkas Lemmas in Convex Optimization Imre McMaster University Advanced Optimization Lab AdvOL Graduate Student Seminar October 25, 2004 1 Exact Farkas Lemma Motivation 2 3 Future plans The
More informationConvex Optimization and l 1 -minimization
Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l
More informationIncremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning
Incremental and Stochastic Majorization-Minimization Algorithms for Large-Scale Machine Learning Julien Mairal Inria, LEAR Team, Grenoble Journées MAS, Toulouse Julien Mairal Incremental and Stochastic
More informationSparse Linear Programming via Primal and Dual Augmented Coordinate Descent
Sparse Linear Programg via Primal and Dual Augmented Coordinate Descent Presenter: Joint work with Kai Zhong, Cho-Jui Hsieh, Pradeep Ravikumar and Inderjit Dhillon. Sparse Linear Program Given vectors
More informationFrist order optimization methods for sparse inverse covariance selection
Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationStatistical Machine Learning II Spring 2017, Learning Theory, Lecture 4
Statistical Machine Learning II Spring 07, Learning Theory, Lecture 4 Jean Honorio jhonorio@purdue.edu Deterministic Optimization For brevity, everywhere differentiable functions will be called smooth.
More informationMotivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning
Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic
More informationProximal Minimization by Incremental Surrogate Optimization (MISO)
Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationDual Augmented Lagrangian, Proximal Minimization, and MKL
Dual Augmented Lagrangian, Proximal Minimization, and MKL Ryota Tomioka 1, Taiji Suzuki 1, and Masashi Sugiyama 2 1 University of Tokyo 2 Tokyo Institute of Technology 2009-09-15 @ TU Berlin (UT / Tokyo
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationarxiv: v1 [math.oc] 3 Sep 2012
Proximal methods for the latent group lasso penalty arxiv:1209.0368v1 [math.oc] 3 Sep 2012 Silvia Villa, Lorenzo Rosasco, Sofia Mosci, Alessandro Verri Istituto Italiano di Tecnologia, Genova, ITALY CBCL,
More informationLecture 5: September 12
10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationOptimization for Machine Learning
Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS
More information