A Tutorial on Primal-Dual Algorithm

Size: px
Start display at page:

Download "A Tutorial on Primal-Dual Algorithm"

Transcription

1 A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, / 34

2 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term. Used for a wide range of problems. min{e(x) = R(x) + D(x)} x Minimizer provides the best configuration to the problem. The energy related to the posterior probability via a Gibbs distribution: p(x) = 1 Z exp( E(x)) 2 / 34

3 Energy minimization Discrete vs Continuous MRF setting According to different output space, we have: Discrete setting: each variable y i can take a label from a discrete label set U Z. Continuous setting: each variable y i is considered as a continuous value from U R. We will focus on continuous setting today. 3 / 34

4 Why not Black box convex optimization? 1 Graz University of Technology Black box convex optimization Generic iterative algorithm for convex optimization: 1 Pick any initial vector x 0 2 R n,setk =0 2 Compute search direction d k 2 R n 3 Choose step size k such that f (x k + k d k ) < f (x k ) 4 Set x k+1 = x k + k d k, k = k +1 5 Stop if converged, else goto 2 Plug-and-play, lots of choice: steepest descent, conjugate gradient, newton, quasi-newton (e.g. L-BFGS) Do not use the structure of the problems, thus may not be the most efficient choice. What if it s difficult to compute d? 1 Image from Cremers (2014) 4 / 34

5 Notations Variational imaging people use different notations: min u + u x Ω λ 2 k u f 2 2dx where u : Ω R is a function over the continuous image space Ω R n and the objective is a functional of u. Keep in mind what it looks like in our language: min x Gx 1 + λ 2 Kx f / 34

6 Warm-up Semi-continuity 2 A function f(x) is called lower(upper) semi-continuous for a point x 0 if function values for arguments near x 0 are either close to f(x 0 ) or greater than (less than) f(x 0 ) Floor function x is upper semi-continuous, x is lower semi-continuous. The indicator function of any open set is lower semicontinuous. The indicator function of a closed set is upper semicontinuous. Used to convert inequality constraints to objective function. 2 See for a formal definition 6 / 34

7 Warm-up Lipschitz continuity A function f(x) is called Lipschitz continuous on B n if : f(x) f(y) L x y, x, y B n where constant L is an upper bound to the maximum steepness of f(x) Stronger than continuous, weaker than continuously differentiable. f = x is Lipschitz continuous but not continuously differentiable. 7 / 34

8 Warm-up Convex conjugate The convex conjugate f (y) of a function f(x) is defined as: f (y) = sup x, y f(x) x domf Examples: f(x) = x : Each pair of (y, f(y )) is a tangent line of the function 3 3 Image from Wikipedia f (y) = sup x, y x = x { 0 if y < 1 else 8 / 34

9 Warm-up Proximal operator The proximal operator (or proximal mapping) of a convex function f is: ( prox f (x) = arg min f(u) + 1 ) u 2 u x 2 2 f can be nonsmooth, have embedded constraints,... evaluating prox f involves solving a convex optimization problem. but often has analytic solution, or simple linear-time algorithm. 9 / 34

10 Warm-up Proximal operator examples quadratic function f(x) = 1 2 xt P x + q T x + r (P 0): prox f (x) = (I + P ) 1 (x q) l 1 norm f(x) = x 1 : x i 1 x i 1 prox f (x) i = 0 x i 1 x i + 1 x i 1 (soft thresholding) logarithmic barrier f(x) = n i=1 log x i : prox f (x) i = x i + x 2 i / 34

11 Warm-up Useful properties of proximal operator If f is closed and convex then prox f exists and is unique for all x. seperable sum: if f(x) = N i=1 f i (x i ), then (prox f (x)) i = prox fi (x i ) fixed point: the point x minimizes f if and only if x is a fixed point: x = prox f (x ) scaling and translation: define h(x) = f(tx + b) with t 0: conjugate: prox h (x) = 1 t (prox t 2 f (tx + b) b) prox tf (x) = x tprox f/t (x/t) keys to design parrallel optimization method 11 / 34

12 Proximal gradient method Problem min f(x) = g(x) + h(x) x g convex, differentiable, g is Lipschitz continuous with constant L h convex, possibly nondifferentiable; prox h is inexpensive rules out many methods, e.g. conjugate gradient e.g. lasso: min x x 1 + λ 2 Ax + b / 34

13 Proximal gradient method Proximal gradient method min f(x) = g(x) + h(x) x g convex, differentiable, g is Lipschitz continuous with constant L h convex, possibly nondifferentiable; prox h is inexpensive proximal gradient algorithm ) x (t) = prox h (x (t 1) τ t g(x (t 1) ) O(1/N) convergence rate i.e. to get f(x (k) ) f(x ) ɛ, need O(1/ɛ) iterations 13 / 34

14 Proximal gradient method Accelerated gradient method min f(x) = g(x) + h(x) x g convex, differentiable, g is Lipschitz continuous with constant L h convex, possibly nondifferentiable; prox h is inexpensive accelerated proximal gradient algorithm 4 ) x (t) = prox h (y (t 1) τ t g(y (t 1) ) y (t) = x (t) + t 1 t + 2 (x(t) x (t 1) ) O(1/N 2 ) convergence rate for first-order method! i.e. to get f(x (k) ) f(x ) ɛ, only need O(1/ ɛ) iterations 4 Nesterov (2004), Beck and Teboulle (2009) 14 / 34

15 Proximal gradient method Experiments proximal gradient vs accelerated proximal gradient 5 5 Image from Gordon & Tibshirani (2012) 15 / 34

16 Motivation Energy minimization min f(kx) + g(x) x X K is a linear and continuous operator X is Hilbert space f : X R { }, g : X R { }, convex, not necessarily to be continuous and differentiable, prox is inexpensive 16 / 34

17 Motivation Examples Many problems are within this subclass: Lasso: Low-level vision: Linear programming: min x x 1 + λ 2 Ax + b 2 2 min u,v u 1 + v 1 + λ ρ(u, v) 1 min x c, x, subject to { Ax = b x 0 SVM: min w,b w n max(0, 1 y i ( w, x i + b)) i=1 17 / 34

18 Primal-dual algorithm Problem formulation min f(kx) + g(x) (primal) x X Recall the convex conjugate: f (y) = Kx, y f(kx), we have: min max Kx, y + g(x) f (y) x X y Y (primal-dual) Primal-dual gap: max (f (y) + g ( K y)) y Y (dual) f(kx) + g(x) + f (y) + g ( K y) For convex function, primal-dual gap will be 0 at the optimal solution. 18 / 34

19 Primal-dual algorithm Optimal condition Today we focus on primal-dual problem: min max Kx, y + g(x) f (y) x X y Y (primal-dual) Why primal-dual? proximal operator for f(kx) is not trivial. but we can get proximal operator f (x) and g(x) easily A saddle point (x, y) X Y of this min-max function should satisfy: { K ˆx f (ŷ) 0 K ŷ + g(ˆx) 0 We iterate according to this condition. 19 / 34

20 Primal-dual algorithm The algorithm 6 min max Kx, y + g(x) f (y) x X y Y Choose step size σ > 0 and τ > 0, so that στl 2 < 1, where L = K, and θ [0, 1]. Choose initialization (x 0, y 0 ) For each iteration: y (n+1) = prox f (y (n) + σk x (n) ) (dual proximal) x (n+1) = prox g (x (n) τk ȳ (n+1) ) (primal proximal) x (n+1) = x (n+1) + θ(x (n+1) x (n) ) (extrapolation) Essentially alternately do proximal gradient descent for x and y. 6 Chambolle and Pock (2011), Pock, Cremers, Bischof, Chambolle (2009) 20 / 34

21 Primal-dual algorithm Convergence The algorithm s convergence rate depending on different types of the problem 7 : Completely non-smooth problem: O(1/N) for the duality gap. Sum of a smooth and non-smooth: O(1/N 2 ) for x x 2. Completely smooth problem: O(ω N ), ω < 1 for x x 2. 7 See Chambolle and Pock (2011) for a detailed proof. 21 / 34

22 Discussion Parallel implementation y (n+1) = prox f (y (n) + σk x (n) ) (dual proximal) x (n+1) = prox g (x (n) τk ȳ (n+1) ) (primal proximal) x (n+1) = x (n+1) + θ(x (n+1) x (n) ) (extrapolation) Problems we usually have in vision x and y are defined on a regular grid. f and g is usually in a separable sum format. Small number of variables involved gradient part Kx (high-order potential) perfect for GPU parallel computing! 22 / 34

23 Discussion Arrow-Hurwicz method (θ = 0) 8 { y (n+1) = prox f (y (n) + σk x (n) ) (dual proximal) x (n+1) = prox g (x (n) τk ȳ (n+1) ) (primal proximal) Also tackles primal-dual method Without the momentum step Theoretically lose O(1/N 2 ) convergence guarantee (or people haven t proved it yet) In practice, for some method it is still fast 8 Arrow, Hurwicz, Uzawa (1958) 23 / 34

24 Discussion Alternating direction method of multipliers (ADMM) min f(kx) + g(x) (primal) x X We can conduct a decomposition: min f(y) + g(x) subject to Kx y = 0 x X Solving the following augmented Lagrangian multipliers problem: L τ (x, y, z) = f(y) + g(x) + z T (Kx y) + τ 2 K y x 2 2 y (n+1) = arg min y f(y) + y, z (n) + τ 2 Kx(n) y 2 2 (primal) x (n+1) = arg min x g(x) Kx, z (n) + τ 2 Kx y(n+1) 2 2 (primal) z (n+1) = z (n) + τ(kx y) (dual) Primal-dual method is equivalent to ADMM if K = I. But in the general case primal-dual is usually faster, since solving the subproblems of ADMM is harder. 24 / 34

25 Discussion Experimental results 9 ROF-model, grayscale image, ɛ error tolerance AHZC: Arrow-Hurwicz primal-dual method FISTA: Fast iterative shrinkage threshold, O(1/N 2 ) convergence rate NEST: Nesterov s method on dual problem, O(1/N 2 ) convergence rate ADMM: Alternating direction method of multipliers PGM: Proximal gradient method, O(1/N) convergence rate CFP: Fixed point method on dual 9 Table from Chambolle and Pock (2010) 25 / 34

26 Example Image denoising ROF 10 model for image denoising: min x x 1 + λ 2 x u 2 2 f(x) = i x i 1, where x i is a two-dimensional intensity gradient vector at image pixel i { f 0 y P (y) = δ l (y) = y / P, P = {p : i, p i < 1} Proximal operator for convex-set indicator function is just euclidean projecting onto the feasible closed set P. Thus: y prox f (y) = max( y, 1) Proximal operator for λ 2 x u 2 2 is in closed form: prox g (x) = x + λτu 1 + λτ 10 Rudin, Osher, Fatemi (1992) 26 / 34

27 Example TV-L 1 Optical flow 11 min u,v u 1 + v 1 + λ ρ(u, v) 1 u, v: horizontal and vertical motion field ρ(u, v) first-order Taylor approximation of photometric error u k : estimation of inverse depth from single view 11 Zach, Pock, Bischof (2007) 27 / 34

28 Example Linear Programming min x c, x, subject to { Ax = b x 0 Introducing Lagrange multipliers y min x Applying primal-dual algorithm: max Ax b, y + c, x, subject to x 0 y y (n+1) = y (n) + σa x (n) x (n+1) = proj [0, ) (x (n) τ(a T y (n+1) + c)) x (n+1) = x (n+1) + θ(x (n+1) x (n) ) where proj [0, ) is the simply truncation function. 28 / 34

29 Example Discrete MRF inference LP relaxation: min θ i (x i ) + i,x i L i f min θ i (x i )µ i (x i ) + i,x i L i x i i subject to µ i (x i ) = 1, i x i µ f (x f ) = 1, f x f θ f (x f ) µ f (x f ) = µ i (x i ), f, i, x i x f ı x f θ f (x f )µ f (x f ) 29 / 34

30 Example Binary labeling 12 Potts-model over 2D grid can be written as following convex relaxation: min x Dx 1 + x, w subject to 0 x i 1, i Preconditioned primal-dual algorithm: y (n+1) = proj [ 1,1] (y (n) + ΣA x (n) ) x (n+1) = proj [0, ) (x (n) T (A T y (n+1) + w)) x (n+1) = x (n+1) + θ(x (n+1) x (n) ) 12 Image from Cremers (2014) 30 / 34

31 Example Multi-view stereo reconstruction 13 x: inverse depth estimation min λ(x) x ɛ + C(x) x λ(x): element-wise weighting function ɛ robust Huber loss function C(x) non-convex matching function 13 Newcombie et al. (2011) 31 / 34

32 Example Multi-view stereo reconstruction 14 Introducing auxiliary variable u: min x λ(x) x ɛ + C(u) + 1 2θ x u 2 2 Minimize C(u) + 1 2θ x u 2 2 wrt. u by smart brute-force search Minimize x ɛ + 1 2θ x u 2 2 wrt. x by primal-dual Reducing θ 14 Newcombie et al. (2011) 32 / 34

33 Summary First-order primal-dual algorithm for a class of structured convex optimization problems Objective function can be non-differentiable Easy to implement (we just need to derive the proximal operators) Optimal convergence rate on multiple sub-classes 33 / 34

34 Reference Chambolle, Antonin, and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40.1 (2011): Nesterov, Yurii. Introductory lectures on convex optimization: A basic course. Vol. 87. Springer Science & Business Media, Nesterov, Yurii. Gradient methods for minimizing composite objective function. No. CORE Discussion Papers (2007/76). UCL, Parikh, Neal, and Stephen P. Boyd. Proximal Algorithms. Foundations and Trends in optimization 1.3 (2014): Newcombe, Richard A., Steven J. Lovegrove, and Andrew J. Davison. DTAM: Dense tracking and mapping in real-time. Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, Pock, Thomas, et al. Global solutions of variational models with convex regularization. SIAM Journal on Imaging Sciences 3.4 (2010): Nieuwenhuis, Claudia, Eno TÃűppe, and Daniel Cremers. A survey and comparison of discrete and continuous multi-label optimization approaches for the Potts model. International journal of computer vision (2013): Zach, Christopher, Thomas Pock, and Horst Bischof. A duality based approach for realtime TV-L 1 optical flow. Pattern Recognition. Springer Berlin Heidelberg, / 34

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

A first-order primal-dual algorithm for convex problems with applications to imaging

A first-order primal-dual algorithm for convex problems with applications to imaging A first-order primal-dual algorithm for convex problems with applications to imaging Antonin Chambolle, Thomas Pock To cite this version: Antonin Chambolle, Thomas Pock. A first-order primal-dual algorithm

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

Smoothing Proximal Gradient Method. General Structured Sparse Regression

Smoothing Proximal Gradient Method. General Structured Sparse Regression for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization

A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization A First Order Primal-Dual Algorithm for Nonconvex T V q Regularization Thomas Möllenhoff, Evgeny Strekalovskiy, and Daniel Cremers TU Munich, Germany Abstract. We propose an efficient first order primal-dual

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

The proximal mapping

The proximal mapping The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

arxiv: v3 [math.oc] 29 Jun 2016

arxiv: v3 [math.oc] 29 Jun 2016 MOCCA: mirrored convex/concave optimization for nonconvex composite functions Rina Foygel Barber and Emil Y. Sidky arxiv:50.0884v3 [math.oc] 9 Jun 06 04.3.6 Abstract Many optimization problems arising

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

A Primal-dual Three-operator Splitting Scheme

A Primal-dual Three-operator Splitting Scheme Noname manuscript No. (will be inserted by the editor) A Primal-dual Three-operator Splitting Scheme Ming Yan Received: date / Accepted: date Abstract In this paper, we propose a new primal-dual algorithm

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting

On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Mathematical Programming manuscript No. (will be inserted by the editor) On the equivalence of the primal-dual hybrid gradient method and Douglas Rachford splitting Daniel O Connor Lieven Vandenberghe

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Dual methods for the minimization of the total variation

Dual methods for the minimization of the total variation 1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction

More information

A first-order primal-dual algorithm with linesearch

A first-order primal-dual algorithm with linesearch A first-order primal-dual algorithm with linesearch Yura Malitsky Thomas Pock arxiv:608.08883v2 [math.oc] 23 Mar 208 Abstract The paper proposes a linesearch for a primal-dual method. Each iteration of

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Convex Optimization in Computer Vision:

Convex Optimization in Computer Vision: Convex Optimization in Computer Vision: Segmentation and Multiview 3D Reconstruction Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

AUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION

AUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION New Horizon of Mathematical Sciences RIKEN ithes - Tohoku AIMR - Tokyo IIS Joint Symposium April 28 2016 AUGMENTED LAGRANGIAN METHODS FOR CONVEX OPTIMIZATION T. Takeuchi Aihara Lab. Laboratories for Mathematics,

More information

Barrier Method. Javier Peña Convex Optimization /36-725

Barrier Method. Javier Peña Convex Optimization /36-725 Barrier Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: Newton s method For root-finding F (x) = 0 x + = x F (x) 1 F (x) For optimization x f(x) x + = x 2 f(x) 1 f(x) Assume f strongly

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

Lecture 5: September 12

Lecture 5: September 12 10-725/36-725: Convex Optimization Fall 2015 Lecture 5: September 12 Lecturer: Lecturer: Ryan Tibshirani Scribes: Scribes: Barun Patra and Tyler Vuong Note: LaTeX template courtesy of UC Berkeley EECS

More information

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems

On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems On the convergence rate of a forward-backward type primal-dual splitting algorithm for convex optimization problems Radu Ioan Boţ Ernö Robert Csetnek August 5, 014 Abstract. In this paper we analyze the

More information

Optimisation in Higher Dimensions

Optimisation in Higher Dimensions CHAPTER 6 Optimisation in Higher Dimensions Beyond optimisation in 1D, we will study two directions. First, the equivalent in nth dimension, x R n such that f(x ) f(x) for all x R n. Second, constrained

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Lecture 16: October 22

Lecture 16: October 22 0-725/36-725: Conve Optimization Fall 208 Lecturer: Ryan Tibshirani Lecture 6: October 22 Scribes: Nic Dalmasso, Alan Mishler, Benja LeRoy Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent

Lecture 5: Gradient Descent. 5.1 Unconstrained minimization problems and Gradient descent 10-725/36-725: Convex Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 5: Gradient Descent Scribes: Loc Do,2,3 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for

More information

10701 Recitation 5 Duality and SVM. Ahmed Hefny

10701 Recitation 5 Duality and SVM. Ahmed Hefny 10701 Recitation 5 Duality and SVM Ahmed Hefny Outline Langrangian and Duality The Lagrangian Duality Eamples Support Vector Machines Primal Formulation Dual Formulation Soft Margin and Hinge Loss Lagrangian

More information