Optimization for Learning and Big Data

Size: px
Start display at page:

Download "Optimization for Learning and Big Data"

Transcription

1 Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016.

2 Lecture 1. First-Order Methods for Convex Optimization Lecture 2. Stochastic Quasi-Newton Methods for Machine Learning Lecture 3. Optimization for Tensor Models

3 In whatever happens in the world, one can find the concept of maximum or minimum; hence there is no doubt that all phenomena in nature can be explained via the maximum and minimum method... Euler, Leonhard (1744)

4 Lec 1. Unsupervised Learning Background - Foreground Separation in Videos

5 Lec 2. Supervised Learning Classification

6 Lec 3 Unsupervised Learning Topic Model

7 First-Order Methods for Convex Optimization Convex Functions - Basic Definitions Proximal Algorithms Augmented Lagrangian Method (of Multipliers) Alternating Direction Method of Multipliers (ADMM) Conditional Gradient (Frank-Wolfe) Method

8 Convex Functions f : R n R {+ }, dom(f) = {x R n f (x) < + } f is proper: dom(f ) f is (strictly) convex if f (λx + (1 λ)y) λf (x) + (1 λ)f (y), λ [0, 1] (<) (λ (0,1)) f is µ-strongly convex if for every λ [0, 1] f (λx + (1 λ)y) λf (x) + (1 λ)f (y) µ λ(1 λ) x y 2 2 If f C 2, µi 2 f (x) LI, then f is µ-strongly convex and f (y) f (x) + f (x)(y x) + µ y x 2

9 Non-smooth Convex Functions For convex functions, subgradient take the place of gradients. v is a subgradient of f at x if f (y) f (x) + v (y x) Recall for f C 1, f (y) f (x) + f (x) (y x) Subdifferential: f (x) = {all subgradients of f at x} f(y) f(x) + v, y-x Slope v x y

10 Optimality for Non-smooth Convex Functions f is a set-valued functions Example: { x 2 if x < 0 f (x) = x if x 0 2x if x < 0 f (x) = [0, 1] if x = 0 1 if x > x minimize f (x) 0 f (x).

11 Moreau Proximal Envelopes History: Moreau and Yosida (1960 s) Moreau Envelope: f γ (x) = min y {f (y) + 1 2γ y x 2 } f γ (x) f (x); f γ (x) is a regularized version of f f γ (x) has the same set of minimizer as f (x) x Moreau envelope for x and γ =

12 Moreau Proximity Operator Proximity Operator: prox γf : R n R n of γf, where γ > 0 is a scale factor in prox γf (x) = argmin y {f (y) + 1 2γ y x 2 } (1) The function in {} in (1) is strongly convex and hence has a unique minimizer for every x. prox γf ( ) is closer to minimizers of f ( ) (and f γ ( )) than x. f (y) f (x) + f (x) (y x) prox γ f (x) = x γ f (x) linearizion of f ( ) at x gradient descent with step size γ

13 Proximity Operators: Examples f = I C (x), the indicator function for the convex set C R n I C (x) = { 0 if x C + if x C prox f (x) = argmin y C y x 2 (projection of x onto C) f = γ x prox γf (x) = soft(x, γ) = sgn(x) max( x γ, 0) Nuclear (trace) norm: X = of singular values of X. Let SVD of X be UΛV, then prox γ (X ) = U ΛV, Λ ii = soft(λ ii, γ)

14 Proximal Minimization x k+1 prox γf (x k ) (2) Minimizer x of γf is a fixed point of prox γf, i.e. x = prox γf (x ) prox γf = x γ f γ (x), is a steepest descent step, with step length γ for minimizing the Moreau envelope. w.r.t f, if f C 1, prox γf is equivalent to an implicit gradient (backward Euler) step. Iteration (2) converges to the set of minimizers of f.

15 Proximal Gradient Method Consider: minimize f (x) + g(x) where f : R n R, f C 1, g : R n R {+ } are both closed and proper convex functions. Proximal gradient method x k+1 prox αk g (x k α k f (x k )) Re-discovered in optimization, convex analysis, machine learning, signal processing, PDE, etc - Fixed-Point Continuation (FPC) - Iterative Shrinkage Thresholding (IST) - Forward-Backward Splitting (FBS) Let f (x) = f (x k ) + f (x k ) (x x k ) x k+1 prox αk g (prox αk f (x k ))

16 Unsupervised Learning: Proximal Gradient Method Recommendation Systems: Netflix problem 17,000 movies, 500,000 customers, 100,000,000 ratings objective function value: $1,000,000

17 Unsupervised Learning: Proximal Gradient Method Netflix Problem Matrix Completion min X {rank(x ) P Ω(X M) = 0} Convex Relaxation (Candes and Recht, 2009) (Candes and Tao, 2009) Prox gradient method: min µ X P Ω(X M) 2 F Y k X k τg(x k ) X k+1 S τµ (Y k ) where g(x ) := gradient of 1 2 P Ω(X M) 2 F (Ma, G, Chen, 2009) S ν (Y ) := matrix shrinkage operatior

18 Augmented Lagrangian Methods Consider the linearly constrained problem minimize subject to f (x) Ax = b where f is a proper, lower semi-continuous, convex function. Augmented Lagrangian with penalty parameter ρ > 0 L(x, λ; ρ) := f (x) + λ (Ax b) + ρ }{{} 2 Ax b 2 2 }{{} Lagrangian augmentation Augmented Lagrangian method (method of multipliers) (Hestenes, Powell ) x k = argmin x L(x, λ k 1 ; ρ), λ k = λ k 1 + ρ(ax k b).

19 A Non-standard Derivation min x f (x) s.t. Ax = b min x max λ {f (x) + λ (Ax b)} To smooth max λ {f (x) + λ (Ax b)}, add a proximal term given an estimate λ: ˆϕ(x) := max λ {f (x) + λ (Ax b) 1 2ρ λ λ 2 } Maximizing w.r.t. λ yields ˆλ = λ + ρ(ax b) and min{f (x) + λ (Ax b) + ρ x 2 Ax b 2 } = L(x, λ; ρ). Extends immediately to nonlinear constraints c(x) = 0 or c(x) 0, and explicit constraints min x Ω L(x, λ, ρ).

20 Another Non-standard Derivation Consider a penalty method approach min x f (x) + ρ 2 Ax b 2 2 Bregman distance for convex f ( ) between points u and v is D p f (u, v) := f (u) f (v) p, u v

21 Another Non-standard Derivation (Cont.) (ρ = 1) Bregman iteration: set x 0 0, p 0 0 x k+1 argmin D pk x f (x, x k ) Ax b 2 2 p k+1 p k + A (Ax k+1 b) Augmented Lagrangian method: set x 0 0, λ 0 0 x k+1 argmin x f (x) + λ k, Ax Ax b 2 2 λ k+1 λ k + Ax k+1 b Augmented Lagrangian Bregman {p k = A λ k }

22 Alternating Direction Method of Multipliers (ADMM) Long history: goes back to Gabay and Mercier, Glowinski and Marrocco, Lions and Mercier, and Passty etc. Variational problems in partial differential equations Maximal monotone operators Variational inequalities Nonlinear convex optimization Linear programming Nonsmooth l 1 -minimization, compressive sensing Split-Bregman (Goldstein & Osher, 2009) 2139 citations, (Gabay & Mercier, 1976) 970 citations

23 Alternating Direction Method of Multipliers (ADMM) Consider problems with a separable objective of the form min f (x) + h(z) s.t. Ax + Bz = c. (x,z) Standard augmented Lagrangian method minimizes L(x, z, λ; ρ) := f (x)+h(z)+λ (Ax+Bz c)+ ρ 2 Ax Bz c 2 2 w.r.t. (x, z) jointly. In ADMM, minimize over x and z separately and sequentially: x k = argmin x L(x, z k 1, λ k 1 ; ρ k ); z k = argmin z L(x k, z, λ k 1 ; ρ k ); λ k = λ k 1 + ρ k (Ax k + Bz k c).

24 ADMM: A Simpler Form Consider the simpler problem min x f (x) + h(ax) min (x,z) f (x) + h(z) s.t. Ax = z. In this case, the ADMM can be written as x k = argmin x f (x) + ρ 2 Ax z k 1 d k z k = argmin z h(z) + ρ 2 Ax k 1 z d k d k = d k 1 (Ax k z k ) sometimes called the scaled version of ADMM. Note z k = prox h/ρ (Ax k 1 d k 1 ) and is usually easy. Updating x k may be hard: if f is not quadratic, may be as hard as the original problem.

25 Examples min F (x) f (x) + g(x) Compressed sensing (Lasso): Matrix Rank Min: min ρ x Ax b 2 2 Robust PCA: min ρ X A(X ) b 2 2 min X,Y X + ρ Y 1 : X + Y = M Sparse Inverse Covariance Selection: Group Lasso: min log det(x ) + Σ, X + ρ X 1 min ρ x 1, Ax b 2 2

26 Variable Splitting min f (x) + g(x) min f (x) + g(y) s.t. x = y Augmented Lagrangian function: L(x, y; λ) := f (x) + g(y) λ, x y + 1 x y 2 2µ ADMM x k+1 := arg min x L(x, y k ; λ k ) y k+1 := arg min y L(x k+1, y; λ k ) λ k+1 := λ k (x k+1 y k+1 )/µ

27 Symmetric ADMM Alternating Linearization Method Symmetric version x k+1 := arg min L(x, y k ; λ k ) x λ k+ 1 2 := λ k (x k+1 y k )/µ y k+1 := arg min L(x k+1, y; λ k+ 1 2 ) y λ k+1 := λ k+ 1 2 (x k+1 y k+1 )/µ Optimality conditions lead to (assuming f and g are smooth) λ k+ 1 2 = f (x k+1 ), λ k+1 = g(y k+1 ) Alternating Linerization Method (ALM) { x k+1 = arg min x f (x) + g(y k ) + g(y k ), x y k + 1 2µ x y k 2 y k+1 = arg min x f (x k+1 ) + f (x k+1 ), y x k µ x k+1 y 2 + g(y) Gauss-Seidel like algorithm

28 Complexity Bound for ALM Theorem (G, Ma and Scheinberg, 2013) Assume f and g are Lipschitz continuous with constants L(f ) and L(g). For µ 1/ max{l(f ), L(g)}, ALM satisfies F (y k ) F (x ) x 0 x 2 4µk O(1/ɛ) iterations for an ɛ-optimal solution (f (x) f (x ) ɛ) Can we improve the complexity? Can we extend this result to ADMM?

29 Optimal Gradient Methods Lipschitz continuous f Classical gradient method Complexity O(1/ɛ) x k = x k 1 τ k f (x k 1 ) Nesterov s acceleration technique (1983) Complexity O(1/ ɛ) { x k := y k 1 τ k f (y k 1 ) y k := x k + k 1 k+2 (x k x k 1 ) Optimal first-order method; best one can get

30 ISTA and FISTA (Beck and Teboulle, 2009) Assume g is smooth min F (x) f (x) + g(x) ISTA (Proximal gradient method) or equivalently x k+1 := arg min x Q g (x, x k ) Complexity O(1/ɛ) x k+1 := arg min x τf (x) x (x k τ g(x k )) 2 Never minimize g Fast ISTA (FISTA) Complexity O(1/ ɛ) x k := ( arg min x τf (x) + ) 1 2 x (y k τ g(y k )) 2 t k+1 := tk 2 /2 y k+1 := x k + t k 1 t k+1 (x k x k 1 )

31 Fast Alternating Linearization Method (FALM) ALM (symmetric ADMM) { x k+1 := arg min x Q g (x, y k ) y k+1 := arg min y Q f (x k+1, y) Accelerate ALM in the same way as FISTA Fast Alternating Linearization Method (FALM) x k := arg min x Q g (x, z k ) y k := arg min y Q f (x k, y) w k := (x k + y k )/2 t k+1 := ( t 2 k ) /2 z k+1 := w k + 1 t k+1 (t k (y k w k 1 ) (w k w k 1 )) computational effort at each iteration is almost unchanged both f and g must be smooth; however, both are minimized

32 FALM (cont.) Theorem (G, Ma and Scheinberg, 2013) Assume f and g are Lipschitz continuous with constants L(f ) and L(g). For µ 1/ max{l(f ), L(g)}, FALM satisfies F (y k ) F (x ) x 0 x 2 µ(k + 1) 2 Complexity O(1/ ɛ) iterations for an ɛ-optimal solution Hence, optimal first-order method Applied to Total Variation denoising outperforms split Bregman (Qin, G, Ma, 2013)

33 ALM with skipping steps At k-th iteration of ALM-S: x k+1 := arg min x L µ (x, y k ; λ k ) If F (x k+1 ) > L µ (x k+1, y k ; λ k ), then x k+1 := y k y k+1 := arg min y Q f (y, x k+1 ) λ k+1 := f (x k+1 ) (x k+1 y k+1 )/µ Note that only f is required to be smooth. If µ 1/L(f ), complexity O(1/ɛ) ; if L(f ) not known, use backtracking line search (Scheinberg, G, Bai 2014) FALM version has complexity O(1/ ɛ). Applied to solve Sparse Inverse Covariance Selection (Scheinberg, Ma, G, 2010), Group Lasso (structured sparsity for breast cancer gene expression) (Qin, G, 2012)

34 Multiple Splitting Algorithm (MSA) Generalization from 2 to K convex functions is possible, but non-convergence of ADMM for K 3 has been shown. Consider min ALM (symmetric ADMM) F (x) f (x) + g(x) + h(x) Q gh (u, v, w) := f (u) + g(v) + g(v), u v + u v 2 /2µ +h(w) + h(w), u w + u w 2 /2µ. x k+1 := arg min Q gh (x, y k, z k ) y k+1 := arg min Q fh (x k+1, y, z k ) z k+1 := arg min Q fg (x k+1, y k+1, z) Gauss-Seidel like algorithm! Convergence?

35 Multiple Splitting Algorithm (MSA) (cont.) Jacobi type algorithm x k+1 := arg min Q gh (x, w k, w k ) y k+1 := arg min Q fh (w k, y, w k ) z k+1 := arg min Q fg (w k, w k, z) w k+1 := (x k+1 + y k+1 + z k+1 )/3 Convergent Complexity O(1/ɛ) (G and Ma, 2012)

36 O(1/ ɛ) complexity (G and Ma, 2012) Fast Multiple Splitting Algorithm (FaMSA) x k := arg min Q gh (x, wx k, wx k ) y k := arg min Q fh (wy k, y, wy k ) z k := arg min Q fg (wz k, wz k, z) w k := (x k + y k + z k )/3 t k+1 := ( 1 + w k+1 x := w k t 2 k ) /2 t k+1 [t k (x k w k ) (w k w k 1 )] wy k+1 := w k + 1 t k+1 [t k (y k w k ) (w k w k 1 )] wz k+1 := w k + 1 t k+1 [t k (z k w k ) (w k w k 1 )]

37 The Frank-Wolfe Algorithm Discovered in 1956, the Frank-Wolfe (also known as conditional gradient) algorithm is the earliest algorithm to solve: where f (x) is a convex function minimize f (x) subject to x D D R p is a compact and convex set.

38 The Frank-Wolfe Algorithm

39 Application:Signal Processing Recover a sparse signal x from noisy measurements b Convex Relaxation Exact Recovery with high probability (Candes, Romberg and Tao, 2006; Donoho, 2006) Consider min Ax b 2 x 1 1 Frank - Wolfe select same ======== vertex each step Matching Pursuit Fully corective Frank-Wolfe Orthogonal Matching Pursuit (Tropp & Gilbert, 2007)

40 Application: Robust and Stable Principal Component Pursuit (RPCP and SPCP) M = L 0 + S 0 + N 0 low-rank sparse small, dense noise Given M, approximately and efficiently recover L 0 and S 0. Convex approach SPCP: min L,S L + λ S 1 s.t. L + S M F δ RPCP: min L,S L + λ S 1 s.t. L + S = M

41 Algorithms for RPCP and SPCP Many first-order methods have been developed Most exploit the closed-form expression for the proximal operator of nuclear norm; i.e. matrix shrinkage 1 min L 2 L Z λ L Using a full or partial SVD, thus limiting their applicability to large-scale problems They also us the closed-form expression for the proximal operator of the l 1 -norm; i.e. vector shrinkage to compute S.

42 Frank-Wolfe for Norm-Constrained SPCP Solve min L,S 1 2 P Ω(L + S M) 2 F s.t. L β 1, S 1 β 2 Frank-Wolfe algorithm for SPCP:

43 Inefficiency of the FW algorithm Synthetic data: (Slow convergence) Inefficient in updateing S: S k+1 = k k + 2 S k 2β 2 k + 2 ek i (ek j ) = S k+1 0 S k 0 + 1

44 Frank-Wolfe/Prox Gradient (FW-P) Algorithm Key idea: Add a prox gradient step to update S after each F-W step

45 FW-P Algorithm for SPCP Solve min L,S 1 2 P Ω[L + S M] 2 F + λ 1 L + λ 2 S 1 Domain unbounded Epigraph formulation!

46 FW-P Algorithm for SPCP Synthetic data: (Red: F-W, Blue: UFA) Theorem (Mu, Wright, G. 14) For {(L k, S k )} produced by FW-P method, we have f (L k, S k ) f (L, S ) 16(β2 1 + β2 2 ) k + 2

47 FW-P Algorithm for SPCP Comparison with other algorithms

48 FW-P for Matrix SPCP Background and foreground extractions from greyscale surveillance videos M, 96 seconds using a laptop!

49 FW-P for Tensor SPCP Comvex program: min L,S 1 2 P Ω[X L S] F + λ 1 X + λ 2 S 1

50 FW-P for Tensor SPCP Background segmentation for color videos: background modelling (50% missing entries) Data size: = 18.4M, running time: 34 secs.

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization

Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Alternating Direction Augmented Lagrangian Algorithms for Convex Optimization Joint work with Bo Huang, Shiqian Ma, Tony Qin, Katya Scheinberg, Zaiwen Wen and Wotao Yin Department of IEOR, Columbia University

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Optimization in Big Data

Optimization in Big Data Optimization in Big Data Ethan Xingyuan Fang University Park PA, June, 2018 1 / 99 Outline 1 Linear Programming 2 Convex Optimization 3 Stochastic First-Order Methods 4 ADMM 2 / 99 What is Linear Programming?

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING

FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING FAST FIRST-ORDER METHODS FOR COMPOSITE CONVEX OPTIMIZATION WITH BACKTRACKING KATYA SCHEINBERG, DONALD GOLDFARB, AND XI BAI Abstract. We propose new versions of accelerated first order methods for convex

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Generalized Conditional Gradient and Its Applications

Generalized Conditional Gradient and Its Applications Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized

More information

First Order Methods for Large-Scale Sparse Optimization. Necdet Serhat Aybat

First Order Methods for Large-Scale Sparse Optimization. Necdet Serhat Aybat First Order Methods for Large-Scale Sparse Optimization Necdet Serhat Aybat Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and

More information

Low Complexity Regularization 1 / 33

Low Complexity Regularization 1 / 33 Low Complexity Regularization 1 / 33 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models Sparse representations

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT

FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT FAST FIRST-ORDER METHODS FOR STABLE PRINCIPAL COMPONENT PURSUIT N. S. AYBAT, D. GOLDFARB, AND G. IYENGAR Abstract. The stable principal component pursuit SPCP problem is a non-smooth convex optimization

More information

Oslo Class 6 Sparsity based regularization

Oslo Class 6 Sparsity based regularization RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS

FAST ALTERNATING DIRECTION OPTIMIZATION METHODS FAST ALTERNATING DIRECTION OPTIMIZATION METHODS TOM GOLDSTEIN, BRENDAN O DONOGHUE, SIMON SETZER, AND RICHARD BARANIUK Abstract. Alternating direction methods are a common tool for general mathematical

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Stochastic Optimization: First order method

Stochastic Optimization: First order method Stochastic Optimization: First order method Taiji Suzuki Tokyo Institute of Technology Graduate School of Information Science and Engineering Department of Mathematical and Computing Sciences JST, PRESTO

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Lecture: Matrix Completion

Lecture: Matrix Completion 1/56 Lecture: Matrix Completion http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html Acknowledgement: this slides is based on Prof. Jure Leskovec and Prof. Emmanuel Candes s lecture notes Recommendation systems

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

arxiv: v2 [math.oc] 18 Nov 2012

arxiv: v2 [math.oc] 18 Nov 2012 arxiv:206.2372v2 [math.oc] 8 Nov 202 PRISMA: PRoximal Iterative SMoothing Algorithm Francesco Orabona Toyota Technological Institute at Chicago francesco@orabona.com Andreas Argyriou Katholieke Universiteit

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

Math 273a: Optimization Convex Conjugacy

Math 273a: Optimization Convex Conjugacy Math 273a: Optimization Convex Conjugacy Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Convex conjugate (the Legendre transform) Let f be a closed proper

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Descent methods. min x. f(x)

Descent methods. min x. f(x) Gradient Descent Descent methods min x f(x) 5 / 34 Descent methods min x f(x) x k x k+1... x f(x ) = 0 5 / 34 Gradient methods Unconstrained optimization min f(x) x R n. 6 / 34 Gradient methods Unconstrained

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Mathematics of Data: From Theory to Computation

Mathematics of Data: From Theory to Computation Mathematics of Data: From Theory to Computation Prof. Volkan Cevher volkan.cevher@epfl.ch Lecture 8: Composite convex minimization I Laboratory for Information and Inference Systems (LIONS) École Polytechnique

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Operator Splitting for Parallel and Distributed Optimization

Operator Splitting for Parallel and Distributed Optimization Operator Splitting for Parallel and Distributed Optimization Wotao Yin (UCLA Math) Shanghai Tech, SSDS 15 June 23, 2015 URL: alturl.com/2z7tv 1 / 60 What is splitting? Sun-Tzu: (400 BC) Caesar: divide-n-conquer

More information

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method

1. Gradient method. gradient method, first-order methods. quadratic bounds on convex functions. analysis of gradient method L. Vandenberghe EE236C (Spring 2016) 1. Gradient method gradient method, first-order methods quadratic bounds on convex functions analysis of gradient method 1-1 Approximate course outline First-order

More information

Lecture 17: October 27

Lecture 17: October 27 0-725/36-725: Convex Optimiation Fall 205 Lecturer: Ryan Tibshirani Lecture 7: October 27 Scribes: Brandon Amos, Gines Hidalgo Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information