Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization

Size: px
Start display at page:

Download "Alternating Direction Method of Multipliers. Ryan Tibshirani Convex Optimization"

Transcription

1 Alternating Direction Method of Multipliers Ryan Tibshirani Convex Optimization

2 Consider the problem Last time: dual ascent min x f(x) subject to Ax = b where f is strictly convex and closed. Denote Lagrangian L(x, u) = f(x) + u T (Ax b) Dual gradient ascent repeats, for k = 1, 2, 3,... x (k) = argmin x L(x, u (k 1) ) u (k) = u (k 1) + t k (Ax (k) b) Good: x update decomposes when f does. Bad: require stringent assumptions (strong convexity of f) to ensure convergence 2

3 Augmented Lagrangian method (also called method of multipliers) considers the modified problem, for a parameter ρ > 0, uses modified Lagrangian min x f(x) + ρ 2 Ax b 2 2 subject to Ax = b L ρ (x, u) = f(x) + u T (Ax b) + ρ 2 Ax b 2 2 and repeats, for k = 1, 2, 3,... x (k) = argmin x L ρ (x, u (k 1) ) u (k) = u (k 1) + ρ(ax (k) b) Good: better convergence properties. Bad: lose decomposability 3

4 Alternating direction method of multipliers Alternating direction method of multipliers or ADMM: combines the best of both methods. Consider a problem of the form: min x,z f(x) + g(z) subject to Ax + Bz = c We define augmented Lagrangian, for a parameter ρ > 0, L ρ (x, z, u) = f(x) + g(z) + u T (Ax + Bz c) + ρ 2 Ax + Bz c 2 2 We repeat, for k = 1, 2, 3,... x (k) = argmin x z (k) = argmin z L ρ (x, z (k 1), u (k 1) ) L ρ (x (k), z, u (k 1) ) u (k) = u (k 1) + ρ(ax (k) + Bz (k) c) 4

5 Convergence guarantees Under modest assumptions on f, g (these do not require A, B to be full rank), the ADMM iterates satisfy, for any ρ > 0: Residual convergence: r (k) = Ax (k) + Bz (k) c 0 as k, i.e., primal iterates approach feasibility Objective convergence: f(x (k) ) + g(z (k) ) f + g, where f + g is the optimal primal objective value Dual convergence: u (k) u, where u is a dual solution For details, see Boyd et al. (2010). Note that we do not generically get primal convergence, but this is true under more assumptions Convergence rate: roughly, ADMM behaves like first-order method. Theory still being developed, see, e.g., in Hong and Luo (2012), Deng and Yin (2012), Iutzeler et al. (2014), Nishihara et al. (2015) 5

6 Scaled form ADMM Scaled form: denote w = u/ρ, so augmented Lagrangian becomes L ρ (x, z, w) = f(x) + g(z) + ρ 2 Ax + Bz c + w 2 2 ρ 2 w 2 2 and ADMM updates become x (k) = argmin x z (k) = argmin z f(x) + ρ 2 Ax + Bz(k 1) c + w (k 1) 2 2 g(z) + ρ 2 Ax(k) + Bz c + w (k 1) 2 2 w (k) = w (k 1) + Ax (k) + Bz (k) c Note that here kth iterate w (k) is just a running sum of residuals: w (k) = w (0) + k ( Ax (i) + Bz (i) c ) i=1 6

7 Outline Today: Examples, practicalities Consensus ADMM Special decompositions 7

8 Connection to proximal operators Consider min f(x) + g(x) min x x,z f(x) + g(z) subject to x = z ADMM steps (equivalent to Douglas-Rachford, here): x (k) = prox f,1/ρ (z (k 1) w (k 1) ) z (k) = prox g,1/ρ (x (k) + w (k 1) ) w (k) = w (k 1) + x (k) z (k) where prox f,1/ρ is the proximal operator for f at parameter 1/ρ, and similarly for prox g,1/ρ In general, the update for block of variables reduces to prox update whenever the corresponding linear transformation is the identity 8

9 Example: lasso regression Given y R n, X R n p, recall the lasso problem: We can rewrite this as: 1 min β 2 y Xβ λ β 1 min β,α 1 2 y Xβ λ α 1 subject to β α = 0 ADMM steps: β (k) = (X T X + ρi) 1( X T y + ρ(α (k 1) w (k 1) ) ) α (k) = S λ/ρ (β (k) + w (k 1) ) w (k) = w (k 1) + β (k) α (k) 9

10 10 Notes: The matrix X T X + ρi is always invertible, regardless of X If we compute a factorization (say Cholesky) in O(p 3 ) flops, then each β update takes O(p 2 ) flops The α update applies the soft-thresolding operator S t, which recall is defined as x j t x > t [S t (x)] j = 0 t x t, j = 1,... p x j + t x < t ADMM steps are almost like repeated soft-thresholding of ridge regression coefficients

11 11 Comparison of various algorithms for lasso regression: 100 random instances with n = 200, p = 50 Suboptimality fk fstar 1e 10 1e 07 1e 04 1e 01 Coordinate desc Proximal grad Accel prox ADMM (rho=50) ADMM (rho=100) ADMM (rho=200) Iteration k

12 12 Practicalities In practice, ADMM usually obtains a relatively accurate solution in a handful of iterations, but it requires a large number of iterations for a highly accurate solution (like a first-order method) Choice of ρ can greatly influence practical convergence of ADMM: ρ too large not enough emphasis on minimizing f + g ρ too small not enough emphasis on feasibility Boyd et al. (2010) give a strategy for varying ρ; it can work well in practice, but does not have convergence guarantees Like deriving duals, transforming a problem into one that ADMM can handle is sometimes a bit subtle, since different forms can lead to different algorithms

13 13 Example: group lasso regression Given y R n, X R n p, recall the group lasso problem: Rewrite as: min β,α ADMM steps: 1 min β 2 y Xβ λ 1 2 y Xβ λ G c g β g 2 g=1 G c g α g 2 subject to β α = 0 g=1 β (k) = (X T X + ρi) 1( X T y + ρ(α (k 1) w (k 1) ) ) α g (k) ( = R cgλ/ρ β (k) g + w g (k 1) ), g = 1,... G w (k) = w (k 1) + β (k) α (k)

14 14 Notes: The matrix X T X + ρi is always invertible, regardless of X If we compute a factorization (say Cholesky) in O(p 3 ) flops, then each β update takes O(p 2 ) flops The α update applies the group soft-thresolding operator R t, which recall is defined as ( R t (x) = 1 t ) x x 2 + Similar ADMM steps follow for a sum of arbitrary norms of as regularizer, provided we know prox operator of each norm ADMM algorithm can be rederived when groups have overlap (hard problem to optimize in general!). See Boyd et al. (2010)

15 15 Example: sparse subspace estimation Given S S p (typically S 0 is a covariance matrix), consider the sparse subspace estimation problem (Vu et al., 2013): max Y tr(sy ) λ Y 1 subject to Y F k where F k is the Fantope of order k, namely F k = {Y S p : 0 Y I, tr(y ) = k} Note that when λ = 0, the above problem is equivalent to ordinary principal component analysis (PCA) This above is an SDP and in principle solveable with interior point methods, though these can be complicated to implement and quite slow for large problem sizes

16 16 Rewrite as: min Y,Z tr(sy ) + I F k (Y ) + λ Z 1 subject to Y = Z ADMM steps are: Y (k) = P Fk (Z (k 1) W (k 1) + S/ρ) Z (k) = S λ/ρ (Y (k) + W (k 1) ) W (k) = W (k 1) + Y (k) Z (k) Here P Fk is Fantope projection operator, computed by clipping the eigendecomposition A = UΣU T, Σ = diag(σ 1,..., σ p ): P Fk (A) = UΣ θ U T, Σ θ = diag(σ 1 (θ),..., σ p (θ)) where each σ i (θ) = min{max{σ i θ, 0}, 1}, and p i=1 σ i(θ) = k

17 17 Example: sparse + low rank decomposition Given M R n m, consider the sparse plus low rank decomposition problem (Candes et al., 2009): min L,S subject to L tr + λ S 1 L + S = M ADMM steps: L (k) = S tr 1/ρ (M S(k 1) + W (k 1) ) S (k) = S l 1 λ/ρ (M L(k) + W (k 1) ) W (k) = W (k 1) + M L (k) S (k) where, to distinguish them, we use Sλ/ρ tr for matrix soft-thresolding and S l 1 λ/ρ for elementwise soft-thresolding

18 18 Example from Candes et al. (2009): (a) Original frames (b) Low-rank ˆL (c) Sparse Ŝ

19 19 Consensus ADMM Consider a problem of the form: min x B f i (x) i=1 The consensus ADMM approach begins by reparametrizing: min x 1,...x B,x B f i (x i ) subject to x i = x, i = 1,... B i=1 This yields the decomposable ADMM steps: x (k) i = argmin x i B x (k) = 1 B w (k) i i=1 = w (k 1) i f i (x i ) + ρ 2 x i x (k 1) + w (k 1) i 2 2, i = 1,... B ( x (k) i ) + w (k 1) i + x (k) i x (k), i = 1,... B

20 20 Write x = 1 B B i=1 x i and similarly for other variables. Not hard to see that w (k) = 0 for all iterations k 1 Hence ADMM steps can be simplified, by taking x (k) = x (k) : x (k) i w (k) i = argmin x i = w (k 1) i f i (x i ) + ρ 2 x i x (k 1) + w (k 1) i 2 2, i = 1,... B + x (k) i x (k), i = 1,... B To reiterate, the x i, i = 1,... B updates here are done in parallel Intuition: Try to minimize each f i (x i ), use (squared) l 2 regularization to pull each x i towards the average x If a variable x i is bigger than the average, then w i is increased So the regularization in the next step pulls x i even closer

21 21 General consensus ADMM Consider a problem of the form: min x B f i (a T i x + b i ) + g(x) i=1 For consensus ADMM, we again reparametrize: min x 1,...x B,x B f i (a T i x i + b i ) + g(x) subject to x i = x, i = 1,... B i=1 This yields the decomposable ADMM updates: x (k) i = argmin x i x (k) = argmin x w (k) i = w (k 1) i f i (a T i x i + b i ) + ρ 2 x i x (k 1) + w (k 1) i 2 2, Bρ 2 x x(k) w (k 1) g(x) + x (k) i x (k), i = 1,... B i = 1,... B

22 22 Notes: It is no longer true that w (k) = 0 at a general iteration k, so ADMM steps do not simplify as before To reiterate, the x i, i = 1,... B updates are done in parallel Each x i update can be thought of as a loss minimization on part of the data, with l 2 regularization The x update is a proximal operation in regularizer g The w update drives the individual variables into consensus A different initial reparametrization will give rise to a different ADMM algorithm See Boyd et al. (2010), Parikh and Boyd (2013) for more details on consensus ADMM, strategies for splitting up into subproblems, and implementation tips

23 23 Special decompositions ADMM can exhibit much faster convergence than usual, when we parametrize subproblems in a special way ADMM updates relate closely to block coordinate descent, in which we optimize a criterion in an alternating fashion across blocks of variables With this in mind, get fastest convergence when minimizing over blocks of variables leads to updates in nearly orthogonal directions Suggests we should design ADMM form (auxiliary constraints) so that primal updates de-correlate as best as possible This is done in, e.g., Ramdas and Tibshirani (2014), Wytock et al. (2014), Barbero and Sra (2014)

24 Example: 2d fused lasso Given an image Y Rd d, equivalently written as y Rn, recall the 2d fused lasso or 2d total variation denoising problem: min Θ X 1 ky Θk2F + λ Θi,j Θi+1,j + Θi,j Θi,j+1 2 i,j min θ 1 ky θk22 + λkdθk1 2 Here D Rm n is a 2d difference operator giving the appropriate differences (across horizontally and vertically adjacent positions) 164 Parallel and Distributed Algorithms Figure 5.2: Variables are black dots; the partitions P and Q are in orange and cyan. 24

25 25 First way to rewrite: min θ,z 1 2 y θ λ z 1 subject to θ = Dz Leads to ADMM steps: Notes: θ (k) = (I + ρd T D) 1( y + ρd T (z (k 1) + w (k 1) ) ) z (k) = S λ/ρ (Dθ (k) w (k 1) ) w (k) = w (k 1) + z (k 1) Dθ (k) The θ update solves linear system in I + ρl, with L = D T D the graph Laplacian matrix of the 2d grid, so this can be done efficiently, in roughly O(n) operations The z update applies soft thresholding operator S t Hence one entire ADMM cycle uses roughly O(n) operations

26 26 Second way to rewrite: min H,V subject to H = V 1 2 Y H 2 F + λ i,j Leads to ADMM steps: H (k),j V (k) i, ( (k 1) Y + ρ(v = FL 1d,j W (k 1) λ/(1+ρ) 1 + ρ ( = FL 1d λ/ρ H (k) i, + W (k 1) i, W (k) = W (k 1) + H (k) V (k) Notes: ( ) H i,j H i+1,j + V i,j V i,j+1,j ) ), i = 1,..., d ), j = 1,..., d Both H, V updates solve (sequence of) 1d fused lassos, where we write FL 1d 1 τ (a) = argmin x 2 a x τ d 1 i=1 x i x i+1

27 Critical: each 1d fused lasso solution can be computed exactly in O(d) operations with specialized algorithms (e.g., Johnson, 2013; Davies and Kovac, 2001) Distributed Hence one entire ADMM cycleparallel again usesand O(n) operations Algori 27

28 28 Comparison of 2d fused lasso algorithms: an image of dimension (so n = 60, 000) Data Exact solution

29 29 Two ADMM algorithms, (say) standard and specialized ADMM: f(k) fstar 1e+01 1e+02 1e+03 1e+04 1e+05 Standard Specialized k

30 30 ADMM iterates visualized after k = 10, 30, 50, 100 iterations: Standard ADMM Specialized ADMM 10 iterations 10 iterations

31 30 ADMM iterates visualized after k = 10, 30, 50, 100 iterations: Standard ADMM Specialized ADMM 30 iterations 30 iterations

32 30 ADMM iterates visualized after k = 10, 30, 50, 100 iterations: Standard ADMM Specialized ADMM 50 iterations 50 iterations

33 30 ADMM iterates visualized after k = 10, 30, 50, 100 iterations: Standard ADMM Specialized ADMM 100 iterations 100 iterations

34 31 References A. Barbero and S. Sra (2014), Modular proximal optimization for multidimensional total-variation regularization S. Boyd and N. Parikh and E. Chu and B. Peleato and J. Eckstein (2010), Distributed optimization and statistical learning via the alternating direction method of multipliers E. Candes and X. Li and Y. Ma and J. Wright (2009), Robust principal component analysis? N. Parikh and S. Boyd (2013), Proximal algorithms A. Ramdas and R. Tibshirani (2014), Fast and flexible ADMM algorithms for trend filtering V. Vu and J. Cho and J. Lei and K. Rohe (2013), Fantope projection and selection: a near-optimal convex relaxation of sparse PCA M. Wytock and S. Sra. and Z. Kolter (2014), Fast Newton methods for the group fused lasso

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Dual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems?

Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Does Alternating Direction Method of Multipliers Converge for Nonconvex Problems? Mingyi Hong IMSE and ECpE Department Iowa State University ICCOPT, Tokyo, August 2016 Mingyi Hong (Iowa State University)

More information

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Gradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Duality. Geoff Gordon & Ryan Tibshirani Optimization / Duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Duality in linear programs Suppose we want to find lower bound on the optimal value in our convex problem, B min x C f(x) E.g., consider

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Differential network analysis from cross-platform gene expression data: Supplementary Information

Differential network analysis from cross-platform gene expression data: Supplementary Information Differential network analysis from cross-platform gene expression data: Supplementary Information Xiao-Fei Zhang, Le Ou-Yang, Xing-Ming Zhao, and Hong Yan Contents 1 Supplementary Figures Supplementary

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Case Study: Generalized Lasso Problems. Ryan Tibshirani Convex Optimization /36-725

Case Study: Generalized Lasso Problems. Ryan Tibshirani Convex Optimization /36-725 Case Study: Generalized Lasso Problems Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: review of optimization toolbox So far we ve learned about: First-order methods Newton/quasi-Newton

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

Splitting methods for decomposing separable convex programs

Splitting methods for decomposing separable convex programs Splitting methods for decomposing separable convex programs Philippe Mahey LIMOS - ISIMA - Université Blaise Pascal PGMO, ENSTA 2013 October 4, 2013 1 / 30 Plan 1 Max Monotone Operators Proximal techniques

More information

Distributed Optimization: Analysis and Synthesis via Circuits

Distributed Optimization: Analysis and Synthesis via Circuits Distributed Optimization: Analysis and Synthesis via Circuits Stephen Boyd Prof. S. Boyd, EE364b, Stanford University Outline canonical form for distributed convex optimization circuit intepretation primal

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Proximal splitting methods on convex problems with a quadratic term: Relax!

Proximal splitting methods on convex problems with a quadratic term: Relax! Proximal splitting methods on convex problems with a quadratic term: Relax! The slides I presented with added comments Laurent Condat GIPSA-lab, Univ. Grenoble Alpes, France Workshop BASP Frontiers, Jan.

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent

Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent Homework 5 ADMM, Primal-dual interior point Dual Theory, Dual ascent CMU 10-725/36-725: Convex Optimization (Fall 2017) OUT: Nov 4 DUE: Nov 18, 11:59 PM START HERE: Instructions Collaboration policy: Collaboration

More information

Linear programming II

Linear programming II Linear programming II Review: LP problem 1/33 The standard form of LP problem is (primal problem): max z = cx s.t. Ax b, x 0 The corresponding dual problem is: min b T y s.t. A T y c T, y 0 Strong Duality

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Primal-dual coordinate descent

Primal-dual coordinate descent Primal-dual coordinate descent Olivier Fercoq Joint work with P. Bianchi & W. Hachem 15 July 2015 1/28 Minimize the convex function f, g, h convex f is differentiable Problem min f (x) + g(x) + h(mx) x

More information

Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization /36-725

Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization /36-725 Nonconvex? NP! (No Problem!) Ryan Tibshirani Convex Optimization 10-725/36-725 1 Beyond the tip? 2 Some takeaway points If possible, formulate task in terms of convex optimization typically easier to solve,

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 XI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.11 Alternating direction methods of multipliers for separable convex programming Bingsheng He Department of Mathematics

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time

Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time Zhaoran Wang Huanran Lu Han Liu Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08540 {zhaoran,huanranl,hanliu}@princeton.edu

More information

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization Duality Uses and Correspondences Ryan Tibshirani Conve Optimization 10-725 Recall that for the problem Last time: KKT conditions subject to f() h i () 0, i = 1,... m l j () = 0, j = 1,... r the KKT conditions

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

Optimization in Big Data

Optimization in Big Data Optimization in Big Data Ethan Xingyuan Fang University Park PA, June, 2018 1 / 99 Outline 1 Linear Programming 2 Convex Optimization 3 Stochastic First-Order Methods 4 ADMM 2 / 99 What is Linear Programming?

More information

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent

The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent The Direct Extension of ADMM for Multi-block Convex Minimization Problems is Not Necessarily Convergent Yinyu Ye K. T. Li Professor of Engineering Department of Management Science and Engineering Stanford

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers

Decentralized Quadratically Approximated Alternating Direction Method of Multipliers Decentralized Quadratically Approimated Alternating Direction Method of Multipliers Aryan Mokhtari Wei Shi Qing Ling Alejandro Ribeiro Department of Electrical and Systems Engineering, University of Pennsylvania

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions

Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

An ADMM algorithm for optimal sensor and actuator selection

An ADMM algorithm for optimal sensor and actuator selection An ADMM algorithm for optimal sensor and actuator selection Neil K. Dhingra, Mihailo R. Jovanović, and Zhi-Quan Luo 53rd Conference on Decision and Control, Los Angeles, California, 2014 1 / 25 2 / 25

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information