Low Complexity Regularization 1 / 33

Size: px
Start display at page:

Download "Low Complexity Regularization 1 / 33"

Transcription

1 Low Complexity Regularization 1 / 33

2 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models

3 Sparse representations Sparse signal: only K out of N coordinates nonzero Sparse representations: sparse transform coefficients A fundamental impact:

4

5 Recommendation systems

6 Recommendation systems Machine learning competition with a $1 million prize

7 Backround extraction

8 Basis pursuit min{ x 1 : Ax = b} find least l 1-norm point on the affine plane {x : Ax = b} tends to return a sparse point (sometimes, the sparsest) l 1 ball touches the affine plane 4 / 33

9 Basis pursuit denoising, LASSO min x { Ax b 2 : x 1 τ}, (1a) min x x 1 + µ 2 Ax b 2 2, (1b) min x { x 1 : Ax b 2 σ}. (1c) all models allow Ax b 5 / 33

10 Basis pursuit denoising, LASSO min x { Ax b 2 : x 1 τ}, (2a) min x x 1 + µ 2 Ax b 2 2, (2b) min x { x 1 : Ax b 2 σ}. (2c) 2 is most common for error but can be generalized to loss function L (2a) seeks for a least-squares solution with bounded sparsity (2b) is known as LASSO (least absolute shrinkage and selection operator). it seeks for a balance between sparsity and fitting (2c) is referred to as BPDN (basis pursuit denoising), seeking for a sparse solution from tube-like set {x : Ax b 2 σ} they are equivalent (see later slides) in terms of regression, they select a (sparse) set of features (i.e., columns of A) to linearly express the observation b 6 / 33

11 Sparse under basis Ψ / l 1 -synthesis model min s { s 1 : AΨs = b} (3) signal x is sparsely synthesized by atoms from Ψ, so vector s is sparse Ψ is referred to as the dictionary commonly used dictionaries include both analytic and trained ones analytic examples: Id, DCT, wavelets, curvelets, gabor, etc., also their combinations; they have analytic properties, often easy to compute (for example, multiplying a vector takes O(n log n) instead of O(n 2 )) Ψ can also be numerically learned from training data or partial signal they can be orthogonal, frame, or general 7 / 33

12 Sparse under basis Ψ / l 1 -synthesis model If Ψ is orthogonal, problem (3) is equivalent to by change of variable x = Ψs, equivalently s = Ψ x. Related models for noise and approximate sparsity: min x { Ψ x 1 : Ax = b} (4) min{ Ax b 2 : Ψ x 1 τ}, x min x Ψ x 1 + µ 2 Ax b 2 2, min{ Ψ x 1 : Ax b 2 σ}. x 8 / 33

13 Sparse after transform / l 1 -analysis model min x { Ψ x 1 : Ax = b} (5) Signal x becomes sparse under the transform Ψ (may not be orthogonal) Examples of Ψ: DCT, wavelets, curvelets, ridgelets,... tight frames, Gabor,... (weighted) total variation When Ψ is not orthogonal, the analysis is more difficult 9 / 33

14 Joint/group sparsity Joint sparse recovery model: min{ X 2,1 : A(X) = b} (6) X where m X 2,1 := [x i1 x i,2 x in] 2. i=1 l 2-norm is applied to each row of X l 2,1-norm ball has sharp boundaries across different rows, which tend to be touched by {X : A(X) = b}, so the solution tends to be row-sparse also X p,q for 1 < p, affects magnitudes of entries on the same row complex-valued signals are a special case 13 / 33

15 Joint/group sparsity Decompose {1,..., n} = G 1 G 2 G S. non-overlapping groups: G i G j =, i j. otherwise, groups may overlap (modeling many interesting structures). Group-sparse recovery model: where min x { x G,2,1 : Ax = b} (7) x G,2,1 = S w s x Gs 2. s=1 14 / 33

16 Auxiliary constraints Auxiliary constraints introduce additional structures of the underlying signal into its recovery, which sometimes significantly improve recovery quality nonnegativity: x 0 bound (box) constraints: l x u general inequalities: Qx q They can be very effective in practice. They also generate corners. 15 / 33

17 Nuclear-norm minimization min{ X : A(X) = b} (8) We can also model min X { X : A(X) b F σ} min X { A(X) b F : X τ} min X µ X + 1 A(X) 2 b 2 F 26 / 33

18 Questions 1. Can we trust these models to return Low complexity solutions? 2. When will the solution be unique? 3. Will the solution be robust to noise? 4. How to compute? 5. How to quality uncertainty? 6. How to understanding Low complexity in a unified framework? Including: Sparse, Low Rank, Sparse+Low Rank, Sign vector, Vector from a list, permutation matrices, Mtrices constrained by egienvalues, Orthogonal matrices, Measrues / 33

19 Linear representation of low-dimensional models A key notion in sparse representation synthesis of the signal using a few vectors A slightly different mathematical formalism for generalization Synthesis model: i.e., linear (positive) combination of elements from an atomic set

20 Linear representation of low-dimensional models A key notion in sparse representation synthesis of the signal using a few vectors Sparse representations via the atomic formulation Example:

21 Linear representation of low-dimensional models Basic definitions on low-dimensional atomic representations

22 Linear representation of low-dimensional models Basic definitions on low-dimensional atomic representations : convex hull of atoms in A

23 Linear representation of low-dimensional models Basic definitions on low-dimensional atomic representations : convex hull of atoms in A atomic ball

24 Linear representation of low-dimensional models Basic definitions on low-dimensional atomic representations : convex hull of atoms in A : atomic norm* *: requires A to be centrally symmetric

25 Linear representation of low-dimensional models Basic definitions on low-dimensional atomic representations : convex hull of atoms in A : atomic norm* *: requires A to be centrally symmetric

26 Linear representation of low-dimensional models Basic definitions on low-dimensional atomic representations : convex hull of atoms in A : atomic norm* Alternative: *: requires A to be centrally symmetric

27 Linear representation of low-dimensional models Examples with easy forms: sparse vectors low-rank matrices * *symmetric matrices binary vectors

28 Linear representation of low-dimensional models Examples with easy forms: sparse vectors Examples with no-so-easy forms: A : infinite set of unit-norm rank-one tensors low-rank matrices A : finite (but large) set of permutation matrices A : infinite set of orthogonal matrices A : infinite set of matrices constrained by eigenvalues A : infinite set of measures binary vectors A : finite (but large) set of cut matrices

29 A Geometrical Approach = let s turn to the blackboard!

30 A geometric perspective Other key concepts:

31 A geometric perspective Other key concepts:

32 A geometric perspective Other key concepts: Tangent cone is the set of descent directions where you do not increase the atomic norm.

33 A geometric perspective Other key concepts: Tangent cone is the set of descent directions where you do not increase the atomic norm.

34 A geometric perspective

35 A geometric perspective

36 A geometric perspective Consider the criteria:

37 A geometric perspective Consider the criteria:

38 A geometric perspective Consider the criteria:

39 A geometric perspective Consider the criteria:

40 A geometric perspective Key observation:

41 A geometric perspective How about noise?

42 A geometric perspective How about noise? Stability assumption:

43 A geometric perspective How about noise? Stability assumption:

44 A geometric perspective How about noise? Stability assumption: want epsilon large to minimize overlap between and For this 2D example: Matlab notation

45 A geometric perspective How about noise? Stability assumption:

46 A geometric perspective Can we guarantee the following?* *without knowing

47 A geometric perspective Can we guarantee the following?* YES: with randomized measurements! Gordon s Minimum Restricted Singular Values Theorem has a probabilistic characterization. probabilistic deterministic [Gordon 1988] *without knowing

48 A geometric perspective Can we guarantee the following?* Gordon s Minimum Restricted Singular Values Theorem has a probabilistic characterization. Key concept: width of the tangent cone! *without knowing

49 A geometric perspective Can we guarantee the following?* Gordon s Minimum Restricted Singular Values Theorem has a probabilistic characterization. *without knowing

50 A geometric perspective Can we guarantee the following?* Gordon s Minimum Restricted Singular Values Theorem has a probabilistic characterization. *without knowing

51 A geometric perspective Can we guarantee the following?* Gordon s Minimum Restricted Singular Values Theorem has a probabilistic characterization. *without knowing

52 A geometric perspective Key observation:

53 A geometric perspective How about noise? Stability assumption:

54 A geometric perspective Can we guarantee the following?* Gordon s Minimum Restricted Singular Values Theorem has a probabilistic characterization. *without knowing

55 A geometric perspective Can we guarantee the following?* *without knowing 1-sparse and 1-random measurement

56 A geometric perspective Can we guarantee the following?* *without knowing 1-sparse and 1-random measurement

57 A geometric perspective Can we guarantee the following?* *without knowing 1-sparse and 1-random measurement

58 A geometric perspective Can we guarantee the following?* A projected 6D hypercube with 64 vertices Blessing-of-dimensionality!

59 A geometric perspective Pop-quiz: What is the probability that we can determine a 2-sparse x* with 1-random measurement?

60 A geometric perspective Pop-answer: Tangent cone is too wide! Need at least 2 measurements!

61 Take home messages convex polytope <> atomic norm geometry (and algebra) of representations in high dimensions geometric perspective <> convex criteria convex optimization algorithms in high dimensions tangent cone width <> # of randomized samples probabilistic concentration-of-measures in high dimensions

62

63 three and do this fast with theoretical guarantees

64 Convex optimization and proximal algorithms bx 2 arg min x2r N f 1 (x) + f 2 (x) f 1 : R N! R data fidelity term; convex, smooth. typically: f 2 : R N! ¹ R = R [ f+1g Convex regularizer (maybe non-smooth; e.g. `1 ) (non-convex, later ). Difficulties: non-smoothness and large dimension ( N À 1)

65 Convex and strictly convex sets S is convex if x; x 0 2 S ) 8 2 [0; 1] x + (1 )x 0 2 S x S x 0 convex x S x 0 non-convex S is strictly convex if x; x 0 2 S ) 8 2 (0; 1) x + (1 )x 0 2 int(s) x S x convex, but not strictly x 0 strictly convex x 0

66 Convex and strictly convex functions Extended real valued function: f : R N! ¹ R = R [ f+1g Domain of a function: dom(f) = fx : f(x) 6= +1g f is a convex function if f is a strictly convex function if 8 2 (0; 1); x; x 0 2 dom(f) f( x + (1 )x 0 ) < f(x) + (1 )f(x 0 ) non-convex convex strictly convex convex, not strictly

67 Convexity, coercivity, and minima f is coercive if f f : R N! ¹ R = R [ f+1g lim f(x) = +1 kxk!+1 G arg min f(x) if is coercive, then is a non-empty set x if f is strictly convex, then G has at most one element coercive and strictly convex coercive, not strictly convex convex, not coercive x G = fx g f G G = ;

68 Euclidean projections on convex sets bx 2 arg min f 1(x) + f 2 (x) x2r n ½ 0 ( x 2 S consider f 2 (x) = S (x) = +1 ( x 62 S Our problem: (convex if S is convex) and f 1 (x) = 1 2 ku xk2 2 (strictly convex) S z = P S (z) P S (u) u bx = arg min f 1(x) + f 2 (x) x2r n = arg min ku x2s xk2 2 P S (u) (Euclidean projection)

69 Projected gradient algorithm Our problem: bx 2 arg min x2r n f 1(x) + f 2 (x) f 2 (x) = S (x) with ( is a convex set) and S f 1 some function, e.g., f 1 (x) = 1 2 k x uk2 2 Projected gradient algorithm: x k+1 = P S ³x k k rf 1 (x k ) if f 1 (x) = 1 2 k x uk2 2 step size x k+1 = P S ³x k k T ( x k u)

70 Detour: majorization-minimization (MM) Problem: bx 2 arg min x2r n f(x) Q(x; x k ) is a majorizer of f at x k Q(x; x k ) f(x); Q(x k ; x k ) = f(x k ) f(x) Q(x; x k ) Q(x; x k+1 ) MM algorithm: x k+1 = arg min Q(x; x k ) x monotonicity: x k x k+2 x k+1

71 Projected gradient from majorization-minimization Our problem: bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 (x) = S (x) with ( is a convex set) and f 1 has L -Lipschitz gradient S e.g. f 1 (x) = 1 2 k x uk2 2 ) L = max( T ) = k k a separable approximation of f 1 Hessian of f 1 Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) k kx x k k 2 2

72 Projected gradient from majorization-minimization Our problem: Separable approximation of f 1 bx 2 arg min x2r n f 1(x) + S (x) Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) + 1 Q(x; x k ) is a majorizer of f 1, if k < 1 L 2 k kx x k k 2 2 Q(x; x k ) + S (x) is a majorizer f 1 (x) + S (x) MM algorithm: x k+1 = arg min Q(x; x k ) + S (x) x 1 = arg min x x k krf 1 (x k ) 2 x 2 k 2 = P S ³x k k rf 1 (x k ) + S (x) projected gradient.

73 Proximity operators Our problem: with bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 a convex function and f 1 (x) = 1 2 ku xk2 2 (strictly convex) 1 bx = arg min x2r n 2 ku xk2 2 + f 2(x) prox f2 (u) Proximity operator [Moreau 62], [Combettes 01]. Generalizes the notion of Euclidean projection.

74 Proximity operators (linear) 1 prox f (u) = arg min x2r n 2 ku xk2 2 + f(x) (R N! R N ) Classical cases: squared `2 regulizer f(x) = 2 kxk2 2 1 prox f (u) = arg min x2r n 2 ku xk kxk2 2 = u 1 + squared regularizer with analysis operator f(x) = `2 2 kdxk2 2 1 prox f (u) = arg min x2r n 2 ku xk kdxk2 2 = (I + D T D) 1 u if D is a circulant matrix, O(N log N) cost using the FFT

75 Proximity operator of the norm 1 prox k k1 (u) = arg min x2r n 2 ku xk2 2 + kxk 1 Separable: solve w.r.t. each component: min x jxj + 0:5(x u) 2 Possible approach: write jxj = max jzj 1 zx min x max zx + 0:5(x jzj 1 u)2 = max min zx + 0:5(x u)2 jzj 1 x arg max jzj 1 0:5 2 z 2 + zu = `1 x = u z = max 0:5 2 z 2 + zu (for ) jzj 1 8 < : u= ( juj 1 ( u > 1 ( u <

76 Proximity operator of the `1 norm: the soft soft thresholding soft(u; ) = sign(u) maxf0; juj g soft(u; ) = prox j j (for vectors, soft(u; ) is applied component-wise) p -th power of `p closed form prox for norms [Combettes, Wajs, 2005] kxk p p = X j[x] i j p i p 2 ½1; 2; 43 ; 32 ¾ ; 3; 4

77 Dual norms, proximity operators, and projections Dual norm: some norm, k k : R N! R + its dual norm: kxk = max hx; zi kzk 1 1 Dual norm of k k p is k k q, where p + 1 q = 1 Hölder conjugates simple corollary of Hölder s inequality: Examples of Hölder conjugates: (2; 2); (1; +1); (3=2; 3); ::: These concepts are related through: prox k k (u) = u P fx:kxk 1g(u) [Combettes, Wajs, 2005]

78 Dual norms, proximity operators, and projections prox k k (u) = u P fx:kxk g(u) This relation underlies our earlier derivation of prox k k1 prox k k1 (u) = u P fx:kxk1 g(u) It s all separable, prox j j (u) = u P fx:jxj g (u) kxk 1 = maxfj[x] i jg P j j (z) soft(u; ) = soft(u; )

79 Dual norms, proximity operators, and projections prox k k (u) = u P fx:kxk g(u) This relation allows deriving prox k k1 and prox k k2 prox k k1 (u) = u P fx:kxk1 g(u) prox k k2 (u) = u P fx:kxk2 g(u) projection on the ball of radius O(n log n) `1 prox P(u) u = u kuk 2 maxf0; kuk 2 g vector soft thresholding

80 Proximity operators of atomic norms prox k k (u) = u P fx:kxk g(u) These relation allows deriving prox operators of atomic norms: kxk A = infft > 0 : x 2 t conv(a)g The dual of an atomic norm ball: kxk A = max hz; xi = max kzk A 1 P fx:kxk A g(u) = arg prox k ka (u) = u arg hz; xi z2conv(a) = maxfha; xi; a 2 Ag min ha;xi ; 8 a2a ku xk 2 2 min ha;xi ; 8 a2a ku xk 2 2

81 Proximity operators of atomic norms: `1 Deriving prox k k1 kxk 1 = kxk A A = e 1 e 2 e 1 e 2 jaj = 2 N from the atomic norm view >< ; ; : : : ; ; ; : : : ; 6 4 >: = fe 1 ; e 2 ; :::; e N ; e 1 ; :::; e N g >= 7 5 >; kxk A = maxfha; xi; a 2 Ag = maxfj[x] i jg = kxk 1 prox k k1 (u) = u P fx:kxk1 g(u) = soft(x; )

82 Proximity operators of atomic norms: `1 Deriving prox k k1 from the atomic norm view >< ; ; ; ; : : : ; 1 >= >: >; kxk 1 = kxk A A = = f 1; +1g N jaj = 2 N N kxk A = maxfha; xi; a 2 Ag = X j[x] i j = kxk 1 i=1 prox k k1 (u) = u P fx:kxk1 g(u)

83 Proximity of atomic norms: matrix nuclear norm Matrix nuclear norm: kxk = X i ¾ i (X) = X i q i(x T X) kxk = kxk A A = fz : rank(z) = 1; kzk F = 1g rank(z) = jf¾ i (Z) 6= 0gj Frobenius norm kzk 2 F = X ij [Z] 2 ij = X i ¾ 2 i (Z) kxk A = maxfhz; Xi; Z 2 Ag ( X = max ¾ i (Z)¾ i (X); rank(z) = 1; X i i = ¾ max (X) = kxk 2 spectral norm ¾ 2 i (Z) = 1 )

84 Proximity of atomic norms: matrix nuclear norm Euclidean matrix projection: P S (X) = arg min Z2S kz Xk2 F Note: for any unitary matrix U (U T U = I; UU T = I) kumk 2 F = trace M T U T UM = trace M T A = kmk 2 F prox k k (X) = X P fz:kzk2 g(x) singular value diagonal matrix = U V T P fz:¾max (Z) g(u V T ) [Lewis, Malick, 2009] = Udiag diag( ) P fx:kxk1 g(diag( )) V T = Usoft( ; )V T singular value thresholding (svt)

85 Proximity of atomic norms: matrix spectral norm Matrix spectral norm: kxk 2 = ¾ max (X) kxk 2 = kxk A A = fz : Z T Z = Ig = fz : ¾ i (Z) = 1; 8 i g kxk A = maxfhz; Xi; Z 2 Ag ( ) X = max ¾ i (Z)¾ i (X); ¾ i (Z) = 1; 8 i i orthogonal matrices = X i ¾ i (X) = kxk nuclear norm

86 Proximity of atomic norms: matrix spectral norm prox k k2 (X) = X P fz:kzk g(x) = U V T P fz:kzk g(u V T ) singular value diagonal matrix = U P fz: P i ¾ i(z) g( ) V T = Udiag diag( ) P fx:kxk1 g(diag( )) V T residual of projection of the singular values on an `1 ball of radius

87 Proximity and atomic sets: vectors vs matrices norm vectors prox matrices atomic set norm prox atomic set `1 kxk 1 component soft thresholding A = f e i g nuclear jaj = 2 N kxk singular value thresholding A = set of all rank 1, norm 1 matrices `1 kxk 1 residual of projection on `1 ball A = f 1g N spectral jaj = 2 N kxk 2 residual of s.v. proj. on `1 ball A = set of all orthogonal matrices `2 kxk 2 vector soft thresholding A = set of all vectors with norm 1 jaj = 1 Frobenius kxk F matrix soft threshold. A = all matrices of unit Frobenius norm.

88 Proximal algorithms Back to the problem: with bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 a proper convex function f 1 L f 1 (x) = 1 2 k x uk2 2 and has a -Lipschitz gradient; e.g. with L = max ( ) separable majorizer ( k < 1=L ) Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) + 1 kx x k k k majorization-minimization algorithm x k+1 = arg min Q(x; x k ) + f 2 (x) x 1 = arg min x x k krf 1 (x k ) 2 x 2 k 2 x k+1 = prox k f 2 ³x k k rf 1 (x k ) + f 2 (x)

89 Proximal algorithms: convergence Problem: bx 2 arg min x2r n f 1(x) + f 2 (x) f(x) f 1 has a L -Lipschitz gradient; e.g. f 1 (x) = 1 2 k x uk2 2 Iterative shrinkage/thresholding (IST) (or forward-backward) L = max ( ) x k+1 = prox kf 2 ³x k k rf 1 (x k ) if k < 1 L, IST is a majorization-minimization algorithm, thus f(x) 0, thus (f(x 1 ); f(x 2 ); :::; f(x k ); :::) converges. Attention: this does not imply convergence of (x 1 ; :::; x k ; :::)

90 Proximal algorithms: convergence bx 2 G = arg min f 1(x) + f 2 (x) x2r n IST algorithm: x k+1 = prox kf 2 ³x k k rf 1 (x k ) 0 < k < 2 L if, then (x 1 ; x 2 ; :::; x k ; :::) errors x k+1 = prox kf 2 ³x k ( k rf 1 (x k ) + b k ) + a k Inexact version: converges to a point in convergence still guaranteed if 1X ka k k < 1 G 1X kb k k < 1 k=1 k=1 Results and proofs in [Combettes and Wajs, 2005]

91 Proximal algorithms: convergence ) Convergence of function values (f(x 1 ); :::; f(x k ); :::)! f(bx) Convergence of iterates ) (x 1 ; x 2 ; :::; x k ; :::)! bx Convergence rates (for function values) [Beck, Teboulle, 2009]: Convergence rate for the iterates require further assumptions on f

92 Proximal algorithms: convergence of iterates 1 bx = arg min x 2 k x uk2 2 + f 2(x) With L = max ( ) l = min ( ) > 0 = l=l (condition number) ) G = fbxg (unique minimizer) Under- ( < 1 ) or over-relaxed ( > 1) IST x k+1 = (1 )x k + prox f2 ³x k T ( x k u) Optimal choice = 2 L + l Q-linear convergence Small l ) ½. 1 ) slow convergence! [F, Bioucas-Dias, 2007]

93 Proximal algorithms: convergence of iterates With 1 bx 2 G = arg min x 2 k x uk2 2 + kxk 1 L = max ( ) ; using a step-size < 2=L; ³ x k+1 = soft x k T ( x k u); Z µ f1; 2; :::; ng such that bx 2 G ) [bx] Z = 0 Then, after a finite number of iterations: [x k ] Z = [bx] Z = 0 After this, Q-linear convergence: Optimal choice = 2 L + l ; l = min ( ¹ Z ¹ Z ) > 0 [Hale, Yin, Zhang, 2008]

94 Slowness and acceleration of IST Problem: IST algorithm: 1 bx 2 G = arg min x 2 k x uk2 2 + kxk 1 ³ x k+1 = soft x k T ( x k u); IST is slow, if is very ill-conditioned and/or is very small! Several proposals for accelerated variants of IST Methods with memory (TwIST, FISTA) Quasi-Newton methods (SpaRSA) Continuation, i.e., use a varying (FPC, SpaRSA)

95 Memory-based variants of IST: FISTA Fast IST algortihm (FISTA); based on Nesterov s work (1980 s) [Beck, Teboulle, 2009] FISTA t k+1 = 1 + p t 2 k 2 z k+1 = x k + t k 1 (x k x k 1 ) t k+1 ³ x k+1 = soft z k T ( z k u); IST: FISTA:

96 Memory-based variants of IST: twist Inspired by 2-step methods for linear systems [Frankel, 1950], [Axelsson, 1996] TwIST (two-step IST): [Bioucas-Dias, F, 2007] x k+1 = ( )x k + (1 )x k 1 + prox f2 xk T ( x k u) TwIST Q-linear convergence IST

97 Memory-based variants of IST: twist objective function SNR original B Blurred ( ), 9x9, 40db noise restored 1 bx 2 arg min x2r n 2 kbªx uk2 2 + kxk 1 representation coefficients dictionary (e.g, wavelet basis, frame,...) IST over-relaxed IST TwIST = =1 = 0 Second order full TwIST iterations TwIST over-relaxed IST = =1 = 0 Second order full TwIST IST iterations

98 Quasi-newton acceleration of IST: SpaRSA IST: x k+1 = prox kf 2 ³x k k rf 1 (x k ) A Newton step (instead of gradient descent) would be: x k+1 = prox kf 2 ³x k [H(x k )] 1 rf 1 (x k )...computationally too expensive! Barzilai-Borwein approach: [Barzilai-Borwein, 1988], [Wright, Nowak, F, 2009] 1 k Hessian (matrix of second derivatives) = arg min k (x k x k 1 ) (rf(x k ) rf(x k 1 )k 2 2 f 1 (x) = 1 If 2 k x uk2 2, then k = kx k x k 1 k 2 2 k (x k x k 1 )k k I ' H(x k )

99 Acceleration via continuation IST: ³ x k+1 = soft x k T ( x k u); Slow, if is small. Observation: IST (as SpaRSA) benefits from warm-starting (being initialized close to the minimizer) Continuation: start with large slowly decrease while tracking the solution. [F, Nowak, Wright, 2007], [Hale, Yin, Zhang, 2007] IST + continuation = fixed point continuation (FPC) [Hale, Yin, Zhang, 2007]

100 Acceleration via continuation 1 bx 2 G = arg min x 2 k x uk2 2 + kxk u = x + n max = k T yk 1 ( max ) bx = 0 )

101 Some speed comparisons from [Lorenz, 2011] = [I U R] ( ) bx 1 bx = arg min x 2 k x uk2 2 + kxk 1 with 120 non-zeros IST

102 Proximal algorithms for matrices cm 2 arg X k+1 = svt ¹ k 1 min M2R n n 2 k (M) Uk2 F + ¹kMk The proximal algorithm (IST) is as before: ³ X k k ( (X k ) U) linear operator...its adjoint Matrix completion: (X) = X (subset of entries) IST APG (FISTA) FPC (continuation) APG + continuation from [Toh, Yun, 2009]...the importance of acceleration!

103 Another class of methods: augmented Lagrangian The problem: min x s.t. f(x) x = u The augmented Lagrangian (AL) Penalty parameter L ¹ (x; ) = f(x) + T ( x u) + ¹ 2 k x uk2 2 The AL method (ALM) (a.k.a. method of multipliers) [Hestenes, Powell, 1969] x k+1 = arg min L ¹ (x; k) x k+1 = k + ¹( x k+1 u) Can be written as: x k+1 = arg min f(x) + ¹ x 2 k x u d kk 2 2 d k+1 = d k ( x k+1 u) Similar to Bregman method [Osher, Burger, Goldfarb, Xu, Yin, 2005] [Yin, Osher, Goldfarb, Darbon, 2008]

104 Augmented Lagrangian for variable splitting The problem: min x Equivalent constrained formulation Can be written as ALM: f 1 ( x) + f 2 (x) min y f(y) s.t. ªy = 0 min x f 1 (z) + f 2 (x) s.t. x z = 0 (x k+1 ; z k+1 ) = arg min x;z f 1(z) + f 2 (x) + ¹ 2 k x z d kk 2 2 d k+1 = d k ( x k+1 z k+1 ) with y = x z ª = [ I]

105 Augmented Lagrangian for variable splitting It may be hard to solve (x k+1 ; z k+1 ) = arg min x;z f 1(z) + f 2 (x) + ¹ 2 k x z d kk 2 2 Alternative: x k+1 = arg min x f 2 (x) + ¹ 2 k x z k d k k 2 2 z k+1 = arg min z f 1 (z) + ¹ 2 k x k+1 z d k k 2 2 d k+1 = d k ( x k+1 z k+1 ) Alternating directions method of multipliers (ADMM) [Glowinsky, Marrocco, 1975], [Gabay, Mercier, 1976], [Eckstein, Bertsekas, 1992] When applied to bx = arg min 2 k x uk2 2 + kxk 1 split augmented Lagrangian shrinkage algorithm (SALSA) [F, Bioucas-Dias, Afonso, 2009] x 1

106 Augmented Lagrangian for variable splitting Testing ADMM/SALSA on a typical image deblurring problem blurred restored bx 2 arg min x2r n 1 2 kbªx uk2 2 + kxk 1 restored 2 Objective function function 0.5 y-ax 2 + TV(x) TwIST FISTA SpaRSA SALSA seconds CPU time

107 Handling more than two functions bx 2 arg min f 0(x) + f 1 (x) + + f n (x) x2r n f 0 has a L-Lipschitz gradient f 1 ; :::; f n are convex Possible uses: multiple regularizers, positivity constraints,... Generalized forward-backward algorithm [Raguet, Fadili, Peyré, 2011] Parameters:! 1 ; :::;! n 2 (0; 1); s.t. P j! j = 1 k = 0; z0 1; :::; zn 0 ; x 0 = P n j=1! j z j 0 repeat until convergence for i = 1 : n zk+1 i = zk i + prox kf i =! i ³2 x k zk i k rf 1 (x k ) x k Initialization: x k+1 = P n i=1! i z i k+1 k à k + 1

108 Handling more than two functions bx 2 arg min x2r n f 1(x) + + f n (x) f 1 ; :::; f n arbitrary convex functions ADMM-based method [F and Bioucas-Dias, 2009], [Setzer, Steidl, Teuber, 2009] Parameter: Initialization: k = 0; z 1 0 ; :::; zn 0 ; y1 0 ; :::; yn 0 repeat until convergence x k+1 = (1=n) P n i=1 (yi k zi k ) for i = 1 : n yk+1 i = prox fi xk zk i zk+1 i = zk i + x k yk+1 i k à k + 1

Winter Conference in Statistics Compressed Sensing. LECTURE #7-8 Algorithms for low-dimensional models

Winter Conference in Statistics Compressed Sensing. LECTURE #7-8 Algorithms for low-dimensional models Winter Conference in Statistics 2013 Compressed Sensing LECTURE #7-8 Algorithms for low-dimensional models lions@epfl Prof. Dr. Volkan Cevher volkan.cevher@epfl.ch LIONS/Laboratory for Information and

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Lecture: Algorithms for Compressed Sensing

Lecture: Algorithms for Compressed Sensing 1/56 Lecture: Algorithms for Compressed Sensing Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Sparsity and Compressed Sensing

Sparsity and Compressed Sensing Sparsity and Compressed Sensing Jalal Fadili Normandie Université-ENSICAEN, GREYC Mathematical coffees 2017 Recap: linear inverse problems Dictionary Sensing Sensing Sensing = y m 1 H m n y y 2 R m H A

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

ADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution

ADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution ADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution Mário A. T. Figueiredo Instituto Superior Técnico, University of Lisbon, Portugal Instituto de Telecomunicações Lisbon, Portugal Joint

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Alternating Direction Optimization for Imaging Inverse Problems

Alternating Direction Optimization for Imaging Inverse Problems Alternating Direction Optimization for Imaging Inverse Problems Mário A. T. Figueiredo Instituto Superior Técnico, and Instituto de Telecomunicações Technical University of Lisbon PORTUGAL Joint work with:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS

SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS SEMI-SMOOTH SECOND-ORDER TYPE METHODS FOR COMPOSITE CONVEX PROGRAMS XIANTAO XIAO, YONGFENG LI, ZAIWEN WEN, AND LIWEI ZHANG Abstract. The goal of this paper is to study approaches to bridge the gap between

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived

More information

Convex Optimization in Machine Learning and Inverse Problems Part 2: First-Order Methods

Convex Optimization in Machine Learning and Inverse Problems Part 2: First-Order Methods Convex Optimization in Machine Learning and Inverse Problems Part 2: First-Order Methods Mário A. T. Figueiredo 1 and Stephen J. Wright 2 1 Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa,

More information

A Tutorial on Primal-Dual Algorithm

A Tutorial on Primal-Dual Algorithm A Tutorial on Primal-Dual Algorithm Shenlong Wang University of Toronto March 31, 2016 1 / 34 Energy minimization MAP Inference for MRFs Typical energies consist of a regularization term and a data term.

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Contraction Methods for Convex Optimization and monotone variational inequalities No.12

Contraction Methods for Convex Optimization and monotone variational inequalities No.12 XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department

More information

Minimizing Isotropic Total Variation without Subiterations

Minimizing Isotropic Total Variation without Subiterations MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples Agenda Fast proximal gradient methods 1 Accelerated first-order methods 2 Auxiliary sequences 3 Convergence analysis 4 Numerical examples 5 Optimality of Nesterov s scheme Last time Proximal gradient method

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

ParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals

ParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals Preprint manuscript No. (will be inserted by the editor) ParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals Ming Gu ek-heng im Cinna Julie Wu Received:

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

Variational Image Restoration

Variational Image Restoration Variational Image Restoration Yuling Jiao yljiaostatistics@znufe.edu.cn School of and Statistics and Mathematics ZNUFE Dec 30, 2014 Outline 1 1 Classical Variational Restoration Models and Algorithms 1.1

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory

Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Beyond Heuristics: Applying Alternating Direction Method of Multipliers in Nonconvex Territory Xin Liu(4Ð) State Key Laboratory of Scientific and Engineering Computing Institute of Computational Mathematics

More information

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants

Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants 53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants Panagiotis Patrinos and Lorenzo Stella

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen

Numerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen

More information

A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration

A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration J Sci Comput (2011) 46: 20 46 DOI 10.1007/s10915-010-9408-8 A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration Xiaoqun Zhang Martin Burger Stanley Osher Received: 23 November 2009 / Revised:

More information

Tight Rates and Equivalence Results of Operator Splitting Schemes

Tight Rates and Equivalence Results of Operator Splitting Schemes Tight Rates and Equivalence Results of Operator Splitting Schemes Wotao Yin (UCLA Math) Workshop on Optimization for Modern Computing Joint w Damek Davis and Ming Yan UCLA CAM 14-51, 14-58, and 14-59 1

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

A priori bounds on the condition numbers in interior-point methods

A priori bounds on the condition numbers in interior-point methods A priori bounds on the condition numbers in interior-point methods Florian Jarre, Mathematisches Institut, Heinrich-Heine Universität Düsseldorf, Germany. Abstract Interior-point methods are known to be

More information