Winter Conference in Statistics Compressed Sensing. LECTURE #7-8 Algorithms for low-dimensional models

Size: px
Start display at page:

Download "Winter Conference in Statistics Compressed Sensing. LECTURE #7-8 Algorithms for low-dimensional models"

Transcription

1 Winter Conference in Statistics 2013 Compressed Sensing LECTURE #7-8 Algorithms for low-dimensional models Prof. Dr. Volkan Cevher LIONS/Laboratory for Information and Inference Systems

2 Convex Algorithms for Low-Dimensional Models And, this is how you solve huge-dimensional problems

3 The classical problem templates Criteria seen above have the form kxk 1 = NX [x]i i=1 Basis pursuit (BP) [Chen, Donoho, Saunders, 1998] min A x kxk 1 s.t. x = u BP denoising (BPDN): [Chen, Donoho, Saunders, 1998] Also well known: LASSO (least absolute shrinkage/selection operator): [Tibshirani, 1996] All can be written as bx 2 arg min x2r N f 1 (x) + f 2 (x)

4 Convex optimization and proximal algorithms bx 2 arg min x2r N f 1 (x) + f 2 (x) f 1 : R N! R data fidelity term; convex, smooth. typically: f 2 : R N! ¹ R = R [ f+1g Convex regularizer (maybe non-smooth; e.g. `1 ) (non-convex, later ). Difficulties: non-smoothness and large dimension ( N À 1)

5 Constrained vs unconstrained formulations Constrained optimization formulations can be written as using indicator functions: S (x) = bx 2 arg min f 1 (x) + f 2 (x) x2r N ½ 0 ( x 2 S +1 ( x 62 S Example: same as bx 2 arg min x2r N f 1 (x) + fx:g(x) ºg (x) Classical example: the LASSO:

6 Convex and strictly convex sets S is convex if x; x 0 2 S ) 8 2 [0; 1] x + (1 )x 0 2 S x S x 0 convex x S x 0 non-convex S is strictly convex if x; x 0 2 S ) 8 2 (0; 1) x + (1 )x 0 2 int(s) x S x convex, but not strictly x 0 strictly convex x 0

7 Convex and strictly convex functions Extended real valued function: f : R N! ¹ R = R [ f+1g Domain of a function: dom(f) = fx : f(x) 6= +1g f is a convex function if f is a strictly convex function if 8 2 (0; 1); x; x 0 2 dom(f) f( x + (1 )x 0 ) < f(x) + (1 )f(x 0 ) non-convex convex strictly convex convex, not strictly

8 Convexity, coercivity, and minima f is coercive if f f : R N! ¹ R = R [ f+1g lim f(x) = +1 kxk!+1 G arg min f(x) if is coercive, then is a non-empty set x if f is strictly convex, then G has at most one element coercive and strictly convex coercive, not strictly convex convex, not coercive x G = fx g f G G = ;

9 Euclidean projections on convex sets bx 2 arg min f 1(x) + f 2 (x) x2r n ½ 0 ( x 2 S consider f 2 (x) = S (x) = +1 ( x 62 S Our problem: (convex if S is convex) and f 1 (x) = 1 2 ku xk2 2 (strictly convex) S z = P S (z) P S (u) u bx = arg min f 1(x) + f 2 (x) x2r n = arg min ku x2s xk2 2 P S (u) (Euclidean projection)

10 Projected gradient algorithm Our problem: bx 2 arg min x2r n f 1(x) + f 2 (x) f 2 (x) = S (x) with ( is a convex set) and S f 1 some function, e.g., f 1 (x) = 1 2 k x uk2 2 Projected gradient algorithm: x k+1 = P S ³x k k rf 1 (x k ) if f 1 (x) = 1 2 k x uk2 2 step size x k+1 = P S ³x k k T ( x k u)

11 Detour: majorization-minimization (MM) Problem: bx 2 arg min x2r n f(x) Q(x; x k ) is a majorizer of f at x k Q(x; x k ) f(x); Q(x k ; x k ) = f(x k ) f(x) Q(x; x k ) Q(x; x k+1 ) MM algorithm: x k+1 = arg min Q(x; x k ) x monotonicity: x k x k+2 x k+1

12 Projected gradient from majorization-minimization Our problem: bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 (x) = S (x) with ( is a convex set) and f 1 has L -Lipschitz gradient S e.g. f 1 (x) = 1 2 k x uk2 2 ) L = max( T ) = k k a separable approximation of f 1 Hessian of f 1 Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) k kx x k k 2 2

13 Projected gradient from majorization-minimization Our problem: Separable approximation of f 1 bx 2 arg min x2r n f 1(x) + S (x) Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) + 1 Q(x; x k ) is a majorizer of f 1, if k < 1 L 2 k kx x k k 2 2 Q(x; x k ) + S (x) is a majorizer f 1 (x) + S (x) MM algorithm: x k+1 = arg min Q(x; x k ) + S (x) x 1 = arg min x x k krf 1 (x k ) 2 x 2 k 2 = P S ³x k k rf 1 (x k ) + S (x) projected gradient.

14 Proximity operators Our problem: with bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 a convex function and f 1 (x) = 1 2 ku xk2 2 (strictly convex) 1 bx = arg min x2r n 2 ku xk2 2 + f 2(x) prox f2 (u) Proximity operator [Moreau 62], [Combettes 01]. Generalizes the notion of Euclidean projection.

15 Proximity operators (linear) 1 prox f (u) = arg min x2r n 2 ku xk2 2 + f(x) (R N! R N ) Classical cases: squared `2 regulizer f(x) = 2 kxk2 2 1 prox f (u) = arg min x2r n 2 ku xk kxk2 2 = u 1 + squared regularizer with analysis operator f(x) = `2 2 kdxk2 2 1 prox f (u) = arg min x2r n 2 ku xk kdxk2 2 = (I + D T D) 1 u if D is a circulant matrix, O(N log N) cost using the FFT

16 Proximity operator of the norm 1 prox k k1 (u) = arg min x2r n 2 ku xk2 2 + kxk 1 Separable: solve w.r.t. each component: min x jxj + 0:5(x u) 2 Possible approach: write jxj = max jzj 1 zx min x max zx + 0:5(x jzj 1 u)2 = max min zx + 0:5(x u)2 jzj 1 x arg max jzj 1 0:5 2 z 2 + zu = `1 x = u z = max 0:5 2 z 2 + zu (for ) jzj 1 8 < : u= ( juj 1 ( u > 1 ( u <

17 Proximity operator of the `1 norm: the soft soft thresholding soft(u; ) = sign(u) maxf0; juj g soft(u; ) = prox j j (for vectors, soft(u; ) is applied component-wise) p -th power of `p closed form prox for norms [Combettes, Wajs, 2005] kxk p p = X j[x] i j p i p 2 ½1; 2; 43 ; 32 ¾ ; 3; 4

18 Dual norms, proximity operators, and projections Dual norm: some norm, k k : R N! R + its dual norm: kxk = max hx; zi kzk 1 1 Dual norm of k k p is k k q, where p + 1 q = 1 Hölder conjugates simple corollary of Hölder s inequality: Examples of Hölder conjugates: (2; 2); (1; +1); (3=2; 3); ::: These concepts are related through: prox k k (u) = u P fx:kxk 1g(u) [Combettes, Wajs, 2005]

19 Dual norms, proximity operators, and projections prox k k (u) = u P fx:kxk g(u) This relation underlies our earlier derivation of prox k k1 prox k k1 (u) = u P fx:kxk1 g(u) It s all separable, prox j j (u) = u P fx:jxj g (u) kxk 1 = maxfj[x] i jg P j j (z) soft(u; ) = soft(u; )

20 Dual norms, proximity operators, and projections prox k k (u) = u P fx:kxk g(u) This relation allows deriving prox k k1 and prox k k2 prox k k1 (u) = u P fx:kxk1 g(u) prox k k2 (u) = u P fx:kxk2 g(u) projection on the ball of radius O(n log n) `1 prox P(u) u = u kuk 2 maxf0; kuk 2 g vector soft thresholding

21 Proximity operators of atomic norms prox k k (u) = u P fx:kxk g(u) These relation allows deriving prox operators of atomic norms: kxk A = infft > 0 : x 2 t conv(a)g The dual of an atomic norm ball: kxk A = max hz; xi = max kzk A 1 P fx:kxk A g(u) = arg prox k ka (u) = u arg hz; xi z2conv(a) = maxfha; xi; a 2 Ag min ha;xi ; 8 a2a ku xk 2 2 min ha;xi ; 8 a2a ku xk 2 2

22 Proximity operators of atomic norms: `1 Deriving prox k k1 kxk 1 = kxk A A = e 1 e 2 e 1 e 2 jaj = 2 N from the atomic norm view >< ; ; : : : ; ; ; : : : ; 6 4 >: = fe 1 ; e 2 ; :::; e N ; e 1 ; :::; e N g >= 7 5 >; kxk A = maxfha; xi; a 2 Ag = maxfj[x] i jg = kxk 1 prox k k1 (u) = u P fx:kxk1 g(u) = soft(x; )

23 Proximity operators of atomic norms: `1 Deriving prox k k1 from the atomic norm view >< ; ; ; ; : : : ; 1 >= >: >; kxk 1 = kxk A A = = f 1; +1g N jaj = 2 N N kxk A = maxfha; xi; a 2 Ag = X j[x] i j = kxk 1 i=1 prox k k1 (u) = u P fx:kxk1 g(u)

24 Proximity of atomic norms: matrix nuclear norm Matrix nuclear norm: kxk = X i ¾ i (X) = X i q i(x T X) kxk = kxk A A = fz : rank(z) = 1; kzk F = 1g rank(z) = jf¾ i (Z) 6= 0gj Frobenius norm kzk 2 F = X ij [Z] 2 ij = X i ¾ 2 i (Z) kxk A = maxfhz; Xi; Z 2 Ag ( X = max ¾ i (Z)¾ i (X); rank(z) = 1; X i i = ¾ max (X) = kxk 2 spectral norm ¾ 2 i (Z) = 1 )

25 Proximity of atomic norms: matrix nuclear norm Euclidean matrix projection: P S (X) = arg min Z2S kz Xk2 F Note: for any unitary matrix U (U T U = I; UU T = I) kumk 2 F = trace M T U T UM = trace M T A = kmk 2 F prox k k (X) = X P fz:kzk2 g(x) singular value diagonal matrix = U V T P fz:¾max (Z) g(u V T ) [Lewis, Malick, 2009] = Udiag diag( ) P fx:kxk1 g(diag( )) V T = Usoft( ; )V T singular value thresholding (svt)

26 Proximity of atomic norms: matrix spectral norm Matrix spectral norm: kxk 2 = ¾ max (X) kxk 2 = kxk A A = fz : Z T Z = Ig = fz : ¾ i (Z) = 1; 8 i g kxk A = maxfhz; Xi; Z 2 Ag ( ) X = max ¾ i (Z)¾ i (X); ¾ i (Z) = 1; 8 i i orthogonal matrices = X i ¾ i (X) = kxk nuclear norm

27 Proximity of atomic norms: matrix spectral norm prox k k2 (X) = X P fz:kzk g(x) = U V T P fz:kzk g(u V T ) singular value diagonal matrix = U P fz: P i ¾ i(z) g( ) V T = Udiag diag( ) P fx:kxk1 g(diag( )) V T residual of projection of the singular values on an `1 ball of radius

28 Proximity and atomic sets: vectors vs matrices norm vectors prox matrices atomic set norm prox atomic set `1 kxk 1 component soft thresholding A = f e i g nuclear jaj = 2 N kxk singular value thresholding A = set of all rank 1, norm 1 matrices `1 kxk 1 residual of projection on `1 ball A = f 1g N spectral jaj = 2 N kxk 2 residual of s.v. proj. on `1 ball A = set of all orthogonal matrices `2 kxk 2 vector soft thresholding A = set of all vectors with norm 1 jaj = 1 Frobenius kxk F matrix soft threshold. A = all matrices of unit Frobenius norm.

29 Proximal algorithms Back to the problem: with bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 a proper convex function f 1 L f 1 (x) = 1 2 k x uk2 2 and has a -Lipschitz gradient; e.g. with L = max ( ) separable majorizer ( k < 1=L ) Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) + 1 kx x k k k majorization-minimization algorithm x k+1 = arg min Q(x; x k ) + f 2 (x) x 1 = arg min x x k krf 1 (x k ) 2 x 2 k 2 x k+1 = prox k f 2 ³x k k rf 1 (x k ) + f 2 (x)

30 Proximal algorithms: convergence Problem: bx 2 arg min x2r n f 1(x) + f 2 (x) f(x) f 1 has a L -Lipschitz gradient; e.g. f 1 (x) = 1 2 k x uk2 2 Iterative shrinkage/thresholding (IST) (or forward-backward) L = max ( ) x k+1 = prox kf 2 ³x k k rf 1 (x k ) if k < 1 L, IST is a majorization-minimization algorithm, thus f(x) 0, thus (f(x 1 ); f(x 2 ); :::; f(x k ); :::) converges. Attention: this does not imply convergence of (x 1 ; :::; x k ; :::)

31 Proximal algorithms: convergence bx 2 G = arg min f 1(x) + f 2 (x) x2r n IST algorithm: x k+1 = prox kf 2 ³x k k rf 1 (x k ) 0 < k < 2 L if, then (x 1 ; x 2 ; :::; x k ; :::) errors x k+1 = prox kf 2 ³x k ( k rf 1 (x k ) + b k ) + a k Inexact version: converges to a point in convergence still guaranteed if 1X ka k k < 1 G 1X kb k k < 1 k=1 k=1 Results and proofs in [Combettes and Wajs, 2005]

32 Proximal algorithms: convergence ) Convergence of function values (f(x 1 ); :::; f(x k ); :::)! f(bx) Convergence of iterates ) (x 1 ; x 2 ; :::; x k ; :::)! bx Convergence rates (for function values) [Beck, Teboulle, 2009]: Convergence rate for the iterates require further assumptions on f

33 Proximal algorithms: convergence of iterates 1 bx = arg min x 2 k x uk2 2 + f 2(x) With L = max ( ) l = min ( ) > 0 = l=l (condition number) ) G = fbxg (unique minimizer) Under- ( < 1 ) or over-relaxed ( > 1) IST x k+1 = (1 )x k + prox f2 ³x k T ( x k u) Optimal choice = 2 L + l Q-linear convergence Small l ) ½. 1 ) slow convergence! [F, Bioucas-Dias, 2007]

34 Proximal algorithms: convergence of iterates With 1 bx 2 G = arg min x 2 k x uk2 2 + kxk 1 L = max ( ) ; using a step-size < 2=L; ³ x k+1 = soft x k T ( x k u); Z µ f1; 2; :::; ng such that bx 2 G ) [bx] Z = 0 Then, after a finite number of iterations: [x k ] Z = [bx] Z = 0 After this, Q-linear convergence: Optimal choice = 2 L + l ; l = min ( ¹ Z ¹ Z ) > 0 [Hale, Yin, Zhang, 2008]

35 Slowness and acceleration of IST Problem: IST algorithm: 1 bx 2 G = arg min x 2 k x uk2 2 + kxk 1 ³ x k+1 = soft x k T ( x k u); IST is slow, if is very ill-conditioned and/or is very small! Several proposals for accelerated variants of IST Methods with memory (TwIST, FISTA) Quasi-Newton methods (SpaRSA) Continuation, i.e., use a varying (FPC, SpaRSA)

36 Memory-based variants of IST: FISTA Fast IST algortihm (FISTA); based on Nesterov s work (1980 s) [Beck, Teboulle, 2009] FISTA t k+1 = 1 + p t 2 k 2 z k+1 = x k + t k 1 (x k x k 1 ) t k+1 ³ x k+1 = soft z k T ( z k u); IST: FISTA:

37 Memory-based variants of IST: twist Inspired by 2-step methods for linear systems [Frankel, 1950], [Axelsson, 1996] TwIST (two-step IST): [Bioucas-Dias, F, 2007] x k+1 = ( )x k + (1 )x k 1 + prox f2 xk T ( x k u) TwIST Q-linear convergence IST

38 Memory-based variants of IST: twist objective function SNR original B Blurred ( ), 9x9, 40db noise restored 1 bx 2 arg min x2r n 2 kbªx uk2 2 + kxk 1 representation coefficients dictionary (e.g, wavelet basis, frame,...) IST over-relaxed IST TwIST = =1 = 0 Second order full TwIST iterations TwIST over-relaxed IST = =1 = 0 Second order full TwIST IST iterations

39 Quasi-newton acceleration of IST: SpaRSA IST: x k+1 = prox kf 2 ³x k k rf 1 (x k ) A Newton step (instead of gradient descent) would be: x k+1 = prox kf 2 ³x k [H(x k )] 1 rf 1 (x k )...computationally too expensive! Barzilai-Borwein approach: [Barzilai-Borwein, 1988], [Wright, Nowak, F, 2009] 1 k Hessian (matrix of second derivatives) = arg min k (x k x k 1 ) (rf(x k ) rf(x k 1 )k 2 2 f 1 (x) = 1 If 2 k x uk2 2, then k = kx k x k 1 k 2 2 k (x k x k 1 )k k I ' H(x k )

40 Acceleration via continuation IST: ³ x k+1 = soft x k T ( x k u); Slow, if is small. Observation: IST (as SpaRSA) benefits from warm-starting (being initialized close to the minimizer) Continuation: start with large slowly decrease while tracking the solution. [F, Nowak, Wright, 2007], [Hale, Yin, Zhang, 2007] IST + continuation = fixed point continuation (FPC) [Hale, Yin, Zhang, 2007]

41 Acceleration via continuation 1 bx 2 G = arg min x 2 k x uk2 2 + kxk u = x + n max = k T yk 1 ( max ) bx = 0 )

42 Some speed comparisons from [Lorenz, 2011] = [I U R] ( ) bx 1 bx = arg min x 2 k x uk2 2 + kxk 1 with 120 non-zeros IST

43 Proximal algorithms for matrices cm 2 arg X k+1 = svt ¹ k 1 min M2R n n 2 k (M) Uk2 F + ¹kMk The proximal algorithm (IST) is as before: ³ X k k ( (X k ) U) linear operator...its adjoint Matrix completion: (X) = X (subset of entries) IST APG (FISTA) FPC (continuation) APG + continuation from [Toh, Yun, 2009]...the importance of acceleration!

44 Another class of methods: augmented Lagrangian The problem: min x s.t. f(x) x = u The augmented Lagrangian (AL) Penalty parameter L ¹ (x; ) = f(x) + T ( x u) + ¹ 2 k x uk2 2 The AL method (ALM) (a.k.a. method of multipliers) [Hestenes, Powell, 1969] x k+1 = arg min L ¹ (x; k) x k+1 = k + ¹( x k+1 u) Can be written as: x k+1 = arg min f(x) + ¹ x 2 k x u d kk 2 2 d k+1 = d k ( x k+1 u) Similar to Bregman method [Osher, Burger, Goldfarb, Xu, Yin, 2005] [Yin, Osher, Goldfarb, Darbon, 2008]

45 Augmented Lagrangian for variable splitting The problem: min x Equivalent constrained formulation Can be written as ALM: f 1 ( x) + f 2 (x) min y f(y) s.t. ªy = 0 min x f 1 (z) + f 2 (x) s.t. x z = 0 (x k+1 ; z k+1 ) = arg min x;z f 1(z) + f 2 (x) + ¹ 2 k x z d kk 2 2 d k+1 = d k ( x k+1 z k+1 ) with y = x z ª = [ I]

46 Augmented Lagrangian for variable splitting It may be hard to solve (x k+1 ; z k+1 ) = arg min x;z f 1(z) + f 2 (x) + ¹ 2 k x z d kk 2 2 Alternative: x k+1 = arg min x f 2 (x) + ¹ 2 k x z k d k k 2 2 z k+1 = arg min z f 1 (z) + ¹ 2 k x k+1 z d k k 2 2 d k+1 = d k ( x k+1 z k+1 ) Alternating directions method of multipliers (ADMM) [Glowinsky, Marrocco, 1975], [Gabay, Mercier, 1976], [Eckstein, Bertsekas, 1992] When applied to bx = arg min 2 k x uk2 2 + kxk 1 split augmented Lagrangian shrinkage algorithm (SALSA) [F, Bioucas-Dias, Afonso, 2009] x 1

47 Augmented Lagrangian for variable splitting Testing ADMM/SALSA on a typical image deblurring problem blurred restored bx 2 arg min x2r n 1 2 kbªx uk2 2 + kxk 1 restored 2 Objective function function 0.5 y-ax 2 + TV(x) TwIST FISTA SpaRSA SALSA seconds CPU time

48 Handling more than two functions bx 2 arg min f 0(x) + f 1 (x) + + f n (x) x2r n f 0 has a L-Lipschitz gradient f 1 ; :::; f n are convex Possible uses: multiple regularizers, positivity constraints,... Generalized forward-backward algorithm [Raguet, Fadili, Peyré, 2011] Parameters:! 1 ; :::;! n 2 (0; 1); s.t. P j! j = 1 k = 0; z0 1; :::; zn 0 ; x 0 = P n j=1! j z j 0 repeat until convergence for i = 1 : n zk+1 i = zk i + prox kf i =! i ³2 x k zk i k rf 1 (x k ) x k Initialization: x k+1 = P n i=1! i z i k+1 k à k + 1

49 Handling more than two functions bx 2 arg min x2r n f 1(x) + + f n (x) f 1 ; :::; f n arbitrary convex functions ADMM-based method [F and Bioucas-Dias, 2009], [Setzer, Steidl, Teuber, 2009] Parameter: Initialization: k = 0; z 1 0 ; :::; zn 0 ; y1 0 ; :::; yn 0 repeat until convergence x k+1 = (1=n) P n i=1 (yi k zi k ) for i = 1 : n yk+1 i = prox fi xk zk i zk+1 i = zk i + x k yk+1 i k à k + 1

50 Non-Convex Algorithms for Low-Dimensional Models

51 Motivation Discrete descriptions of low-dimensional models Example: reflectivity of Lambertian surfaces [Basri and Jacobs 2001]

52 Motivation Discrete descriptions of low-dimensional models Example: graphical model selection vertex weights of the i-th edge Gauss-Markov graph <> linear regression [Meinshausen and Buhlman 2006]

53 Motivation Discrete descriptions of structure in low-dimensional models Typical of wavelet transforms of natural signals and images (piecewise smooth) [Baraniuk, C, Duarte, Hegde 2010]

54 Motivation Non-convex criteria beyond atomic norms 1-bit compressive sensing optimization criteria [Boufounos and Baraniuk 2008] Compressible signals in weak optimization criteria [Chartrand and Yin 2008]

55 Non-convexity in this tutorial Anything not convex <> too big to cover convexity is in general a rare condition Active research topic with great depth Key lesson: [Attouch et al. 2010] convergence of the projected gradient-descent algorithm This tutorial <> a special subset ( is non-convex) with Assumptions: 1. access to the gradient of convex 2. tractable/approximate prox of non-convex

56 Can we project onto non-convex sets? Running examples Sparse signal: only K out of N coordinates nonzero model: union of all K-dimensional subspaces aligned w/ coordinate axes Structured sparse signal: (or model-sparse) reduced set of subspaces model: a particular union of subspaces ex: clustered or dispersed sparse patterns sorted index

57 Can we project onto non-convex sets? Running examples Sparse signal: only K out of N coordinates nonzero model: union of all K-dimensional subspaces aligned w/ coordinate axes Structured sparse signal: (or model-sparse) reduced set of subspaces model: a particular union of subspaces ex: clustered or dispersed sparse patterns sorted index

58 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem indexing set

59 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem

60 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem

61 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem where

62 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem underlying optimization problem <> integer linear program : support indicator variables [Kyrillidis and C, 2011]

63 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem underlying optimization problem <> integer linear program Class of problems we can tractably solve: PMAP Polynomial time modular epsilon-approximation property [Kyrillidis and C, 2011]

64 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: non-emptiness heredity exchange [Nemhauser and Wolsey, 1999]

65 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: non-emptiness heredity exchange Let \. The smallest matroid that contains is??? [Nemhauser and Wolsey, 1999]

66 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: non-emptiness heredity exchange Let \. The smallest matroid that contains {, by the non-emptiness property {1}, {2}, {3}, {4}, {1,2}, {3,4}, by the heredity property {1,3}, {1,4}, {2,3}, {2,4} by the exchange property }

67 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: Greedy basis algorithm efficiently solves sort N in decreasing order by weight start with empty set: 1. while keeping the order 2. 3.

68 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: Greedy basis algorithm efficiently solves matroid constrained problems Examples: 1. uniform matroid: hard thresholding! sorted index

69 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: Greedy basis algorithm efficiently solves matroid constrained problems Examples: [Kyrillidis and C, 2011] 1. uniform matroid <> simple sparsity intersection with the following matroids (result is still a matroid!*) 2. partition matroid <> distributed sparsity 3. graphic matroid <> spanning tree sparsity 4. matching matroid <> graph matching sparsity *: in general, the intersection of two matroids is not a matroid.

70 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: A and b <> integral first row of A <> all 1 s first entry of b <> K [Kyrillidis and C, 2011]

71 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: Example: neuronal spike model

72 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: We can use LP can relax the LS constrained ILPs: but, when is the result binary?

73 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: LP can exactly solve the LS constrained ILPs: when A is totally unimodular (TU)*! [Nemhauser and Wolsey, 1999] the determinant of each square submatrix is {-1,0,1} Examples: interval matrices, perfect matrices, network matrices *: if we want LP relaxation to work for all b, TU is a necessary condition.

74 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: Example: neuronal spike model TU [Hegde, Duarte, and C, 2009]

75 Can we project onto non-convex sets? PMAP-0: prox-sparse models Definition: define algorithmically! [Kyrillidis and C, 2011]

76 Can we project onto non-convex sets? PMAP-0: prox-sparse models Definition: define algorithmically! Example: clustered sparsity models tree-sparse <> dynamic program clustered sparse <> dynamic program [Baraniuk, C, Wakin 2010; Baraniuk, C, Duarte, Hegde 2010]

77 Can we project onto non-convex sets? Pop-quiz: A prox with convex and non-convex terms Let us consider Is it PMAP-0?

78 Can we project onto non-convex sets? Pop-answer: A prox with convex and non-convex terms Let us consider Hard thresholding followed by soft thresholding! YES: certified PMAP-0 w y

79 Can we project onto non-convex sets? PMAP-epsilon: Knapsack multi-knapsack constraints weighted multi-knapsack Ex: Nested group sparse problems quadratically-constrained Define algorithmically! approximate solutions for computational reasons selector variables [Kyrillidis and C, 2011]

80 we can only approximate and epsilon is large! Can we project onto non-convex sets? PMAP-epsilon: Knapsack multi-knapsack constraints weighted multi-knapsack Ex: Nested group sparse problems quadratically-constrained Define algorithmically! approximate solutions for computational reasons selector variables Pairwise overlapping groups <> quadratic binary w/ cardinality cons.

81 Can we project onto non-convex sets? PMAP-epsilon: Knapsack multi-knapsack constraints weighted multi-knapsack Ex: Nested group sparse problems quadratically-constrained Define algorithmically! approximate solutions for computational reasons selector variables Pairwise overlapping groups <> quadratic binary w/ cardinality cons. Multi-knapsack + multi-matroids [Lee et al., 2009] we can only approximate and epsilon is large!

82 Can we project onto non-convex sets? Matrix examples! Rank constrained projections

83 Can we project onto non-convex sets? Matrix examples! Rank constrained projections singular value decomposition

84 Can we project onto non-convex sets? Matrix examples! Rank constrained projections invariance to unitary transform

85 Can we project onto non-convex sets? Matrix examples! Rank constrained projections sparse approximation problem!

86 Can we project onto non-convex sets? Matrix examples! Rank constrained projections singular value (hard) thresholding

87 Can we project onto non-convex sets? Matrix examples! Rank constrained projections singular value (hard) thresholding Non-convex spectral projections <> sets described by their eigenvalue properties exact projections >> basic operations on eigenvalues [Lewis and Malick 2008]

88 Can we project onto non-convex sets? Matrix examples! Rank constrained projections Non-convex spectral projections <> sets described by their eigenvalue properties epsilon-approximate projections (note the difference with PMAP) Two highlights: structure from randomness/power methods [Halko, Martinsson, Tropp, 2010] column subset selection approaches [Boutsidis, Mahoney, Drineas, 2010]

89 Recovery algorithms for low-dimensional models Now that we have projections Non-convex Convex Probabilistic Encoding combinatorial / manifolds atomic norm / convex relaxation compressible / sparse priors A common criteria covering a broad set of applications: affine rank minimization, matrix completion, robust PCA [Candes and Recht 2009; Waters, Sankaranayanan, Baraniuk, 2011] A common algorithm: projected gradient

90 Recovery algorithms for low-dimensional models To highlight the salient differences, we will consider Non-convex Convex Probabilistic Encoding combinatorial / manifolds atomic norm / convex relaxation compressible / sparse priors compressive sensing recovery A common algorithm: projected gradient

91 A tale of two algorithms Soft thresholding

92 A tale of two algorithms Soft thresholding Structure in optimization:

93 A tale of two algorithms Soft thresholding majorization-minimization Key actor: least absolute shrinkage

94 A tale of two algorithms Soft thresholding slower

95 A tale of two algorithms Soft thresholding Is x* what we are looking for? local unverifiable assumptions: ERC/URC/RSC condition coherence based conditions (local global / dual certification) [Buhlmann and van de Geer 2011]

96 A tale of two algorithms Hard thresholding Key actor: hard thresholding ALGO: sort and pick the largest K

97 A tale of two algorithms Hard thresholding percolations What could possibly go wrong with this naïve approach? [C, 2011]

98 A tale of two algorithms Hard thresholding Global unverifiable assumption: RIP condition we can tiptoe among percolations! another variant has

99 Restricted Isometry Property Model: K-sparse coefficients Remark: implies convergence of convex relaxations also e.g., RIP: stable embedding IID sub-gaussian matrices K-planes [Candes 2008; Baraniuk et al. 2008]

100 Restricted Isometry Property Model: K-sparse coefficients Remark: implies convergence of convex relaxations also e.g., RIP: stable embedding IID sub-gaussian matrices K-planes

101 Restricted Isometry Property for Matrices! Model: rank-r matrices Remark: bi-lipschitz embedding of low-rank matrices RIP: stable embedding IID sub-gaussian matrices [Plan 2011] K-planes

102 Projected gradient method for non-convex sets Model-based hard thresholding Global unverifiable assumption: [Baraniuk, C, Duarte, Hegde 2010] Key actor: non-convex projector

103 Example: tree-sparse recovery Model: K-sparse coefficients + significant coefficients lie on a rooted subtree Sparse approx: find best set of coefficients sorting hard thresholding Tree-sparse approx: find best rooted subtree of coefficients condensing sort and select [Baraniuk] dynamic programming [Donoho] [Baraniuk, C, Duarte, Hegde 2010]

104 Example: tree-sparse recovery Model: K-sparse coefficients RIP: stable embedding IID sub-gaussian matrices K-planes

105 Example: tree-sparse recovery Model: K-sparse coefficients + significant coefficients lie on a rooted subtree Tree-RIP: stable embedding IID sub-gaussian matrices K-planes

106 Example: tree-sparse recovery Number samples for correct recovery Piecewise cubic signals + wavelets Models/algorithms: compressible (CoSaMP) tree-compressible (tree-cosamp) [Baraniuk, C, Duarte, Hegde 2010]

107 Recovery algorithms for low-dimensional models The Clash Operator Non-convex Convex Probabilistic Encoding combinatorial / manifolds Example Algorithm IHT, CoSaMP, SP, ALPS, OMP atomic norm / convex relaxation Basis pursuit, Lasso, basis pursuit denoising compressible / sparse priors Variational Bayes, EP, Approximate message passing (AMP) [Kyrillidis and C, 2011]

108 Recovery algorithms for low-dimensional models

109 Recovery algorithms for low-dimensional models A subtle issue 2 solutions! Which one is correct?

110 Recovery algorithms for low-dimensional models A subtle issue 2 solutions! Greed is good. Joel Tropp 2004

111 Recovery algorithms for low-dimensional models A subtle issue 2 solutions! Which one is correct?

112 The CLASH algorithm combinatorial selection + least absolute shrinkage

113 The CLASH algorithm combinatorial selection + least absolute shrinkage combinatorial origami

114 Recovery algorithms for low-dimensional models The Clash Operator Non-convex Convex Probabilistic Encoding combinatorial / manifolds Example Algorithm IHT, CoSaMP, SP, ALPS, OMP atomic norm / convex relaxation Basis pursuit, Lasso, basis pursuit denoising compressible / sparse priors Variational Bayes, EP, Approximate message passing (AMP) The idea is much more general [Kyrillidis, Puy, and C, 2012]

115 Recovery algorithms for low-dimensional models Using projected gradient with exact non-convex projections with RIP/ERC/URC/RSC Exact low-dimensional model noise-free measurements: exact recovery noisy measurements: stable recovery Approximately low-dimensional model recovery as good as K-model-sparse approximation recovery error signal K-term model approx error noise [Baraniuk, C, Duarte, Hegde 2010]

116 Recovery algorithms for low-dimensional models Using projected gradient with exact non-convex projections with RIP/ERC/URC/RSC Exact low-dimensional model noise-free measurements: exact recovery noisy measurements: stable recovery Approximately low-dimensional model recovery as good as K-model-sparse approximation recovery error signal K-term model approx error noise the bound remains qualitatively the same for other models!!!

117 Recovery algorithms for low-dimensional models Projected gradient with (non)exact non-convex projections without RIP/ERC/URC/RSC Not much! convergence to stationary point with Kurdyka-Lojasiewicz [Attouch et al., 2010]

118 Examples CLASH Structured Sparsity Norm Constraints [Kyrillidis, Puy, and C, 2012]

119 Examples Coded Aperture Snapshot Spectral Imager

120 Examples Total variation TWiST results

121 Examples TV-CLASH TV + sparsity in wavelets [Kyrillidis, Puy, and C, 2012]

122 Acceleration of non-convex algorithms Several approaches step-size selection memory based methods similar to Nesterov acceleration / double overrelaxation non-convex splitting (adaptive) block coordinate descent epsilon-approximate projections [Kyrillidis and C, 2011]

123 Acceleration of non-convex algorithms Several approaches step-size selection memory based methods similar to Nesterov acceleration / double overrelaxation 34.8s non-convex splitting (adaptive) block coordinate descent epsilon-approximate projections 15.8s [Zhou and Tao 2011; Kyrillidis and C, 2012]

124 Final remarks non-convex algorithms <> low-dimensional scaffold possible performance gains non-convexifiable priors matching prox operator with optimal space/time bounds complexity of structured approximation non-convex algorithms vs. convex algorithms no clear winner / scenario dependent convex non-convex decades of research in both

125 References M. Afonso, J. Bioucas-Dias, M. Figueiredo, Fast image recovery using variable splitting and constrained optimization, IEEE Transactions on Image Processing, vol. 19, H. Attouch, J. Bolte, P. Redont, A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality, Math. Oper. Research, 2010 O. Axelsson, Iterative Solution Methods, Cambridge University Press, R. Baraniuk, V. Cevher, M. Duarte, C. Hegde, Model-based compressive sensing, IEEE Transactions on Information Theory, vol. 56, R. Baraniuk, M. Davenport, R. de Vore, M. Wakin, A Simple Proof of the Restricted Isometry Property for Random Matrices, Constructive Approximation, R. Baraniuk, V. Cevher, M. Wakin, Low-dimensional models for dimensionality reduction and signal recovery: A geometric perspective, Proceedings of the IEEE, vol. 98, J. Barzilai and J. Borwein, Two point step size gradient methods, IMA Journal of Numer. Anal., vol. 8, R. Basri, D. Jacobs, Lambertian Reflectance and Linear Subspaces, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Science, vol. 2, J. Bioucas-Dias, M. Figueiredo, A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration, IEEE Transactions on Image Processing, vol. 16, P. Boufounos, R. Baraniuk, 1-bit compressed sensing, Proceedings of the Conference on Information Science and Systems, Princeton, C. Boutsidis, M. Mahoney, P. Drineas, An improved approximation algorithm for the column subset selection problem, Proc. 20th Annual ACM/SIAM Symposium on Discrete Algorithms, New York, NY, 2008.

126 References P. Bühlmann, S. van der Geer, Statistics for High-Dimensional Data, Springer, E. Candès, The restricted isometry property and its implications for compressed sensing, Comptes Rendus Mathematique, vol. 346, E. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, vol. 9, L. Carin, R. Baraniuk, V. Cevher, D. Dunson, M. Jordan, G. Sapiro, M. Wakin, Learning low-dimensional signal models, IEEE Signal Processing Magazine, vol. 28, V. Cevher, An ALPS view of sparse recovery, Proc. ICASSP, V. Chandrasekaran, B. Recht, P. Parrilo, A. Willsky, The convex geometry of linear inverse problems, submitted, R. Chartrand, W. Yin, Iteratively reweighted algorithms for compressive sensing, Proc. ICASSP, 2008 S. Chen, D. Donoho, M. Saunders, Atomic decomposition by Basis Pursuit, SIAM Review, vol. 43, P. Combettes, V. Wajs, Signal recovery by proximal forward-backward splitting, SIAM Journal Multiscale Modeling and Simulation, vol. 4, J. Eckstein, D. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming, vol. 5, M. Figueiredo and J. Bioucas-Dias, Restoration of Poissonian images using alternating direction optimization, IEEE Transactions on Image Processing, vol. 19, M. Figueiredo, R. Nowak, S. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing, vol. 1, 2007.

127 References D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Computers and Mathematics with Application, vol. 2, R. Glowinski, A. Marroco, Sur l approximation, par elements finis d ordre un, et la resolution, par penalisation-dualité d une classe de problemes de Dirichlet non lineares, Rev. Française d Automatique, Y. Gordon, On Milman's inequality and random subspaces which escape through a mesh in R n, in Geometric Aspects of Functional Analysis, Springer, E. Hale, W. Yin, Y. Zhang, Fixed-point continuation for l1-minimization: Methodology and convergence, SIAM Journal on Optimization, vol. 19, N. Halko, P.-G. Martinsson, J. Tropp, Finding structure with randomness: stochastic algorithms for constructing approximate matrix decompositions, SIAM Review, vol. 53, C. Hegde, M. Duarte, V. Cevher, Compressive sensing recovery of spike trains using a structured sparsity model, Proceedings of SPARS'09, Saint-Malo, France, M. Hestenes, Multiplier and gradient methods, Journal of Optimazion Theory andapplications, vol. 4, A. Kyrillidis, V. Cevher, Recipes for hard thresholding methods, Tech. Rep., EPFL, J. Lee, V. Mirrokni, V. Nagarajan, M. Sviridenko, Non-monotone submodular maximization under matroid and knapsack constraints, Proc. 41st Annual ACM Symposium on Theory of Computing, Bethesda, MD, A. Lewis, J. Malick, Alternating projections on manifolds, Math. of Operations Research, vol. 33, D. Lorenz, Constructing test instances for basis pursuit denoising, submitted, N. Meinshausen, P. Bühlmann, High-dimensional graphs and variable selection with the lasso, The Annals of Statistics, vol. 34, pp , J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Mathematiques de France, vol. 93, 1965.

128 References G. Nemhauser, L. Wolsey, Integer and combinatorial optimization, Wiley, S. Osher, M. Burger, D. Goldfarb, J. Xu, W. Yin, An iterative regularization method for total variation-based image restoration, SIAM Journal on Multiscale Modeling and Simulation, vol. 4, Y. Plan, Compressed sensing, sparse approximation, low-rank matrix estimation, PhD Thesis, Caltech, 2011 M. Powell, A method for nonlinear constraints in minimization problems, in Optimization, Academic Press, H. Raguet, J. Fadili, G. Peyré, Generalized Forward-Backward splitting, Tech. report, Hal , S. Setzer, G. Steidl, T. Teuber, Deblurring Poissonian images by split Bregman techniques, Journal of Visual Communication and Image Representation, K.-C. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares Problems, Pacific Journal of Optimization, vol. 6, A. Waters, A. Sankaranarayanan, R. Baraniuk, SpaRCS: recovering low-rank and sparse matrices from compressive measurements, Neural Information Processing Systems, S. Wright, R. Nowak, M. Figueiredo, Sparse reconstruction by separable approximation, IEEE Transactions on Signal Processing, vol. 57, W. Yin, S. Osher, D. Goldfarb, J. Darbon, Bregman iterative algorithms for l1-minimization with applications to compressed sensing, SIAM Journal on Imaging Science, vol. 1, T. Zhou, D. Tao, Godec: randomized low-rank & sparse matrix decomposition in noisy case, Proc. International Conference on Machine Learning, Bellevue, WA, 2011.

Low Complexity Regularization 1 / 33

Low Complexity Regularization 1 / 33 Low Complexity Regularization 1 / 33 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models Sparse representations

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY

More information

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization

This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Fast Hard Thresholding with Nesterov s Gradient Method

Fast Hard Thresholding with Nesterov s Gradient Method Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

Approximate Message Passing for Bilinear Models

Approximate Message Passing for Bilinear Models Approximate Message Passing for Bilinear Models Volkan Cevher Laboratory for Informa4on and Inference Systems LIONS / EPFL h,p://lions.epfl.ch & Idiap Research Ins=tute joint work with Mitra Fatemi @Idiap

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

arxiv: v1 [math.na] 26 Nov 2009

arxiv: v1 [math.na] 26 Nov 2009 Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,

More information

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University

SIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables

Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization

A General Framework for a Class of Primal-Dual Algorithms for TV Minimization A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with

More information

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学

Linearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学 Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Sparsity and Compressed Sensing

Sparsity and Compressed Sensing Sparsity and Compressed Sensing Jalal Fadili Normandie Université-ENSICAEN, GREYC Mathematical coffees 2017 Recap: linear inverse problems Dictionary Sensing Sensing Sensing = y m 1 H m n y y 2 R m H A

More information

Generalized greedy algorithms.

Generalized greedy algorithms. Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Minimizing Isotropic Total Variation without Subiterations

Minimizing Isotropic Total Variation without Subiterations MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

ParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals

ParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals Preprint manuscript No. (will be inserted by the editor) ParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals Ming Gu ek-heng im Cinna Julie Wu Received:

More information

Sparse and Regularized Optimization

Sparse and Regularized Optimization Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity

More information

arxiv: v2 [cs.cv] 27 Sep 2013

arxiv: v2 [cs.cv] 27 Sep 2013 Solving OSCAR regularization problems by proximal splitting algorithms Xiangrong Zeng a,, Mário A. T. Figueiredo a, a Instituto de Telecomunicações, Instituto Superior Técnico, 1049-001, Lisboa, Portugal.

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Denoising of NIRS Measured Biomedical Signals

Denoising of NIRS Measured Biomedical Signals Denoising of NIRS Measured Biomedical Signals Y V Rami reddy 1, Dr.D.VishnuVardhan 2 1 M. Tech, Dept of ECE, JNTUA College of Engineering, Pulivendula, A.P, India 2 Assistant Professor, Dept of ECE, JNTUA

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Learning MMSE Optimal Thresholds for FISTA

Learning MMSE Optimal Thresholds for FISTA MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm

More information

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007

Sparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007 Sparse Optimization: Algorithms and Applications Stephen Wright 1 Motivation and Introduction 2 Compressed Sensing Algorithms University of Wisconsin-Madison Caltech, 21 April 2007 3 Image Processing +Mario

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

The convex algebraic geometry of rank minimization

The convex algebraic geometry of rank minimization The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming

More information

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting

More information

ADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution

ADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution ADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution Mário A. T. Figueiredo Instituto Superior Técnico, University of Lisbon, Portugal Instituto de Telecomunicações Lisbon, Portugal Joint

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Topographic Dictionary Learning with Structured Sparsity

Topographic Dictionary Learning with Structured Sparsity Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Multipath Matching Pursuit

Multipath Matching Pursuit Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy

More information

Frist order optimization methods for sparse inverse covariance selection

Frist order optimization methods for sparse inverse covariance selection Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field

More information

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016 Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Alternating Direction Optimization for Imaging Inverse Problems

Alternating Direction Optimization for Imaging Inverse Problems Alternating Direction Optimization for Imaging Inverse Problems Mário A. T. Figueiredo Instituto Superior Técnico, and Instituto de Telecomunicações Technical University of Lisbon PORTUGAL Joint work with:

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Greedy Sparsity-Constrained Optimization

Greedy Sparsity-Constrained Optimization Greedy Sparsity-Constrained Optimization Sohail Bahmani, Petros Boufounos, and Bhiksha Raj 3 sbahmani@andrew.cmu.edu petrosb@merl.com 3 bhiksha@cs.cmu.edu Department of Electrical and Computer Engineering,

More information

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ???? DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information