Winter Conference in Statistics Compressed Sensing. LECTURE #7-8 Algorithms for low-dimensional models
|
|
- Wendy French
- 5 years ago
- Views:
Transcription
1 Winter Conference in Statistics 2013 Compressed Sensing LECTURE #7-8 Algorithms for low-dimensional models Prof. Dr. Volkan Cevher LIONS/Laboratory for Information and Inference Systems
2 Convex Algorithms for Low-Dimensional Models And, this is how you solve huge-dimensional problems
3 The classical problem templates Criteria seen above have the form kxk 1 = NX [x]i i=1 Basis pursuit (BP) [Chen, Donoho, Saunders, 1998] min A x kxk 1 s.t. x = u BP denoising (BPDN): [Chen, Donoho, Saunders, 1998] Also well known: LASSO (least absolute shrinkage/selection operator): [Tibshirani, 1996] All can be written as bx 2 arg min x2r N f 1 (x) + f 2 (x)
4 Convex optimization and proximal algorithms bx 2 arg min x2r N f 1 (x) + f 2 (x) f 1 : R N! R data fidelity term; convex, smooth. typically: f 2 : R N! ¹ R = R [ f+1g Convex regularizer (maybe non-smooth; e.g. `1 ) (non-convex, later ). Difficulties: non-smoothness and large dimension ( N À 1)
5 Constrained vs unconstrained formulations Constrained optimization formulations can be written as using indicator functions: S (x) = bx 2 arg min f 1 (x) + f 2 (x) x2r N ½ 0 ( x 2 S +1 ( x 62 S Example: same as bx 2 arg min x2r N f 1 (x) + fx:g(x) ºg (x) Classical example: the LASSO:
6 Convex and strictly convex sets S is convex if x; x 0 2 S ) 8 2 [0; 1] x + (1 )x 0 2 S x S x 0 convex x S x 0 non-convex S is strictly convex if x; x 0 2 S ) 8 2 (0; 1) x + (1 )x 0 2 int(s) x S x convex, but not strictly x 0 strictly convex x 0
7 Convex and strictly convex functions Extended real valued function: f : R N! ¹ R = R [ f+1g Domain of a function: dom(f) = fx : f(x) 6= +1g f is a convex function if f is a strictly convex function if 8 2 (0; 1); x; x 0 2 dom(f) f( x + (1 )x 0 ) < f(x) + (1 )f(x 0 ) non-convex convex strictly convex convex, not strictly
8 Convexity, coercivity, and minima f is coercive if f f : R N! ¹ R = R [ f+1g lim f(x) = +1 kxk!+1 G arg min f(x) if is coercive, then is a non-empty set x if f is strictly convex, then G has at most one element coercive and strictly convex coercive, not strictly convex convex, not coercive x G = fx g f G G = ;
9 Euclidean projections on convex sets bx 2 arg min f 1(x) + f 2 (x) x2r n ½ 0 ( x 2 S consider f 2 (x) = S (x) = +1 ( x 62 S Our problem: (convex if S is convex) and f 1 (x) = 1 2 ku xk2 2 (strictly convex) S z = P S (z) P S (u) u bx = arg min f 1(x) + f 2 (x) x2r n = arg min ku x2s xk2 2 P S (u) (Euclidean projection)
10 Projected gradient algorithm Our problem: bx 2 arg min x2r n f 1(x) + f 2 (x) f 2 (x) = S (x) with ( is a convex set) and S f 1 some function, e.g., f 1 (x) = 1 2 k x uk2 2 Projected gradient algorithm: x k+1 = P S ³x k k rf 1 (x k ) if f 1 (x) = 1 2 k x uk2 2 step size x k+1 = P S ³x k k T ( x k u)
11 Detour: majorization-minimization (MM) Problem: bx 2 arg min x2r n f(x) Q(x; x k ) is a majorizer of f at x k Q(x; x k ) f(x); Q(x k ; x k ) = f(x k ) f(x) Q(x; x k ) Q(x; x k+1 ) MM algorithm: x k+1 = arg min Q(x; x k ) x monotonicity: x k x k+2 x k+1
12 Projected gradient from majorization-minimization Our problem: bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 (x) = S (x) with ( is a convex set) and f 1 has L -Lipschitz gradient S e.g. f 1 (x) = 1 2 k x uk2 2 ) L = max( T ) = k k a separable approximation of f 1 Hessian of f 1 Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) k kx x k k 2 2
13 Projected gradient from majorization-minimization Our problem: Separable approximation of f 1 bx 2 arg min x2r n f 1(x) + S (x) Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) + 1 Q(x; x k ) is a majorizer of f 1, if k < 1 L 2 k kx x k k 2 2 Q(x; x k ) + S (x) is a majorizer f 1 (x) + S (x) MM algorithm: x k+1 = arg min Q(x; x k ) + S (x) x 1 = arg min x x k krf 1 (x k ) 2 x 2 k 2 = P S ³x k k rf 1 (x k ) + S (x) projected gradient.
14 Proximity operators Our problem: with bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 a convex function and f 1 (x) = 1 2 ku xk2 2 (strictly convex) 1 bx = arg min x2r n 2 ku xk2 2 + f 2(x) prox f2 (u) Proximity operator [Moreau 62], [Combettes 01]. Generalizes the notion of Euclidean projection.
15 Proximity operators (linear) 1 prox f (u) = arg min x2r n 2 ku xk2 2 + f(x) (R N! R N ) Classical cases: squared `2 regulizer f(x) = 2 kxk2 2 1 prox f (u) = arg min x2r n 2 ku xk kxk2 2 = u 1 + squared regularizer with analysis operator f(x) = `2 2 kdxk2 2 1 prox f (u) = arg min x2r n 2 ku xk kdxk2 2 = (I + D T D) 1 u if D is a circulant matrix, O(N log N) cost using the FFT
16 Proximity operator of the norm 1 prox k k1 (u) = arg min x2r n 2 ku xk2 2 + kxk 1 Separable: solve w.r.t. each component: min x jxj + 0:5(x u) 2 Possible approach: write jxj = max jzj 1 zx min x max zx + 0:5(x jzj 1 u)2 = max min zx + 0:5(x u)2 jzj 1 x arg max jzj 1 0:5 2 z 2 + zu = `1 x = u z = max 0:5 2 z 2 + zu (for ) jzj 1 8 < : u= ( juj 1 ( u > 1 ( u <
17 Proximity operator of the `1 norm: the soft soft thresholding soft(u; ) = sign(u) maxf0; juj g soft(u; ) = prox j j (for vectors, soft(u; ) is applied component-wise) p -th power of `p closed form prox for norms [Combettes, Wajs, 2005] kxk p p = X j[x] i j p i p 2 ½1; 2; 43 ; 32 ¾ ; 3; 4
18 Dual norms, proximity operators, and projections Dual norm: some norm, k k : R N! R + its dual norm: kxk = max hx; zi kzk 1 1 Dual norm of k k p is k k q, where p + 1 q = 1 Hölder conjugates simple corollary of Hölder s inequality: Examples of Hölder conjugates: (2; 2); (1; +1); (3=2; 3); ::: These concepts are related through: prox k k (u) = u P fx:kxk 1g(u) [Combettes, Wajs, 2005]
19 Dual norms, proximity operators, and projections prox k k (u) = u P fx:kxk g(u) This relation underlies our earlier derivation of prox k k1 prox k k1 (u) = u P fx:kxk1 g(u) It s all separable, prox j j (u) = u P fx:jxj g (u) kxk 1 = maxfj[x] i jg P j j (z) soft(u; ) = soft(u; )
20 Dual norms, proximity operators, and projections prox k k (u) = u P fx:kxk g(u) This relation allows deriving prox k k1 and prox k k2 prox k k1 (u) = u P fx:kxk1 g(u) prox k k2 (u) = u P fx:kxk2 g(u) projection on the ball of radius O(n log n) `1 prox P(u) u = u kuk 2 maxf0; kuk 2 g vector soft thresholding
21 Proximity operators of atomic norms prox k k (u) = u P fx:kxk g(u) These relation allows deriving prox operators of atomic norms: kxk A = infft > 0 : x 2 t conv(a)g The dual of an atomic norm ball: kxk A = max hz; xi = max kzk A 1 P fx:kxk A g(u) = arg prox k ka (u) = u arg hz; xi z2conv(a) = maxfha; xi; a 2 Ag min ha;xi ; 8 a2a ku xk 2 2 min ha;xi ; 8 a2a ku xk 2 2
22 Proximity operators of atomic norms: `1 Deriving prox k k1 kxk 1 = kxk A A = e 1 e 2 e 1 e 2 jaj = 2 N from the atomic norm view >< ; ; : : : ; ; ; : : : ; 6 4 >: = fe 1 ; e 2 ; :::; e N ; e 1 ; :::; e N g >= 7 5 >; kxk A = maxfha; xi; a 2 Ag = maxfj[x] i jg = kxk 1 prox k k1 (u) = u P fx:kxk1 g(u) = soft(x; )
23 Proximity operators of atomic norms: `1 Deriving prox k k1 from the atomic norm view >< ; ; ; ; : : : ; 1 >= >: >; kxk 1 = kxk A A = = f 1; +1g N jaj = 2 N N kxk A = maxfha; xi; a 2 Ag = X j[x] i j = kxk 1 i=1 prox k k1 (u) = u P fx:kxk1 g(u)
24 Proximity of atomic norms: matrix nuclear norm Matrix nuclear norm: kxk = X i ¾ i (X) = X i q i(x T X) kxk = kxk A A = fz : rank(z) = 1; kzk F = 1g rank(z) = jf¾ i (Z) 6= 0gj Frobenius norm kzk 2 F = X ij [Z] 2 ij = X i ¾ 2 i (Z) kxk A = maxfhz; Xi; Z 2 Ag ( X = max ¾ i (Z)¾ i (X); rank(z) = 1; X i i = ¾ max (X) = kxk 2 spectral norm ¾ 2 i (Z) = 1 )
25 Proximity of atomic norms: matrix nuclear norm Euclidean matrix projection: P S (X) = arg min Z2S kz Xk2 F Note: for any unitary matrix U (U T U = I; UU T = I) kumk 2 F = trace M T U T UM = trace M T A = kmk 2 F prox k k (X) = X P fz:kzk2 g(x) singular value diagonal matrix = U V T P fz:¾max (Z) g(u V T ) [Lewis, Malick, 2009] = Udiag diag( ) P fx:kxk1 g(diag( )) V T = Usoft( ; )V T singular value thresholding (svt)
26 Proximity of atomic norms: matrix spectral norm Matrix spectral norm: kxk 2 = ¾ max (X) kxk 2 = kxk A A = fz : Z T Z = Ig = fz : ¾ i (Z) = 1; 8 i g kxk A = maxfhz; Xi; Z 2 Ag ( ) X = max ¾ i (Z)¾ i (X); ¾ i (Z) = 1; 8 i i orthogonal matrices = X i ¾ i (X) = kxk nuclear norm
27 Proximity of atomic norms: matrix spectral norm prox k k2 (X) = X P fz:kzk g(x) = U V T P fz:kzk g(u V T ) singular value diagonal matrix = U P fz: P i ¾ i(z) g( ) V T = Udiag diag( ) P fx:kxk1 g(diag( )) V T residual of projection of the singular values on an `1 ball of radius
28 Proximity and atomic sets: vectors vs matrices norm vectors prox matrices atomic set norm prox atomic set `1 kxk 1 component soft thresholding A = f e i g nuclear jaj = 2 N kxk singular value thresholding A = set of all rank 1, norm 1 matrices `1 kxk 1 residual of projection on `1 ball A = f 1g N spectral jaj = 2 N kxk 2 residual of s.v. proj. on `1 ball A = set of all orthogonal matrices `2 kxk 2 vector soft thresholding A = set of all vectors with norm 1 jaj = 1 Frobenius kxk F matrix soft threshold. A = all matrices of unit Frobenius norm.
29 Proximal algorithms Back to the problem: with bx 2 arg min f 1(x) + f 2 (x) x2r n f 2 a proper convex function f 1 L f 1 (x) = 1 2 k x uk2 2 and has a -Lipschitz gradient; e.g. with L = max ( ) separable majorizer ( k < 1=L ) Q(x; x k ) = f 1 (x k ) + (x x k ) T rf 1 (x k ) + 1 kx x k k k majorization-minimization algorithm x k+1 = arg min Q(x; x k ) + f 2 (x) x 1 = arg min x x k krf 1 (x k ) 2 x 2 k 2 x k+1 = prox k f 2 ³x k k rf 1 (x k ) + f 2 (x)
30 Proximal algorithms: convergence Problem: bx 2 arg min x2r n f 1(x) + f 2 (x) f(x) f 1 has a L -Lipschitz gradient; e.g. f 1 (x) = 1 2 k x uk2 2 Iterative shrinkage/thresholding (IST) (or forward-backward) L = max ( ) x k+1 = prox kf 2 ³x k k rf 1 (x k ) if k < 1 L, IST is a majorization-minimization algorithm, thus f(x) 0, thus (f(x 1 ); f(x 2 ); :::; f(x k ); :::) converges. Attention: this does not imply convergence of (x 1 ; :::; x k ; :::)
31 Proximal algorithms: convergence bx 2 G = arg min f 1(x) + f 2 (x) x2r n IST algorithm: x k+1 = prox kf 2 ³x k k rf 1 (x k ) 0 < k < 2 L if, then (x 1 ; x 2 ; :::; x k ; :::) errors x k+1 = prox kf 2 ³x k ( k rf 1 (x k ) + b k ) + a k Inexact version: converges to a point in convergence still guaranteed if 1X ka k k < 1 G 1X kb k k < 1 k=1 k=1 Results and proofs in [Combettes and Wajs, 2005]
32 Proximal algorithms: convergence ) Convergence of function values (f(x 1 ); :::; f(x k ); :::)! f(bx) Convergence of iterates ) (x 1 ; x 2 ; :::; x k ; :::)! bx Convergence rates (for function values) [Beck, Teboulle, 2009]: Convergence rate for the iterates require further assumptions on f
33 Proximal algorithms: convergence of iterates 1 bx = arg min x 2 k x uk2 2 + f 2(x) With L = max ( ) l = min ( ) > 0 = l=l (condition number) ) G = fbxg (unique minimizer) Under- ( < 1 ) or over-relaxed ( > 1) IST x k+1 = (1 )x k + prox f2 ³x k T ( x k u) Optimal choice = 2 L + l Q-linear convergence Small l ) ½. 1 ) slow convergence! [F, Bioucas-Dias, 2007]
34 Proximal algorithms: convergence of iterates With 1 bx 2 G = arg min x 2 k x uk2 2 + kxk 1 L = max ( ) ; using a step-size < 2=L; ³ x k+1 = soft x k T ( x k u); Z µ f1; 2; :::; ng such that bx 2 G ) [bx] Z = 0 Then, after a finite number of iterations: [x k ] Z = [bx] Z = 0 After this, Q-linear convergence: Optimal choice = 2 L + l ; l = min ( ¹ Z ¹ Z ) > 0 [Hale, Yin, Zhang, 2008]
35 Slowness and acceleration of IST Problem: IST algorithm: 1 bx 2 G = arg min x 2 k x uk2 2 + kxk 1 ³ x k+1 = soft x k T ( x k u); IST is slow, if is very ill-conditioned and/or is very small! Several proposals for accelerated variants of IST Methods with memory (TwIST, FISTA) Quasi-Newton methods (SpaRSA) Continuation, i.e., use a varying (FPC, SpaRSA)
36 Memory-based variants of IST: FISTA Fast IST algortihm (FISTA); based on Nesterov s work (1980 s) [Beck, Teboulle, 2009] FISTA t k+1 = 1 + p t 2 k 2 z k+1 = x k + t k 1 (x k x k 1 ) t k+1 ³ x k+1 = soft z k T ( z k u); IST: FISTA:
37 Memory-based variants of IST: twist Inspired by 2-step methods for linear systems [Frankel, 1950], [Axelsson, 1996] TwIST (two-step IST): [Bioucas-Dias, F, 2007] x k+1 = ( )x k + (1 )x k 1 + prox f2 xk T ( x k u) TwIST Q-linear convergence IST
38 Memory-based variants of IST: twist objective function SNR original B Blurred ( ), 9x9, 40db noise restored 1 bx 2 arg min x2r n 2 kbªx uk2 2 + kxk 1 representation coefficients dictionary (e.g, wavelet basis, frame,...) IST over-relaxed IST TwIST = =1 = 0 Second order full TwIST iterations TwIST over-relaxed IST = =1 = 0 Second order full TwIST IST iterations
39 Quasi-newton acceleration of IST: SpaRSA IST: x k+1 = prox kf 2 ³x k k rf 1 (x k ) A Newton step (instead of gradient descent) would be: x k+1 = prox kf 2 ³x k [H(x k )] 1 rf 1 (x k )...computationally too expensive! Barzilai-Borwein approach: [Barzilai-Borwein, 1988], [Wright, Nowak, F, 2009] 1 k Hessian (matrix of second derivatives) = arg min k (x k x k 1 ) (rf(x k ) rf(x k 1 )k 2 2 f 1 (x) = 1 If 2 k x uk2 2, then k = kx k x k 1 k 2 2 k (x k x k 1 )k k I ' H(x k )
40 Acceleration via continuation IST: ³ x k+1 = soft x k T ( x k u); Slow, if is small. Observation: IST (as SpaRSA) benefits from warm-starting (being initialized close to the minimizer) Continuation: start with large slowly decrease while tracking the solution. [F, Nowak, Wright, 2007], [Hale, Yin, Zhang, 2007] IST + continuation = fixed point continuation (FPC) [Hale, Yin, Zhang, 2007]
41 Acceleration via continuation 1 bx 2 G = arg min x 2 k x uk2 2 + kxk u = x + n max = k T yk 1 ( max ) bx = 0 )
42 Some speed comparisons from [Lorenz, 2011] = [I U R] ( ) bx 1 bx = arg min x 2 k x uk2 2 + kxk 1 with 120 non-zeros IST
43 Proximal algorithms for matrices cm 2 arg X k+1 = svt ¹ k 1 min M2R n n 2 k (M) Uk2 F + ¹kMk The proximal algorithm (IST) is as before: ³ X k k ( (X k ) U) linear operator...its adjoint Matrix completion: (X) = X (subset of entries) IST APG (FISTA) FPC (continuation) APG + continuation from [Toh, Yun, 2009]...the importance of acceleration!
44 Another class of methods: augmented Lagrangian The problem: min x s.t. f(x) x = u The augmented Lagrangian (AL) Penalty parameter L ¹ (x; ) = f(x) + T ( x u) + ¹ 2 k x uk2 2 The AL method (ALM) (a.k.a. method of multipliers) [Hestenes, Powell, 1969] x k+1 = arg min L ¹ (x; k) x k+1 = k + ¹( x k+1 u) Can be written as: x k+1 = arg min f(x) + ¹ x 2 k x u d kk 2 2 d k+1 = d k ( x k+1 u) Similar to Bregman method [Osher, Burger, Goldfarb, Xu, Yin, 2005] [Yin, Osher, Goldfarb, Darbon, 2008]
45 Augmented Lagrangian for variable splitting The problem: min x Equivalent constrained formulation Can be written as ALM: f 1 ( x) + f 2 (x) min y f(y) s.t. ªy = 0 min x f 1 (z) + f 2 (x) s.t. x z = 0 (x k+1 ; z k+1 ) = arg min x;z f 1(z) + f 2 (x) + ¹ 2 k x z d kk 2 2 d k+1 = d k ( x k+1 z k+1 ) with y = x z ª = [ I]
46 Augmented Lagrangian for variable splitting It may be hard to solve (x k+1 ; z k+1 ) = arg min x;z f 1(z) + f 2 (x) + ¹ 2 k x z d kk 2 2 Alternative: x k+1 = arg min x f 2 (x) + ¹ 2 k x z k d k k 2 2 z k+1 = arg min z f 1 (z) + ¹ 2 k x k+1 z d k k 2 2 d k+1 = d k ( x k+1 z k+1 ) Alternating directions method of multipliers (ADMM) [Glowinsky, Marrocco, 1975], [Gabay, Mercier, 1976], [Eckstein, Bertsekas, 1992] When applied to bx = arg min 2 k x uk2 2 + kxk 1 split augmented Lagrangian shrinkage algorithm (SALSA) [F, Bioucas-Dias, Afonso, 2009] x 1
47 Augmented Lagrangian for variable splitting Testing ADMM/SALSA on a typical image deblurring problem blurred restored bx 2 arg min x2r n 1 2 kbªx uk2 2 + kxk 1 restored 2 Objective function function 0.5 y-ax 2 + TV(x) TwIST FISTA SpaRSA SALSA seconds CPU time
48 Handling more than two functions bx 2 arg min f 0(x) + f 1 (x) + + f n (x) x2r n f 0 has a L-Lipschitz gradient f 1 ; :::; f n are convex Possible uses: multiple regularizers, positivity constraints,... Generalized forward-backward algorithm [Raguet, Fadili, Peyré, 2011] Parameters:! 1 ; :::;! n 2 (0; 1); s.t. P j! j = 1 k = 0; z0 1; :::; zn 0 ; x 0 = P n j=1! j z j 0 repeat until convergence for i = 1 : n zk+1 i = zk i + prox kf i =! i ³2 x k zk i k rf 1 (x k ) x k Initialization: x k+1 = P n i=1! i z i k+1 k à k + 1
49 Handling more than two functions bx 2 arg min x2r n f 1(x) + + f n (x) f 1 ; :::; f n arbitrary convex functions ADMM-based method [F and Bioucas-Dias, 2009], [Setzer, Steidl, Teuber, 2009] Parameter: Initialization: k = 0; z 1 0 ; :::; zn 0 ; y1 0 ; :::; yn 0 repeat until convergence x k+1 = (1=n) P n i=1 (yi k zi k ) for i = 1 : n yk+1 i = prox fi xk zk i zk+1 i = zk i + x k yk+1 i k à k + 1
50 Non-Convex Algorithms for Low-Dimensional Models
51 Motivation Discrete descriptions of low-dimensional models Example: reflectivity of Lambertian surfaces [Basri and Jacobs 2001]
52 Motivation Discrete descriptions of low-dimensional models Example: graphical model selection vertex weights of the i-th edge Gauss-Markov graph <> linear regression [Meinshausen and Buhlman 2006]
53 Motivation Discrete descriptions of structure in low-dimensional models Typical of wavelet transforms of natural signals and images (piecewise smooth) [Baraniuk, C, Duarte, Hegde 2010]
54 Motivation Non-convex criteria beyond atomic norms 1-bit compressive sensing optimization criteria [Boufounos and Baraniuk 2008] Compressible signals in weak optimization criteria [Chartrand and Yin 2008]
55 Non-convexity in this tutorial Anything not convex <> too big to cover convexity is in general a rare condition Active research topic with great depth Key lesson: [Attouch et al. 2010] convergence of the projected gradient-descent algorithm This tutorial <> a special subset ( is non-convex) with Assumptions: 1. access to the gradient of convex 2. tractable/approximate prox of non-convex
56 Can we project onto non-convex sets? Running examples Sparse signal: only K out of N coordinates nonzero model: union of all K-dimensional subspaces aligned w/ coordinate axes Structured sparse signal: (or model-sparse) reduced set of subspaces model: a particular union of subspaces ex: clustered or dispersed sparse patterns sorted index
57 Can we project onto non-convex sets? Running examples Sparse signal: only K out of N coordinates nonzero model: union of all K-dimensional subspaces aligned w/ coordinate axes Structured sparse signal: (or model-sparse) reduced set of subspaces model: a particular union of subspaces ex: clustered or dispersed sparse patterns sorted index
58 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem indexing set
59 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem
60 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem
61 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem where
62 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem underlying optimization problem <> integer linear program : support indicator variables [Kyrillidis and C, 2011]
63 Can we project onto non-convex sets? Analysis of the prox for structured sparse sets support of the solution <> modular approximation problem underlying optimization problem <> integer linear program Class of problems we can tractably solve: PMAP Polynomial time modular epsilon-approximation property [Kyrillidis and C, 2011]
64 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: non-emptiness heredity exchange [Nemhauser and Wolsey, 1999]
65 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: non-emptiness heredity exchange Let \. The smallest matroid that contains is??? [Nemhauser and Wolsey, 1999]
66 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: non-emptiness heredity exchange Let \. The smallest matroid that contains {, by the non-emptiness property {1}, {2}, {3}, {4}, {1,2}, {3,4}, by the heredity property {1,3}, {1,4}, {2,3}, {2,4} by the exchange property }
67 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: Greedy basis algorithm efficiently solves sort N in decreasing order by weight start with empty set: 1. while keeping the order 2. 3.
68 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: Greedy basis algorithm efficiently solves matroid constrained problems Examples: 1. uniform matroid: hard thresholding! sorted index
69 Can we project onto non-convex sets? PMAP-0: Matroid structured sparse models: Definition: Greedy basis algorithm efficiently solves matroid constrained problems Examples: [Kyrillidis and C, 2011] 1. uniform matroid <> simple sparsity intersection with the following matroids (result is still a matroid!*) 2. partition matroid <> distributed sparsity 3. graphic matroid <> spanning tree sparsity 4. matching matroid <> graph matching sparsity *: in general, the intersection of two matroids is not a matroid.
70 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: A and b <> integral first row of A <> all 1 s first entry of b <> K [Kyrillidis and C, 2011]
71 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: Example: neuronal spike model
72 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: We can use LP can relax the LS constrained ILPs: but, when is the result binary?
73 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: LP can exactly solve the LS constrained ILPs: when A is totally unimodular (TU)*! [Nemhauser and Wolsey, 1999] the determinant of each square submatrix is {-1,0,1} Examples: interval matrices, perfect matrices, network matrices *: if we want LP relaxation to work for all b, TU is a necessary condition.
74 Can we project onto non-convex sets? PMAP-0: Linear support constraints: Definition: Example: neuronal spike model TU [Hegde, Duarte, and C, 2009]
75 Can we project onto non-convex sets? PMAP-0: prox-sparse models Definition: define algorithmically! [Kyrillidis and C, 2011]
76 Can we project onto non-convex sets? PMAP-0: prox-sparse models Definition: define algorithmically! Example: clustered sparsity models tree-sparse <> dynamic program clustered sparse <> dynamic program [Baraniuk, C, Wakin 2010; Baraniuk, C, Duarte, Hegde 2010]
77 Can we project onto non-convex sets? Pop-quiz: A prox with convex and non-convex terms Let us consider Is it PMAP-0?
78 Can we project onto non-convex sets? Pop-answer: A prox with convex and non-convex terms Let us consider Hard thresholding followed by soft thresholding! YES: certified PMAP-0 w y
79 Can we project onto non-convex sets? PMAP-epsilon: Knapsack multi-knapsack constraints weighted multi-knapsack Ex: Nested group sparse problems quadratically-constrained Define algorithmically! approximate solutions for computational reasons selector variables [Kyrillidis and C, 2011]
80 we can only approximate and epsilon is large! Can we project onto non-convex sets? PMAP-epsilon: Knapsack multi-knapsack constraints weighted multi-knapsack Ex: Nested group sparse problems quadratically-constrained Define algorithmically! approximate solutions for computational reasons selector variables Pairwise overlapping groups <> quadratic binary w/ cardinality cons.
81 Can we project onto non-convex sets? PMAP-epsilon: Knapsack multi-knapsack constraints weighted multi-knapsack Ex: Nested group sparse problems quadratically-constrained Define algorithmically! approximate solutions for computational reasons selector variables Pairwise overlapping groups <> quadratic binary w/ cardinality cons. Multi-knapsack + multi-matroids [Lee et al., 2009] we can only approximate and epsilon is large!
82 Can we project onto non-convex sets? Matrix examples! Rank constrained projections
83 Can we project onto non-convex sets? Matrix examples! Rank constrained projections singular value decomposition
84 Can we project onto non-convex sets? Matrix examples! Rank constrained projections invariance to unitary transform
85 Can we project onto non-convex sets? Matrix examples! Rank constrained projections sparse approximation problem!
86 Can we project onto non-convex sets? Matrix examples! Rank constrained projections singular value (hard) thresholding
87 Can we project onto non-convex sets? Matrix examples! Rank constrained projections singular value (hard) thresholding Non-convex spectral projections <> sets described by their eigenvalue properties exact projections >> basic operations on eigenvalues [Lewis and Malick 2008]
88 Can we project onto non-convex sets? Matrix examples! Rank constrained projections Non-convex spectral projections <> sets described by their eigenvalue properties epsilon-approximate projections (note the difference with PMAP) Two highlights: structure from randomness/power methods [Halko, Martinsson, Tropp, 2010] column subset selection approaches [Boutsidis, Mahoney, Drineas, 2010]
89 Recovery algorithms for low-dimensional models Now that we have projections Non-convex Convex Probabilistic Encoding combinatorial / manifolds atomic norm / convex relaxation compressible / sparse priors A common criteria covering a broad set of applications: affine rank minimization, matrix completion, robust PCA [Candes and Recht 2009; Waters, Sankaranayanan, Baraniuk, 2011] A common algorithm: projected gradient
90 Recovery algorithms for low-dimensional models To highlight the salient differences, we will consider Non-convex Convex Probabilistic Encoding combinatorial / manifolds atomic norm / convex relaxation compressible / sparse priors compressive sensing recovery A common algorithm: projected gradient
91 A tale of two algorithms Soft thresholding
92 A tale of two algorithms Soft thresholding Structure in optimization:
93 A tale of two algorithms Soft thresholding majorization-minimization Key actor: least absolute shrinkage
94 A tale of two algorithms Soft thresholding slower
95 A tale of two algorithms Soft thresholding Is x* what we are looking for? local unverifiable assumptions: ERC/URC/RSC condition coherence based conditions (local global / dual certification) [Buhlmann and van de Geer 2011]
96 A tale of two algorithms Hard thresholding Key actor: hard thresholding ALGO: sort and pick the largest K
97 A tale of two algorithms Hard thresholding percolations What could possibly go wrong with this naïve approach? [C, 2011]
98 A tale of two algorithms Hard thresholding Global unverifiable assumption: RIP condition we can tiptoe among percolations! another variant has
99 Restricted Isometry Property Model: K-sparse coefficients Remark: implies convergence of convex relaxations also e.g., RIP: stable embedding IID sub-gaussian matrices K-planes [Candes 2008; Baraniuk et al. 2008]
100 Restricted Isometry Property Model: K-sparse coefficients Remark: implies convergence of convex relaxations also e.g., RIP: stable embedding IID sub-gaussian matrices K-planes
101 Restricted Isometry Property for Matrices! Model: rank-r matrices Remark: bi-lipschitz embedding of low-rank matrices RIP: stable embedding IID sub-gaussian matrices [Plan 2011] K-planes
102 Projected gradient method for non-convex sets Model-based hard thresholding Global unverifiable assumption: [Baraniuk, C, Duarte, Hegde 2010] Key actor: non-convex projector
103 Example: tree-sparse recovery Model: K-sparse coefficients + significant coefficients lie on a rooted subtree Sparse approx: find best set of coefficients sorting hard thresholding Tree-sparse approx: find best rooted subtree of coefficients condensing sort and select [Baraniuk] dynamic programming [Donoho] [Baraniuk, C, Duarte, Hegde 2010]
104 Example: tree-sparse recovery Model: K-sparse coefficients RIP: stable embedding IID sub-gaussian matrices K-planes
105 Example: tree-sparse recovery Model: K-sparse coefficients + significant coefficients lie on a rooted subtree Tree-RIP: stable embedding IID sub-gaussian matrices K-planes
106 Example: tree-sparse recovery Number samples for correct recovery Piecewise cubic signals + wavelets Models/algorithms: compressible (CoSaMP) tree-compressible (tree-cosamp) [Baraniuk, C, Duarte, Hegde 2010]
107 Recovery algorithms for low-dimensional models The Clash Operator Non-convex Convex Probabilistic Encoding combinatorial / manifolds Example Algorithm IHT, CoSaMP, SP, ALPS, OMP atomic norm / convex relaxation Basis pursuit, Lasso, basis pursuit denoising compressible / sparse priors Variational Bayes, EP, Approximate message passing (AMP) [Kyrillidis and C, 2011]
108 Recovery algorithms for low-dimensional models
109 Recovery algorithms for low-dimensional models A subtle issue 2 solutions! Which one is correct?
110 Recovery algorithms for low-dimensional models A subtle issue 2 solutions! Greed is good. Joel Tropp 2004
111 Recovery algorithms for low-dimensional models A subtle issue 2 solutions! Which one is correct?
112 The CLASH algorithm combinatorial selection + least absolute shrinkage
113 The CLASH algorithm combinatorial selection + least absolute shrinkage combinatorial origami
114 Recovery algorithms for low-dimensional models The Clash Operator Non-convex Convex Probabilistic Encoding combinatorial / manifolds Example Algorithm IHT, CoSaMP, SP, ALPS, OMP atomic norm / convex relaxation Basis pursuit, Lasso, basis pursuit denoising compressible / sparse priors Variational Bayes, EP, Approximate message passing (AMP) The idea is much more general [Kyrillidis, Puy, and C, 2012]
115 Recovery algorithms for low-dimensional models Using projected gradient with exact non-convex projections with RIP/ERC/URC/RSC Exact low-dimensional model noise-free measurements: exact recovery noisy measurements: stable recovery Approximately low-dimensional model recovery as good as K-model-sparse approximation recovery error signal K-term model approx error noise [Baraniuk, C, Duarte, Hegde 2010]
116 Recovery algorithms for low-dimensional models Using projected gradient with exact non-convex projections with RIP/ERC/URC/RSC Exact low-dimensional model noise-free measurements: exact recovery noisy measurements: stable recovery Approximately low-dimensional model recovery as good as K-model-sparse approximation recovery error signal K-term model approx error noise the bound remains qualitatively the same for other models!!!
117 Recovery algorithms for low-dimensional models Projected gradient with (non)exact non-convex projections without RIP/ERC/URC/RSC Not much! convergence to stationary point with Kurdyka-Lojasiewicz [Attouch et al., 2010]
118 Examples CLASH Structured Sparsity Norm Constraints [Kyrillidis, Puy, and C, 2012]
119 Examples Coded Aperture Snapshot Spectral Imager
120 Examples Total variation TWiST results
121 Examples TV-CLASH TV + sparsity in wavelets [Kyrillidis, Puy, and C, 2012]
122 Acceleration of non-convex algorithms Several approaches step-size selection memory based methods similar to Nesterov acceleration / double overrelaxation non-convex splitting (adaptive) block coordinate descent epsilon-approximate projections [Kyrillidis and C, 2011]
123 Acceleration of non-convex algorithms Several approaches step-size selection memory based methods similar to Nesterov acceleration / double overrelaxation 34.8s non-convex splitting (adaptive) block coordinate descent epsilon-approximate projections 15.8s [Zhou and Tao 2011; Kyrillidis and C, 2012]
124 Final remarks non-convex algorithms <> low-dimensional scaffold possible performance gains non-convexifiable priors matching prox operator with optimal space/time bounds complexity of structured approximation non-convex algorithms vs. convex algorithms no clear winner / scenario dependent convex non-convex decades of research in both
125 References M. Afonso, J. Bioucas-Dias, M. Figueiredo, Fast image recovery using variable splitting and constrained optimization, IEEE Transactions on Image Processing, vol. 19, H. Attouch, J. Bolte, P. Redont, A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Lojasiewicz inequality, Math. Oper. Research, 2010 O. Axelsson, Iterative Solution Methods, Cambridge University Press, R. Baraniuk, V. Cevher, M. Duarte, C. Hegde, Model-based compressive sensing, IEEE Transactions on Information Theory, vol. 56, R. Baraniuk, M. Davenport, R. de Vore, M. Wakin, A Simple Proof of the Restricted Isometry Property for Random Matrices, Constructive Approximation, R. Baraniuk, V. Cevher, M. Wakin, Low-dimensional models for dimensionality reduction and signal recovery: A geometric perspective, Proceedings of the IEEE, vol. 98, J. Barzilai and J. Borwein, Two point step size gradient methods, IMA Journal of Numer. Anal., vol. 8, R. Basri, D. Jacobs, Lambertian Reflectance and Linear Subspaces, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 25, A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Science, vol. 2, J. Bioucas-Dias, M. Figueiredo, A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration, IEEE Transactions on Image Processing, vol. 16, P. Boufounos, R. Baraniuk, 1-bit compressed sensing, Proceedings of the Conference on Information Science and Systems, Princeton, C. Boutsidis, M. Mahoney, P. Drineas, An improved approximation algorithm for the column subset selection problem, Proc. 20th Annual ACM/SIAM Symposium on Discrete Algorithms, New York, NY, 2008.
126 References P. Bühlmann, S. van der Geer, Statistics for High-Dimensional Data, Springer, E. Candès, The restricted isometry property and its implications for compressed sensing, Comptes Rendus Mathematique, vol. 346, E. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, vol. 9, L. Carin, R. Baraniuk, V. Cevher, D. Dunson, M. Jordan, G. Sapiro, M. Wakin, Learning low-dimensional signal models, IEEE Signal Processing Magazine, vol. 28, V. Cevher, An ALPS view of sparse recovery, Proc. ICASSP, V. Chandrasekaran, B. Recht, P. Parrilo, A. Willsky, The convex geometry of linear inverse problems, submitted, R. Chartrand, W. Yin, Iteratively reweighted algorithms for compressive sensing, Proc. ICASSP, 2008 S. Chen, D. Donoho, M. Saunders, Atomic decomposition by Basis Pursuit, SIAM Review, vol. 43, P. Combettes, V. Wajs, Signal recovery by proximal forward-backward splitting, SIAM Journal Multiscale Modeling and Simulation, vol. 4, J. Eckstein, D. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming, vol. 5, M. Figueiredo and J. Bioucas-Dias, Restoration of Poissonian images using alternating direction optimization, IEEE Transactions on Image Processing, vol. 19, M. Figueiredo, R. Nowak, S. Wright, Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems, IEEE Journal of Selected Topics in Signal Processing, vol. 1, 2007.
127 References D. Gabay, B. Mercier, A dual algorithm for the solution of nonlinear variational problems via finite-element approximations, Computers and Mathematics with Application, vol. 2, R. Glowinski, A. Marroco, Sur l approximation, par elements finis d ordre un, et la resolution, par penalisation-dualité d une classe de problemes de Dirichlet non lineares, Rev. Française d Automatique, Y. Gordon, On Milman's inequality and random subspaces which escape through a mesh in R n, in Geometric Aspects of Functional Analysis, Springer, E. Hale, W. Yin, Y. Zhang, Fixed-point continuation for l1-minimization: Methodology and convergence, SIAM Journal on Optimization, vol. 19, N. Halko, P.-G. Martinsson, J. Tropp, Finding structure with randomness: stochastic algorithms for constructing approximate matrix decompositions, SIAM Review, vol. 53, C. Hegde, M. Duarte, V. Cevher, Compressive sensing recovery of spike trains using a structured sparsity model, Proceedings of SPARS'09, Saint-Malo, France, M. Hestenes, Multiplier and gradient methods, Journal of Optimazion Theory andapplications, vol. 4, A. Kyrillidis, V. Cevher, Recipes for hard thresholding methods, Tech. Rep., EPFL, J. Lee, V. Mirrokni, V. Nagarajan, M. Sviridenko, Non-monotone submodular maximization under matroid and knapsack constraints, Proc. 41st Annual ACM Symposium on Theory of Computing, Bethesda, MD, A. Lewis, J. Malick, Alternating projections on manifolds, Math. of Operations Research, vol. 33, D. Lorenz, Constructing test instances for basis pursuit denoising, submitted, N. Meinshausen, P. Bühlmann, High-dimensional graphs and variable selection with the lasso, The Annals of Statistics, vol. 34, pp , J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bull. Soc. Mathematiques de France, vol. 93, 1965.
128 References G. Nemhauser, L. Wolsey, Integer and combinatorial optimization, Wiley, S. Osher, M. Burger, D. Goldfarb, J. Xu, W. Yin, An iterative regularization method for total variation-based image restoration, SIAM Journal on Multiscale Modeling and Simulation, vol. 4, Y. Plan, Compressed sensing, sparse approximation, low-rank matrix estimation, PhD Thesis, Caltech, 2011 M. Powell, A method for nonlinear constraints in minimization problems, in Optimization, Academic Press, H. Raguet, J. Fadili, G. Peyré, Generalized Forward-Backward splitting, Tech. report, Hal , S. Setzer, G. Steidl, T. Teuber, Deblurring Poissonian images by split Bregman techniques, Journal of Visual Communication and Image Representation, K.-C. Toh, S. Yun, An accelerated proximal gradient algorithm for nuclear norm regularized least squares Problems, Pacific Journal of Optimization, vol. 6, A. Waters, A. Sankaranarayanan, R. Baraniuk, SpaRCS: recovering low-rank and sparse matrices from compressive measurements, Neural Information Processing Systems, S. Wright, R. Nowak, M. Figueiredo, Sparse reconstruction by separable approximation, IEEE Transactions on Signal Processing, vol. 57, W. Yin, S. Osher, D. Goldfarb, J. Darbon, Bregman iterative algorithms for l1-minimization with applications to compressed sensing, SIAM Journal on Imaging Science, vol. 1, T. Zhou, D. Tao, Godec: randomized low-rank & sparse matrix decomposition in noisy case, Proc. International Conference on Machine Learning, Bellevue, WA, 2011.
Low Complexity Regularization 1 / 33
Low Complexity Regularization 1 / 33 Low-dimensional signal models Information level: pixels large wavelet coefficients (blue = 0) sparse signals low-rank matrices nonlinear models Sparse representations
More informationOptimization Algorithms for Compressed Sensing
Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed
More informationOptimization for Learning and Big Data
Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for
More informationSparse Optimization Lecture: Dual Methods, Part I
Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationSPARSE SIGNAL RESTORATION. 1. Introduction
SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationSCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD
EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY
More informationThis can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization
This can be 2 lectures! still need: Examples: non-convex problems applications for matrix factorization x = prox_f(x)+prox_{f^*}(x) use to get prox of norms! PROXIMAL METHODS WHY PROXIMAL METHODS Smooth
More informationLecture Notes 9: Constrained Optimization
Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form
More informationMath 273a: Optimization Overview of First-Order Optimization Algorithms
Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization
More informationLarge-Scale L1-Related Minimization in Compressive Sensing and Beyond
Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.
ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationFast Hard Thresholding with Nesterov s Gradient Method
Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationModel-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk
Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional
More informationApproximate Message Passing for Bilinear Models
Approximate Message Passing for Bilinear Models Volkan Cevher Laboratory for Informa4on and Inference Systems LIONS / EPFL h,p://lions.epfl.ch & Idiap Research Ins=tute joint work with Mitra Fatemi @Idiap
More informationA memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration
A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationInverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.
Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems
More informationA Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization
A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationarxiv: v1 [math.na] 26 Nov 2009
Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,
More informationSIAM Conference on Imaging Science, Bologna, Italy, Adaptive FISTA. Peter Ochs Saarland University
SIAM Conference on Imaging Science, Bologna, Italy, 2018 Adaptive FISTA Peter Ochs Saarland University 07.06.2018 joint work with Thomas Pock, TU Graz, Austria c 2018 Peter Ochs Adaptive FISTA 1 / 16 Some
More informationInverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France
Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationRecent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables
Recent Developments of Alternating Direction Method of Multipliers with Multi-Block Variables Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2014 Workshop
More information2 Regularized Image Reconstruction for Compressive Imaging and Beyond
EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationAn accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems
An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationHomotopy methods based on l 0 norm for the compressed sensing problem
Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationEnhanced Compressive Sensing and More
Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationA General Framework for a Class of Primal-Dual Algorithms for TV Minimization
A General Framework for a Class of Primal-Dual Algorithms for TV Minimization Ernie Esser UCLA 1 Outline A Model Convex Minimization Problem Main Idea Behind the Primal Dual Hybrid Gradient (PDHG) Method
More informationRecent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationSparse Solutions of an Undetermined Linear System
1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationVariable Metric Forward-Backward Algorithm
Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with
More informationLinearized Alternating Direction Method: Two Blocks and Multiple Blocks. Zhouchen Lin 林宙辰北京大学
Linearized Alternating Direction Method: Two Blocks and Multiple Blocks Zhouchen Lin 林宙辰北京大学 Dec. 3, 014 Outline Alternating Direction Method (ADM) Linearized Alternating Direction Method (LADM) Two Blocks
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationSparsity and Compressed Sensing
Sparsity and Compressed Sensing Jalal Fadili Normandie Université-ENSICAEN, GREYC Mathematical coffees 2017 Recap: linear inverse problems Dictionary Sensing Sensing Sensing = y m 1 H m n y y 2 R m H A
More informationGeneralized greedy algorithms.
Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationBias-free Sparse Regression with Guaranteed Consistency
Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March
More informationMinimizing Isotropic Total Variation without Subiterations
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals
Preprint manuscript No. (will be inserted by the editor) ParNes: A rapidly convergent algorithm for accurate recovery of sparse and approximately sparse signals Ming Gu ek-heng im Cinna Julie Wu Received:
More informationSparse and Regularized Optimization
Sparse and Regularized Optimization In many applications, we seek not an exact minimizer of the underlying objective, but rather an approximate minimizer that satisfies certain desirable properties: sparsity
More informationarxiv: v2 [cs.cv] 27 Sep 2013
Solving OSCAR regularization problems by proximal splitting algorithms Xiangrong Zeng a,, Mário A. T. Figueiredo a, a Instituto de Telecomunicações, Instituto Superior Técnico, 1049-001, Lisboa, Portugal.
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationThe Analysis Cosparse Model for Signals and Images
The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationDenoising of NIRS Measured Biomedical Signals
Denoising of NIRS Measured Biomedical Signals Y V Rami reddy 1, Dr.D.VishnuVardhan 2 1 M. Tech, Dept of ECE, JNTUA College of Engineering, Pulivendula, A.P, India 2 Assistant Professor, Dept of ECE, JNTUA
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationLinear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1
Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de
More informationTRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS
TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationA tutorial on sparse modeling. Outline:
A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing
More informationLearning MMSE Optimal Thresholds for FISTA
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm
More informationSparse Optimization: Algorithms and Applications. Formulating Sparse Optimization. Motivation. Stephen Wright. Caltech, 21 April 2007
Sparse Optimization: Algorithms and Applications Stephen Wright 1 Motivation and Introduction 2 Compressed Sensing Algorithms University of Wisconsin-Madison Caltech, 21 April 2007 3 Image Processing +Mario
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationRobust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds
Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds Tao Wu Institute for Mathematics and Scientific Computing Karl-Franzens-University of Graz joint work with Prof.
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationThe convex algebraic geometry of rank minimization
The convex algebraic geometry of rank minimization Pablo A. Parrilo Laboratory for Information and Decision Systems Massachusetts Institute of Technology International Symposium on Mathematical Programming
More informationABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng
ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting
More informationADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution
ADMM in Imaging Inverse Problems: Non-Periodic and Blind Deconvolution Mário A. T. Figueiredo Instituto Superior Técnico, University of Lisbon, Portugal Instituto de Telecomunicações Lisbon, Portugal Joint
More informationA Dykstra-like algorithm for two monotone operators
A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationTopographic Dictionary Learning with Structured Sparsity
Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationMultipath Matching Pursuit
Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy
More informationFrist order optimization methods for sparse inverse covariance selection
Frist order optimization methods for sparse inverse covariance selection Katya Scheinberg Lehigh University ISE Department (joint work with D. Goldfarb, Sh. Ma, I. Rish) Introduction l l l l l l The field
More informationRandom projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016
Lecture notes 5 February 9, 016 1 Introduction Random projections Random projections are a useful tool in the analysis and processing of high-dimensional data. We will analyze two applications that use
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationAlternating Direction Optimization for Imaging Inverse Problems
Alternating Direction Optimization for Imaging Inverse Problems Mário A. T. Figueiredo Instituto Superior Técnico, and Instituto de Telecomunicações Technical University of Lisbon PORTUGAL Joint work with:
More informationPreconditioning via Diagonal Scaling
Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationGreedy Sparsity-Constrained Optimization
Greedy Sparsity-Constrained Optimization Sohail Bahmani, Petros Boufounos, and Bhiksha Raj 3 sbahmani@andrew.cmu.edu petrosb@merl.com 3 bhiksha@cs.cmu.edu Department of Electrical and Computer Engineering,
More informationWHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????
DUALITY WHY DUALITY? No constraints f(x) Non-differentiable f(x) Gradient descent Newton s method Quasi-newton Conjugate gradients etc???? Constrained problems? f(x) subject to g(x) apple 0???? h(x) =0
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More informationConstrained optimization
Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained
More information