Penalty Decomposition Methods for Rank and l 0 -Norm Minimization

Size: px

Start display at page:

Download "Penalty Decomposition Methods for Rank and l 0 -Norm Minimization"

Theresa Murphy
5 years ago
Views:

1 IPAM, October 12, 2010 p. 1/54 Penalty Decomposition Methods for Rank and l 0 -Norm Minimization Zhaosong Lu Simon Fraser University Joint work with Yong Zhang (Simon Fraser)

2 IPAM, October 12, 2010 p. 2/54 Outline of Talk Rank and l 0 -norm minimization problems Technical preliminaries PD methods for rank minimization PD methods forl 0 -norm minimization Numerical results

3 IPAM, October 12, 2010 p. 3/54 Rank minimization Rank minimization: min{f(x) : rank(x) r, X X Ω}, X min{f(x)+ν rank(x) : X X Ω} X for somer, ν 0, wherex is a closed convex set,ωis a closed unitarily invariant set in R m n, and f : R m n R is a continuously differentiable function. Applications: combinatorial optimization; nonconvex QP; image recovery; nearest low-rank correlation matrix and etc.

4 IPAM, October 12, 2010 p. 4/54 l 0 -norm minimization l 0 -norm minimization: min x {f(x) : x J 0 r, x X}, min x {f(x)+ν x J 0 : x X} for some integer r 0 and ν 0 controlling the sparsity of the solution, where X is a closed convex set in R n,f : R n R is a continuously differentiable function, and x J 0 denotes the cardinality of the subvector formed by the entries of x indexed by J. Applications: compressed sensing; sparse logistic regression; sparse inverse covariance selection and etc.

5 IPAM, October 12, 2010 p. 5/54 Technical preliminaries Proposition: Let be a unitarily invariant norm on R m n, and let F : R m n R be a unitarily invariant function. Suppose that X R m n is a unitarily invariant set. LetA R m n be given, q = min(m,n), and letφbe a non-decreasing function on [0, ). Suppose that UΣ(A)V T is the singular value decomposition of A. Then, X = UD(x )V T is an optimal solution of the problem min F(X)+φ( X A ) s.t. X X, wherex R q is an optimal solution of the problem min F(D(x))+φ( D(x) Σ(A) ) s.t. D(x) X.

6 IPAM, October 12, 2010 p. 6/54 Technical preliminaries (cont d) Corollary 1: Let ν 0 and A R m n be given, and let q = min(m,n). Suppose thatx R m n is a unitarily invariant set, and UΣ(A)V T is the singular value decomposition of A. Then, X = UD(x )V T is an optimal solution of the problem min{ν rank(x)+ 1 2 X A 2 F : X X}, wherex R q is an optimal solution of the problem min{ν x x σ(a) 2 2 : D(x) X}.

7 IPAM, October 12, 2010 p. 7/54 Technical preliminaries (cont d) Corollary 2: Let r 0 and A R m n be given, and let q = min(m,n). Suppose thatx R m n is a unitarily invariant set, and UΣ(A)V T is the singular value decomposition of A. Then, X = UD(x )V T is an optimal solution of the problem min{ X A F : rank(x) r, X X}, wherex R q is an optimal solution of the problem min{ x σ(a) 2 : x 0 r, D(x) X}.

8 IPAM, October 12, 2010 p. 8/54 Technical preliminaries (cont d) Corollary 3: Let ν 0 and A R m n be given, and let q = min(m,n). Suppose thatuσ(a)v T is the singular value decomposition of A. Then, X = UD(x )V T is an optimal solution of the problem minν X X A 2 F, wherex R q is an optimal solution of the problem minν x x σ(a) 2 2.

9 IPAM, October 12, 2010 p. 9/54 Technical preliminaries (cont d) Corollary 4: Let r 0 and A R m n be given, and let q = min(m,n). Suppose thatuσ(a)v T is the singular value decomposition of A. Then, X = UD(x )V T is an optimal solution of the problem min{ X A F : X r}, wherex R q is an optimal solution of the problem min{ x σ(a) 2 : x 1 r}.

10 IPAM, October 12, 2010 p. 10/54 Technical preliminaries (cont d) Proposition: Let X i R and φ i : R R for i = 1,...,n be given. Suppose that r is a positive integer and 0 X i for alli. Consider the following l 0 -norm minimization problem: { } n min φ(x) = φ i (x i ) : x 0 r, x X 1 X n i=1. (1) Let x i Argmin{φ i (x i ) : x i X i } and I {1,...,n} be the index set corresponding to r largest values of{vi }n i=1, where vi = φ i (0) φ i ( x i) for i = 1,...,n. Then, x is an optimal solution of problem (1), where x is defined as follows: x i = { x i if i I ; 0 otherwise, i = 1,...,n.

11 IPAM, October 12, 2010 p. 11/54 Technical preliminaries (cont d) Proposition: Let X i R and φ i : R R for i = 1,...,n be given. Suppose that ν 0 and 0 X i for alli. Consider the following l 0 -norm minimization problem: { } n min ν x 0 + φ i (x i ) : x X 1 X n i=1. (2) Let x i Argmin{φ i (x i ) : x i X i } and vi = φ i (0) ν φ i ( x i) for i = 1,...,n. Then, x is an optimal solution of problem (2), where x is defined as follows: x i = { x i if v i 0; 0 otherwise, i = 1,...,n.

12 IPAM, October 12, 2010 p. 12/54 Consider PD methods for rank minimization min{f(x) : rank(x) r, X X Ω}, X (3) min{f(x)+ν rank(x) : X X Ω} X (4) for somer, ν 0, wherex is a closed convex set,ωis a closed unitarily invariant set in R m n, and f : R m n R is a continuously differentiable function. Assumption: Problems (3) and (4) are feasible, and moreover, at least a feasible solution, denoted by X feas, is known.

13 IPAM, October 12, 2010 p. 13/54 PD methods for rank minimization where min{f(x) : rank(x) r, X X Ω} X min X,Y {f(x) : X Y = 0, X X, Y Y}, (5) Y := {Y Ω rank(y) r}. Given > 0, define: Q (X,Y) := f(x)+ 2 X Y 2 F, Q (X,U,V) := Q (X,UV) X R m n,u R m r,v R r n.

14 PD method for (5) (asymmetric matrices): Let {ǫ k } be a positive decreasing sequence. Let 0 > 0,σ > 1 be given. Choose an arbitrary Y0 0 Y and a constant Υ max{f(x feas ),min X X Q 0(X,Y0 0 )}. Set k = 0. 1) Set l = 0 and apply the BCD method to find an approximate solution (X k,y k ) X Y for the penalty subproblem: by performing steps 1a)-1d): min{q k(x,y) : X X, Y Y} (6) 1a) SolveXl+1 k Arg min Q k(x,y l k). X X 1b) SolveYl+1 k ArgminQ k(x l+1 k,y). Y Y 1c) Set (X k,y k ) := (Xl+1 k,y l+1 k ). If (Xk,Y k ) satisfies dist ( X Q k(x k,y k ),N X (X k ) ) ǫ k, (7) U Q k(x k,u k,v k )+Z k Y(V k ) T F ǫ k, (8) V Q k(x k,u k,v k )+(U k ) T Z k Y F ǫ k (9) IPAM, October 12, 2010 p. 14/54

15 IPAM, October 12, 2010 p. 15/54 for some Z k Y N Ω(Y k ),U k R m r,v k R r n such that (U k ) T U k = I, Y k = U k V k, (10) then go to step 2). 1d) Set l l+1and go to step 1a). 2) Set k+1 := σ k. 3) If min X X Q k+1 (X,Y k ) > Υ, set Y k+1 0 := X feas. Otherwise, set Y k+1 0 := Y k. 4) Set k k +1and go to step 1). end

16 IPAM, October 12, 2010 p. 16/54 Theorem (outer iterations): Assume that ǫ k 0. Let {(X k,y k )} be generated by the above PD method, and {(U k,v k,zy k )} be the associated seq satisfying (7)-(10). Suppose that the level set X Υ := {X X f(x) Υ} is compact. Then: (a) {(X k,y k,u k,v k )} is bounded; (b) Suppose that {(X k,y k,u k,v k )} k K converges to(x,y,u,v ). Then, (X,Y ) is a feasible point of problem (5). Moreover, if d X d U V U d V d U V +U d V d Y : d X T X (X ),d U R m r, d V R r n,d Y T Ω (X ) = Rm n R m n holds, then {(Z k X,Zk Y )} k K is bounded, where Z k X := k(x k Y k ), and each cluster point (Z X,Z Y ) of {(Zk X,Zk Y )} k K together with(x,u,v ) satisfies f(x ) Z X N X(X ), (Z X Z Y )(V ) T = 0, (U ) T (Z X Z Y ) = 0, X U V = 0, Z Y N Ω (X ).

17 IPAM, October 12, 2010 p. 17/54 PD methods for rank minimization (cont d) Remark: The above cluster point (X,U,V,ZX,Z Y ) satisfies the KKT conditions of the following reformulation of (5) (or, equivalently, (3)): min X,U,V {f(x) : X UV = 0, UV Ω, X X, U Rm r, V R r n }. Theorem (inner iterations): Suppose that the following condition: { du V + Ūd V d Y : d U R m r,d V R r n,d Y T Ω (Ȳ)} = R m n holds for any Ū Rm r, V R r n such that ŪT Ū = I and Ȳ = Ū V Ω. The approximate solution (X k,y k ) X Y for problem (6) satisfying (7)-(10) can be found by the BCD method described in steps 1a)-1d) within a finite number of iterations.

18 4) Set k k + 1 and go to step 1). IPAM, October 12, 2010 p. 18/54 PD methods for rank minimization (cont d) Penalty decomposition method for (4): Let 0 > 0,σ > 1 be given. Choose an arbitraryy 0 0 Ω and a constant Υ such that Υ max{f(x feas )+ν rank(x feas ),min X X P 0(X,Y 0 0 )}. Set k = 0. 1) Set l = 0 and apply the BCD method to find an approximate solution (X k,y k ) X Ω for the penalty subproblem min{p k(x,y) : X X, Y Ω} by performing steps 1a)-1c): 1a) SolveXl+1 k Arg min P k(x,y l k). X X 1b) SolveYl+1 k ArgminP k(x l+1 k,y). Y Ω 1c) Set l l+1and go to step 1a). 2) Set k+1 := σ k. 3) If min X X P k+1 (X,Y k ) > Υ, set Y k+1 0 := X feas. Otherwise, set Y k+1 0 := Y k.

19 IPAM, October 12, 2010 p. 19/54 PD methods for l 0 -norm minimization Consider the l 0 -norm minimization problems: min{f(x) : x J 0 r, x X}, x (11) min{f(x)+ν x J 0 : x X} x (12) for some integer r 0 and ν 0 controlling the sparsity of the solution, where X is a closed convex set in R n,f : R n R is a continuously differentiable function, and x J 0 denotes the cardinality of the subvector formed by the entries of x indexed by J. Assumption: Problems (11) and (12) are feasible, and moreover, at least a feasible solution, denoted by x feas, is known.

20 IPAM, October 12, 2010 p. 20/54 PD methods for l 0 -norm minimization For simplicity, assume J = {1,2,...,n}. Define: Observe: X M = {D(x) : x X}, f M (X) = f(d (X)) X D n. min{f(x) : x 0 r, x X} x min {f M(X) : rank(x) r, X X M }, (13) X which can be suitably solved by the above PD method. Define: Y M := {Y S n rank(y) r}, Q (X,Y) := f M (X)+ 2 X Y 2 F, Q (X,U,D) := Q (X,UDU T ) X D n,u R n r,d D r.

21 Penalty decomposition method for (13): Let {ǫ k } be a positive decreasing sequence. Let 0 > 0,σ > 1 be given. Choose an arbitrary Y0 0 Y M and a constant Υ max{f(x feas ),min X XM Q 0(X,Y0 0 )}. Set k = 0. 1) Set l = 0 and apply the BCD method to find an approximate solution (X k,y k ) X M Y M for the penalty subproblem: by performing steps 1a)-1d): 1a) SolveX k l+1 1b) SolveY k l+1 min{q k(x,y) : X X M, Y Y M } Arg min X X M Q k(x,y k l ). Arg min Y Y M Q k(x k l+1,y). 1c) Set (X k,y k ) := (X k l+1,y k l+1 ). If (Xk,Y k ) satisfies dist ( X Q k(x k,y k ),N XM (X k ) ) ǫ k, U Q k(x k,u k,d k ) F ǫ k, (14) D Q k(x k,u k,d k ) F ǫ k IPAM, October 12, 2010 p. 21/54

22 IPAM, October 12, 2010 p. 22/54 for some U k R n r,d k D r such that then go to step 2). 1d) Set l l+1and go to step 1a). 2) Set k+1 := σ k. (U k ) T U k = I, Y k = U k D k (U k ) T, (15) 3) If min X X M Q k+1 (X,Y k ) > Υ, set Y k+1 0 := D(x feas ). Otherwise, set Y k+1 0 := Y k. 4) Set k k +1and go to step 1). end

23 Theorem: Assume thatǫ k 0. Let {(X k,y k,u k,d k )} be generated by the above PD method satisfying (14) and (15). Suppose that X Υ := {X X M f M (X) Υ} is compact. Then: (a) {(X k,y k,u k,d k )} is bounded; (b) Suppose that {(X k,y k,u k,d k )} k K converges to(x,y,u,d ). Then, X = Y and X is a feasible point of problem (13). Moreover, if the following condition d X d U D (U ) T U d D (U ) T U D d T U : d X T XM (X ), d U R n r,d D D r holds, then {Z k } k K is bounded, where Z k := k(x k Y k ), and each cluster point Z of {Z k } k K together with (X,U,D ) satisfies f M (X ) Z N XM (X ), Z U D = 0, D ( (U ) T Z U ) = 0, Dn X U D (U ) T = 0. IPAM, October 12, 2010 p. 23/54

24 IPAM, October 12, 2010 p. 24/54 PD methods for l 0 -norm minimization Remark: The above cluster point (X,U,D,Z ) satisfies the KKT conditions of the following reformulation of (13) (or, equivalently, (11)). min {f M(X) : X UDU T = 0, X X M, U R n r, D D r }. X,U,D Goal: Transfer the above PD method into the one involving vector operations only. Define: Y = {y R n : y 0 r}, q (x,y) = f(x)+ 2 x y 2 2 x,y R n.

25 Penalty decomposition method for (11): Let {ǫ k } be a positive decreasing sequence. Let 0 > 0,σ > 1 be given. Choose an arbitrary y 0 0 Y and a constant Υ max{f(x feas ),min x X q 0(x,y 0 0)}. Set k = 0. 1) Set l = 0 and apply the BCD method to find an approximate solution (x k,y k ) X Y for the penalty subproblem by performing steps 1a)-1d): min{q k(x,y) : x X, y Y} 1a) Solvex k l+1 Argmin x X q k(x,y k l ). 1b) Solvey k l+1 Argmin y Y q k(x k l+1,y). 1c) Set (x k,y k ) := (x k l+1,yk l+1 ). If (xk,y k ) satisfies dist ( x q k(x k,y k ),N X (x k ) ) ǫ k, then go to step 2). 1d) Set l l+1and go to step 1a). IPAM, October 12, 2010 p. 25/54

26 IPAM, October 12, 2010 p. 26/54 2) Set k+1 := σ k. 3) Ifmin x X q k+1 (x,y k ) > Υ, sety k+1 0 := x feas. Otherwise, set y k+1 0 := y k. 4) Setk k +1and go to step 1). end

27 IPAM, October 12, 2010 p. 27/54 Theorem: Assume that ǫ k 0. Let {(x k,y k )} be generated by the above PD method andj k = {j k 1,...,j k r} a set ofr distinct indices such that (y k ) j = 0 for all j / J k. Suppose that X Υ := {x X f(x) Υ} is compact. Then: (a) {(x k,y k )} is bounded; (b) Suppose (x,y ) is a cluster point of{(x k,y k )}. Then, x = y and x is a feasible point of problem (11). Moreover, there exists a subsequence K such that {(x k,y k )} k K (x,y ) andj k = J for some index set J when k K is sufficiently large. Furthermore, if {d x +d d : d x T X (x ), d d R n, (d d ) j = 0 j / J } = R n holds, then {z k := k(x k y k )} k K is bounded and each cluster point z of {z k } k K together withx satisfies f(x ) z N X (x ), z j = 0 j J. (16)

28 and (16) becomes f(x ) N X (x ). But (17) clearly cannot when I {1,...,n}. IPAM, October 12, 2010 p. 28/54 Remark: The optimality condition (16) is generally stronger than one natural optimality condition for (11). Let I = {j : x j 0}. Suppose that x is a local minimum of (11). Then x is clearly a local minimum of min x X {f(x) : x j = 0 j / I }. Assume that the constraint qualification {d x +d d : d x T X (x ),(d d ) j = 0 j / I } = R n holds at x. Then there existsz R n such that f(x ) z N X (x ), z j = 0 j I. (17) Clearly, when I J, (16) is generally stronger than (17). For example, when r = n and J = {1,...,n}, problem (11) reduces to min x {f(x) : x X}

29 IPAM, October 12, 2010 p. 29/54 Matrix completion Test I: recover a low-rank matrix M R m n based on a subset of entries ofm. min X R m n rank(x) s.t. X ij = M ij, (i,j) Θ, whereθis a subset of index pairs(i,j). Instances: randomly generate50 copies of M and Θ with m = n = 40, p = 800 for eachr = 1,...,10 by the same procedure as described by Ma, Goldfarb and Chen (2008).

30 IPAM, October 12, 2010 p. 30/54 Matrix completion Initialization: X feas m n such that Xij feas Xij feas = 0 for all(i,j) / Θ. Y 0 0 = X feas. 0 = 0.1 and σ = 10. Inner termination criterion: = M ij for all(i,j) Θ and Q k(x k l,y k l ) Q k(x k l 1,Y k l 1 ) max( Q k(x k l,y k l ),1) Outer termination criterion: max ij X k ij Y k ij 10 5.

31 IPAM, October 12, 2010 p. 31/54 Relative error: Matrix completion rel_err := X M F M F, wherex is an approximate recovery for M. As in Recht, Fazel, and Parrilo (2007) and Candés and Recht (2008), we say a matrix M is successfully recovered by X if the relative error is less than Remark: The recoverabily of M depends on two ratios: SR = p/(mn). FR = r(m+n r)/p.

32 IPAM, October 12, 2010 p. 32/54 Matrix completion Table 1: Numerical results for m = 40, n = 40 and p = 800 Problems FPCA LMaFit PD Rank FR NS rel_err Time NS rel_err Time NS rel_err Time e e e e e e e e e e e e e e e e e e e e e e e e

33 IPAM, October 12, 2010 p. 33/54 Matrix completion Test II: recover a high-rank matrix M R n n, whose most of singular values are nearly zero, by a low-rank matrix based on a subset of entries {M ij } (i,j) Θ. Instances: randomly generate50 instances for each sample ratio varying from0.5 to 0.9 with the singular values given by σ i = i 4 for alli.

34 IPAM, October 12, 2010 p. 34/54 Matrix completion Table 2: Numerical results for n = 40 and σ i = i 4 FPCA LMaFit PD SR Rank Rank rel_err Rank rel_err Rank rel_err e e e e e e e e e e e e e e e 4

35 IPAM, October 12, 2010 p. 35/54 Matrix completion Test III (grayscale image inpainting problem): fill the missing pixel values of the image at given pixel locations. (a) original image (b) rank 40 image (c) 50% masked original image (d) recovered image by PD

problem): fill the missing pixel values of the image at given pixel

36 IPAM, October 12, 2010 p. 36/54 Matrix completion Test III (grayscale image inpainting problem): fill the missing pixel values of the image at given pixel locations. (e) 50% masked rank 40 image (f) recovered image by PD (g) 6.3% masked rank 40 image (h) recovered image by PD

37 IPAM, October 12, 2010 p. 37/54 Nearest low-rank correlation Nearest low-rank correlation problem: 1 min X X S n 2 C 2 F s.t. diag(x) = e, rank(x) r, X 0 for some correlation matrix C S n + and some integer r [1,n]

38 IPAM, October 12, 2010 p. 38/54 Nearest low-rank correlation Table 3: Comparison of Major and PD Problem Major PD Iter Obj Time Iter Obj Time P1n100r P1n100r P1n100r P1n100r P1n100r P1n500r P1n500r P1n500r P1n500r P1n500r

39 IPAM, October 12, 2010 p. 39/54 Nearest low-rank correlation Table 4: Comparison of Major and PD Problem Major PD Iter Obj Time Iter Obj Time P2n100r P2n100r P2n100r P2n100r P2n100r P2n500r P2n500r P2n500r P2n500r P2n500r

40 IPAM, October 12, 2010 p. 40/54 Nearest low-rank correlation Table 5: Comparison of Major and PD Problem Major PD Iter Obj Time Iter Obj Time P3n100r P3n100r P3n100r P3n100r P3n100r P3n500r P3n500r P3n500r P3n500r P3n500r

41 IPAM, October 12, 2010 p. 41/54 Nearest low-rank correlation Table 6: Comparison of Major and PD Problem Major PD Iter Obj Time Iter Obj Time P4n100r P4n100r P4n100r P4n100r P4n100r P4n500r P4n500r P4n500r P4n500r P4n500r

42 IPAM, October 12, 2010 p. 42/54 Sparse logistic regression Given n samplesz i s with p features, and n binary outcomesb i s, let a i = b i z i fori = 1,...,n. The average logistic loss function is: l avg (v,w) := n i=1 θ(wt a i +vb i )/n for v R and w R p, whereθ is the logistic loss function θ(t) := log(1+exp( t)). The sparse logistic regression problem: min v,w {l avg(v,w) : w 0 r}, where integer r [1,p] is for controlling the sparsity of the solution. The l 1 -norm regularization problem: min v,w l avg(v,w)+λ w 1, whereλ 0 is a regularization parameter.

43 IPAM, October 12, 2010 p. 43/54 Sparse logistic regression Given any model variables(v,w) and a sample vector z R p, the outcome predicted by (v,w) for z is given by φ(z) = sgn(w T z +v), where sgn(t) = { +1 if t > 0, 1 otherwise. The error rate of (v,w) for predicting the outcomesb 1,...,b n : { n } Error := φ(z i ) b i 0 /n 100%. i=1 Goal: compare the quality of the solutions of similar sparsity obtained by our PD (l 0 ) and IPM method (l 1 ) (Kim et al. (2007)).

44 IPAM, October 12, 2010 p. 44/54 Sparse logistic regression Table 7: Computational results on three real data sets Data Features Samples IPM PD p n λ/λ max r l avg Error(%) l avg Error(%) Colon Ionosphere Advertisements

45 IPAM, October 12, 2010 p. 45/54 Sparse logistic regression Table 8: Average computational time on six random problems Size Time n p r = 0.1p r = 0.3p r = 0.5p r = 0.7p r = 0.9p

46 IPAM, October 12, 2010 p. 46/54 Sparse inverse covariance selection Sparse inverse covariance selection: max X 0 s.t. logdetx Σ,X X ij 0 r, (i,j) Ω X ij = 0 (i,j) Ω, where Ω = {(i,j) : (i,j) / Ω, i j}, and r [1, Ω ] is some integer for controlling the sparsity of the solution. (18) The l 1 -norm regularization: max X 0 logdetx Σ,X s.t. X ij = 0 (i,j) Ω, (i,j) Ω ρ ij X ij (19) where{ρ ij } (i,j) Ω is a set of regularization parameters.

47 IPAM, October 12, 2010 p. 47/54 Sparse inverse covariance selection Test I (random data): compare the solution quality of the l 0 problem and its l 1 regularization. Data: randomly generate Σ and Ω by the same manner as in d Aspremont et al. (2007) and L. (2008). Normalized entropy loss: Loss := 1 p ( Σ t,x logdet(σ t X) p). apply the PPA (Wang, Sun and Toh (2009)) to solve the l 1 regularization problem with ρ Ω = 0.01, 0.05, 0.1 and 0.5 and obtain solution X ; setr = (i,j) Ω X ij 0 for the l 0 -norm problem so that the solution by PD method is at least as sparse asx.

48 IPAM, October 12, 2010 p. 48/54 Sparse inverse covariance selection Table 9: Computational results for δ = 10% Problem PPA PD p Ω ρ Ω r Likelihood Loss Time Likelihood Loss Time

49 IPAM, October 12, 2010 p. 49/54 Sparse inverse covariance selection Table 10: Computational results for δ = 50% Problem PPA PD p Ω ρ Ω r Likelihood Loss Time Likelihood Loss Time

50 IPAM, October 12, 2010 p. 50/54 Sparse inverse covariance selection Table 11: Computational results for δ = 100% Problem PPA PD p Ω ρ Ω r Likelihood Loss Time Likelihood Loss Time

51 IPAM, October 12, 2010 p. 51/54 Sparse inverse covariance selection Test II (random data): compare the sparsity recovery ability of the l 0 problem and its l 1 regularization. Data: setp = 30 and randomly generate true and sample covariance matricesσ t and Σ by the same manner as in d Aspremont (2007); setω = {(i,j) : (Σ t ) 1 ij = 0, i j 15}; setρ ij = ρ Ω for all(i,j) Ω, whereρ Ω is the smallest value such that the total number of nonzero off-diagonal entries of the approximate solution obtained by the PPA when applied to the l 1 problem equals (i,j) Ω (Σt ) 1 ij 0; setr = (i,j) Ω (Σt ) 1 ij 0 for the l 0 problem.

52 IPAM, October 12, 2010 p. 52/54 Sparse inverse covariance selection (a) Original inverse(σ t ) 1 (b) Noisy inverseσ 1 (c) Approx solution of (19) (d) Approx solution of (18)

53 IPAM, October 12, 2010 p. 53/54 Sparse inverse covariance selection Test III (real data): compare the solution quality of the l 0 problem and its l 1 regularization. Data: pre-process two gene expression data by the same procedure as described by Li and Toh (2010) to obtain Σ; setω = and ρ ij = ρ Ω for some ρ Ω > 0; choose r so that the solution given by the PD method when applied to the l 0 problem is at least as sparse as the one obtained by the PPA when applied to the l 1 problem.

54 IPAM, October 12, 2010 p. 54/54 Sparse inverse covariance selection Table 12: Computational results on two real data sets Data Genes Samples PPA PD p n ρ Ω r Likelihood Loss Time Likelihood Loss Time Lymph Leukemia

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems