Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Size: px

Start display at page:

Download "Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova"

Oswin Sutton
5 years ago
Views:

1 Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France SIAM Minisymposium on Trends in the Mathematics of Signal Processing and Imaging organized by Willi Freeden and Zuhair Nashed Joint Mathematics Meetings, January 6 9, 2016, Seattle

2 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) 3. Parameter values for equality between optimal sets of (C k ) and (R β ) 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests 8. Conclusions and future directions 2

3 1. Two optimization problems involving the l 0 pseudo norm A = ( A 1,, A N ) R M N (matrix) N > M d R M \ {0} (data) A vector û R N is k-sparse if u 0 := { } i : u[i] 0 k. One looks for a sparse vector û such that Aû d. Two desirable optimization problems to find a sparse û: (C k ) min u R N Au d 2 2 subject to u 0 k (constrained) (R β ) F β (u) = Au d β u 0 β > 0 (regularized) These are NP hard (combinatorial) nonconvex problems. Our goal: Clarify the relationship between the global minimizers of (R β ) and (C k ). [M. N., ACHA 2016]. 3

4 Applications: signal and image processing, sparse coding, compression, dictionary building, compressive sensing, machine learning, model selection, classification... 0 has served as a regularizer or as a penalty for a long time Markov random fields, MAP F β (u) = Au d β Du 0 Geman & Geman (1984), Besag (1986) - labeled images, stochastic algorithms Robini & Reissman (2012) - global convergence / computation speed (!) Subset selection via (R β ) - numerous algorithms - c.f. textbook Miller (2002) (C k ) natural sparse coding constraint. Also the best K-term approximation [DeVore 1998)]. Sparse-Land, M < N - strong assumptions on A (RIP, spark, etc.) / various approximations. A huge amount of papers with approximating algorithms, e.g. Haupt & Nowak (06), Blumensath & Davies (08), Tropp (10), Zhang et al (12), Beck & Eldar (14) Typical assumptions: RIP or K spark(a) plus others (e.g. bounds on A etc.) Important progress in solving problems (C k ) and (R β ). The numerical schemes common points. = Explore the relationship between their optimal sets. 4

5 The optimal values / the optimal solution setsof problems (C k ) and (R β ): (C k ) c k := inf { Au d 2 u R N and u 0 k } Ĉ k := { u R N and u 0 k Au d 2 = c k } (R β ) r β := inf { F β (u) u R N} R β := { u R N F β (u) = r β } Theorem 1 For any d R M : Ĉ k k and Rβ β > 0. H 1 Assumption : rank(a) = M < N no further reminder How to evaluate the extent of assumption dependent properties? Definition 1 A property is generic on R M if it holds on a subset of R M \ S where S is closed in R M and its Lebesgue measure in R M is null. A generic property is stronger than a property that holds only with probability one. I n := ( {1,..., n}, < ) and I 0 n := ( {0, 1,..., n}, < ) (totally strictly ordered) L := min { k I N c k = 0 } (uniquely defined) generically L = M 5

6 Main results There is a strictly decreasing sequence {β k } k J {β Jk } for J I L such that û is global minimizer of F β for β ( β Jk, β Jk 1 ) û is global minimizer of (C Jk ) Equivalently { Rβ β ( β Jk, β Jk 1 ) } = ĈJ k k J In a generic sense R βjk = ĈJ k ĈJ k+1 All β Jk s are obtained from the optimal values c k s of the problems (C k ), k I 0 L. The global minimizers of problems (C k ) and (R β ) are strict and generically uniques J is always nonempty For any n I 0 L \ J the global minimizers of (C n) are not global minimizers of (R β ) β When J = I 0 L, problems (C k) and (R β ) are quasi-completely equivalent: { Rβ β (β k, β k 1 )} = Ĉk β k = c k c k+1 k 6

7 Notation. :=. 2. { } supp (u) := i I N : u[i] 0 For any ω I 0 N A ω := ( A ω1,..., A ω ω ) R M ω, A T ω is the transposed of A ω u ω := ( u ω1,..., u ω ω ) T R ω Definition 2 Let f : R N R and S R N. Consider the problem min {f(u) u S}. û is a strict minimizer if there is a neighborhood O S, û O so that f(u)>f(û) u O\ {û}. û is an isolated (local) minimizer if û is the only minimizer in an open subset O O 7

8 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) Preliminaries On the global minimizers of problem (C k ) Necessary and sufficient conditions 3. Parameter values for equality between optimal sets of (C k ) and (R β ) 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests 8. Conclusions and future directions 8

9 2. Common optimality conditions for (C k ) and (R β ) Goal: Derive tests relating the optimal solutions of (C k ) and (R β ). 2.1 Preliminaries A constrained quadratic optimization problem: given d R M and ω I N (P ω ) min u R N Au d 2 subject to u[i] = 0 i I 0 N \ ω The convex problem (P ω ) always has solutions, for any ω I 0 N and for any d RM. Some useful facts on the relation of (P ω ) to (C k ) and (R β ) [M. N. SIIMS 2013] (R β ) û solves (P ω ) for some ω I 0 N û is a (local) minimizer of F β, β > 0 û solves (P ω ) for ω I 0 N with rank(a ω ) = ω û = strict (local) minimizer of F β, β (C k ) û solves (P ω ) for ω I 0 N with ω = k û is a (local) minimizer of (C k) Remark 1 For any ω I N with rank(a ω ) = ω, the minimizer û of (P ω ) is isolated. 9

10 2.2 On the optimal solution sets of problem (C k ) Lemma 1 c 0 = d 2 and {c k } k 0 is decreasing with c k = 0 k M. Lemma 2 For k I M let (C k ) have a global minimizer û obeying û 0 = k n for n 1. Then Aû = d. Furthermore û Ĉm and c m = 0 m k n. L := min { k I M c k = 0 }. Example 1 One has L M 1 if d = Au for u 0 M 1. Theorem 2 û Ĉk for k I 0 L = k L + 1 = ĈL Ĉk. û 0 = k = rank (A σ ) for σ := supp (û) so û is a strict global minimizer of (C k ). Corollary 1 Ĉk Ĉn = (k, n) ( I 0 L )2 such that k n. 10

11 Example 2 A = and d = 1 1 û = (0, 0, 1) T = Ĉ1 (strict, rank(a 3 ) = û 0 ) = c 1 = 0 = L = 1. û = (1, 1, 0) T is a strict global minimizer of (C 2 ) because rank ( ) A supp (û) = 2 and c2 = 0. A = and d = 1 0 Ĉ 1 = { (1, 0, 0, 0) T, (0, 0, 1, 0) T} (strict minimizers) = c 1 = 0 = L = 1. For k 2 all optimal solutions Ĉ1 are nonstrict and have the form û = (x, y, 1 x, y) T, x R \ {0, 1}. If y = 0 then û 0 = 2 and otherwise û 0 = 4. Remark 2 By Theorem 2 the optimal value c k of problem (C k ) for any k I 0 L obeys { } c k = min Aũ d 2 where ũ R N solves (P ω ) ω Ω k where Ω k := { ω I N ω = k = rank (A ω ) }. 11

12 2.3. Necessary and sufficient conditions Proposition 1 û R û β = Ĉk where k := û 0 I 0 L Ĉ k R β for k := û 0 I 0 L The global minimizers of F β are composed of some optimal sets Ĉk for k L. Ĉ = the collection of all optimal solutions Ĉk of problems (C k ) for all k I 0 L; R = the set of all global minimizers R β of F β for all β > 0 Ĉ := L Ĉ k and R := k=0 β>0 R β. Theorem 3 R Ĉ. When β ranges on (0, + ), F β can have at most L + 1 different sets of global minimizers which are optimal solutions of (C k ) for k {0,..., L}. Theorem 4 For any k I 0 L one has: Ĉk R β if and only if F β (u) F β (û) 0 û Ĉk u Ĉ Ĉk = R β if and only if F β (u) F β (û) > 0 û Ĉk u Ĉ \ Ĉk 12

13 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) 3. Parameter values for equality between optimal sets of (C k ) and (R β ) The entire list of parameter values Conditions for agreement between their optimal sets The effective parameters values 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests 8. Conclusions and future directions 13

14 3. Parameter values for equality between optimal sets Definition 3 (Critical parameter values) { ck c k+n β k := max n { ck n c βk U k := min n 3.1. The entire list of parameter values } n {1,..., L k} } n {1,..., k} k I 0 L 1 and β L = 0, k I L and β U 0 β 1 := +. We have β L = 0 < βl U and β 0 < β0 U. The cases where β k < βk U will be of particular interest. Proposition 2 S finite union of vector subspaces of dimension M 1 such that d R M \ S = β k βk U k I 0 L. β k β U k k I 0 L is a generic property. 14

15 3.2. Conditions for agreement between the optimal sets of (C k ) and (R β ) Theorem 5 k I 0 L Ĉ k R β if and only if β 0 β < β U 0 for k = 0 ; β k β β U k for k {1,..., L 1} ; β L < β β U L for k = L. Ĉ k = R β if and only if β k < β < β U k. Proof based on Theorem 4. To exploit Theorem 5 we have to clarify the links between (β k, β U k ) and c k 15

16 3.3. The effective parameters values The global minimizers of F β are always in Ĉ (Theorem 3), so we are interested in the indexes k for which there exist values of β such that Ĉk R β. Their set is obtained from Theorem 5. Definition 4 The effective index set J J E : J := { k I 0 } L β k < βk U and J E := { m I 0 L β m = β U m}. Ordering: J = {J 0, J 1,..., J p } where p := J 1 and J k 1 < J k k. Further: (J 0 = 0, J p = L) J 2 with β J 1 := β U J 0 β U 0 = + and β Jp β L = 0. The set J is always nonempty. Lemma 3 R Ĉ k = if and only if k I 0 L \ {J J E }. 16

17 Definition 3 for reminder: { ck c k+n β k := max n { βk U ck n c k := min n } n {1,..., L k} k I 0 L 1 and β L := 0, } n {1,..., k} k I L and β0 U := +. Simplification of {β k, βk U} k J J E Proposition 3 Let {β k, βk U } and J be as in Definition 3 and Definition 4, resp. Then (a) β Jk < βj U k = β Jk 1 J k J \ {J 0 } and β J U 0 β J 1 = +. (b) β Jk = c J k c Jk+1 J k+1 J k J k J \ {J p } and β Jp β L = 0. (c) { β m m J E } { β Jk J k J \ {J p } }. {β k } k J is strictly decreasing and its first entry is β 0. 17

18 Example 3 Let {c k } L k=0 for L = 7 reads as c 0 = 48 c 1 = 40 c 2 = 30 c 3 = 22 c 4 = 14 c 5 = 10 c 6 = 4 c 7 = 0. By Definition 3 the sequences {β k, β U k } 7 k=0 are given by β 0 = 9 β 1 = 10 β 2 = 8 β 3 = 8 β 4 = 5 β 5 = 6 β 6 = 4 β 7 = 0 β U 0 = + β U 1 = 8 β U 2 = 9 β U 3 = 8 β U 4 = 8 β U 5 = 4 β U 6 = 5 β U 7 = 4 From Definition 4, J = { J 0 = 0, J 1 = 2, J 2 = 4, J 3 = 6, J 4 = 7} and J E = { 3 }. One has β Jk = β U J k+1 for any J k J (Proposition 3(a)). The formula in Proposition 3(b) holds. {β 3 3 J E } β 3 =β J1 =8 {β 3 3 J E } {β Jk J k J \ {J 4 }} (Proposition 3(c)). J E β J1 := {m J E β m = β J1 } = {3 J E J 1 < 3 < J 2 }, see Lemma 4. J has the smallest indexes so that {β k } k J = {9, 8, 5, 4, 0} is the longest strictly decreasing subsequence of {β k } 7 k=0 containing β 0 see Proposition 4. One has {β k } k J = {β k } k J for J := {0, 3, 4, 6, 7}; however, J 2 > J 2. 18

19 The location of {β m m J E } is given by the (probably empty) subsets J E β Jk := {m J E β m = β Jk }. Lemma 4 The sets J E β Jk fulfill J E β Jk = for k = p and for any k p 1 J E β Jk = {m J E J k < m < J k+1 }. J and {β k } k J are characterized next Proposition 4 Let {β k } L k=0 read as in Definition 3 and J as in Definition 4. Then 0 J and J contains the smallest indexes such that {β k } k J is the longest strictly decreasing subsequence of {β k } L k=0 containing β 0. In order to find the effective J and {β k } k J we need only {β k } L k=0 in Definition 3. 19

20 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) 3. Parameter values for equality between optimal sets of (C k ) and (R β ) 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests 8. Conclusions and future directions 20

21 4. Equivalence relations between the optimal sets of (C k ) and (R β ) 4.1. Partial equivalence Theorem 6 Let {β k } be as in Definition 3 and J as in Definition 4. Then: { Rβ β ( β Jk, β Jk 1 ) } = ĈJ k J k J, ( p [ βjk, β Jk 1 ] ) [ β J0, β J 1 ) = [ 0, + ). n=1 { } βj0,..., β Jp 1 in Definition 4 partition (0, + ) into J proper intervals. For any β ( ) β Jk, β Jk 1 the optimal sets of (Rβ ) and of (C n ) for n = J k coincide. If I 0 L \ J, the optimal sets (C k ) for k I 0 L \ J cannot be optimal solutions of (R β ) β > 0. = partial equivalence. {β Jk } is a finite set of isolated values hence β β Jk generically. 21

22 The optimal sets of problem (R β ) for β k, k J: Theorem 7 Let H1 hold. Let {β k } be as in Definition 3 and J as in Definition 4. Then R βjk = ĈJ k ĈJ k+1 ( m J E β Jk Ĉ m ) J k J \ {J p }, where J E β Jk = {m J E J k < m < J k+1 } and Ĉk Ĉn = (k, n) (J J E ) 2, k n. Example 4 [Ex.3, cont.] We had J={0, 2, 4, 6, 7}, (β J0 =9, β J1 =8, β J2 =5, β J3 =4, β J4 =0), and J E 2 = {3} with β J E 2 =8 and J E k = otherwise. By Theorems 6 and 7 { R β β >9}=Ĉ0 { R β β (8, 9)}=Ĉ2 { R β β (5, 8)}=Ĉ4 { R β β (4, 5)}=Ĉ6 { R β β (0, 4)}=Ĉ7 and Rβ=9 = Ĉ0 Ĉ2 Rβ=8 = Ĉ2 Ĉ3 Ĉ4 Rβ=5 = Ĉ4 Ĉ6 Rβ=4 = Ĉ6 Ĉ7. A partial equivalence between problems (C k ) and (R β ) always exists. For the J 1 isolated values { β k k J \ {L} } problem (R β ) has normally two optimal sets (Proposition 2). 22

23 4.2. Quasi-complete equivalence Lemma 5 Let J be as in Definition 4. Then the following hold: (a) If the sequence {β k } L k=0 in Definition 3 is strictly decreasing, then its entries read as β k = c k c k+1 k I 0 L 1 and β L = 0, β 1 := β0 U = +. (1) (b) If the sequence {β k } L k=0 in (1) is strictly decreasing then J = I0 L. Theorem 8 Let {β k } L k=0 in (1) be strictly decreasing. Then { } Rβ β ( β k, β k 1 ) = Ĉk k I 0 L R βk = Ĉk Ĉk+1 with Ĉ k Ĉk+1 = k I 0 L 1. {β k } L k=0 in (1) strictly decreasing means that c k 1 c k > c k c k+1, k I L 1. Let u, û and ũ be optimal solutions of problems (C k 1 ), (C k ) and (C k+1 ), resp. Denote σ := supp (u), û := supp (û) and σ := supp (ũ). The condition on c k s reds as d T (Proj(A σ ) Proj(A σ )) d > d T (Proj(A σ ) Proj(A σ )) d > 0. Mid-way scenarios... 23

24 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) 3. Parameter values for equality between optimal sets of (C k ) and (R β ) 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests 8. Conclusions and future directions 24

25 5. On the optimal values of (C k ) and (R β ) { } Ω k := ω I N ω = k = rank (A ω ). E k := range (A ω ) and G k := ω Ω k E 0 = G M = R M and E M = G 0 = {0} by H1. range (A ω ). ω Ω k Proposition 5 Let L M be arbitrarily fixed. c k > 0 k L 1 d R M \ G L 1 ; d R M \ (E 2 G L 1) = c k 1 > c k k I L. E 2 and G M 1 are finite unions of vector subspaces of dimensions M 2 and M 1, respectively. Hence, d R M \ (E 2 G M 1 ) is a generic property. = {c k } M k=0 is strictly decreasing and L = M generically. 25

26 Proposition 6 k I 0 L = F β (û) = c k + β k û Ĉk. By Theorem 3 R β Ĉ = L k=0 Ĉk = the optimal value of problem (R β ) reads as r β = min { c k + β k k I 0 L }. Corollary 2 The application β r β r β = c Jk + β J k : (0, + ) R fulfills = F β (û) û ĈJ k if and only if β β r β is continuous and concave. [β J0, + ) for J 0 0 [ ] βjk, β Jk 1 for Jk J \ {0, L} ( 0, βjp 1] for J p L r βjk 1 > r βjk J k J, r βj0 = c J0 = r β β β J0 and r βj0 > r β β < β J0. β r β is affine increasing on each interval ( β Jk, β Jk 1 ) with upward kinks at βjk J k J \ {L} and bounded by c 0. for any 26

27 Example 5 [Cont. of Example 3] From Corollary 2, β r β is given by β (0, 4] r β = c 7 + 7β = 7β r β=4 = 28 β [4, 5] r β = c 6 + 6β = 4 + 6β r β=5 = 34 β [5, 8] r β = c 4 + 4β = β r β=8 = 46 β [8, 9] r β = c 2 + 2β = β r β=9 = 48 β [9, + ) r β = c 0 + 0β = 48 affine expressions 27

28 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) 3. Parameter values for equality between optimal sets of (C k ) and (R β ) 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests 8. Conclusions and future directions 28

29 6. Cardinality of the optimal sets of (C k ) and of (R β ) For any β > 0 and k I 0 L the optimal sets of problems (C k ) and (R β ) are composed out of a certain finite number of isolated (hence strict) minimizers Uniqueness of the optimal solutions of (C k ) and of (R β ) k min{l, M 1} and (û, ũ) (Ĉk) 2, û ũ. Then σ := supp (û), σ := supp (ũ) are in (Ω k ) 2. c k = A σ û σ d 2 = A σ ũ σ d 2 where σ σ. Π ω the orthogonal projector onto range (A ω ) A σ û σ d 2 A σ ũ σ d 2 = d T (Π σ Π σ ) d = 0. H For K min{m 1, L} fixed, A R M N obeys Π ω Π ω (ω, ω) Ω 2 k ω ω k I K. H is a generic property of all matrices in R M N [M. N., SIIMS 2013]. K := K k=1 dim( K ) M 1, hence d R M \ K generically. (ω,ω) (Ω k ) 2 { g R M ω ω and g ker (Π ω Π ω ) } H and d R M \ K = (C k ) for k I K has a unique optimal solution. K := max {k J k K} (R β ) has a unique global minimizer β (β K, + ) \ {β k } k J 29

30 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) 3. Parameter values for equality between optimal sets of (C k ) and (R β ) 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests Monte Carlo on {β k } with 10 5 tests for (M, N) = (5, 10) Tests on (partial) equivalence with selected matrix data 8. Conclusions and future directions 30

31 7. Numerical tests Two kind of experiments using matrices A R M N for (M, N) = (5, 10), original vectors u o R N and data samples d = Au o (+noise) with two different goals: to roughly see the behaviour of the parameters β k in Definition 3 ; to verify and illustrate our theoretical findings. All results were calculated using an exhaustive combinatorial search Monte Carlo experiments on {β k } with 10 5 tests for (M, N) = (5, 10) Two experiments, each one composed of 10 5 trials with A R M N for (M, N) = (5, 10) In each trial: an original u o R N, random support on {1,..., N} with u o 0 M 1 = 4. the coefficients of A and u o supp (u o ) i.i.d. d = Au o + i.i.d. centered Gaussian noise. compute the optimal values {c k } and then compute (β k, βk U ) by Definition 3. 31

32 Two different distributions for A and u o supp (u o ) Experiment N (0, 10). A(i, j) and u o supp (u o ) N (0, 10). Support length supp (u o ) {1,..., 4}, mean = 3.8. SNR of d in [10.1, 61.1], mean = db. Experiment Uni [0, 10]. A(i, j) and u o supp (u o ) uniform on [0, 10]. supp (u o ) {1,..., 4}, mean = 3.8. SNR of d in [20, 55], mean = db. N k := { k I 0 M β k > β k 1 }. Table 1: Results on the behaviour of {β k } in Definition 3 for two experiments, each one composed of 10 5 random trials. For k 3 we have found N k = 0. β k < β k 1, k I 0 M N k = 1 N k = 2 mean(snr) mean( u o 0 ) N (0, 10) % % % Uni [0, 10] % % % Observations: L = M in each trial (Remainder: L := min { k I N c k = 0 } ); {c k } M k=0 was always strictly decreasing (see Proposition 5); β k β U k in each trial (see Proposition 3), so J E = ; For every A there were d so that {β k := c k c k 1 } L k=0 was strictly decreasing 32

33 6.2. Tests on quasi-equivalence with a selected matrix and selected data A = ( ) u o T = A(i, j) nearly normal distribution with variance 10 and rank(a) = M = 5 Problem (C M ) has Ω M = 252 optimal solutions; { none } of them is shown. Since β 0 < β0 U = +, in all cases Ĉ0 = Rβ β > β 0 by Theorem 5. We selected a couple (A, u o ) so that β k are seldom strictly decreasing compared to Tab. 1. Table 2: 10 5 trials where d = Au o + i.i.d. centered Gaussian noise. β k < β k 1, k I 0 M N k = 1 N k = 2 mean(snr) mean( u o 0 ) u o in (2) % % 0 %

34 Noise-free data d = Au o = ( ) T. û = u o is an optimal solution to problems (C k ) with c k = 0 for k {3, 4, 5} and L = 3. β 3 = 0 < β U 3 = β 1 = < β U 1 = β 0 = and β 2 = 3968 > β U 2 = J = {0, 1, 3} k c k Ĉ k = the optimal solution of (C k ), singleton Ĉ k = R β β (β 3, β 1 ) no β (β 1, β 0 ) β > β 0 34

35 Noisy data 1. Nearly normal, centered, i.i.d. noise and SNR= db: ( ) T d = β 5 = 0 < β U 5 = β 4 = < β U 4 = β 3 = < β U 3 = β 1 = < β U 1 = β 0 = 63154, while β 2 = > β U 2 = Hence, J = I 0 5 \ {2} k c k Ĉ k = the optimal solution of (C k ), singleton Ĉ k = R β β (β 4, β 3 ) β (β 3, β 1 ) no β (β 1, β 0 ) β > β 0 35

36 Noisy data 2. The noise is nearly normal, centered, i.i.d., SNR= db: ( ) T d = β 0 = β 1 = 3825 β 2 = β 3 = β 4 = β 5 = 0. {β k } is strictly decreasing and hence β k = c k c k 1 (C k ) and (R β ) are quasi-completely equivalent. k c k Ĉ k = the optimal solution of (C k ), singleton Ĉ k = R β β (β 4, β 3 ) β (β 3, β 2 ) β (β 2, β 1 ) β (β 1, β 0 ) β > β 0 36

37 Outline 1. Two optimization problems involving the l 0 pseudo norm 2. Joint optimality conditions for (C k ) and (R β ) 3. Parameter values for equality between optimal sets of (C k ) and (R β ) 4. Equivalences between the global minimizers of (C k ) and (R β ) Partial equivalence Quasi-complete equivalence 5. On the optimal values of (C k ) and (R β ) 6. Cardinality of the optimal sets of (C k ) and of (R β ) Uniqueness of the optimal solutions of (C k ) and of (R β ) 7. Numerical tests 8. Conclusions and future directions 37

38 8. Conclusions and open questions The main equivalence result in a nutshell: Ĉ Jk ĈJ k+1 Ĉ0 J k }{{} R β = Ĉ L Ĉ Jk+1 Ĉ Jk Ĉ 0 0 = β L <... < β Jk+1 < β Jk < β Jk 1 <... < β J0 < The agreement between the optimal sets of problems (C k ) and (R β ) is driven by the critical parameters {β k } L k=0 which depend only on the optimal values c k of problem (C k ). Our comparative results clarify a proper choice between models (C k ) and (R β ) in applications. If one needs solutions with a fixed number of nonzero entries, (C k ) is the best choice. If only information on the perturbations is available, (R β ) is a more flexible model. If one can solve problem (C k ) for all k, the global minimizers of problem (R β ) are immediate. Our detailed results can give rise to innovative algorithms. The degree of partial equivalence depends on the distribution of the coefficients of A and d. 38

39 By specifying a class of matrices A and assumptions on data d, one can infer statistical knowledge on the optimal values c k of problems (C k ) and thus on the critical parameters {β k }. Promising theoretical and practical results can be expected. A related open question is to know if the optimal solutions of (R β ) are able to eliminate some meaningless solutions of (C k ). Extensions to analysis type penalties Du 0, to low rank matrix recovery, etc., are important. Other important results concern algorithms that are known to converge to local minimizers. Remark 3 Problem (R β ) (for some β > 0) and problems (C k ) for k {0, 1,..., M} have the same sets of (strict) local minimizers. 39

40 Thank you for the attention. Thanks to Zuhair Nashed for the invitation 40

Relationship between the optimal solutions of least squares regularized with l 0 -norm and constrained by k-sparsity

Applied and Computational Harmonic Analysis, Volume 41, Issue 1, July 2016, pages 237 265 http://dx.doi.org/10.1016/j.acha.2015.10.010 Relationship between the optimal solutions of least squares regularized