On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms

Size: px
Start display at page:

Download "On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms"

Transcription

1 On the Minimization Over Sparse Symmetric Sets: Projections, Optimality Conditions and Algorithms Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint work with Nadav Hallak and Yakov Vaisbourd OPT2014: Workshop on Optimization for Machine Learning (NIPS 2014), Montreal, December 12, 2014

2 Problem Formulation The sparse optimization problem (P) min f (x) s.t. x C s B, (1) f continuously differentiable (1) B is closed and convex (2) C s = {x R n : x 0 s} Difficulties: (a) C s B non-convex (b) C s B induces a combinatorial constraint No global optimality conditions, solution methods are heuristic in nature.

3 Example - Compressed Sensing Linear CS Recover a sparse signal x with a sampling matrix A and a measure b. min Ax b (CS) 2 2 s.t. x C s R n

4 Literature - CS Linear: 1 Conditions for reconstruction: RIP (Candes and Tao 05), SRIP (Beck and Teboulle 10), spark (Donoho and Elad 03; Gorodnitsky and Rao 97), mutual coherence (Donoho et al. 03; Donoho and Huo 99; Mallat and Zhang 93) 2 Reviews: Bruckstein et al. 09, Davenport et al. 11, Tropp and Wright Iterative algorithms: IHT (Blumensath and Davis 08, 09, 12; Beck and Teboulle 10), CoSaMP (Needell and Tropp 09) Nonlinear: 1 Phase retrieval: Shechtman et al. 13; Ohlsson and Eldar 13; Eldar and Mendelson 13; Eldar et al. 13; Hurt Nonlinear: optimality conditions (Beck and Eldar 13), GraSP (Bahmani et al. 13)

5 Example - Sparse Index Tracking Sparse Index Tracking Track an index b with at most s assets, with return matrix A. (IT ) min Ax b 2 2 s.t. x C s n Example: Finance - track the S&P500 with a small number of assets Takeda et al 12

6 Example - Sparse Principal Component Analysis Sparse Principal Component Analysis Find the first principal eigenvector of a matrix A. (PCA) max x T Ax s.t. x C s {y R n : y 2 1} Example: Finance - identify the group which explains most of the variance in the S&P500 Sample of works: Moghaddam, Weiss, Avidan 06, d Aspremont, Bach, El-Ghaoui 08, d Aspremont, El-Ghaoui, Jordan Lanckriet 07, recent review: Luss and Teboulle 13

7 Objectives The sparse optimization problem (P) min f (x) s.t. x C s B, B closed and convex Main Objectives: Define necessary optimality conditions Develop corresponding algorithms Establish hierarchy between algorithms and conditions. The case B = R n : Beck, Eldar 13

8 Objectives The sparse optimization problem (P) min f (x) s.t. x C s B, B closed and convex Main Objectives: Define necessary optimality conditions Develop corresponding algorithms Establish hierarchy between algorithms and conditions. The case B = R n : Beck, Eldar 13 However, we will also need to study and compute Orthogonal Projections on B C s.

9 Recap of Necessary First Order Opt. Conditions over Convex Sets: Stationarity ( ) min{f (x) : x S}, S closed and convex, f continuously differentiable. Equivalent Definitions of Stationarity: x stationary point iff Projection Form x = P S (x 1 ) L f (x ) Variational Form f (x ), x x 0 x S for some L > 0

10 Recap of Necessary First Order Opt. Conditions over Convex Sets: Stationarity ( ) min{f (x) : x S}, S closed and convex, f continuously differentiable. Equivalent Definitions of Stationarity: x stationary point iff Projection Form x = P S (x 1 ) L f (x ) Variational Form f (x ), x x 0 x S for some L > 0 conditions are equivalent independent of L most algorithms that use first order information converge to stat. points. condition relies on the properties/computatbility of P S ( ) P S (y) = argmin{ y x 2 : y S}.

11 Why Study Orthogonal Projections? P Cs B (x) = argmin { z x 2 2 : z C s B } To define optimality conditions, we need to compute and analyze properties of the orthogonal projection P Cs B. Computing P Cs B is in general a difficult task, but in fact tractable under symmetry assumptions on B

12 Why Study Orthogonal Projections? P Cs B (x) = argmin { z x 2 2 : z C s B } To define optimality conditions, we need to compute and analyze properties of the orthogonal projection P Cs B. Computing P Cs B is in general a difficult task, but in fact tractable under symmetry assumptions on B Revised Layout: Projections, Optimality Conditions, Algorithms

13 Projection Onto Symmetric Sets

14 Definitions - Basics Σ n = permutation group of [n] x σ = reordering of x according to σ Σ n, (x σ ) i = x σ(i). Example (permutation) x = ( ) T, and σ(1) = 3, σ(2) = 1, σ(3) = 2, then x σ = ( ) T.

15 Definitions - sorting permutations σ Σ n is a sorting permutation of x if x σ(1) x σ(2) x σ(n 1) x σ(n) Σ(x) is the set of all the sorting permutations of x Example (sorting permutation) x = ( ) T, and σ(1) = 2, σ(2) = 4, σ(3) = 3, σ(4) = 1, then x σ = ( ) T Also σ Σ(x) where σ(1) = 4, σ(2) = 2, σ(3) = 3, σ(4) = 1.

16 Definitions - Type-1 Symmetric Set D is a type-1 symmetric set if x D x σ D σ Σ n set description type-1 nonneg. type-1 type-2 n 1 unit sum [l, u] n (l < u) box 1 n = {x R n : 1 T x = 1}

17 Definitions - Nonnegative Type-1 Symmetric Set D is nonnegative if x D, x 0 set description type-1 nonneg. type-1 type-2 R n + nonnegative orthant n unit simplex

18 Definitions - Type-2 Symmetric Set D is a type-2 symmetric set if it is type-1 symmetric and x D, y { 1, 1} n x y (x i y i ) n i=1 D set description type-1 nonneg. type-1 type-2 R n entire space B p [0, 1](p 1) p-ball

19 Summary of symmetry properties of simple sets set desc. type-1 non. t-1 type-2 R n entire space R n + nonnegative orthant n unit simplex n unit sum B p [0, 1](p 1) p-ball [l, u] n (l < u) box

20 Symmetric Projection Monotonicity Lemma Symmetric Projection Monotonicity Lemma. Let D be a type-1 symmetric set, x R n, and y P D (x). Then for any i, j [n]. (y i y j ) (x i x j ) 0

21 Order Preservation Property Theorem. Let D be a type-1 symmetric set, and σ Σ(x). Suppose that P D (x). Then y P D (x) s.t. σ Σ(y) That is x σ(1) x σ(2) x σ(n) y σ(1) y σ(2) y σ(n) Example: D = C 2, P D ((3, 2, 2, 0)) = {(3, 2, 0, 0), (3, 0, 2, 0)}.

22 Sparse Projection Onto Symmetric Sets

23 The Sparse Projection Problem The sparse projection problem Find an element in orthogonal projection of x R n onto B C s : P Cs B (x) = argmin { z x 2 2 : z C s B } 1 B C s closed P Cs B(x) 2 B C s nonconvex P Cs B(x) 1 A DIFFICULT NONCONVEX PROBLEM IN GENERAL (known for B = R n, R n +, n )

24 Supports, Super Supports Let x R n, s [n] = {1,..., n}. 1 Support of x: I 1 (x) {i [n] : x i 0}. 2 Super support of x: any set T s.t. I 1 (x) T and T = s. 3 x has full support if x 0 = I 1 (x) = s. 4 Off-support of x: I 0 (x) {i [n] : x i = 0}. Example s = 3, n = 5 and x = ( 3, 4, 0, 0, 0) T 1 Support: I 1 (x) = {1, 2} 2 Super support: T {{1, 2, 3}, {1, 2, 4}, {1, 2, 5}} 3 Incomplete support: x 0 < s 4 Off-support: I 0 (x) = {3, 4, 5}

25 Restriction on Index Sets x R n, T [n] index set 1 x T R T is the restriction of x to T 2 U T is the submatrix of I n constructed from the columns in T 3 B T = {x R T : U T x B} is the restriction of B to T 4 T f (x) = U T T f (x) is the restriction of f (x) to T. Example x = (8, 7, 6, 5) T x 1,3 = (8, 6) T. B = {(x 1, x 2, x 3, x 4 ) : x 1 + 2x 2 + 3x 3 + 4x 4 = 1} B 1,2 = {(x 1, x 2 ) T : x 1 + 2x 2 = 1}. f (x) = x 1 x 2 + x x 3 3 {1,3} f (x) = (x 2, 3x 2 3 ) T.

26 The Order Set order set S σ [j 1,j 2 ] For any permutation σ Σ n, we define S σ [j 1,j 2 ] as: { S[j σ 1,j 2 ] = {σ(j 1 ), σ(j 1 + 1),..., σ(j 2 )} 0 < j 1 j 2 n, otherwise. Example (order set) σ = ( ) S[1,2] σ = {σ(1), σ(2)} = {4, 1}, S [3,4] σ = {σ(3), σ(4)} = {3, 2}

27 Phases in Computing the Projection To find y P Cs B (x): (1) find its super support S (2) Compute y S = P BS (x S ), y S c = 0 Naive approach: go over all possible ( n s) super supports, compute the corresponding projections, and find the sparse projection vector. TOO EXPENSIVE

28 Type-1 Symmetric Sparse Projection Theorem Type-1 Symmetric Projection Theorem B be a type-1 symmetric set, σ Σ(x). Then y P Cs B (x), k {0,..., s} for which I 1 (y) S σ [1,k] S σ [n+k (s 1),n] Result: to find the super support, find k {0,..., s}: x σ(1) x σ(k) x σ(k+1) x σ(n+k s) x σ(n+k (s 1)) x σ(n) y σ(1) y σ(k) 0 0 y σ(n+k (s 1)) y σ(n)

29 Type-1 Symmetric Sparse Projection Algorithm Algorithm 1 Projection onto a type-1 symmetric sparse set Input: x R n. Output: u P Cs B(x). 1 Find σ Σ(x). 2 for any k = s, s 1,..., 0 do: 1 Set T k = S σ [1,k] S σ [n+k (s 1),n]. 2 Compute g k = P BTk (x Tk ) and define z k = U Tk g k. 3 Return u = argmin{ z x 2 : z {z k : k = s, s 1,..., 0}}

30 Nonnegative Type-1 Symmetric Sparse Projection Theorem Nonnegative Type-1 Symmetric Sparse Projection Theorem B nonnegative type-1 symmetric, and σ Σ(x). Then y P Cs B (x) s.t I 1 (y) S σ [1,s] Example B = 4, s = 2, x = (1, 0.5, 0.5, 1) T, S[1,2] σ = {1, 2}, y = (0.75, 0.25, 0, 0) T. The super support can be found by a simple sort operation

31 Type-2 Symmetric Sparse Projection Theorem Type-2 Symmetric Sparse Projection Theorem B type-2 symmetric, and σ Σ( x ). Then y P Cs B (x) s.t I 1 (y) S σ [1,s] Example B = B 2 [0, 1], s = 2, x = (3, 0.5, 0.7, 4) T, S[1,2] σ = {4, 1}, y = (0.6, 0, 0, 0.8) T. The super support can be found by a simple sort operation

32 Unified Symmetric Projection Theorem symmetry function The symmetry function p : R n R n : { x B is nonnegative type-1, p(x) x B is type-2 symmetric. Unified Symmetric Projection Theorem B be closed, convex, and a nonnegative type-1 or a type-2 symmetric. σ Σ(p(x)). Then y P Cs B (x) s.t. I 1 (y) S σ [1,s]

33 Unified Symmetric Sparse Projection Algorithm Algorithm 2 Unified symmetric sparse projection algorithm Input: x R n. Output: u P B Cs (x). 1 Compute T = S[1,s] σ for σ Σ(p(x)). 2 Return u = U T P BT (x T ).

34 Super supports for sparse projection onto simple sets B P B Cs (x) super support set(s) B T R n U T x T T = S σ [1,s], σ Σ( x ) B T = R s R n + U T [x T ] + T = S σ [1,s], σ Σ(x) B T = R s + n U T P BT (x T ) T = S[1,s] σ, σ Σ(x) B T = s n U Tk P BTk (x Tk ) T k = S[1,k] σ S [n+k (s 1),n] σ k = 0, 1,..., s, σ Σ(x) B Tk = s Bp n [0, 1] U T P BT (x T ) T = S[1,s] σ, σ Σ( x ) B T = (p 1) Bp[0, s 1] [l, u] n U Tk P BTk (x Tk ) T k = S[1,k] σ S [n+k (s 1),n] σ B Tk = [l, u] s (l < u) k = 0, 1,..., s, σ Σ(x)

35 Optimality Conditions and Algorithms

36 Back to the Sparse Optimization Problem The sparse optimization problem (P) min f (x) s.t. x C s B, C s = {x R n : x 0 s} Assumption [A] f : R n R is lower bounded, continuously differentiable. [B] B is a closed and convex set. In some cases [C] f C 1,1 L(f ).

37 Road Map of Optimality Conditions min f (x) (P) s.t. x C s B, Basic Feasibility - optimality over the support. L-Stationarity - extension of stationarity over convex sets. CW-optimality

38 Basic Feasibility (BF) x C s B is a basic feasible (BF) point of (P) if for any super support set S of x, for some L > 0: x S = P BS (x S 1 ) L Sf (x). Remarks: (a) If I 1 (x) = s, then the only super support set is the support itself. (b) If I 1 (x) = k < s, then there are ( n k s k) possible super supports. (c) x is a BF x T is stationary point of f over B T for any super support T of x (d) Optimality BF

39 Basic Feasibility - Examples B BF conditions (full support) R n I1 (x )f (x ) = 0 R n + I1 (x )f (x ) = 0 n µ R : i f (x ) = µ, i I 1 (x ) n µ R : i f (x ) = µ, i I 1 (x ) B2 n[0, 1] I 1 (x )f (x ) = 0 or x = 1 and λ 0 : I1 (x )f (x ) = λx I 1 (x ) = 0 l < x i < u [l, u] n f (l < u) x i (x ) 0 x i = l 0 x i = u i, i I 1 (x ) What if I 1 (x) < s?

40 Basic Feasibility - Examples B BF conditions (full support) R n I1 (x )f (x ) = 0 R n + I1 (x )f (x ) = 0 n µ R : i f (x ) = µ, i I 1 (x ) n µ R : i f (x ) = µ, i I 1 (x ) B2 n[0, 1] I 1 (x )f (x ) = 0 or x = 1 and λ 0 : I1 (x )f (x ) = λx I 1 (x ) = 0 l < x i < u [l, u] n f (l < u) x i (x ) 0 x i = l 0 x i = u i, i I 1 (x ) What if I 1 (x) < s? In all of the above cases, basic feasibility is exactly like stationarity over B

41 Characterization of BF points Theorem. x C s B is a BF point if and only if x T = P BT (x T 1 ) L T f (x), where 1 Symmetry: B is non-negative type-1 or type-2 symmetric 2 Fill the support: i [n + 1] s.t. S[i,n] σ I 1(x) = s (σ Σ( p( f (x)))) 3 Into super support: T = I 1 (x) S σ [i,n] Important: when I 1 (x) < s only one super support needs to be checked

42 Basic Feasible Search - Find a BF point Algorithm 3 BAsic Feasible Search (BAFS) Initialization: x 0 C s B, k = 0. Output: u C s B which is a basic feasible point. 1 Repeat 1 k k let σ Σ ( p ( f (x k ) )) 3 set i {1,..., n + 1} such that S[i,n] σ I 1(x k ) = s 4 set T k = I 1 (x k ) S σ [i,n] 5 take x k argmin {f (y) : y B, I 1 (y) T k } Until f (x k 1 ) f (x k ) 2 Set u = x k 1 Finite Termination. requires the ability to minimize over the support set.

43 Road Map of Optimality Conditions min f (x) (P) s.t. x C s B, Basic Feasibility - optimality over the support. L-Stationarity - extension of stationarity over convex sets. CW-optimality

44 L-Stationarity Unfortunately, the variational form f (x ), x x 0 x B C s is not a necessary optimality condition (in general...)

45 L-Stationarity Unfortunately, the variational form f (x ), x x 0 x B C s is not a necessary optimality condition (in general...) Let L > 0. A vector x C s B is an L-stationary point of (P) if x P Cs B (x 1L ) f (x). L-Stationarity in the Hierarchy: 1 L-Stationarity BF 2 If f C 1,1 L(f ), Optimality L-stationarity L > L(f ) Condition depends on L, more restrictive as L gets smaller

46 Gradient Projection Method x k+1 P Cs B ( x k 1 ) L f (xk ) B = R n Iterative Hard Thresholding (IHT) method (Blumensath and Davis 08, 09, 12). Makes sense only when f C 1,1. Only guarantees convergence to an L-stationary point for L > L(f ).

47 L-stationarity characterization Theorem. Let B be a nonnegative type-1 or type-2 symmetric set. Then a BF point x is an L-stationary point if and only if min p(lx i I 1 (x i i f (x )) max p(lx ) j I 0 (x j j f (x )), ) Example (B = R n ) B = R n, and σ Σ( x ). Then x is an L-stationary point of (P) if and only if a { L x i f (x ) σ(s) if i I 0 (x ), = 0 if i I 1 (x ). a Beck, A. & Eldar, Y. C., SIOPT, 2013

48 Back to the Variational Form of Stationarity In general, the variational form is not a necessary optimality conditions. However, when f is concave, it is in fact a necessary optimality condition.

49 Back to the Variational Form of Stationarity In general, the variational form is not a necessary optimality conditions. However, when f is concave, it is in fact a necessary optimality condition. Theorem. Suppose that f is concave and cont. diff.. If x is an optimal solution of (P), then f (x ), x x 0 x B C s. (a direct consequence of Krein-Milman+attainment of opt. sol. at extreme points)

50 Back to the Variational Form of Stationarity In general, the variational form is not a necessary optimality conditions. However, when f is concave, it is in fact a necessary optimality condition. Theorem. Suppose that f is concave and cont. diff.. If x is an optimal solution of (P), then f (x ), x x 0 x B C s. (a direct consequence of Krein-Milman+attainment of opt. sol. at extreme points) We will call this type of stationarity co-stationarity

51 co-stationarity L-stationarity Theorem. For any L > 0: x co-stationary x is L stationary

52 co-stationarity L-stationarity Theorem. For any L > 0: x co-stationary x is L stationary Consequence: for concave f, L-stationarity is a necessary optimality condition for any L > 0.

53 co-stationarity L-stationarity Theorem. For any L > 0: x co-stationary x is L stationary Consequence: for concave f, L-stationarity is a necessary optimality condition for any L > 0. sparse PCA max{x T Ax : x 2 1, x 0 s} (A 0) Most algorithms converge to a co-stationary point, e.g., conditional gradient method: x k+1 = yk y k 2, y k = H s (Ax k ). Gradient projection is never employed (since L-stationarity is a weak condition). The above is only correct for concave f.

54 Back to L-stationarity - Example n min f (x1, x2 ) 12x x1 x2 + 32x22 : (x1 ; x2 )T 0 1 o L(f ) = Two BF vectors: (0, 9/16) - optimal solution. ( 1/12, 0) non-optimal, SL=196. L = 250 L =

55 Main Questions Are there stronger (more restrictive) optimality conditions? Can we define algorithms that (1) do not depend on the Lipschitz property and (2) converge to better optimality conditions?

56 Main Questions Are there stronger (more restrictive) optimality conditions? Can we define algorithms that (1) do not depend on the Lipschitz property and (2) converge to better optimality conditions? The answer is YES

57 Road Map of Optimality Conditions min f (x) (P) s.t. x C s B, Basic Feasibility - optimality over the support. L-Stationarity - extension of stationarity over convex sets. CW-optimality

58 Simple-CW optimality Let B be nonnegative type-1 or type-2 symmetric. A BF point x is a simple-cw point of (P) if { f (x xi e f (x) i ± x i e j ) B type-2 f (x x i e i + x i e j ) B nonneg. type-1 where i argmin{p( l f (x))} with D(x) = argmin p(x k ) l D(x) k I 1 (x) j argmin { p( l f (x))} l I 0 (x)

59 Simple-CW optimality in the Hierarchy Results: If B is a non-negative type-1 set or a type-2 symmetric 1 Optimality Simple-CW 2 If f C 1,1 L(f ), then Simple-CW L 2(f )-stationarity L L 2 (f ) L 2 (f ) - Lipschitz constant of f restricted to two coordinates. Smaller than L(f ) Simple-CW is more restrictive than L(f )-stationarity

60 Local Lipschitz Constant Under the Lipschitz assumption, For any i j there exists a L i,j (f ) for which: i,j f (x) i,j f (x + d) L i,j (f ) d, for any d R n satisfying d k = 0 for any k i, j.

61 Local Lipschitz Constant Under the Lipschitz assumption, For any i j there exists a L i,j (f ) for which: i,j f (x) i,j f (x + d) L i,j (f ) d, for any d R n satisfying d k = 0 for any k i, j. the Local Lipschitz constant is L 2 (f ) = max L i,j (f ) i j

62 Local Lipschitz Constant Under the Lipschitz assumption, For any i j there exists a L i,j (f ) for which: i,j f (x) i,j f (x + d) L i,j (f ) d, for any d R n satisfying d k = 0 for any k i, j. the Local Lipschitz constant is L 2 (f ) L(f ). L 2 (f ) = max L i,j (f ) i j Example: f (x) = x T Qx + 2b T x with Q n = I n + J n (I n - identity, J n - all ones)

63 Local Lipschitz Constant Under the Lipschitz assumption, For any i j there exists a L i,j (f ) for which: i,j f (x) i,j f (x + d) L i,j (f ) d, for any d R n satisfying d k = 0 for any k i, j. the Local Lipschitz constant is L 2 (f ) L(f ). L 2 (f ) = max L i,j (f ) i j Example: f (x) = x T Qx + 2b T x with Q n = I n + J n (I n - identity, J n - all ones) L(f ) = 2λ max (Q n ) = 2(n + 1) On the other hand,l i,j (f ) = 2λ max ( ) = 6

64 Local Lipschitz Constant Under the Lipschitz assumption, For any i j there exists a L i,j (f ) for which: i,j f (x) i,j f (x + d) L i,j (f ) d, for any d R n satisfying d k = 0 for any k i, j. the Local Lipschitz constant is L 2 (f ) L(f ). L 2 (f ) = max L i,j (f ) i j Example: f (x) = x T Qx + 2b T x with Q n = I n + J n (I n - identity, J n - all ones) L(f ) = 2λ max (Q n ) = 2(n + 1) ( ) 2 1 On the other hand,l i,j (f ) = 2λ max = We get: L(f ) = 2(n + 1), L 2 (f ) = 6

65 Zero-CW Optimality Assumption: The function can be minimized over any super support, B nonnegative type-1 or type-2 x is called a zero-cw optimal point if f (x) min {f (y) : y B, I 1 (y) T }, where T = (I 1 (x) {j}) \{i} if x 0 = s, otherwise additional (specific) indices are added

66 Full-CW Optimality Assumption: The function can be minimized over any super support, B nonnegative type-1 or type-2 x is a full-cw optimal point if i I 1 (x), j I 0 (x), it holds that f (x) min {f (y) : y B, I 1 (y) T i,j }. In the non full-support case, additional indices are added to T i,j.

67 Hierarchy - Summary Full-CW Zero-CW Simple-CW L 2 (f )-Stationarity Basic Feasibility

68 Concave f x is called CW-minimum point if f (x) f (z) for any S 2 (x) {z : z x 0 2, z C s B}.

69 Concave f x is called CW-minimum point if f (x) f (z) for any S 2 (x) {z : z x 0 2, z C s B}. CW-optimality is a necessary optimality conditions (of a combinatorial flavor...) Theorem. If f is concave and B = {x : x 2 1}, then any CWmaximal point is a co-stationary point. Quite difficult to prove since co-stationarity is a rather strong condition.

70 pit-prop data 13 variables measuring 180 properties of pitprops. 715 BF points, 28 co-stationary points, 3 CW-optimal points. # Support CW Value # Support CW Value 1 {1,2,9,10} * {5,6,7,10} {1,2,7,10} {7,8,10,12} {1,2,7,9} {7,8,10,13} {1,2,8,9} {5,6,7,13} {1,2,8,10} {3,4,6,7} {1,2,6,7} {4,5,6,7} {2,7,9,10} {7,10,12,13} {2,6,7,10} {3,4,8,12} {1,6,7,10} {3,4,10,12} {1,2,3,4} * {3,10,11,12} {7,8,9,10} {3,5,12,13} {6,7,9,10} {1,5,12,13} {6,7,10,13} {2,5,12,13} {6,7,8,10} {3,5,11,13} 1.382

71 LS over sparse l 1 -ball min x : x C 2 B1 4 [0, 1]. support {1, 2} {1, 3} {1, 4} {2, 3} {2, 4} {3, 4} values BF L(f)-stationary simple-cw zero-cw full-cw

72 Zero-CW search method Algorithm 4 Zero-CW search method (ZCWS) Initialization: x 0 C s B a BF, k = 0. Output: u C s B which is a zero-cw. General Step (k = 0, 1, 2,...) 1 D(x k ) = argmin p(xl k ) l I 1 (x k ) 2 i argmin{p( l f (x k ))} l D(x k ) 3 j argmin l I 0 (x k ) { } p( l f (x k ))

73 Zero-CW search method - Find a Zero-CW point Algorithm 5 Zero-CW search method (ZCWS) 4 let σ Σ( p( f (x k ))) and let l be such that ( ) S[l,n] σ I 1(x k ) {j} {i} = s 5 Define T k = ( ) S[l,n] σ I 1(x k ) {j} {i}. 6 Set x argmin {f (y) : y B, I 1 (y) T k }. 7 x k+1 = BFS(x) 8 If f (x k ) f (x k+1 ), then STOP and the output is u = x k. Otherwise, k k + 1 and go back to step 1. ZCWS generates a points that is a zero-cw point, and this converges to better points than gradient projection/iht.

74 Full-CW search method - Find a Full-CW point Algorithm 6 Full-CW search method Initialization: x 0 C s B - a basic feasible point, k = 0. Output: u C s B which is a full-cw point. General Step (k = 0, 1, 2,...) 1 x k+1 = ZCWS(x k ) and set k k let σ Σ( p( f (x k ))), and for any i I 1 (x k ), j I 0 (x k ) let l i,j be such that ( ) S σ [l i,j,n] I 1(x k ) {j} {i} = s 3 define ( ) T i,j k = S σ [l i,j,n] I 1(x k ) {j} {i}

75 Full-CW search method Algorithm 7 Full-CW search method { 4 take z i,j argmin }. f (y) : y B, I 1 (y) T i,j k 5 set (i 0, j 0 ) argmin { f (z i,j ) : i I 1 (x), j I 0 (x) } 6 define x k+1 = BFS(z i 0,j 0 ) 7 if f (x k ) f (x k+1 ), then STOP and the output is u = x k. Otherwise, k k + 1 and go back to step 1. The full-cw search method obviously find a full-cw point in a finite number of steps.

76 Hierarchy of Algorithms (Best to Worst) Full-CW search. ZCWS Gradient projection/iht BAFS

77 Numerical Experiments

78 Compared Method - TGA Greedily build the super support Algorithm 8 TGA Initialization: x = 0 n, S =. Output: x C s B 1 while S < s do: 1 (j, x) argmin (l S c,z B) {f (z) : I 1 (z) S {l}} 2 set S S {j} 2 Return x

79 Numerical Experiment - Sparse index tracking Objective: track an index with a small number of assets x, which has a return matrix A. Prob. min Ax b 2 2 s.t. x C s n A: A R 72 54, daily returns matrix x: x weights vector b: b R 72 S&P500 daily returns vector Data 180 random sets of stocks from NYSE, 60 for each sparsity level

80 Numerical Experiment - Sparse index tracking improver improved s = 9 s = 18 s = 27 total FCWS ZCWS IHT TGA ZCWS FCWS IHT TGA ZCWS IHT FCWS TGA The IHT never reached a zero-cw or full-cw point. 2 The ZCWS reached a full-cw point in 66%. 3 The TGA was improved in most of the instances when s > 9.

81 sparse PCA - gene expression data (GeneChip oncology) 20 data sets. Number of variables Probability of Obtaining the Best Solution PCW cont PCW CGU EM agr Trsh Sparsity Level

82 THANK YOU FOR YOUR ATTENTION Beck, Hallak On the minimization over sparse symmetric sets. Beck, Vaisbourd Optimization methods for solving the sparse PCA problem.

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Zhaosong Lu November 21, 2015 Abstract We consider the problem of minimizing a Lipschitz dierentiable function over a

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12 STATS 306B: Unsupervised Learning Spring 2014 Lecture 13 May 12 Lecturer: Lester Mackey Scribe: Jessy Hwang, Minzhe Wang 13.1 Canonical correlation analysis 13.1.1 Recap CCA is a linear dimensionality

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Sensing systems limited by constraints: physical size, time, cost, energy

Sensing systems limited by constraints: physical size, time, cost, energy Rebecca Willett Sensing systems limited by constraints: physical size, time, cost, energy Reduce the number of measurements needed for reconstruction Higher accuracy data subject to constraints Original

More information

The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Compressive Sensing and Beyond

Compressive Sensing and Beyond Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

Sparse Principal Component Analysis Formulations And Algorithms

Sparse Principal Component Analysis Formulations And Algorithms Sparse Principal Component Analysis Formulations And Algorithms SLIDE 1 Outline 1 Background What Is Principal Component Analysis (PCA)? What Is Sparse Principal Component Analysis (spca)? 2 The Sparse

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption

BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption Daniel Reem (joint work with Simeon Reich and Alvaro De Pierro) Department of Mathematics, The Technion,

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk

Model-Based Compressive Sensing for Signal Ensembles. Marco F. Duarte Volkan Cevher Richard G. Baraniuk Model-Based Compressive Sensing for Signal Ensembles Marco F. Duarte Volkan Cevher Richard G. Baraniuk Concise Signal Structure Sparse signal: only K out of N coordinates nonzero model: union of K-dimensional

More information

Tractable performance bounds for compressed sensing.

Tractable performance bounds for compressed sensing. Tractable performance bounds for compressed sensing. Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure/INRIA, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach

Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles SIAM Student Research Conference Deanna Needell Joint work with Roman Vershynin and Joel Tropp UC Davis, May 2008 CoSaMP: Greedy Signal

More information

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Constructing Explicit RIP Matrices and the Square-Root Bottleneck Constructing Explicit RIP Matrices and the Square-Root Bottleneck Ryan Cinoman July 18, 2018 Ryan Cinoman Constructing Explicit RIP Matrices July 18, 2018 1 / 36 Outline 1 Introduction 2 Restricted Isometry

More information

Greedy Signal Recovery and Uniform Uncertainty Principles

Greedy Signal Recovery and Uniform Uncertainty Principles Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Signal Sparsity Models: Theory and Applications

Signal Sparsity Models: Theory and Applications Signal Sparsity Models: Theory and Applications Raja Giryes Computer Science Department, Technion Michael Elad, Technion, Haifa Israel Sangnam Nam, CMI, Marseille France Remi Gribonval, INRIA, Rennes France

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Lecture 6 - Convex Sets

Lecture 6 - Convex Sets Lecture 6 - Convex Sets Definition A set C R n is called convex if for any x, y C and λ [0, 1], the point λx + (1 λ)y belongs to C. The above definition is equivalent to saying that for any x, y C, the

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin

More information

Inverse Power Method for Non-linear Eigenproblems

Inverse Power Method for Non-linear Eigenproblems Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems

More information

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation

More information

Three Generalizations of Compressed Sensing

Three Generalizations of Compressed Sensing Thomas Blumensath School of Mathematics The University of Southampton June, 2010 home prev next page Compressed Sensing and beyond y = Φx + e x R N or x C N x K is K-sparse and x x K 2 is small y R M or

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Abstract This paper is about the efficient solution of large-scale compressed sensing problems.

Abstract This paper is about the efficient solution of large-scale compressed sensing problems. Noname manuscript No. (will be inserted by the editor) Optimization for Compressed Sensing: New Insights and Alternatives Robert Vanderbei and Han Liu and Lie Wang Received: date / Accepted: date Abstract

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices

Compressed Sensing. 1 Introduction. 2 Design of Measurement Matrices Compressed Sensing Yonina C. Eldar Electrical Engineering Department, Technion-Israel Institute of Technology, Haifa, Israel, 32000 1 Introduction Compressed sensing (CS) is an exciting, rapidly growing

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

Generalized greedy algorithms.

Generalized greedy algorithms. Generalized greedy algorithms. François-Xavier Dupé & Sandrine Anthoine LIF & I2M Aix-Marseille Université - CNRS - Ecole Centrale Marseille, Marseille ANR Greta Séminaire Parisien des Mathématiques Appliquées

More information

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Scale Mixture Modeling of Priors for Sparse Signal Recovery Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse

More information

Greedy Sparsity-Constrained Optimization

Greedy Sparsity-Constrained Optimization Greedy Sparsity-Constrained Optimization Sohail Bahmani, Petros Boufounos, and Bhiksha Raj 3 sbahmani@andrew.cmu.edu petrosb@merl.com 3 bhiksha@cs.cmu.edu Department of Electrical and Computer Engineering,

More information

A new method on deterministic construction of the measurement matrix in compressed sensing

A new method on deterministic construction of the measurement matrix in compressed sensing A new method on deterministic construction of the measurement matrix in compressed sensing Qun Mo 1 arxiv:1503.01250v1 [cs.it] 4 Mar 2015 Abstract Construction on the measurement matrix A is a central

More information

Exponential decay of reconstruction error from binary measurements of sparse signals

Exponential decay of reconstruction error from binary measurements of sparse signals Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

On the Solution of the GPS Localization and Circle Fitting Problems

On the Solution of the GPS Localization and Circle Fitting Problems On the Solution of the GPS Localization and Circle Fitting Problems Amir Beck Technion - Israel Institute of Technology Haifa, Israel Joint work with Dror Pan, Technion. Applications of Optimization in

More information

Sparse Optimization Lecture: Sparse Recovery Guarantees

Sparse Optimization Lecture: Sparse Recovery Guarantees Those who complete this lecture will know Sparse Optimization Lecture: Sparse Recovery Guarantees Sparse Optimization Lecture: Sparse Recovery Guarantees Instructor: Wotao Yin Department of Mathematics,

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Generalized Power Method for Sparse Principal Component Analysis

Generalized Power Method for Sparse Principal Component Analysis Generalized Power Method for Sparse Principal Component Analysis Peter Richtárik CORE/INMA Catholic University of Louvain Belgium VOCAL 2008, Veszprém, Hungary CORE Discussion Paper #2008/70 joint work

More information

Deflation Methods for Sparse PCA

Deflation Methods for Sparse PCA Deflation Methods for Sparse PCA Lester Mackey Computer Science Division University of California, Berkeley Berkeley, CA 94703 Abstract In analogy to the PCA setting, the sparse PCA problem is often solved

More information

Geometry of log-concave Ensembles of random matrices

Geometry of log-concave Ensembles of random matrices Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA Matthias Hein Thomas Bühler Saarland University, Saarbrücken, Germany {hein,tb}@cs.uni-saarland.de

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Taylor-like models in nonsmooth optimization

Taylor-like models in nonsmooth optimization Taylor-like models in nonsmooth optimization Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with Ioffe (Technion), Lewis (Cornell), and Paquette (UW) SIAM Optimization 2017 AFOSR,

More information

Expectation-Maximization for Sparse and Non-Negative PCA

Expectation-Maximization for Sparse and Non-Negative PCA Christian D. Sigg Joachim M. Buhmann Institute of Computational Science, ETH Zurich, 8092 Zurich, Switzerland chrsigg@inf.ethz.ch jbuhmann@inf.ethz.ch Abstract We study the problem of finding the dominant

More information

Optimization for Learning and Big Data

Optimization for Learning and Big Data Optimization for Learning and Big Data Donald Goldfarb Department of IEOR Columbia University Department of Mathematics Distinguished Lecture Series May 17-19, 2016. Lecture 1. First-Order Methods for

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

arxiv: v1 [math.na] 26 Nov 2009

arxiv: v1 [math.na] 26 Nov 2009 Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Iterative Hard Thresholding for Compressed Sensing

Iterative Hard Thresholding for Compressed Sensing Iterative Hard Thresholding for Compressed Sensing Thomas lumensath and Mike E. Davies 1 Abstract arxiv:0805.0510v1 [cs.it] 5 May 2008 Compressed sensing is a technique to sample compressible signals below

More information

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Z Algorithmic Superpower Randomization October 15th, Lecture 12 15.859-Z Algorithmic Superpower Randomization October 15th, 014 Lecture 1 Lecturer: Bernhard Haeupler Scribe: Goran Žužić Today s lecture is about finding sparse solutions to linear systems. The problem

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms

Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Sparse analysis Lecture III: Dictionary geometry and greedy algorithms Anna C. Gilbert Department of Mathematics University of Michigan Intuition from ONB Key step in algorithm: r, ϕ j = x c i ϕ i, ϕ j

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

Multipath Matching Pursuit

Multipath Matching Pursuit Multipath Matching Pursuit Submitted to IEEE trans. on Information theory Authors: S. Kwon, J. Wang, and B. Shim Presenter: Hwanchol Jang Multipath is investigated rather than a single path for a greedy

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit

Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23

More information

The Pros and Cons of Compressive Sensing

The Pros and Cons of Compressive Sensing The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal

More information

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017

Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017 Additional Exercises for Introduction to Nonlinear Optimization Amir Beck March 16, 2017 Chapter 1 - Mathematical Preliminaries 1.1 Let S R n. (a) Suppose that T is an open set satisfying T S. Prove that

More information

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Rahul Garg IBM T.J. Watson research center grahul@us.ibm.com Rohit Khandekar IBM T.J. Watson research center rohitk@us.ibm.com

More information

Compressive sensing in the analog world

Compressive sensing in the analog world Compressive sensing in the analog world Mark A. Davenport Georgia Institute of Technology School of Electrical and Computer Engineering Compressive Sensing A D -sparse Can we really acquire analog signals

More information

Key words. first order methods, trust region subproblem, optimality conditions, global optimum

Key words. first order methods, trust region subproblem, optimality conditions, global optimum GLOBALLY SOLVING THE TRUST REGION SUBPROBLEM USING SIMPLE FIRST-ORDER METHODS AMIR BECK AND YAKOV VAISBOURD Abstract. We consider the trust region subproblem which is given by a minimization of a quadratic,

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell

More information

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery

Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Sparse analysis Lecture V: From Sparse Approximation to Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan Connection between... Sparse Approximation and Compressed

More information

Info-Greedy Sequential Adaptive Compressed Sensing

Info-Greedy Sequential Adaptive Compressed Sensing Info-Greedy Sequential Adaptive Compressed Sensing Yao Xie Joint work with Gabor Braun and Sebastian Pokutta Georgia Institute of Technology Presented at Allerton Conference 2014 Information sensing for

More information

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form:

N. L. P. NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP. Optimization. Models of following form: 0.1 N. L. P. Katta G. Murty, IOE 611 Lecture slides Introductory Lecture NONLINEAR PROGRAMMING (NLP) deals with optimization models with at least one nonlinear function. NLP does not include everything

More information

Copositive Programming and Combinatorial Optimization

Copositive Programming and Combinatorial Optimization Copositive Programming and Combinatorial Optimization Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria joint work with I.M. Bomze (Wien) and F. Jarre (Düsseldorf) IMA

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Symmetric Factorization for Nonconvex Optimization

Symmetric Factorization for Nonconvex Optimization Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling

More information

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Uniqueness Conditions for A Class of l 0 -Minimization Problems Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Thresholds for the Recovery of Sparse Solutions via L1 Minimization Thresholds for the Recovery of Sparse Solutions via L Minimization David L. Donoho Department of Statistics Stanford University 39 Serra Mall, Sequoia Hall Stanford, CA 9435-465 Email: donoho@stanford.edu

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

A Note on the Complexity of L p Minimization

A Note on the Complexity of L p Minimization Mathematical Programming manuscript No. (will be inserted by the editor) A Note on the Complexity of L p Minimization Dongdong Ge Xiaoye Jiang Yinyu Ye Abstract We discuss the L p (0 p < 1) minimization

More information