Linear dimensionality reduction for data analysis

Size: px

Start display at page:

Download "Linear dimensionality reduction for data analysis"

Eugene Taylor
5 years ago
Views:

1 Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino

2 The setup Dimensionality reduction for data analysis Given a set of n data points m j (j = 1, 2,..., n), we would like to understand the underlying structure of this data. Nicolas Gillis Linear dimensionality reduction for data analysis 2

3 The setup Dimensionality reduction for data analysis Given a set of n data points m j (j = 1, 2,..., n), we would like to understand the underlying structure of this data. A fundamental and powerful tool is linear dimensionality reduction: find a set of r basis vectors u k (1 k r) so that for all j for some weights v kj. Nicolas Gillis Linear dimensionality reduction for data analysis 2

4 The setup Dimensionality reduction for data analysis Given a set of n data points m j (j = 1, 2,..., n), we would like to understand the underlying structure of this data. A fundamental and powerful tool is linear dimensionality reduction: find a set of r basis vectors u k (1 k r) so that for all j for some weights v kj. This is equivalent to the low-rank approximation of matrix M: M = [m 1 m 2... m n ] [u 1 u 2... u r ] [v 1 v 2... v n ] = UV. Nicolas Gillis Linear dimensionality reduction for data analysis 2

5 Constrained Low-Rank Matrix Approximations Nicolas Gillis Linear dimensionality reduction for data analysis 3

6 Constrained Low-Rank Matrix Approximations How to measure the error M UV? Ex. PCA/truncated SVD use X = X 2 F = i,j X 2 ij. Nicolas Gillis Linear dimensionality reduction for data analysis 3

7 Constrained Low-Rank Matrix Approximations How to measure the error M UV? Ex. PCA/truncated SVD use X = X 2 F = i,j X 2 ij. What constraints should the factors U Ω U and V Ω V satisfy? Ex. PCA has no constraints, NMF with U 0 and V 0. Nicolas Gillis Linear dimensionality reduction for data analysis 3

8 Constrained Low-Rank Matrix Approximations How to measure the error M UV? Ex. PCA/truncated SVD use X = X 2 F = i,j X 2 ij. What constraints should the factors U Ω U and V Ω V satisfy? Ex. PCA has no constraints, NMF with U 0 and V 0. Goal of this presentation: show some applications, present several models and discuss some algorithms. Nicolas Gillis Linear dimensionality reduction for data analysis 3

9 Recommender systems In some cases, some entries are missing/unknown. For example, we would like to predict how much someone is going to like a movie based on its movie preferences (e.g., 1 to 5 stars) : Movies Users 2 3 2??? 1? 3 2 1? 4 1? 5 4? 3 2? 1 2? 4 1? Nicolas Gillis Linear dimensionality reduction for data analysis 4

10 Recommender systems In some cases, some entries are missing/unknown. For example, we would like to predict how much someone is going to like a movie based on its movie preferences (e.g., 1 to 5 stars) : Movies Users 2 3 2??? 1? 3 2 1? 4 1? 5 4? 3 2? 1 2? 4 1? Huge potential in electronic commercial sites (movies, books, music,... ). Good recommendations will increase the propensity of a purchase. Nicolas Gillis Linear dimensionality reduction for data analysis 4

11 Low-rank matrix approximations The behavior of users is modeled using linear combination of feature users (related to age, sex, culture, etc.) M(:, j) }{{} user j r k=1 U(:, k) }{{} feature user k V (k, j) }{{} weights Nicolas Gillis Linear dimensionality reduction for data analysis 5

12 Low-rank matrix approximations The behavior of users is modeled using linear combination of feature users (related to age, sex, culture, etc.) M(:, j) }{{} user j r k=1 U(:, k) }{{} feature user k V (k, j) }{{} weights Or equivalently, movies ratings are modeled as linear combinations of feature movies (related to the genres - child oriented, serious vs. escapist, thriller, romantic, actors, etc.). M(i, :) }{{} movie i r k=1 U(i, k) }{{} weights V (k, :) }{{} genre k Nicolas Gillis Linear dimensionality reduction for data analysis 5

13 For example, using a rank-2 factorization on the Netflix dataset, female vs. male and serious vs. escapist behaviors were extracted. Koren, Bell, Volinsky, Matrix Factorization Techniques for Recommender Systems, Winners of the Netflix prize 1,000,000$. Nicolas Gillis Linear dimensionality reduction for data analysis 6

14 PCA with weights and missing data inf U R m r V R r n M UV T 2 W = ij where W 0 is a weight matrix. W ij (M UV T ) 2 ij, (WLRA) Nicolas Gillis Linear dimensionality reduction for data analysis 7

15 PCA with weights and missing data inf U R m r V R r n M UV T 2 W = ij where W 0 is a weight matrix. W ij (M UV T ) 2 ij, NP-hard in general (G., Glineur 2011). Ill-posed, e.g., M = (WLRA) ( 1? 0 1 ). Nicolas Gillis Linear dimensionality reduction for data analysis 7

16 PCA with weights and missing data inf U R m r V R r n M UV T 2 W = ij where W 0 is a weight matrix. W ij (M UV T ) 2 ij, NP-hard in general (G., Glineur 2011). Ill-posed, e.g., M = (WLRA) ( 1? 0 1 Convexification using the nuclear norm: min X X + M X 2 W with O(nr log 2 n) observations (Fazel 2002, Candès and Recht 2009) assuming incoherence. ). Nicolas Gillis Linear dimensionality reduction for data analysis 7

17 PCA with weights and missing data inf U R m r V R r n M UV T 2 W = ij where W 0 is a weight matrix. W ij (M UV T ) 2 ij, NP-hard in general (G., Glineur 2011). Ill-posed, e.g., M = (WLRA) ( 1? 0 1 Convexification using the nuclear norm: min X X + M X 2 W with O(nr log 2 n) observations (Fazel 2002, Candès and Recht 2009) assuming incoherence. Alternating/Local minimization can lead to optimal solutions under similar assumptions (Keshavan, Montanari, Oh 2009; Jain, Netrapalli, Sanghavi 2013; Last week: Bhojanapalli, Neyshabur, Srebro 2016; Ge, Lee, Ma 2016). Riemannian optimization techniques (Boumal, Absil 2011). ). Nicolas Gillis Linear dimensionality reduction for data analysis 7

18 PCA with weights and missing data inf U R m r V R r n M UV T 2 W = ij where W 0 is a weight matrix. W ij (M UV T ) 2 ij, NP-hard in general (G., Glineur 2011). Ill-posed, e.g., M = (WLRA) ( 1? 0 1 Convexification using the nuclear norm: min X X + M X 2 W with O(nr log 2 n) observations (Fazel 2002, Candès and Recht 2009) assuming incoherence. Alternating/Local minimization can lead to optimal solutions under similar assumptions (Keshavan, Montanari, Oh 2009; Jain, Netrapalli, Sanghavi 2013; Last week: Bhojanapalli, Neyshabur, Srebro 2016; Ge, Lee, Ma 2016). Riemannian optimization techniques (Boumal, Absil 2011). See the youtube video Linear Inverse Problems by Ankur Moitra (MIT). Nicolas Gillis Linear dimensionality reduction for data analysis 7 ).

19 Background substraction in a video sequence Nicolas Gillis Linear dimensionality reduction for data analysis 8

20 Background substraction in a video sequence Nicolas Gillis Linear dimensionality reduction for data analysis 8

21 Robust PCA min M X 0 + γ X UV T. (RPCA) X R m n,u R m r,v R r n Nicolas Gillis Linear dimensionality reduction for data analysis 9

22 Robust PCA min M X 0 + γ X UV T. (RPCA) X R m n,u R m r,v R r n Very similar developments as for PCA with missing data: Convexification using the l 1 and nuclear norms: min X X + M X 1 (Chandrasekaran, Sanghavi, Parrilo, Willsky 2011, Candès, Li, Ma, Wright 2011), provably recovers the solutions if the model holds (M is low-rank + sparse + some noise). Nicolas Gillis Linear dimensionality reduction for data analysis 9

23 Robust PCA min M X 0 + γ X UV T. (RPCA) X R m n,u R m r,v R r n Very similar developments as for PCA with missing data: Convexification using the l 1 and nuclear norms: min X X + M X 1 (Chandrasekaran, Sanghavi, Parrilo, Willsky 2011, Candès, Li, Ma, Wright 2011), provably recovers the solutions if the model holds (M is low-rank + sparse + some noise). Alternating minimization can lead to optimal solutions under similar assumptions (Netrapalli, Niranjan, Sanghavi, Anandkumar 2014). Nicolas Gillis Linear dimensionality reduction for data analysis 9

24 Complexity of Robust PCA min u R m,v R n M uv T 1 = i,j M ij u i v j. (rank-one RPCA) Nicolas Gillis Linear dimensionality reduction for data analysis 10

25 Complexity of Robust PCA min u R m,v R n M uv T 1 = i,j M ij u i v j. (rank-one RPCA) NP-hard problem (G. and Vavasis 2018). Nicolas Gillis Linear dimensionality reduction for data analysis 10

26 Complexity of Robust PCA min u R m,v R n M uv T 1 = i,j M ij u i v j. (rank-one RPCA) NP-hard problem (G. and Vavasis 2018). If M is binary, M {0, 1} m n, any optimal solution (u, v ) can be assumed to be binary, that is, (u, v ) {0, 1} m {0, 1} n. Nicolas Gillis Linear dimensionality reduction for data analysis 10

27 Complexity of Robust PCA min u R m,v R n M uv T 1 = i,j M ij u i v j. (rank-one RPCA) NP-hard problem (G. and Vavasis 2018). If M is binary, M {0, 1} m n, any optimal solution (u, v ) can be assumed to be binary, that is, (u, v ) {0, 1} m {0, 1} n. MAX-CUT can be reduced to the binary l 1 -norm rank-one approximation. Nicolas Gillis Linear dimensionality reduction for data analysis 10

28 Blind hyperspectral unmixing Figure: Urban hyperspectral image, 162 spectral bands and 307-by-307 pixels. Nicolas Gillis Linear dimensionality reduction for data analysis 11

29 Blind hyperspectral unmixing Figure: Urban hyperspectral image, 162 spectral bands and 307-by-307 pixels. Problem. Identify the materials and classify the pixels. Model. Linear mixing model. Nicolas Gillis Linear dimensionality reduction for data analysis 11

30 Linear mixing model Nicolas Gillis Linear dimensionality reduction for data analysis 12

31 Linear mixing model Nicolas Gillis Linear dimensionality reduction for data analysis 12

32 Blind hyperspectral unmixing Nicolas Gillis Linear dimensionality reduction for data analysis 13

33 Blind hyperspectral unmixing Basis elements allow to recover the different endmembers: U 0; Nicolas Gillis Linear dimensionality reduction for data analysis 13

34 Blind hyperspectral unmixing Basis elements allow to recover the different endmembers: U 0; Abundances of the endmembers in each pixel: V 0. Nicolas Gillis Linear dimensionality reduction for data analysis 13

35 Urban hyperspectral image Nicolas Gillis Linear dimensionality reduction for data analysis 14

36 Urban hyperspectral image Figure: Decomposition of the Urban dataset. Nicolas Gillis Linear dimensionality reduction for data analysis 15

37 Urban hyperspectral image Figure: Decomposition of the Urban dataset. Nicolas Gillis Linear dimensionality reduction for data analysis 15

38 Urban hyperspectral image Figure: Decomposition of the Urban dataset. Nicolas Gillis Linear dimensionality reduction for data analysis 15

39 Nonnegative Matrix Factorization (NMF) Given a matrix M R p n + and a factorization rank r min(p, n), find U R p r and V R r n such that min U 0,V 0 M UV 2 F = i,j (M UV ) 2 ij. (NMF) Nicolas Gillis Linear dimensionality reduction for data analysis 16

40 Nonnegative Matrix Factorization (NMF) Given a matrix M R p n + and a factorization rank r min(p, n), find U R p r and V R r n such that min U 0,V 0 M UV 2 F = i,j (M UV ) 2 ij. (NMF) NMF is a linear dimensionality reduction technique for nonnegative data : r M(:, i) }{{} 0 k=1 U(:, k) }{{} 0 V (k, i) }{{} 0 for all i. Nicolas Gillis Linear dimensionality reduction for data analysis 16

41 Nonnegative Matrix Factorization (NMF) Given a matrix M R p n + and a factorization rank r min(p, n), find U R p r and V R r n such that min U 0,V 0 M UV 2 F = i,j (M UV ) 2 ij. (NMF) NMF is a linear dimensionality reduction technique for nonnegative data : r M(:, i) }{{} 0 k=1 U(:, k) }{{} 0 V (k, i) }{{} 0 for all i. Why nonnegativity? Interpretability: Nonnegativity constraints lead to easily interpretable factors (and a sparse and part-based representation). Many applications. image processing, text mining, hyperspectral unmixing, community detection, clustering, etc. Nicolas Gillis Linear dimensionality reduction for data analysis 16

42 Application 2: topic recovery and document classification Nicolas Gillis Linear dimensionality reduction for data analysis 17

43 Application 2: topic recovery and document classification Basis elements allow to recover the different topics; Nicolas Gillis Linear dimensionality reduction for data analysis 17

44 Application 2: topic recovery and document classification Basis elements allow to recover the different topics; Weights allow to assign each text to its corresponding topics. Nicolas Gillis Linear dimensionality reduction for data analysis 17

45 Standard NMF Algorithms We would like to solve min M UV U R m r,v R 2 r n F such that U 0, V 0, (NMF) which is NP-hard in general (Vavasis, 2009). Nicolas Gillis Linear dimensionality reduction for data analysis 18

46 Standard NMF Algorithms We would like to solve min M UV U R m r,v R 2 r n F such that U 0, V 0, (NMF) which is NP-hard in general (Vavasis, 2009). Most NMF algorithms use a two-block coordinate descent scheme, since subproblems are convex nonnegative least squares (NNLS): (0) Select initial matrices (U, V ). Then repeat the following two steps: (i) Fix V : find a new U 0 such that M UV 2 F is reduced. (ii) Fix U: find a new V 0 such that M UV 2 F is reduced. Nicolas Gillis Linear dimensionality reduction for data analysis 18

47 Standard NMF Algorithms We would like to solve min M UV U R m r,v R 2 r n F such that U 0, V 0, (NMF) which is NP-hard in general (Vavasis, 2009). Most NMF algorithms use a two-block coordinate descent scheme, since subproblems are convex nonnegative least squares (NNLS): (0) Select initial matrices (U, V ). Then repeat the following two steps: (i) Fix V : find a new U 0 such that M UV 2 F is reduced. (ii) Fix U: find a new V 0 such that M UV 2 F is reduced. So far, no optimality guarantees for alternating optimization for NMF (under some suitable conditions). Nicolas Gillis Linear dimensionality reduction for data analysis 18

48 Standard NMF Algorithms We would like to solve min M UV U R m r,v R 2 r n F such that U 0, V 0, (NMF) which is NP-hard in general (Vavasis, 2009). Most NMF algorithms use a two-block coordinate descent scheme, since subproblems are convex nonnegative least squares (NNLS): (0) Select initial matrices (U, V ). Then repeat the following two steps: (i) Fix V : find a new U 0 such that M UV 2 F is reduced. (ii) Fix U: find a new V 0 such that M UV 2 F is reduced. So far, no optimality guarantees for alternating optimization for NMF (under some suitable conditions). Is there another way? Nicolas Gillis Linear dimensionality reduction for data analysis 18

49 Separability Assumption Separability of M: there exists an index set K and V 0 with M = M(:, K) V, with K = r. }{{} U Nicolas Gillis Linear dimensionality reduction for data analysis 19

50 Separability Assumption Separability of M: there exists an index set K and V 0 with M = M(:, K) V, with K = r. }{{} U [AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization Provably, STOC Nicolas Gillis Linear dimensionality reduction for data analysis 19

51 Separability Assumption Separability of M: there exists an index set K and V 0 with M = M(:, K) V, with K = r. }{{} U [AGKM12] Arora, Ge, Kannan, Moitra, Computing a Nonnegative Matrix Factorization Provably, STOC Nicolas Gillis Linear dimensionality reduction for data analysis 19

52 Applications In hyperspectral imaging, this is the pure-pixel assumption: for each material, there is a pure pixel containing only that material. [M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insights from Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, Nicolas Gillis Linear dimensionality reduction for data analysis 20

53 Applications In hyperspectral imaging, this is the pure-pixel assumption: for each material, there is a pure pixel containing only that material. [M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insights from Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, In document classification: for each topic, there is a pure word used only by that topic (an anchor word). [A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML Nicolas Gillis Linear dimensionality reduction for data analysis 20

54 Applications In hyperspectral imaging, this is the pure-pixel assumption: for each material, there is a pure pixel containing only that material. [M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insights from Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, In document classification: for each topic, there is a pure word used only by that topic (an anchor word). [A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML Time-resolved raman spectra analysis: each substance has a peak in its spectrum while the other spectra are (close to) zero. [L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis of Time-Resolved Raman Spectra, Nicolas Gillis Linear dimensionality reduction for data analysis 20

55 Applications In hyperspectral imaging, this is the pure-pixel assumption: for each material, there is a pure pixel containing only that material. [M+14] Ma et al., A Signal Processing Perspective on Hyperspectral Unmixing: Insights from Remote Sensing, IEEE Signal Processing Magazine 31(1):67-81, In document classification: for each topic, there is a pure word used only by that topic (an anchor word). [A+13] Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML Time-resolved raman spectra analysis: each substance has a peak in its spectrum while the other spectra are (close to) zero. [L+16] Luce et al., Using Separable Nonnegative Matrix Factorization for the Analysis of Time-Resolved Raman Spectra, Others: video summarization, foreground-background separation. [ESV12] Elhamifar, Sapiro, Vidal, See all by looking at a few: Sparse modeling for finding representative objects, CVPR [KSK13] Kumar, Sindhwani, Near-separable Non-negative Matrix Factorization with l 1 - and Bregman Loss Functions, SIAM data mining Nicolas Gillis Linear dimensionality reduction for data analysis 20

56 Geometric Interpretation The columns of U are the vertices of the convex hull of the columns of M: r r M(:, j) = U(:, k)v (k, j) j, where V (k, j) = 1, V 0. k=1 k=1 Nicolas Gillis Linear dimensionality reduction for data analysis 21

57 Geometric Interpretation with Noise The columns of U are the vertices of the convex hull of the columns of M: r r M(:, j) U(:, k)v (k, j) j, where V (k, j) = 1, V 0. k=1 k=1 Nicolas Gillis Linear dimensionality reduction for data analysis 21

58 Successive Projection Algorithm (SPA) 0: Initially K =. For i = 1 : r 1: Find j = argmax j M(:, j). 2: K = K {j }. 3: M ( I uu T ) M where u = M(:,j ) M(:,j ) 2. end modified Gram-Schmidt with column pivoting. Nicolas Gillis Linear dimensionality reduction for data analysis 22

59 Successive Projection Algorithm (SPA) 0: Initially K =. For i = 1 : r 1: Find j = argmax j M(:, j). 2: K = K {j }. 3: M ( I uu T ) M where u = M(:,j ) M(:,j ) 2. end modified Gram-Schmidt with column pivoting. ) Theorem. If ɛ O, SPA satisfies ( σmin (U) rκ 2 (U) max U(:, k) M(:, K(k)) O ( ɛκ 2 (U) ). 1 k r Advantages. Extremely fast, no parameter. Drawbacks. Requires U to be full rank; bound is weak. [GV14] G., Vavasis, Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization, IEEE Trans. Patt. Anal. Mach. Intell. 36 (4), pp , Nicolas Gillis Linear dimensionality reduction for data analysis 22

60 Numerical results for the Urban HSI Nicolas Gillis Linear dimensionality reduction for data analysis 23

61 Minimum-volume NMF: Relaxing separability test min M M(:, K)V K,V 0 2 F such that K = r. Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, IEEE Signal Proc. Magazine, Nicolas Gillis Linear dimensionality reduction for data analysis 24

62 Minimum-volume NMF: Relaxing separability min vol(u) such that M UV U 0,V 0 2 F ɛ, where vol(u) det(u T U), V (:, j) r for all j. Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, IEEE Signal Proc. Magazine, Nicolas Gillis Linear dimensionality reduction for data analysis 24

63 Minimum-volume NMF: Relaxing separability min vol(u) such that M UV U 0,V 0 2 F ɛ, where vol(u) det(u T U), V (:, j) r for all j. Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, IEEE Signal Proc. Magazine, Nicolas Gillis Linear dimensionality reduction for data analysis 24

64 Minimum-volume NMF: Relaxing separability min vol(u) such that M UV U 0,V 0 2 F ɛ, where vol(u) det(u T U), V (:, j) r for all j. Open problems: Efficient algorithm for min-vol NMF, robustness to noise. Fu, Huang, Sidiropoulos, Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications, IEEE Signal Proc. Magazine, Nicolas Gillis Linear dimensionality reduction for data analysis 24

65 Sequential NMF with underapproximations It is possible to solve NMF sequentially, solving at each step min M uv T 2 F such that uv T M M uv T 0. u 0,v 0 Nicolas Gillis Linear dimensionality reduction for data analysis 25

66 Sequential NMF with underapproximations It is possible to solve NMF sequentially, solving at each step min M uv T 2 F such that uv T M M uv T 0. u 0,v 0 NMU is yet another linear dimensionality reduction technique. However, As PCA/SVD, it is sequential and is well-posed. G., Glineur, Using Underapproximations for Sparse Nonnegative Matrix Factorization, Pattern Recognition, Nicolas Gillis Linear dimensionality reduction for data analysis 25

67 Sequential NMF with underapproximations It is possible to solve NMF sequentially, solving at each step min M uv T 2 F such that uv T M M uv T 0. u 0,v 0 NMU is yet another linear dimensionality reduction technique. However, As PCA/SVD, it is sequential and is well-posed. As NMF, it leads to a separation by parts. Moreover the additional underapproximation constraints enhance this property. G., Glineur, Using Underapproximations for Sparse Nonnegative Matrix Factorization, Pattern Recognition, Nicolas Gillis Linear dimensionality reduction for data analysis 25

68 Sequential NMF with underapproximations It is possible to solve NMF sequentially, solving at each step min M uv T 2 F such that uv T M M uv T 0. u 0,v 0 NMU is yet another linear dimensionality reduction technique. However, As PCA/SVD, it is sequential and is well-posed. As NMF, it leads to a separation by parts. Moreover the additional underapproximation constraints enhance this property. In the presence of pure-pixels, the NMU recursion is able to detect materials individually. G., Glineur, Using Underapproximations for Sparse Nonnegative Matrix Factorization, Pattern Recognition, G., Plemmons, Dimensionality Reduction, Classification, and Spectral Mixture Analysis using Nonnegative Underapproximation, Optical Engineering, Nicolas Gillis Linear dimensionality reduction for data analysis 25

69 Urban hyperspectral image Nicolas Gillis Linear dimensionality reduction for data analysis 26

70 Urban hyperspectral image Nicolas Gillis Linear dimensionality reduction for data analysis 26

71 Urban hyperspectral image Nicolas Gillis Linear dimensionality reduction for data analysis 26

72 Urban hyperspectral image Nicolas Gillis Linear dimensionality reduction for data analysis 26

73 Urban hyperspectral image Nicolas Gillis Linear dimensionality reduction for data analysis 26

74 Sparse low-rank matrix approximations Decompose a low-rank matrix with known coefficient sparsity. M = UV, rank(m) = rank(u) = r, V (:, j) 0 k = r s < r j. Nicolas Gillis Linear dimensionality reduction for data analysis 27

75 Sparse low-rank matrix approximations Decompose a low-rank matrix with known coefficient sparsity. M = UV, rank(m) = rank(u) = r, V (:, j) 0 k = r s < r j. Many existing theoretical results (see, e.g., [Gribonval 16]) and algorithms (Dictionary Learning). But: Not many results specific to the low-rank case Only two deterministic identifiability results [Elad 06, Georgiev 05] Not much in the NMF case except l 1 regularization Nicolas Gillis Linear dimensionality reduction for data analysis 27

76 Identifiability with sparsity: example Example: p = 3, r = 3, s=sparsity=1, n = 9. data points first decomposition

77 Identifiability with sparsity: example Example: p = 3, r = 3, s=sparsity=1, n = 9. data points first decomposition second decomposition Nicolas Gillis Linear dimensionality reduction for data analysis 28

78 Identifiability results Theorem Let M = UV where rank(u) = rank(m) = r and each column of V has at least s zeros. Nicolas Gillis Linear dimensionality reduction for data analysis 29

79 Identifiability results Theorem Let M = UV where rank(u) = rank(m) = r and each column of V has at least s zeros. The factorization (U, V ) is essentially unique if on each hyperplane spanned by all but one column of U, there are r(r 2) s + 1 data points with spark r. [CG18] Cohen, G., Identifiability of Low-Rank Sparse Component Analysis, arxiv: Nicolas Gillis Linear dimensionality reduction for data analysis 29

80 Geometric intuition Example: p = 3, r = 3, sparsity=1, n = = 9. data points unique decomposition Nicolas Gillis Linear dimensionality reduction for data analysis 30

81 Sparsity in action Spectral unmixing, R = 6, s = 4 Nicolas Gillis Linear dimensionality reduction for data analysis 31

82 Sparsity in action Spectral unmixing, R = 6, s = 4 Sparsity is another way to obtain identifiability for matrix decompositions. Nicolas Gillis Linear dimensionality reduction for data analysis 31

83 Sparsity in action Spectral unmixing, R = 6, s = 4 Sparsity is another way to obtain identifiability for matrix decompositions. Hard combinatorial problems to solve... Nicolas Gillis Linear dimensionality reduction for data analysis 31

84 Conclusion 1 Low-rank matrix approximations are useful and widely used linear models in data analysis and machine learning. Nicolas Gillis Linear dimensionality reduction for data analysis 32

85 Conclusion 1 Low-rank matrix approximations are useful and widely used linear models in data analysis and machine learning. 2 Except for PCA, most of these models lead to difficult non-convex optimization problems. Nicolas Gillis Linear dimensionality reduction for data analysis 32

86 Conclusion 1 Low-rank matrix approximations are useful and widely used linear models in data analysis and machine learning. 2 Except for PCA, most of these models lead to difficult non-convex optimization problems. 3 However, under some appropriate assumptions, solutions with optimality guarantee can be recovered (using convexification, standard optimization schemes or dedicated algorithms). Nicolas Gillis Linear dimensionality reduction for data analysis 32

87 Conclusion 1 Low-rank matrix approximations are useful and widely used linear models in data analysis and machine learning. 2 Except for PCA, most of these models lead to difficult non-convex optimization problems. 3 However, under some appropriate assumptions, solutions with optimality guarantee can be recovered (using convexification, standard optimization schemes or dedicated algorithms). 4 This is a very active area of research. Many open questions (e.g., uniqueness issues, drawing the line between easy and difficult instances), new models, and applications. Nicolas Gillis Linear dimensionality reduction for data analysis 32

88 Thank you for your attention! Code and papers available on Nicolas Gillis Linear dimensionality reduction for data analysis 33

Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization

Semidefinite Programming Based Preconditioning for More Robust Near-Separable Nonnegative Matrix Factorization Nicolas Gillis nicolas.gillis@umons.ac.be https://sites.google.com/site/nicolasgillis/ Department