Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy

Size: px

Start display at page:

Download "Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy"

Emil Lindsey
5 years ago
Views:

1 Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence spectroscopy Caroline Chaux Joint work with X. Vu, N. Thirion-Moreau and S. Maire (LSIS, Toulon) Aix-Marseille Univ. I2M IHP, Feb / 32

2 Outline Introduction 3D fluorescence spectroscopy Tensor definition Goal Proximal tools Criterion formulation Proximity operator Proximal algorithm Application to CPD Minimization problem Algorithm Numerical simulations Synthetic case Real case: water monitoring Conclusion and future work 2 / 32

2.15.15 1.1.1 5.5.5 2 22 24 26 28 3 32 34

3 3D fluorescence spectroscopy.35 Normalized excitation spectra.4 Normalized emission spectra 25 Normalized concentrations Experiment 3 / 32

.1.5 3 32 34 36 38 4 42 44 46 48 5 5 FEEM Normalized

.1.5 2 22 24 26 28 3 32 34 36 38 4 45 4 35 5 45 4 35

4 Coumpounds characterisation.4 Normalized emission spectra FEEM Normalized excitation spectra / 32

5 Tensor What is a tensor? An Nth-order tensor is represented by an N-way array in a chosen basis. Example: N = 1: a vector. N = 2: a matrix. 5 / 32

6 Third-order tensors A special case: nonnegative third-order tensors (N = 3) T = (t i1i 2i 3 ) i1,i 2,i 3 R +I1 I2 I3. The Canonical Polyadic (CP) decomposition: Tensor rank T = R r=1 ā (1) r ā (2) r ā (3) r = [Ā(1), Ā(2), Ā(3) ] Loading vectors n {1, 2, 3}, ā (n) r R +In and Ā(n) R In R : the outer product. Entry-wise form: Loading matrices t i1i 2i 3 = R r=1 ā (1) i 1r ā(2) i 2r ā(3) i 3r, (i 1, i 2, i 3 ) 6 / 32

7 Standard operations Outer product: let u R I, v R J, u v = uv R I J Khatri-Rao product: let U = [u 1, u 2,..., u J ] R I J and V = [v 1, v 2,..., v J ] R K J U V = [u 1 v 1, u 2 v 2,..., u J v J ] R IK J. where u v = [u 1 v;... ; u I v] R IK (Kronecker product). Hadamard division: let U R I J, V R I J, U V = (u ij /v ij ) i,j R I J 7 / 32

8 Tensor flattening: example Objective: to handle matrices instead of tensors. Tensor Mode 1 Mode 2 Mode 3 8 / 32

9 3D fluorescence spectroscopy and tensors T = R r=1 ā (1) r ā (2) r ā (3) r.35 Normalized excitation spectra.4 Normalized emission spectra 25 Normalized concentrations Experiment 9 / 32

10 Objective: tensor decomposition Input: Output: Observed tensor T : observation of an original (unknown) tensor T possibly degraded (noise). Estimated loading matrices Â (n) for all n {1, 2, 3} Difficulty: Rank R unknown (i.e. R R): generally i) estimated or ii) overestimated. Proposed approach Formulate the problem under a variational approach. 1 / 32

11 Minimization problem Standard problem: minimize x R L F(x) }{{} Fidelity + R(x) }{{}. Regularization Taking into account several regularizations (J terms): R(x) = J R j (x) For large size problem or for other reasons, can be interesting to work on data blocks x (j) of size L j (x = (x (j) ) 1 j J ) j=1 R(x) = J R j (x (j) ) Technical assumptions: F, R and R j are proper lower semi-continuous functions. F is differentiable with a β-lipschitz gradient. R j is assumed to be bounded from below by an affine function, and its restriction to its domain is continuous. j=1 11 / 32

12 Proximity operator let ϕ : R ], + ] be a proper lower semi-continuous function. The proximity operator is defined as 1 prox ϕ : R R: v arg min u R 2 u v 2 + ϕ(u), let ϕ : R L ], + ] be a proper lower semi-continuous function. The proximity operator associated with a Symmetric Positive Definite (SPD) matrix P is defined as prox P,ϕ : R L R L 1 : v arg min u R L 2 u v 2 P + ϕ(u), where x R L, x 2 P = x, Px and, is the inner product. Remark : Note that if P reduces to the identity matrix, then the two definitions coincides. 12 / 32

13 Criterion to be minimized minimize F(x) + x R L J R j (x (j) ) j=1 Some solutions (non exhaustive list, CPD oriented): Proximal Alternating Linearized Minimization (PALM) [Bolte et al., 214] A Block Coordinate Descent Method for both CPD and Tucker decomposition [Xu and Yin, 213] An accelerated projection gradient based algorithm [Zhang et al., 216] Block-Coordinate Variable Metric Forward-Backward (BC-VMFB) algorithm [Chouzenoux et al., 216] Advantages of the BC-VMFB: flexible, stable, integrates preconditionning, relatively fast. 13 / 32

14 Block coordinate proximal algorithm 1: Let x domr, k N and γ k ], + [ // Initialization step 2: for k =, 1,... do // k-th iteration of the algorithm 3: Let j k {1,..., J} // Processing of block number j k (chosen, here, according to a quasi cyclic rule) 4: Let P jk (x k) be a SPD matrix // Construction of the preconditioner P jk (x k) 5: Let jk F(x k) be the Gradient // Calculation of Gradient 6: x (j k) k = x (j k) k γ kp jk (x k) 1 jk F(x k) // Updating of block j k according to a 7: Gradient step ( ) x (j k) k+1 prox γ 1 k P jk (x k ),R jk x (j k) k Proximal step // Updating of block j k according to a 8: k x j k+1 = k x j k where j = {1,..., J} \ {j} // Other blocks are kept unchanged 9: end for 14 / 32

15 Prox for CP decomposition CP decomposition: decompose a tensor into a (minimal) sum of rank-1 terms. Order 3: T = R r=1 ā (1) r ā (2) r ā (3) r = [Ā(1), Ā(2), Ā(3) ], (1) Tensor structure: naturally leads to consider 3 blocks corresponding to the loading matrices A (1), A (2) and A (3). Proposed optimization problem minimize A (n) R In R, n {1,2,3} F(A (1), A (2), A (3) )+R 1 (A (1) )+R 2 (A (2) )+R 3 (A (3) ). Some of the fastest classical approaches: Fast HALS [Phan et al., 213] and N-Way [Bro, 1997]. 15 / 32

16 Tensor matricization T (n) I n,i n R In I n + the matrix obtained by unfolding the tensor T in the n-th mode where the size I n is equal to I 1 I 2 I 3 /I n Tensor expressed under matrix form as where T (n) I n,i n = Ā(n) (Z ( n) ), n {1, 2, 3} Z ( 1) = Ā(3) Ā(2) R I 1 R +, Z ( 2) = Ā(3) Ā(1) R I 2 R +, Z ( 3) = Ā(2) Ā(1) R I 3 R +, 16 / 32

17 Function choice F(A (1), A (2), A (3) ): quadratic data fidelity term F(A (1), A (2), A (3) ) = 1 2 T [A(1), A (2), A (3) ] 2 F = 1 2 T(n) I n,i n A (n) Z ( n) 2 F R n (A (n) ): block dependent penalty terms enforcing sparsity and nonnegativity R n (A (n) ) = I n R i n=1 r=1 ρ n (a (n) i nr ) n {1, 2, 3} where loading matrices are defined element wise as A (n) = (a (n) i ) nr (i n,r) {1,...,I n} {1,...,R} and { α (n) ω π(n) if η (n) min ρ n (ω) = ω η(n) max + otherwise α (n) ], + [, π (n) N, η (n) min [, + [ and η(n) max [η (n) min, + ]. block dependent but constant within a block regularization parameters. 17 / 32

18 Preconditionning Preconditionning similar to the one used in NMF [Lee and Seung, 21]. The matrix P for the n-th block can be defined as follows n {1, 2, 3} P (n) (A (1), A (2), A (3) ) = A (n) (Z ( n) Z ( n) ) A (n), Remark: n {1, 2, 3}, A (n) must be non zero. 18 / 32

19 Gradient and proximity operator Gradient matrices of F with respect to A (n) for all n = 1,..., 3, defined as n F(A (1), A (2), A (3) ) = (T (n) I n,i n A (n) Z ( n) )Z ( n). Proximity operator given by ( y = (y (i) ) i {1,...,RIn} R RIn )) prox γ[k] 1 P (n) [k],r n (y) = where i {1,..., RI n }, we have ( υ R) ( ) prox γ[k] 1 p (n) i [k],ρ n (y (i) ). i {1,...,RI n} { { }} prox γ[k] 1 p (n) i,ρ n (υ) = min η max, (n) max η (n) min, prox (υ) γ[k]α (n) (p (n) i [k]) 1. π(n) (separable structure, diagonal preconditionning matrices, componentwise calculation) 19 / 32

20 Proximal algorithm for tensor decomposition Inputs: 1) Initial loading matrices A n [] 2) Stepsize choice γ 3) Iteration k = Choose randomly a block n {1, 2, 3} to be updated Preconditioner P (n) [k] Partial gradient n[k] Gradient step Ã (n) [k] = A (n) [k] γ n[k] P (n) [k] No k k + 1 Outputs: Estimated loading matrices Â(n) Yes Is stopping criterion reached? Other blocks unchanged at k Proximal step prox γ 1 P (n) [k],rn (Ã(n) [k]) Figure: BC-VMFB algorithm for CPD. 2 / 32

21 Computer simulation: simulated spectroscopy-like data Simulated tensor: (uni or bimodal type) emission and excitation spectra, random concentrations T R and R = 5. Simulated observed tensor: T = T + B where B stands for an additive white Gaussian noise 2 considered cases : 1. Perturbed case (noiseless): no noise added and R = 6 (overestimation). 2. Perturbed case (noisy): B fixed such that SNR = 17.6 db and R = 6 (overestimation). Error measures 1. Signal to Noise Ratio defined as SNR = 2 log 1 T F T T F 2. Relative Reconstruction Error defined as RRE = 2 log T T 1 1 ( T 1 ) 3 n=1 3. Estimation error: E 1 = 1 log Â(n) (1 : R) Ā(n) n=1 Ā(n) 1 4. Over-factoring error: E 2 = 1 log 1 R r=r+1 â (1) r â (2) r â (3) r 1 21 / 32

22 Numerical results Elapsed time (s) BC-VMFB without penalty BC-VMFB with penalty N-way fast HALS For 5 iterations Noisy case To reach stopping conditions (actual number of iterations) (485) (365) (43) (1856) (SNR,E 1, E 2 ) db (31.3, -12.5, 3.6) (32.7, -11.2, -49) (31.3, -12.5, 3.6) (31.3, -12.5, 3.6) Noiseless case To reach stopping conditions (actual number of iterations) (1) (365) (838) (38) (RRE,E 1, E 2 ) db (-75.1,-12.4,25.6) (-44.7, -15, -49) (-127.9,-8.7, 31.7) (-63.9, -6.1, 31.7) Computation time comparison of BC-VMFB in two cases: with or without penalty, with N-way [Bro, 1997] and fast HALS [Phan et al., 213] using the same initial value in the noiseless and noisy cases. 22 / 32

23 Influence of the initialization 2 Error 5 Over factoring index BC VMFB with penalty BC VMFB without penalty Nway 4 fhals BC VMFB with penalty db 8 db 2 BC VMFB without penalty Nway fhals Initializations Initializations Performance versus different initializations (noisy, overestimated case): error index E 1, overfactoring error index E 2 23 / 32

5 5 4 35 4 45 5 35 4 45 5 35 4 45 5 Figure: FEEM of reference (left) -

24 Visual results: noiseless case 5 Reference BC VMFB without penalty BC VMFB with penalty Figure: FEEM of reference (left) - FEEM reconstructed using BC-VMFB without regularization (middle) and with regularization α =.5 (right). 24 / 32

25 Visual results: noiseless case.1 Excitation spectra.4 Emission spectra 1 Concentrations Experiments Figure: R = 6 - reference spectra / BC-VMFB without penalty / BC-VMFB with penalty α = / 32

26 Visual results: noisy case 5 Reference BC VMFB without penalty BC VMFB with penalty Figure: FEEM of reference (left) - FEEM reconstructed using BC-VMFB without regularization (middle) and with regularization α =.5 (right). 25 / 32

27 Visual results: noisy case.1 Excitation spectra.4 Emission spectra 1 Concentrations Experiments Figure: R = 6 - reference spectra / BC-VMFB without penalty / BC-VMFB with penalty α = / 32

28 Computer simulation: real experimental data - water monitoring to detect pollutants Data were acquired automatically every 3 minutes, during a 1 days monitoring campaign performed on water extracted from an urban river tensor of size The excitation wavelengths range from 225nm to 4nm with a 5nm bandwidth, whereas the emission wavelengths range from 28nm to 5nm with a 2nm bandwidth. The FEEM have been pre-processed using the Zepp s method (negative values were set to ). Contamination During this experiment, a contamination with diesel oil appeared 7 days after the beginning of the monitoring. 26 / 32

29 Results: what about the rank? Estimated FEEM Estimated FEEM 4 (1) 4 (2) x 1 5 (1) (2) x (3) 4 (4) (3) 4 (4) penalized BC-VMFB algorithm Bro s N-way algorithm Case R = 4 27 / 32

30 Results: what about the rank? Estimated FEEM Estimated FEEM 4 (1) 4 (2) x 1 5 (1) 4 4 (2) x (3) 4 (4) (3) 4 (4) (5) 4 (6).6 4 (5) 4 (6) penalized BC-VMFB algorithm Bro s N-way algorithm Case R = 6 27 / 32

31 Results: concentrations 14 x 14 Normalized concentrations 14 x 14 Normalized concentrations (1) (1) (2) (2) 12 (3) (4) 12 (3) (4) Experiment Experiment penalized BC-VMFB algorithm Bro s N-way algorithm Case R = 4 28 / 32

32 Concentrations estimated by BC-VMFB 14 x Normalized concentrations (1) (2) (3) (4) Experiment Case R = 4 29 / 32

33 Concentrations estimated by BC-VMFB / 32

34 14 x 14 Normalized concentrations 14 x 14 Normalized concentrations (1) (1) (2) (2) 12 (3) (4) 12 (3) (4) (5) (5) (6) (6) Experiment Experiment penalized BC-VMFB algorithm Bro s N-way algorithm Case R = 6 3 / 32

35 Concentrations estimated by BC-VMFB 14 x Normalized concentrations (1) (2) (3) (4) (5) (6) Experiment Case R = 6 31 / 32

36 Concentrations estimated by BC-VMFB / 32

37 Conclusion clear theoretical and mathematical framework for CPD decomposition; interesting properties of the proposed approach: reliability, robustness versus noise and overestimation of the rank, good performance despite model errors and relative quickness; promising results on simulated and real data. Perspectives: extension to higher order tensor (order N; LVA-ICA Grenoble Feb. 217); possibility of considering missing data; study other preconditionning stategies. 32 / 32

38 Conclusion clear theoretical and mathematical framework for CPD decomposition; interesting properties of the proposed approach: reliability, robustness versus noise and overestimation of the rank, good performance despite model errors and relative quickness; promising results on simulated and real data. Perspectives: extension to higher order tensor (order N; LVA-ICA Grenoble Feb. 217); possibility of considering missing data; study other preconditionning stategies. Thank you! Questions?? 32 / 32

Variable Metric Forward-Backward Algorithm

Variable Metric Forward-Backward Algorithm 1/37 Variable Metric Forward-Backward Algorithm for minimizing the sum of a differentiable function and a convex function E. Chouzenoux in collaboration with