MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Size: px

Start display at page:

Download "MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A"

Loraine Fitzgerald
5 years ago
Views:

A. 015-016 Pietro Guccione, PhD DEI - DIPARIMENO DI INGEGNERIA ELERICA E DELL

1 MultiDimensional Signal Processing Master Degree in Ingegneria delle elecomunicazioni A.A Pietro Guccione, PhD DEI - DIPARIMENO DI INGEGNERIA ELERICA E DELL INFORMAZIONE POLIECNICO DI BARI Pietro Guccione Assistant Professor in Signal Processing (pietro.guccione@poliba.it, )

2 Lecture 10 - Summary Further focus on Dimensionality Reduction Non Negative Matrix Factorization Optimum Constrained Components Rotation Applications Summary

3 From PCA to NNMF Non Negative Matrix Factorization is a way to decompose a matrix A in a product of two non negative matrices, W and H. NNMF overcomes some limitations of PCA: Input/output data are all positives; It does not need of normalization Main drawback: Not-unique solution Inner dimension: can be chosen/selected, basing on problem considerations A = W W H H he problem is solved by using a least square solution with applied the positivity constraint. wo methods: Alternative Least Square; Multiplicative Update Algorithm. 3

4 NNMF Algorithms Multiplicative Update Algorithm: W rand( m, k) H rand( k, n) for i=1:maxiter H H W A W WH W W AH WHH endfor 1 1 4

5 NNMF Algorithms Alternate Least Square Algorithm: W rand( m, k) for i=1:maxiter 1 H W W W A 1 W HH HA WW ( 0) 0 endfor H H 0 0 5

6 NNMF Initialization! Critical problems: he choice of the initial matrices (final solution is sensible); he choice of the number of iterates (which the precision of the solutions depends on). he problem can be circumvented by approaching a first (possibly negative) solutions. Some examples are: - Fill H and/or W with random values with fixed generator seed (to get stable solution). - Use some decomposition method (PCA or ICA) to get initial solution (it is not constrained to be positive): A SL W S(:,1: k) H L(:,1: k) Use the PCA and retain the first k components 6

7 NNMF Initialization! Critical problems: he choice of the initial matrices (final solution is sensible); he choice of the number of iterates (which the precision of the solutions depends on). Alternative solutions can be found, according to detailed problem constraints A SL W S(:,1: k) 1 H W W W A 1 W HH HA W f ( W ) H H 0 0 Firstly use the PCA Solve for H as in ALS Solve the LS problem for W, then apply any other constraint, if needed 7

8 Example: again XPD /1 he set of X-ray Powder Diffraction patterns (an example already presented in previous lectures), decomposed by using NNMF. NNMF has been modified according to some constraints: the spectra are all positive (but not the time profiles), one of the components is the square of another. When applied to the set of XPD patterns, PCA can be interpreted as follows: A powder diffraction profile (sample) can be seen as a data point of an N- dimensional space, where N is the number of the θ (diffraction angles) values (the variables) while the coordinates of the point in a reference system of this space are the values of intensity associated to each θ value. 8

9 Example: again XPD / he set of X-ray Powder Diffraction patterns (an example already presented in previous lectures), decomposed by using NNMF. NNMF has been modified according to some constraints: the spectra are all positive (but not the time profiles), one of the components is the square of another Component #1 x Component ~ tests Component # x 108 [deg] Component time profile SIM CASE Component tests Component # x [deg] tests [deg] 9

10 Example: again XPD /3 Comparison of the first component vs. the second component, to verify the square relation. nd component st vs. nd components 4.5 x 109 C4_ Intensity st component theta [deg] Comparison of the second component with a reference spectrum (pure Cu): ρ=

11 Example: again XPD /4 he set of X-ray Powder Diffraction patterns (an example already presented in previous lectures), decomposed by using NNMF. NNMF has been modified accordingly to some constraints: the spectra are all positive (but not the time profiles), one of the components is the square of another Component #1 3.5 x Component time profile REAL CASE ~ Component tests Component Component # tests Component # x [deg] x [deg] tests

12 Example: again XPD /5 he square relation between the 1 st and nd components still mantains, even for the real data st vs. nd components 4.5 x 109 C_ nd component Intensity st component theta [deg] Comparison of the second component with a reference spectrum (pure Cu): ρ= (real data has some negative component, in this case) 1

13 Example: again XPD /6 Different kind of normalization on data has been applied using PCA Stimulus shape Method applied C4 (simulated data) C (real data) ρ resid ρ resid Sinusoidal PCA no score PCA Z-score spectra PCA Z-score stimuli Mod-NNMF Modified NNMF 1. Perform Z-score on data along stimuli. Make PCA and save the first components (usually 3 comp. correspond to 99% of saved variance) 3. Make NNMF. Initialize W with the components of a previous PCA step. 4. Compute the polynomial fit of such components [coeff = polyfit(w(:,1),w(:,),)] 5. Perform the LS for H [H = pinv(w'*w,tol)*w'*a] 6. Put H(H<0) = 0 7. Perform the LS for W [W = (pinv(h*h',tol)*h*a ) ] 8. Impose the quadratic condition W(:,) = coeff(1)*w(:,1).^ 9. Compute the cost function [0.5*norm(A-W*H, 'fro') / norm(a, 'fro')] 10.Repeat the loop 5-9 conditioned to the reaching of the minimum of the cost function or a maximum number of iterates 13

NNMF/constrained PCA or MCR? Multivariate Curve Resolution (MCR) is a multivariate data-driven analysis algorithm firstly proposed in the chemometrics research field.

14 NNMF/constrained PCA or MCR? Multivariate Curve Resolution (MCR) is a multivariate data-driven analysis algorithm firstly proposed in the chemometrics research field. MCR decomposes the dataset to recover the pure response profiles (spectra, compounds ph profiles, time profiles) of the chemical constituents or species of an unresolved mixture obtained in chemical processes (Lawton & Sylvestre, 1971; Sylvestre et al., 1974), starting from PCA. MCR tries to refine the solution by determining a decomposition in two matrices (the concentration profiles C and the spectra profiles S of individual components, corresponding to the scores and loadings in PCA, respectively), that are both nonnegative. he two matrices are found by solving a constrained minimum mean square problem and starting from the reduced data matrix achieved after the application of an initial PCA to original data: X CS E min Cˆ C, S s. t.: opt min Sˆ X X PCA PCA CS ˆˆ CS ˆˆ 14

15 From PCA to a constrained PCA In PCA the data matrix is decomposed into a number of principal components (PCs) that maximize the explained variance in the data on each successive component, under the constraint of being orthogonal to the previous PCs: X UW' N X ( n, m) = U( n, l) W( m, l) U( n, l) W( m, l) l1 l1 where the transformation is defined by a set of N-dimensional vectors of N loadings W(:,n) (this notation addresses the n-th column vector of W) that map each row vector of X to a new vector of principal component (or scores) U(:,n) (U has size MxN, as the matrix X). he loadings are calculated as the eigenvectors of the covariance matrix of the data, X X; the magnitude of the corresponding eigenvalues represents the variance of the data along the eigenvector directions k 15

16 Optimum Constrained PCA In some problems, there may occur the need to impose some external constraints to the components, since it is supposed that the «source» from which data derive, may be related each other following a set of equations (the constraints): X UW' f1 ( UW, ) 0 fk ( UW, ) 0 Here f are a set of equations that impose the constraints to the loadings, to the score, or to both of them at the same time. Such constraints transform a problem of component extraction (which is a linear problem) in a constrained problem, which can have a computationally intensive difficulty, since nonlinear, that may be solved, when possible, only using optimization methods. 16

17 Optimum Constrained PCA he problem of principal component decomposition, which is a linear problem of computational complexity (NM +N 3 ) (the Singular Value Decomposition is needed to decompose the sample covariance matrix of X), becomes a nonlinear problem, according to the general formulation: UW, U, W = arg min X UW', f ( U, W) 0 that is a nonlinear constrained optimization problem ( According to the difficulty of expanding and accounting of the function(s) f, the problem can be solved using optimization methods as trust-region-reflective, activeset or interior point. 17

18 Optimum Constrained Component Rotation Let us apply the previous formulation to the specific problem of X-ray Powder Diffraction spectra. he XPD spectrum can be properly modelled as follows: A(, t) b(, t) R( ) g( t) S( ) g( t) ( ) where A(ϑ,t) are the data, b(ϑ,t) a possible bias, R(ϑ) represents the diffraction profile as determined by the averaged crystallographic parameters of the active atoms; S(ϑ) the diffraction profile as determined by the interaction between the active and spectator (or silent) sub-lattices; the third term has contribution from the part of the structure factors which does not vary with time. he quantity A(θ, t b(θ, t can be arranged as a matrix X(m,n) of size MxN, where the columns are the variables (the diffraction angles θ), the rows over the diffraction profiles taken at different times 18

19 Optimum Constrained Component Rotation he matrix can be seen as a data point set of an N-dimensional space, where N is the number of θ values (variables), while the coordinates of the point in a reference system of this space are the values of intensity associated with each θ value. PCA can reduce the dimensionality of this representation, by using a reference system with only k orthogonal axes that represent the directions of maximum variability of the data. he coordinates of the data point in this new reference system are the scores, while the loadings are the coefficients which define the N directions with respect to the original reference system. 19

20 Optimum Constrained Component Rotation Decomposition of is: With constraints: A(, t) b(, t) R( ) g( t) S( ) g( t) ( ) R( ) (, ) (, ) ( ) ( ) 1 A t b t g t g t S( ) ( ) R( ),S( ), ( ) 0 [spectra (components) all positive] and g( t) g( t) [second score (dependence with time) is expected to be the square of the first one] 0

21 Optimum Constrained Component Rotation Main rationale: the scores are no longer constrained to be orthogonal each other (they may be partially correlated), so to allow the constraints to be applied. Since this constraint is not required by the previous problem, we allow the score axes to change their direction, by exploring the k-dimensional space (already reduced to the principal components) driven by a properly defined cost function. he idea is that we are able to detect the optimal rotated axes of a lowdimensional space (where data still have a meaningful representation) by minimizing an objective function, provided that the conditions (after X UW' ): U(:,) U(:,1) W(:,) 0 are satisfied. he axes are no longer orthogonal. 1

22 Optimum Constrained Component Rotation A hypothetical powder diffraction profile (P) constituted by N = 3 intensity values (I 1, I, I 3 ) for respective θ values (θ 1, θ, θ 3 ). When projected in the space of the principal component directions PC1 and PC, it can be described by only two values: Score 1 and Score.

23 Optimum Constrained Component Rotation Problem formalization for the case k= X UW' X U W ' ( k) ( k) 1 ( k) ( k) X U W ' New scores X M N cos sin sin cos where φ and ψ are two independent parameters defining the change in direction of the axes U ( k) uˆ( m,1) u( m,1)cos u( m,)sin uˆ( m,) u( m,1)sin u( m,1)cos m 1,..., M 3

24 Optimum Constrained Component Rotation New components: w n w n w n cos 1 W k ( ) 1 ˆ (,1) (,1)cos (,)sin 1 wˆ ( n,) w( n,1)sin w( n,)cos cos n 1,..., N Matrix the energies associated to the first and second scores (i.e. the variance of data explained by them) do not change in such transformation (the columns of have norm 1); the change in the direction of the two scores are independent (φ ψ ). 4

25 Optimum Constrained Component Rotation he figures of merit are the transformation of the constraints into equations. he objective is to identify the scores that give the maximum values of the FOM. 1. Pearson correlation coefficient between the second (rotated) score and the square of the first (rotated) score FOM scores M M m1 uˆ ( m,1) uˆ ( m,1) uˆ ( m,) uˆ ( m,) M uˆ ( m,1) uˆ ( m,1) uˆ ( m,) uˆ ( m,) m1 m1 his figure of merit requires that the mean square of the residual ε is minimum in, U(:,) U(:,1) regardless of the proportional term γ. he absolute value at numerator accounts for the sign ambiguity of PCA scores 5

26 Optimum Constrained Component Rotation he figures of merit are the transformation of the constraints into equations. he objective is to identify the loadings that give the maximum values of the FOM.. he normalized difference between the positive and the negative part of the area underlying the second loading FOM loadings N N wˆ( n,) wˆ( n,) w w n1 n1 N N wˆ( n,) wˆ( n,) w w n1 n1 where w^(n,) is the intensity of the rotated second loading at the angle ϑ n, and σ w is the standard deviation of w^(n,). his cost function measures the positive-negative asymmetry of the second (rotated) loading; its definition is dictated by the fact that the overall sign of the PCA loadings is arbitrary 6

27 Optimum Constrained Component Rotation Both the figures of merit have 1 as the highest and best value. he idea is to find the optimal combination of (φ,ψ) verifying FOM scores M M m1 uˆ ( m,1) uˆ ( m,1) uˆ ( m,) uˆ ( m,) M uˆ ( m,1) uˆ ( m,1) uˆ ( m,) uˆ ( m,) m1 m1 subjected to FOM loadings N N wˆ( n,) wˆ( n,) w w n1 n1 N N wˆ( n,) wˆ( n,) w w n1 n1 Possible optimization research path of the algorithm 7

28 Optimum Constrained Component Rotation Problem formalization for the generic case k> X UW' X U W ' ( k) ( k) k ' ' 1 X U ( ) W ( k) ' X M N Moore-Penrose generalized inverse matrix k k 1 1 has size [kx]. he degree of freedom in is now (k-1), since, to preserve the energy of the scores we have 8

29 Optimum Constrained Component Rotation, arg max i i FOM U, W, i, i New scores opt k uˆ( m,1) u( m, i) i ˆ i 1 ( k) U= U m1,..., M k uˆ( m,) u( m, i) i New components computed according to the relation: i1 ˆ 1 ( ) ' ' W k W ( k) ' s.t. k i1 k i1 i i

30 Summary NNMF is a matrix data-driven decomposition that applies the non-negativity as constraint. Useful in problem where negative solutions are meaningless Multivariate Curve Resolution is a decomposition method similar to NNMF, but that starts from a PCA as initial solution OCCR can be seen as a possible generalization of PCA to problem where sources (loadings) or components (scores) are subjected to some conditions. he problem may not always be resolvable OCCR differs from MCR since: In OCCR the solution is given without solving a least squares problem (a general optimization method is used instead); OCCR does not impose the constraint that both the matrices are positive, as MCR does. NNMF (and in a limited extent, also MCR, which basically is a modified version of NNMF) has some limitations: A NNMF solution is very sensitive to initial conditions; In some case, we do not need both the matrices of the decomposition to be positive. 30

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 2017-2018 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI