Nonnegativity Constraints in Numerical Analysis

Size: px

Start display at page:

Download "Nonnegativity Constraints in Numerical Analysis"

Roberta Phelps
5 years ago
Views:

1 Nonnegativity Constraints in Numerical Analysis Bob Plemmons Wake Forest University Collaborator -D. Chen, WFU Birth of Numerical Analysis, Leuven, Belgium, Oct., 07 Related Papers at:

2 Outline Some Personal Thoughts on my Early Days in NA Nonnegativity: Perron Frobenius Theory, M-matrices, Regular Splittings, Classical Iterative Methods, R. Varga, D. Young Historical Developments in Nonnegative Least Squares (NNLS) Computations From Lawson and Hansen (1974) Natural Extensions to (Approximate) Nonnegative Matrix and Tensor Factorization (NMF and NTF) Algorithms for NMF and NTF from 1990s - present Survey of Applications: NNLS, NMF, NTF Some of our Work Computations using Data from AF Maui Space Center & NASA 2

Research and Technology Center, 590 Lipoa Parkway,

3 Looking Ahead: The creation and observation of a reflectance spectrum SUN Satellite A B anit, Maui Research and Technology Center, 590 Lipoa Parkway, Ste. 264, Kihei, HI 96 Atmosphere Day Night C EL Z D Zenith 3

4 Some Personal Thoughts on the Birth of NA Alston Householder My Mentor at ORNL (ret. 1969) and UTK Introduced me to Numerical Linear Algebra Fortran Coding Accuracy and Stability of Numerical Algs. QR Factorization - HT SIAM J. Appl. Math. Gatlinburg Meetings 4

5 Preliminaries Nonnegativity

6 Classical Perron-Frobenius Theorem Perron (1907) Frobenius (1917) Formed the theoretical basis for (classical) iterative methods in Numerical Analysis. 6

7 Classical Iterative Methods for Linear Systems Ax = b Time frame: 1950's, 1960's, 1970's People: Garrett Birkhoff 1948 started work with student David Young Some other Principal players: Sir Richard Southwell, L. R. Richardson, Richard Varga, Gene Golub Early Locations: Harvard University, Aberdeen Proving Ground, NBS, Argonne National Lab., Cal Tech JPL, Stanford 7

8 Jacobi, Gauss-Seidel, SOR, Etc. Convergence based on regular splittings: A = M N is a regular splitting if M is nonsingular with M -1 and N nonnegative. M-matrices (Minkowski 1900): A = (a ij ), with a ii > 0 and a ij nonpositive for i j, and A -1 nonnegative. Arise, e.g., in finite difference methods for elliptic PDEs. Many generalizations in Numerical Analysis, 1950s present. Nonnegativity plays a fundamental role. Books: Varga (1963), Young (1971), Berman & Ple (1979), etc. Preconditioners for CG, Henk van der Vorst (1977), incomplete LU - based on M-matrix theory. (Will move on to LS) 8

9 Linear Least Squares 9

10 Nonnegative Least Squares (NNLS) 10

11 Will Survey the Following Topics 11

12 Lawson and Hansen s Algorithm C. L. Lawson and R. J. Hanson, Solving Least Squares Problems, Prentice Hall, J. Longley, Least Squares Problems Using Orthogonal Methods, Dekker, L. Shure, Brief History of Nonnegative Least Squares in MATLAB, Blog available at: lsqnonneg in Matlab. 12

13 lsqnonneg In their landmark text Lawson and Hanson, 1974, gave the standard algorithm for NNLS - an active set method. Mathworks modified the algorithm NNLS, which ultimately was renamed as lsqnonneg. 13

14 Active Set Methods Active-set methods are based on the observation that only a small subset of constraints are usually active (i.e. satisfied exactly) at the solution. There are n inequality constraints in NNLS problem. The i th constraint is said to be active if the i th regression coefficient is negative, otherwise the constraint is passive. lsqnonneg uses the fact that if the true active set is known, solution to the least squares problem will be the unconstrained least squares solution to the problem using only variables corresponding to the passive set, setting the regression coefficients of the active set to zero. 14

15 15

16 Lsqnonneg, cont. 16

17 Comments on L&H s lsqnonneg They show that the NNLS algorithm is finite. Given sufficient time, the algorithm will reach a point where the Kuhn-Tucker conditions are satisfied, and it will terminate. In that sense it is a direct algorithm. There is no good way of telling exactly how many iterations it will require in a practical sense. lsqnonneg can be quite time consuming if n is large. 17

18 Bro and Jong s Fast NNLS (fnnls) Rasmus Bro, S. D. Jong, A fast non-negativityconstrained least squares algorithm, Journal of Chemometrics, Fast NNLS (fnnls), often speeds up the basic algorithm, especially in the presence of multiple right-hand sides, by avoiding unnecessary recomputations. Application is nonnegative tensor factorization for 3D data analysis (more later). Designed for use in multiway decomposition methods for tensor arrays such as PARAFAC and N-mode PCA. 18

19 Bro and Jong s fnnls, cont. In an extension to their NNLS algorithm characterized as being for ``advanced users", they retain information about the previous iteration s solution and extract further improvements in ALS applications that employ NNLS. These innovations can lead to a substantial performance improvement when analyzing large multivariate, multiway array data sets. Extension: M. H. van Benthem, M. R. Keenan, Fast algorithm for the solution of large-scale nonnegativity constrained least squares problems, Journal of Chemometrics,

20 Algorithms Based on Iterative Methods Active-set methods typically deal with only one active constraint at each iteration. Main advantage of this class of algorithms is that by using information from a projected gradient step along with a good guess of the active set, one can handle multiple active constraints per iteration. Some of the iterative methods are based on the solution of a corresponding LCP. In contrast to an active set approach, iterative methods, like gradient projection, enable the incorporation of multiple active constraints at each iteration. 20

21 Recent Iterative Algorithms Franc, et al. Sequential Coordinate-wise Algorithm for NNLS, Czech Tech. U., Bellavia, et al., Interior Point Newton-like Algorithm for NNLS, NLA, Dhillon et al., UTA, Projective Quasi-Newton Method for NNLS, Dhillon et al., UTA, Fast Newton-type Algorithm for NNLS, SIAM Data Mining Conf., These methods have made solving large NNLS problems feasible. 21

22 Nonnegative Matrix Factorization (NMF A Natural Extension of Large Scale NNLS) Convex in W or H, but not both. 22

23 Necessary Conditions for W & H to Solve NMF Problem Lee & Seung, Nature,

24 NMF an SVD-Type Concept X = [X 1,X 2,,X n ] column vectors (data) o w j Approximately factor X W H = 1 k w (j) oh (j) denotes outer product is j th col of W, h j is j th col of H T 24

25 Approximate NMF in Data Analysis Utilize constraint that values in a data matrix X are nonnegative Apply non-negativity constrained low rank approximation for blind source separation, dimension reduction, and/or unsupervised unmixing Low rank approximation to data matrix X : X WH, W 0, H 0 Columns of W are initial basis vectors for database, may want smoothness and statistical independence in W. Columns of H represent mixing coefficients or weights, often desire statistical sparsity in H to force essential uniqueness in W. 25

26 - Brief Overview - Principal Component Analysis (PCA) Independent Component Analysis (ICA) 26

27 PCA Based on eigen-decomposition of covariance matrix for X = [X 1,X 2,,X m ] training set of column vectors, scaled and centered, XX T (or SVD of X itself). Low-rank approx. In the PCA context each column of W represents an eigenvector (hidden component), and H represents eigenprojections. Principal components correspond to largest eigenvalues. (Components called eigenfaces in face recognition applications.) Used for: orthogonal representation, dimension reduction, clustering into principal components, all computed by simple linear algebra. 27

28 Characteristics of PCA May not enforce nonnegativity in W and H, where X WH May not enforce statistical independence of basis vectors in W May not enforce statistical sparsity in H (separation by parts). This led to work beginning in late 1990s to present on ICA and on NMF. 28

29 ICA Based on neural computation unsupervised learning. Often identified with - blind source separation (BSS), feature extraction, finding hidden components. Most research based on equality, X = WH, not necessary. Statistical independence for components in W, a guiding principle, but seldom holds in practical situations. Data in X assumed to have non-gausssian PDF, find hidden components as independent as possible mutual information content in different components c i, c j, is (near) zero, or p(c i,c j ) p(c i )p(c j ). 29

30 Some Related References Lee and Seung. Learning the Parts of Objects by Non-Negative Matrix Factorization", Nature, Hoyer. Non-Negative Sparse Coding", Neural Networks for Signal Proc., Hyvärinen and Hoyer. Emergence of Phase and Shift Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces", Neural Computation, David Donoho and Stodden. ``When does Nonnegative Matrix Factorization give a Correct Decomposition into Parts?", preprint, Dept. Stat., Stanford, Berman and Plemmons. Non-Negative Matrices in the Mathematical Sciences, SIAM Press, Sajda, Du, and Parra, Recovery of Constituent Spectra using Non-negative Matrix Factorization, Tech. Rept., Columbia U. & Sarnoff Corp Cooper and Foote, Summarizing Video using Non-Negative Similarity Matrix Factorization, Tech. Rept. FX Palo Alto Lab, Szu and Kopriva, Deterministic Blind Source Separation for Space Variant Imaging, 4 th Inter. Conf. Independent Component. Anal., Nara Japan, Umeyama, Blind Deconvolution of Images using Gabor Filters and Independent Component Analysis, 4 th Inter. Conf. Independent Component. Anal., Nara Japan, Our papers on web page: 30

31 Nonnegative Matrix Factorization (NMF) Will extend concept to 3D arrays - tensors. 31

32 ALS- NNLS for Approximate Solution to NMF Initialize W using recursive application of Perron-Frobenius Theory to SVD. 32

33 ALS- NNLS for Approximate Solution to NMF 33

34 Tensors Multi-way Arrays in Multilinear Algebra 3D Viewpoint Tensor as a Box

35 Extend NMF: Nonnegative Tensor Factorization (NTF) Our interest: 3-D data. 2-D images stacked into 3-D Array 35

36 Multilinear NMF = Nonnegative Tensor Factorization (NTF) 36

37 NTF (for 3-D Arrays) 37

38 Datasets of images modeled as tensors Goal: Extract features from a tensor dataset (naively, a dataset subscripted by multiple indices). M x n x p tensor A Tensor (outer product) rank can be defined as the minimum number of rank 1 factors. 38

39 Properties of Matrix Rank Rank of A in R mx n easy to determine (Gauss elimination). Optimal rank-r approximation to A always exists, and easy to find ( SVD). Pick A in R mx n at random, then A has full rank with probability 1, i.e., rank(a) = min{m,n}. Rank A is is the same whether we consider A as an element in R mx n or in C mx n. Every statement above is false for order-k tensors, k > 2. 39

40 NMF/NTF Applications to Data Analysis 40

41 NMF Allows only additive, not subtractive combinations of the original data, in comparison to orthogonal decomposition methods, e.g. PCA. NMF used by Lee and Seung (MIT) in Nature, 1999, in biometrics, preceded and followed by numerous papers related to applications. Earlier work by Paatero, et al. Matlab Toolbox: NMFLAB, Historical Perspectives: Problem 73-14, Rank Factorization of Nonnegative Matrices, by Berman and Ple., SIAM Review 15 (1973), p. 655: (Also in Berman/Ple. book). 41

42 Historical Persectives, Cont. Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl., De Lathauwer L., De Moor B., Vandewalle J., "Independent component analysis and (simultaneous) third-order tensor diagonalization", IEEE Transactions on Signal Processing, De Lathauwer L., Vandewalle J., "Dimensionality reduction in higher-order signal processing and rank- (R_1, R_2,...,R_N) reduction in multilinear algebra", Linear Algebra and its Applications, Special Issue on Linear Algebra in Signal and Image Processing,

43 Some General Applications of NMF and NTF Source separation in acoustics, speech, video, image processing and computer vision. EEG in Medicine, electric potentials. Spectroscopy in chemistry. Molecular pattern discovery genomics. surveillance. Document clustering in text data mining. Atmospheric pollution source identification. Hyperspectral sensor data compression. Spectroscopy for space applications spectral data mining Identifying object surface materials and substances 43

44 Outline of SOI Application Space Object Identification from spectral data (SOI) Nonnegative Matrix Factorization (NMF) Methods for Spectral Unmixing Nonnegative Tensor Factorization (NTF) for SOI 3-D Tensor Factorization: Applications to SOI, using Hyperspectral Data Compression Material identification Spectral Abundances Another Application (Medical) 44

45 Blind Source Separation for Finding Hidden Components (Endmembers) Mixing of Sources basic physics often leads to linear mixing X = [X 1,X 2,,X n ] column vectors (1-D spectral scans) o w j X W H Approximately factor X W H = 1 k w (j) oh (j) denotes outer product is jth col of W, h j is jth col of H T sensor readings (mixed components observed data) separated components (feature basis matrix, unknown, low rank) mixing coefficients 45

46 Simple Analog Illustration Hidden Components in Light Separated by a Prism Our purpose finding hidden components by data analysis 46

Space Object Identification and Characterization from Spectral Reflectance Data More than 20,000 known objects in orbit: various types of military and

47 Space Object Identification and Characterization from Spectral Reflectance Data More than 20,000 known objects in orbit: various types of military and commercial satellites, rocket bodies, residual parts, and debris need for space object database mining, object identification, clustering, classification, etc. 47

48 Maui Space Surveillance Site 48

49 The creation and observation of a reflectance spectrum SUN Satellite A B anit, Maui Research and Technology Center, 590 Lipoa Parkway, Ste. 264, Kihei, HI 96 Atmosphere Day Night C EL Z D Zenith 49

50 Typical Scan 50

51 Some Laboratory Electromagnetic Spectral Signatures Reflectance Reflectance Wavelength (µm) Mylar Wavelength (µm) White Paint Reflectance Reflectance Wavelength (µm) Wavelength (µm) Aluminum Solar Cell 51

Hyperpectral Imaging of Space Objects Current operational capability for spectral imaging of space objects Panchromatic images Non-imaging spectra 1.

52 Hyperpectral Imaging of Space Objects Current operational capability for spectral imaging of space objects Panchromatic images Non-imaging spectra UNCLASSIFIED al sandal Reflectance al7075 al6061 al2219 al1100 UNCLASSIFIED Jorgensen Wavelength, microns 52

53 Recall NTF (for 3-D Arrays) Optimal solution exists (Lim 2006, Stanford MMDS workshop) Issues: initialization, efficient optimization algorithms, avoid local minima. 53

54 Hyperspectral Imaging Hyperspectral remote sensing technology allows one to capture images objects (multiple pixels), using spectra from ultraviolet and visible to infrared. Multiple images of a scene or object are created using light from different parts of the spectrum. Hyperspectral images can be used to: Identify surface minerals, objects, buildings, etc. from space. Enable space object identification from the ground. 54

55 Hyperspectral Data Cube for Hubble Telescope 55

56 56

57 NTF Algorithm using Alternating NNLS 57

58 Simulated Data 177 x 193 x D model of Hubble satellite. Assign each pixel a certain spectral signature from lab data supplied by NASA. 8 materials used. Bands of spectra ranging from.4 μm to 2.5 μm, with 100 evenly distributed spectral values. Spatial blurring followed by Gaussian and Poisson noise and applied over the spectral bands. 58

59 Materials Assigned to Pixels 59

60 Reconstruction of Blurred & Noisy Data at wavelengths:.42,.68, 1.55, 2.29 μm Top row Original Images Bottom row Reconstructions Compression factor

61 Some Spectral Scan Matching Graphs Red is original, Blue is matched endmember scan matched from Z-factors. Black rubber edge (8%) not as well matched. Blurred and noisy data cube. 61

62 Telescope images taken at the Air Force Maui Space Center of the shuttle Columbia on it's final orbit before disintegration on re-entry, Feb Below are 3 of 64 video images. 62

63 Separation by Parts using NTF 63

64 Biomedical Application using Spectral Imaging & Nonnegative Tensor Factorization Burn and tissue wound assessment (joint with the WFU Hospital Burn Center) 1 st degree burn 3 rd degree burn 2 nd degree burn 64

65 Summary Some Personal Thoughts on the Birth of NA Nonnegativity and Classical Iterative Methods. Historical Developments in Nonnegative Least Squares (NNLS) Computations From Lawson and Hansen (1974) to recent work Natural Extensions to (Approximate) Nonnegative Matrix and Tensor Factorization (NMF and NTF) Algorithms for NMF and NTF from 1990s-present Survey of Applications: NNLS, NMF, NTF Some of our Work Computations using Data from AF Maui Space Center & NASA, and a Biomedical Application 65

Interaction on spectral data with: Kira Abercromby (NASA-Houston) Related Papers at:

Interaction on spectral data with: Kira Abercromby (NASA-Houston) Related Papers at: Low-Rank Nonnegative Factorizations for Spectral Imaging Applications Bob Plemmons Wake Forest University Collaborators: Christos Boutsidis (U. Patras), Misha Kilmer (Tufts), Peter Zhang, Paul Pauca, (WFU)