Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012

Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A. Halevy, V. Rajapakse Naval Research Laboratory: D. Gillis

Outline Mathematical Techniques 1 Mathematical Techniques 2 3

Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component Analysis (PCA), Locally Linear Embedding (LLE), Isomap, genetic algorithms, and neural networks. We are interested in a subfamily of these techniques known as Kernel Eigenmap Methods. These include Kernel PCA, LLE, Hessian LLE (HLLE), and Laplacian Eigenmaps. Kernel eigenmap methods require two steps. Given data space X of N vectors in R D. 1 Construction of an N N symmetric, positive semi-definite kernel, K, from these N data points in R D. 2 Diagonalization of K, and then choosing d D significant eigenmaps of K. These become our new coordinates, and accomplish dimensionality reduction. We are particularly interested in diffusion kernels K, which are defined by means of transition matrices.

Kernel Eigenmap Methods for Dimension Reduction - Kernel Construction Kernel eigenmap methods were introduced to address complexities not resolvable by linear methods. The idea behind kernel methods is to express correlations or similarities between vectors in the data space X in terms of a symmetric, positive semi-definite kernel function K : X X R. Generally, there exists a Hilbert space K and a mapping Φ : X K such that K (x, y) = Φ(x), Φ(y). Then, diagonalize by the spectral theorem and choose significant eigenmaps to obtain dimensionality reduction. Kernels can be constructed by many kernel eigenmap methods. These include Kernel PCA, LLE, HLLE, and Laplacian Eigenmaps.

Kernel Eigenmap Methods for Dimension Reduction - Kernel Diagonalization The second step in kernel eigenmap methods is the diagonalization of the kernel. Let e j, j = 1,..., N, be the set of eigenvectors of the kernel matrix K, with eigenvalues λ j. Order the eigenvalues monotonically. Choose the top d << D significant eigenvectors to map the original data points x i R D to (e 1 (i),..., e d (i)) R d, i = 1,..., N.

Laplacian Eigenmaps - Theory M. Belkin and P. Niyogi, 2003 Points close on the manifold should remain close in R d f (x) f (y) f (x) x y + o( x y ) arg min M f (x) 2 = arg min M M(f )f f L 2 (M) =1 f L 2 (M) =1 Find eigenfunctions of the Laplace-Beltrami operator M Use a discrete approximation of the Laplace-Beltrami operator Proven convergence (Belkin and Niyogi, 2003 2008)

Laplacian Eigenmaps - Implementation 1 Put an edge between nodes i and j if x i and x j are close. Precisely, given a parameter k N, put an edge between nodes i and j if x i is among the k nearest neighbors of x j or vice versa. 2 Given a parameter t > 0, if nodes i and j are connected, set W i,j = e x i x j 2 t. 3 Set D i,i = j W i,j, and let L = D W. Solve Lf = λdf, under the constraint y Dy = Id. Let f 0, f 1,..., f d be d + 1 eigenvector solutions corresponding to the first eigenvalues 0 = λ 0 λ 1 λ d. Discard f 0 and use the next d eigenvectors to embed in d-dimensional Euclidean space using the map x i (f 1 (i), f 2 (i),..., f d (i)).

Swiss Roll Mathematical Techniques Figure : a) Original, b) PCA, c f) LE, J. Shen et al., Neurocomputing, Volume 87, 2012

From Laplacian to Schroedinger Eigenmaps Consider the following minimization problem, y R d, 1 min y 2 y i y j 2 W i,j = min tr(y Ly). Dy=Id y Dy=E i,j Its solution is given by the d minimal non-zero eigenvalue solutions of Lf = λdf under the constraint y Dy = Id. Similarly, for diagonal V, consider the problem 1 min y 2 Dy=Id y i y j 2 W i,j + i,j i y i 2 V i,i = min tr(y (L+V )y), (1) y Dy=E which leads to solving equation (L + V )f = λdf.

Properties of Schroedinger Eigenmaps Let the data graph be connected and let V be a symmetric positive semi-definite matrix. If 1 Null(V ), then λ = 0 is a single eigenvalue of L + V with eigenvector 1. Theorem (with M. Ehler) Let the data graph be connected, let V be a symmetric positive semi-definite, and let n dim(null(v ). Then the minimizer of (1) satisfies: y (α) 2 V = trace2 (y (α)t Vy (α) ) C 1 α Let y (α) be a minimizer of the Schroedinger eigen-problem with αv, α > 0. Let V = diag(v 1,..., v N ). v i y (α) i 2 N v i y (α) i 2 1 C 1, for all i = 1,..., N. α i=1

Pointwise Convergence of SE Given n data points x 1, x 2,..., x n sampled independently from a uniform distribution on a smooth, compact, d-dimensional manifold M R D, define the operator ˆL t,n : C(M) C(M) by ˆL t,n (f )(x) = 1 1 (4πt) d/2 t n j f (x)e x x j 2 4t 1 n f (x j )e x x j 2 4t. Let v C(M) be a potential. For x M, let y n (x) = arg min x 1,x 2,...,x n x x i and define V n : C(M) C(M) by V n f (x) = v(y n (x))f (x). Theorem (Pointwise Convergence, with A. Halevy) Let α > 0, and set t n = ( 1 n ) 1 d+2+α. For f C (M), lim ˆL n tn,nf (x) + V n f (x) = C M f (x) + v(x)f (x) in probability. j

Spectral Convergence of SE - Theorem Let L t,n be the unnormalized discrete Laplacian. Theorem (Spectral Convergence of SE, with A. Halevy) Let λ i t,n and ei t,n be the ith eigenvalue and corresponding eigenfunction of L t,n + V n. Let λ i and e i be the ith eigenvalue and corresponding eigenfunction of M + V. Then there exists a sequence t n 0 such that, in probability, lim n λi t n,n = λ i and lim et i n n,n e i = 0.

SE as Semisupervised Method 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 8 x 10 3 8 x 10 3 8 x 10 3 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 The Schroedinger Eigenmaps with diagonal potential V = diag(0,..., 0, 1, 0,..., 0) only acting in one point y i0 in the middle of the arc for α = 0.05, 0.1, 0.5, 5. This point is pushed to zero.

SE as Semisupervised Method 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 8 x 10 3 8 x 10 3 8 x 10 3 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 By applying the potential to the end points of the arc for α = 0.01, 0.05, 0.1, 1, we are able to control the dimension reduction such that we obtain an almost perfect circle.

Outline Mathematical Techniques 1 Mathematical Techniques 2 3

Computational Bottleneck If N is the ambient dimension, and n is the number of points, time complexity of constructing an adjacency graph is O(Nn 2 ) What can we do about N? What can we do about the exponent 2? What can we do about n? What can we do about the computational complexity of eigendecomposition?

Numerical acceleration Data Compression via Incoherent Random Projections Fast Approximate k Nearest Neighbors algorithms, e.g., via Divide and Conquer strategies Quantization Landmarking Randomized low-rank SVD decompositions

Setting for data compression Dataset {x 1, x 2,..., x n } in R N, sampled from a compact K -dimensional Riemannian manifold Assume x i x j A for all i, j and some A > 0 Let 0 < λ 1 λ 2 λ K be the first K nonzero eigenvalues computed by LE, assumed simple, with r = min i,j λ i λ j, and let f j be a normalized eigenvector corresponding to λ j Use a random orthogonal projector Φ to map the points to R M. Let ˆf j be the jth eigenvector computed by LE for the projected data set

Laplacian Eigenmaps with random projections Theorem (with A. Halevy) Fix 0 < α < 1 and 0 < ρ < 1. If M 4 2 ln(1/ρ) rα ɛ 2 /200 + ɛ 3 K ln(ckn/ɛ), where ɛ = /3000 4An(n 1), then, with probability at least 1 ρ, f j ˆf j < α. The constant C depends on properties of the manifold. Precisely, C = 1900RV, where R, V and 1/τ are the geodesic covering regularity, τ 1/3 volume, and condition number, respectively.

Outline Mathematical Techniques 1 Mathematical Techniques 2 3

Hyperspectral Satellite Imagery Figure : Hyperspectral Satellite Image, courtesy of NASA

Data: Pavia University 100 100 200 200 300 300 400 400 500 500 600 50 100 150 600 50 100 150 Figure : (a) Pavia University with bands [68,30,2], (b) Ground truth map, (c) Class map for purely spectral LE, (d) Class map for purely spatial LE

Data Indian Pines 20 20 40 40 60 60 80 80 100 100 120 120 140 20 40 60 80 100 120 140 140 20 40 60 80 100 120 140 Figure : (a) Indian Pines with bands [29,15,12], (b) Ground truth map, (c) Class map for purely spectral LE, (d) Class map for purely spatial LE

Pure Spectral and Pure Spatial LE Table : Classification results for spectral LE σ 0.2 0.4 0.6 0.8 1 Indian Pines 57.10 57.74 57.70 57.67 57.63 Pavia University 79.74 80.01 80.02 80.02 80.02 Table : Classification results for spatial LE η 0.2 0.4 0.6 0.8 1 Indian Pines 85.88 85.99 86.03 86.04 86.03 Pavia University 77.56 77.68 77.74 77.77 77.77

LE and Data Integration Classification of stacked spatial/spectral embeddings: Indian Pines: 46 top spatial, 4 top spectral Laplacian eigenvectors, OA 90.82% Pavia University: 20 top spatial, 5 top spectral Laplacian eigenvectors, OA 96.46%

Mixing Spatial/Spectral Graphs and Weights Let G be the knn-graph computed using the spectral metric. Let L f and L s denote, resp., the spectral and spatial Laplace operators on G. Fuse the Laplace operator L in the following ways: L 1 (σ, η) = L f (σ) L s (η), where denotes entry-wise multiplication, L 2 (σ, η) = L f (σ) + L s (η), L 3 (σ, η) = G (L ff (σ) L fs (η)), i.e., we take the usual matrix product of the two Laplacians and then overlay the sparsity structure determined by the adjacency matrix.

Figure : Fusion operator class maps for Indian Pines: L 1 (σ = 0.8, η = 0.2, 62.73%), L 2 (σ = 0.4, η = 0.2, 59.01%), L 3 (σ = 0.6, η = 0.2, 58.12%)

Figure : Fusion operator class maps for Pavia University: L 1 (σ = 0.4, η = 0.2, 79.19%), L 2 (σ = 0.8, η = 0.2, 78.73%), L 3 (σ = 0.2, η = 0.2, 79.22%)

Fusion metric Mathematical Techniques Adjust the norm used to determine the knn-graph and the weight matrix. For γ > 0, we define x i x j γ = x i x j 2 f + γ x i x j 2 s. Let L γ be the Laplacian operator defined using the knn-graph and weight matrix determined by γ.

Fused Spatial-Spectral Classification Table : Classification results for the fusion operators acting on the fusion graph, Indian Pines, η = 0.2 γ = 450 L 1 L 2 L 3 σ 0.2 94.02 93.77 93.80 0.4 93.83 93.75 93.80 0.6 93.81 93.67 93.77 0.8 93.82 93.62 93.78 Table : Classification results for the fusion operators acting on the fusion graph, Pavia University, η = 0.25 γ = 34 L 1 L 2 L 3 σ 0.2 95.92 96.17 96.49 0.4 96.17 96.19 96.48 0.6 96.17 96.20 96.48 0.8 96.17 96.20 96.48

Fused Spatial-Spectral Classification Fusion operators on fusion metric graphs: Indian Pines, L 1, L 2, L 3.

Fused Spatial-Spectral Classification Fusion operators on fusion metric graphs: Pavia University, L 1, L 2, L 3.

Cluster Potentials Let ι = {i 1,..., i m } be a collection of nodes. A cluster potential over ι is defined by: 1 1 1 2 1 V [ι, ι] :=.......... 1 2 1 1 1 For such a V, the minimization problem is equivalent to: 1 min y T Dy=I 2 i,j m 1 y i y j 2 W i,j + α y ik y ik+1 2. k=1

Schroedinger Eigenmaps with Cluster Potentials Use k-means clustering to cluster the hyperspectral image. The clustering was initialized randomly so that the construction of the potential would be completely unsupervised. The Euclidean metric was used. Define a cluster potential, V k over each of the k = 1,..., K clusters. The order of the pixels in each cluster is determined by their index (smallest to largest). Aggregate into one potential by summing the individual cluster potentials: K V K = V k. k=1

Fused Spatial-Spectral Classification Table : Results for the Cluster Potential on Pavia University K α 100 500 1000 5000 15 91.23 93.09 92.92 92.08 25 85.55 85.70 85.38 89.69 50 85.71 85.35 85.22 84.86 100 85.47 85.31 85.13 84.96 Table : Results for the Cluster Potential on Indian Pines α K 100 500 1000 16 67.45 67.99 67.92 25 66.39 67.29 67.78 50 63.11 64.24 64.23 100 60.67 60.43 60.41

Fused Spatial-Spectral Classification Pavia University: Dimensions 4 and 5 of the SE embedding using cluster potential V 16 for classes 2,3, and 7

Fused Spatial-Spectral Classification Indian Pines: Dimensions 17 and 22 of the SE embedding using cluster potential V 16 for classes 2,3, and 10