Data dependent operators for the spatial-spectral fusion problem

Similar documents
Data-dependent representations: Laplacian Eigenmaps

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Dimensionality Reduction

Unsupervised dimensionality reduction

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Manifold Learning: Theory and Applications to HRI

Nonlinear Dimensionality Reduction. Jose A. Costa

Non-linear Dimensionality Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Robust Laplacian Eigenmaps Using Global Information

LECTURE NOTE #11 PROF. ALAN YUILLE

Nonlinear Dimensionality Reduction

Intrinsic Structure Study on Whale Vocalizations

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Locality Preserving Projections

Lecture 10: Dimension Reduction Techniques

Dimensionality Reduc1on

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Manifold Regularization

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Graphs, Geometry and Semi-supervised Learning

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Lecture: Some Practical Considerations (3 of 4)

Analysis of Spectral Kernel Design based Semi-supervised Learning

Graph Metrics and Dimension Reduction

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

EECS 275 Matrix Computation

Learning gradients: prescriptive models

DIMENSION REDUCTION. min. j=1

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Functional Analysis Review

8.1 Concentration inequality for Gaussian random matrix (cont d)

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Apprentissage non supervisée

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Spectral Techniques for Clustering

The Nyström Extension and Spectral Methods in Learning

Statistical and Computational Analysis of Locality Preserving Projection

Dimension Reduction and Low-dimensional Embedding

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

15 Singular Value Decomposition

Introduction to Machine Learning

Consistent Manifold Representation for Topological Data Analysis

Spectral Dimensionality Reduction

Multivariate Statistical Analysis

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

A graph based approach to semi-supervised learning

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION. Shiliang Sun

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

arxiv: v3 [cs.cv] 12 Sep 2016

14 Singular Value Decomposition

Dimensionality Reduction AShortTutorial

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Statistical Pattern Recognition

Unsupervised Learning: Dimensionality Reduction

Manifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Diffusion Wavelets and Applications

Learning a kernel matrix for nonlinear dimensionality reduction

Dimensionality Reduction: A Comparative Review

Graph-Laplacian PCA: Closed-form Solution and Robustness

Large-Scale Manifold Learning

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Spectral Clustering. by HU Pili. June 16, 2013


L26: Advanced dimensionality reduction

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Review of Linear Algebra

How to learn from very few examples?

Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction

Manifold Regularization

Is Manifold Learning for Toy Data only?

A Duality View of Spectral Methods for Dimensionality Reduction

Spectral Processing. Misha Kazhdan

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Clustering compiled by Alvin Wan from Professor Benjamin Recht s lecture, Samaneh s discussion

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

Norbert Wiener Center for Harmonic Analysis and Applications, University of Maryland, College Park, MD

Mid-year Report Linear and Non-linear Dimentionality. Reduction. applied to gene expression data of cancer tissue samples

Spectral Clustering. Guokun Lai 2016/10

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Lecture: Some Statistical Inference Issues (2 of 3)

ABSTRACT. Professor Wojciech Czaja Department of Mathematics

Statistical Learning Notes III-Section 2.4.3

Statistical Learning. Dong Liu. Dept. EEIS, USTC

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

PARAMETERIZATION OF NON-LINEAR MANIFOLDS

Transcription:

Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012

Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A. Halevy, V. Rajapakse Naval Research Laboratory: D. Gillis

Outline Mathematical Techniques 1 Mathematical Techniques 2 3

Outline Mathematical Techniques 1 Mathematical Techniques 2 3

Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component Analysis (PCA), Locally Linear Embedding (LLE), Isomap, genetic algorithms, and neural networks. We are interested in a subfamily of these techniques known as Kernel Eigenmap Methods. These include Kernel PCA, LLE, Hessian LLE (HLLE), and Laplacian Eigenmaps. Kernel eigenmap methods require two steps. Given data space X of N vectors in R D. 1 Construction of an N N symmetric, positive semi-definite kernel, K, from these N data points in R D. 2 Diagonalization of K, and then choosing d D significant eigenmaps of K. These become our new coordinates, and accomplish dimensionality reduction. We are particularly interested in diffusion kernels K, which are defined by means of transition matrices.

Kernel Eigenmap Methods for Dimension Reduction - Kernel Construction Kernel eigenmap methods were introduced to address complexities not resolvable by linear methods. The idea behind kernel methods is to express correlations or similarities between vectors in the data space X in terms of a symmetric, positive semi-definite kernel function K : X X R. Generally, there exists a Hilbert space K and a mapping Φ : X K such that K (x, y) = Φ(x), Φ(y). Then, diagonalize by the spectral theorem and choose significant eigenmaps to obtain dimensionality reduction. Kernels can be constructed by many kernel eigenmap methods. These include Kernel PCA, LLE, HLLE, and Laplacian Eigenmaps.

Kernel Eigenmap Methods for Dimension Reduction - Kernel Diagonalization The second step in kernel eigenmap methods is the diagonalization of the kernel. Let e j, j = 1,..., N, be the set of eigenvectors of the kernel matrix K, with eigenvalues λ j. Order the eigenvalues monotonically. Choose the top d << D significant eigenvectors to map the original data points x i R D to (e 1 (i),..., e d (i)) R d, i = 1,..., N.

Laplacian Eigenmaps - Theory M. Belkin and P. Niyogi, 2003 Points close on the manifold should remain close in R d f (x) f (y) f (x) x y + o( x y ) arg min M f (x) 2 = arg min M M(f )f f L 2 (M) =1 f L 2 (M) =1 Find eigenfunctions of the Laplace-Beltrami operator M Use a discrete approximation of the Laplace-Beltrami operator Proven convergence (Belkin and Niyogi, 2003 2008)

Laplacian Eigenmaps - Implementation 1 Put an edge between nodes i and j if x i and x j are close. Precisely, given a parameter k N, put an edge between nodes i and j if x i is among the k nearest neighbors of x j or vice versa. 2 Given a parameter t > 0, if nodes i and j are connected, set W i,j = e x i x j 2 t. 3 Set D i,i = j W i,j, and let L = D W. Solve Lf = λdf, under the constraint y Dy = Id. Let f 0, f 1,..., f d be d + 1 eigenvector solutions corresponding to the first eigenvalues 0 = λ 0 λ 1 λ d. Discard f 0 and use the next d eigenvectors to embed in d-dimensional Euclidean space using the map x i (f 1 (i), f 2 (i),..., f d (i)).

Swiss Roll Mathematical Techniques Figure : a) Original, b) PCA, c f) LE, J. Shen et al., Neurocomputing, Volume 87, 2012

From Laplacian to Schroedinger Eigenmaps Consider the following minimization problem, y R d, 1 min y 2 y i y j 2 W i,j = min tr(y Ly). Dy=Id y Dy=E i,j Its solution is given by the d minimal non-zero eigenvalue solutions of Lf = λdf under the constraint y Dy = Id. Similarly, for diagonal V, consider the problem 1 min y 2 Dy=Id y i y j 2 W i,j + i,j i y i 2 V i,i = min tr(y (L+V )y), (1) y Dy=E which leads to solving equation (L + V )f = λdf.

Properties of Schroedinger Eigenmaps Let the data graph be connected and let V be a symmetric positive semi-definite matrix. If 1 Null(V ), then λ = 0 is a single eigenvalue of L + V with eigenvector 1. Theorem (with M. Ehler) Let the data graph be connected, let V be a symmetric positive semi-definite, and let n dim(null(v ). Then the minimizer of (1) satisfies: y (α) 2 V = trace2 (y (α)t Vy (α) ) C 1 α Let y (α) be a minimizer of the Schroedinger eigen-problem with αv, α > 0. Let V = diag(v 1,..., v N ). v i y (α) i 2 N v i y (α) i 2 1 C 1, for all i = 1,..., N. α i=1

Pointwise Convergence of SE Given n data points x 1, x 2,..., x n sampled independently from a uniform distribution on a smooth, compact, d-dimensional manifold M R D, define the operator ˆL t,n : C(M) C(M) by ˆL t,n (f )(x) = 1 1 (4πt) d/2 t n j f (x)e x x j 2 4t 1 n f (x j )e x x j 2 4t. Let v C(M) be a potential. For x M, let y n (x) = arg min x 1,x 2,...,x n x x i and define V n : C(M) C(M) by V n f (x) = v(y n (x))f (x). Theorem (Pointwise Convergence, with A. Halevy) Let α > 0, and set t n = ( 1 n ) 1 d+2+α. For f C (M), lim ˆL n tn,nf (x) + V n f (x) = C M f (x) + v(x)f (x) in probability. j

Spectral Convergence of SE - Theorem Let L t,n be the unnormalized discrete Laplacian. Theorem (Spectral Convergence of SE, with A. Halevy) Let λ i t,n and ei t,n be the ith eigenvalue and corresponding eigenfunction of L t,n + V n. Let λ i and e i be the ith eigenvalue and corresponding eigenfunction of M + V. Then there exists a sequence t n 0 such that, in probability, lim n λi t n,n = λ i and lim et i n n,n e i = 0.

SE as Semisupervised Method 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 8 x 10 3 8 x 10 3 8 x 10 3 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 The Schroedinger Eigenmaps with diagonal potential V = diag(0,..., 0, 1, 0,..., 0) only acting in one point y i0 in the middle of the arc for α = 0.05, 0.1, 0.5, 5. This point is pushed to zero.

SE as Semisupervised Method 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 6 4 2 0 2 4 6 8 x 10 3 8 x 10 3 8 x 10 3 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 8 8 6 4 2 0 2 4 6 8 x 10 3 By applying the potential to the end points of the arc for α = 0.01, 0.05, 0.1, 1, we are able to control the dimension reduction such that we obtain an almost perfect circle.

Outline Mathematical Techniques 1 Mathematical Techniques 2 3

Computational Bottleneck If N is the ambient dimension, and n is the number of points, time complexity of constructing an adjacency graph is O(Nn 2 ) What can we do about N? What can we do about the exponent 2? What can we do about n? What can we do about the computational complexity of eigendecomposition?

Numerical acceleration Data Compression via Incoherent Random Projections Fast Approximate k Nearest Neighbors algorithms, e.g., via Divide and Conquer strategies Quantization Landmarking Randomized low-rank SVD decompositions

Setting for data compression Dataset {x 1, x 2,..., x n } in R N, sampled from a compact K -dimensional Riemannian manifold Assume x i x j A for all i, j and some A > 0 Let 0 < λ 1 λ 2 λ K be the first K nonzero eigenvalues computed by LE, assumed simple, with r = min i,j λ i λ j, and let f j be a normalized eigenvector corresponding to λ j Use a random orthogonal projector Φ to map the points to R M. Let ˆf j be the jth eigenvector computed by LE for the projected data set

Laplacian Eigenmaps with random projections Theorem (with A. Halevy) Fix 0 < α < 1 and 0 < ρ < 1. If M 4 2 ln(1/ρ) rα ɛ 2 /200 + ɛ 3 K ln(ckn/ɛ), where ɛ = /3000 4An(n 1), then, with probability at least 1 ρ, f j ˆf j < α. The constant C depends on properties of the manifold. Precisely, C = 1900RV, where R, V and 1/τ are the geodesic covering regularity, τ 1/3 volume, and condition number, respectively.

Outline Mathematical Techniques 1 Mathematical Techniques 2 3

Hyperspectral Satellite Imagery Figure : Hyperspectral Satellite Image, courtesy of NASA

Data: Pavia University 100 100 200 200 300 300 400 400 500 500 600 50 100 150 600 50 100 150 Figure : (a) Pavia University with bands [68,30,2], (b) Ground truth map, (c) Class map for purely spectral LE, (d) Class map for purely spatial LE

Data Indian Pines 20 20 40 40 60 60 80 80 100 100 120 120 140 20 40 60 80 100 120 140 140 20 40 60 80 100 120 140 Figure : (a) Indian Pines with bands [29,15,12], (b) Ground truth map, (c) Class map for purely spectral LE, (d) Class map for purely spatial LE

Pure Spectral and Pure Spatial LE Table : Classification results for spectral LE σ 0.2 0.4 0.6 0.8 1 Indian Pines 57.10 57.74 57.70 57.67 57.63 Pavia University 79.74 80.01 80.02 80.02 80.02 Table : Classification results for spatial LE η 0.2 0.4 0.6 0.8 1 Indian Pines 85.88 85.99 86.03 86.04 86.03 Pavia University 77.56 77.68 77.74 77.77 77.77

LE and Data Integration Classification of stacked spatial/spectral embeddings: Indian Pines: 46 top spatial, 4 top spectral Laplacian eigenvectors, OA 90.82% Pavia University: 20 top spatial, 5 top spectral Laplacian eigenvectors, OA 96.46%

Mixing Spatial/Spectral Graphs and Weights Let G be the knn-graph computed using the spectral metric. Let L f and L s denote, resp., the spectral and spatial Laplace operators on G. Fuse the Laplace operator L in the following ways: L 1 (σ, η) = L f (σ) L s (η), where denotes entry-wise multiplication, L 2 (σ, η) = L f (σ) + L s (η), L 3 (σ, η) = G (L ff (σ) L fs (η)), i.e., we take the usual matrix product of the two Laplacians and then overlay the sparsity structure determined by the adjacency matrix.

Figure : Fusion operator class maps for Indian Pines: L 1 (σ = 0.8, η = 0.2, 62.73%), L 2 (σ = 0.4, η = 0.2, 59.01%), L 3 (σ = 0.6, η = 0.2, 58.12%)

Figure : Fusion operator class maps for Pavia University: L 1 (σ = 0.4, η = 0.2, 79.19%), L 2 (σ = 0.8, η = 0.2, 78.73%), L 3 (σ = 0.2, η = 0.2, 79.22%)

Fusion metric Mathematical Techniques Adjust the norm used to determine the knn-graph and the weight matrix. For γ > 0, we define x i x j γ = x i x j 2 f + γ x i x j 2 s. Let L γ be the Laplacian operator defined using the knn-graph and weight matrix determined by γ.

Fused Spatial-Spectral Classification Table : Classification results for the fusion operators acting on the fusion graph, Indian Pines, η = 0.2 γ = 450 L 1 L 2 L 3 σ 0.2 94.02 93.77 93.80 0.4 93.83 93.75 93.80 0.6 93.81 93.67 93.77 0.8 93.82 93.62 93.78 Table : Classification results for the fusion operators acting on the fusion graph, Pavia University, η = 0.25 γ = 34 L 1 L 2 L 3 σ 0.2 95.92 96.17 96.49 0.4 96.17 96.19 96.48 0.6 96.17 96.20 96.48 0.8 96.17 96.20 96.48

Fused Spatial-Spectral Classification Fusion operators on fusion metric graphs: Indian Pines, L 1, L 2, L 3.

Fused Spatial-Spectral Classification Fusion operators on fusion metric graphs: Pavia University, L 1, L 2, L 3.

Cluster Potentials Let ι = {i 1,..., i m } be a collection of nodes. A cluster potential over ι is defined by: 1 1 1 2 1 V [ι, ι] :=.......... 1 2 1 1 1 For such a V, the minimization problem is equivalent to: 1 min y T Dy=I 2 i,j m 1 y i y j 2 W i,j + α y ik y ik+1 2. k=1

Schroedinger Eigenmaps with Cluster Potentials Use k-means clustering to cluster the hyperspectral image. The clustering was initialized randomly so that the construction of the potential would be completely unsupervised. The Euclidean metric was used. Define a cluster potential, V k over each of the k = 1,..., K clusters. The order of the pixels in each cluster is determined by their index (smallest to largest). Aggregate into one potential by summing the individual cluster potentials: K V K = V k. k=1

Fused Spatial-Spectral Classification Table : Results for the Cluster Potential on Pavia University K α 100 500 1000 5000 15 91.23 93.09 92.92 92.08 25 85.55 85.70 85.38 89.69 50 85.71 85.35 85.22 84.86 100 85.47 85.31 85.13 84.96 Table : Results for the Cluster Potential on Indian Pines α K 100 500 1000 16 67.45 67.99 67.92 25 66.39 67.29 67.78 50 63.11 64.24 64.23 100 60.67 60.43 60.41

Fused Spatial-Spectral Classification Pavia University: Dimensions 4 and 5 of the SE embedding using cluster potential V 16 for classes 2,3, and 7

Fused Spatial-Spectral Classification Indian Pines: Dimensions 17 and 22 of the SE embedding using cluster potential V 16 for classes 2,3, and 10