CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

Similar documents
Non-linear Dimensionality Reduction

Nonlinear Dimensionality Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Unsupervised dimensionality reduction

Nonlinear Dimensionality Reduction. Jose A. Costa

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Statistical and Computational Analysis of Locality Preserving Projection

Learning a kernel matrix for nonlinear dimensionality reduction

EECS 275 Matrix Computation

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Robust Laplacian Eigenmaps Using Global Information

Data-dependent representations: Laplacian Eigenmaps

Apprentissage non supervisée

Lecture 10: Dimension Reduction Techniques

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Graph-Laplacian PCA: Closed-form Solution and Robustness

Manifold Learning: Theory and Applications to HRI

A Duality View of Spectral Methods for Dimensionality Reduction

Spectral Techniques for Clustering

Dimensionality Reduction AShortTutorial

LECTURE NOTE #11 PROF. ALAN YUILLE

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Manifold Learning and it s application

Bi-stochastic kernels via asymmetric affinity functions

Graphs, Geometry and Semi-supervised Learning

A Duality View of Spectral Methods for Dimensionality Reduction

Large-Scale Manifold Learning

Locality Preserving Projections

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization

L26: Advanced dimensionality reduction

Discriminant Uncorrelated Neighborhood Preserving Projections

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Nonlinear Dimensionality Reduction

Gaussian Process Latent Random Field

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Learning Eigenfunctions Links Spectral Embedding

A spectral clustering algorithm based on Gram operators

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Clustering in kernel embedding spaces and organization of documents

Data dependent operators for the spatial-spectral fusion problem

A Local Non-Negative Pursuit Method for Intrinsic Manifold Structure Preservation

Intrinsic Structure Study on Whale Vocalizations

Nonlinear Manifold Learning Summary

MANIFOLD LEARNING: A MACHINE LEARNING PERSPECTIVE. Sam Roweis. University of Toronto Department of Computer Science. [Google: Sam Toronto ]

THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction

An indicator for the number of clusters using a linear map to simplex structure

Regression on Manifolds Using Kernel Dimension Reduction

Local Learning Projections

Dimensionality Reduction:

Learning Generative Models of Similarity Matrices

Analysis of Spectral Kernel Design based Semi-supervised Learning

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Informative Laplacian Projection

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Chap.11 Nonlinear principal component analysis [Book, Chap. 10]

Unsupervised Clustering of Human Pose Using Spectral Embedding

Diffuse interface methods on graphs: Data clustering and Gamma-limits

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Supervised locally linear embedding

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Spectral Dimensionality Reduction

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Nonnegative Matrix Factorization Clustering on Multiple Manifolds

CMPSCI 791BB: Advanced ML: Laplacian Learning

Theoretical analysis of LLE based on its weighting step

Self-Tuning Spectral Clustering

Mid-year Report Linear and Non-linear Dimentionality. Reduction. applied to gene expression data of cancer tissue samples

Spectral Dimensionality Reduction via Maximum Entropy

Spectral Clustering. by HU Pili. June 16, 2013

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Discriminative K-means for Clustering

Learning gradients: prescriptive models

Statistical Pattern Recognition

Dimensionality Reduc1on

Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

Diffusion Wavelets and Applications

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Robust Motion Segmentation by Spectral Clustering

Locally Linear Embedded Eigenspace Analysis

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Self-Tuning Semantic Image Segmentation

Justin Solomon MIT, Spring 2017

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Multiscale Manifold Learning

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

From graph to manifold Laplacian: The convergence rate

Clustering by weighted cuts in directed graphs

Global Positioning from Local Distances

Graph Metrics and Dimension Reduction

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Doubly Stochastic Normalization for Spectral Clustering

Transcription:

CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as a matrix X with D rows and N columns, where D =2 is the input dimensionality and N = 1 is the number of inputs. Submit your source code as part of your solutions. 4 3 2 1 1 2 3 4 4 3 2 1 1 2 3 4 (a) Spectral clustering Compute a binary partition of the data using spectral clustering with affinities K ij = e x i x j 2 /σ 2. For what values of the kernel width σ does the method separate the two rings? Visualize your results in the plane. (b) k-means clustering Use your best results from spectral clustering to initialize a k-means algorithm with k = 2. Does k-means clustering preserve or modify your previous solution? For better or for worse? 1

3.2 Dimensionality reduction of images Download the teapot data set for this problem from the course web site. The images are stored in MATLAB format as a matrix X with D rows and N columns, where D =2328 is the number of pixels per image and N = 4 is the number of images. The i th image in the data set can be displayed using the command: imagesc(reshape(x(:, i), 76, 11, 3)) In this problem you will compare the results from linear and nonlinear methods for dimensionality reduction. For the latter, you may download and make use of publicly available code for any of the algorithms we discussed in class. (a) Subspace methods A Center the data by subtracting out the mean image, then compute the Gram matrix of the centered images. Plot the eigenvalues of the Gram matrix in descending order. How many leading dimensions of the data are needed to account for 95% of the data s total variance? Submit your source code as part of your solutions. (b) Manifold learning Use a manifold learning algorithm of your choice to compute a two dimensional embedding of these images. By visualizing your results in the plane, compare the two dimensional embeddings obtained from subspace methods versus manifold learning. B 2

3.3 Sparse matrices on ring graphs Consider a ring graph with N nodes and uniform weights on its edges. In particular, let W ij = 1 2 if j i = 1 or if j i = N 1; otherwise W ij =. (a) Graph Laplacian Show that the normalized graph Laplacian is a circulant matrix, and use this property to compute all of its eigenvalues and eigenvectors. (b) Locally linear embedding (LLE) Consider the matrix (I W ) (I W ) diagonalized by LLE. How are its eigenvalues and eigenvectors related to those of the normalized graph Laplacian? 3

3.4 Isomap on a ring (*) Consider N data points that are uniformly sampled along the unit circle in the plane. In this problem, you will study the behavior of the Isomap algorithm in the continuum limit N. In this limit of large sample size, it is possible to compute an analytical solution. This solution reveals how Isomap breaks down when its convexity assumption is violated. (a) Inner products Applied to a finite data set, Isomap estimates the geodesic distances D ij between the ith and jth inputs, then constructs the Gram matrix with elements: G ij = 1 [ 1 ( ) Dik 2 + Dkj 2 1 ] 2 N N 2 Dkl 2 Dij 2. k In the continuum limit N, a data point exists at every angle θ [, 2π] on the unit circle. The geodesic distance between two points at angles θ and θ is given by: D(θ, θ ) = min( θ θ, 2π θ θ ). In this limit, the Gram matrix normally constructed by Isomap is replaced by a kernel function g(θ, θ ). The kernel function can be computed from: g(θ, θ ) = 1 2 [ 2π dφ ( ) D 2 (θ, φ) + D 2 (φ, θ ) 2π 2π kl dφ 2π ] dψ 2π 2π D2 (φ, ψ) D 2 (θ, θ ). Evaluate these integrals to derive an expression for the kernel function g(θ, θ ). In particular, show that in the continuum limit: g(θ, θ ) = π2 6 1 2 D2 (θ, θ ). Plot the kernel function g(θ, θ ) as a value of the difference in angle θ θ. On the same figure, plot k(θ, θ ) = cos(θ θ ), which measures the true inner products of points on the unit circle. How are these plots different? (b) Eigenvalues Applied to a finite data set, Isomap computes the eigenvalues and eigenvectors of the Gram matrix estimated from the first equation in part (a). In the continuum limit N, Isomap s output is encoded by the eigenvalues λ n and eigenfunctions u n (θ) of the kernel function g(θ, θ ). These eigenvalues and eigenfunctions satisfy: 2π dφ g(θ, φ) u n (φ) = λ n u n (θ). Show that sin(rθ) and cos(rθ) are eigenfunctions of the kernel function g(θ, θ ) for integer-valued r {, 1, 2,..., }. (Ignore the degenerate case sin(rθ) for r =.) What are the eigenvalues of these eigenfunctions? How many of them are non-zero? Is there a gap in the eigenvalue spectrum that reveals the intrinsic dimensionality of the ring? 4

3.5 Readings J. Shi and J. Malik (2). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 22(8): 888-95. S. T. Roweis and L. K. Saul (2). Nonlinear dimensionality reduction by locally linear embedding. Science 29: 2323-2326. J.B. Tenenbaum, V. de Silva and J. C. Langford (2). A global geometric framework for nonlinear dimensionality reduction. Science, vol. 29, pp. 2319 2323. A. Ng, M. Jordan, and Y. Weiss (22). On spectral clustering: analysis and an algorithm. In T. Dietterich, S. Becker and Z. Ghahramani (eds.), in Neural Information Processing Systems 14, pages 849-856. MIT Press: Cambridge, MA. M. Belkin and P. Niyogi (23). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6): 1373-1396. K. Q. Weinberger and L. K. Saul (26). An introduction to nonlinear dimensionality reduction by maximum variance unfolding. In Proceedings of the Twenty First National Conference on Artificial Intelligence (AAAI-6), pages 1683-1686. Boston, MA. 5