A graph based approach to semi-supervised learning

A graph based approach to semi-supervised learning 1 Feb 2011

Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research 1-48, 2006. M. Belkin, P. Niyogi. Towards a Theoretical Foundation for Laplacian Based Manifold Methods. Journal of Computer and System Sciences, 2007.

What is semi-supervised learning? Prediction, but with the help of unsupervised examples.

Why semi-supervised learning?

Why semi-supervised learning? Practical reasons: unlabeled data cheap

Why semi-supervised learning? Practical reasons: unlabeled data cheap More natural model of human learning

An example

Semi-supervised learning framework 1 l labeled examples (x, y) generated by distribution P. u unlabeled examples drawn from marginal P X. Mercer kernel K. f = argmin f Hk 1 l l V (x i, y i, f ) + γ f 2 K i=1

Semi-supervised learning framework 2 Classical representer theorem: f (x) = l α i K(x i, x) i=1

Manifold regularization: assumptions Assumptions: P supported on manifold M P(y x) varies smoothly along geodesics of P X

Manifold regularization: assumptions Assumptions: P supported on manifold M P(y x) varies smoothly along geodesics of P X Modified objective: f = argmin f HK 1 l l V (x i, y i, f ) + γ A f 2 K + γ I f 2 I i=1

Manifold regularization: known marginal Theorem If P X known and M is a smooth Riemannian manifold, f (x) = l i=1 + α(z)k(x, z)dp X (z) M

Manifold regularization: unknown marginal Need to estimate marginal and f I

Manifold regularization: unknown marginal Need to estimate marginal and f I Only requires unlabeled data

Manifold regularization: unknown marginal Need to estimate marginal and f I Only requires unlabeled data Natural choice: f 2 I = M Mf 2 dp

Manifold regularization: unknown marginal Need to estimate marginal and f I Only requires unlabeled data Natural choice: f 2 I = M Mf 2 dp Approximate M with graph

Manifold regularization: building the graph Single-linkage clustering Nearest neighbor methods

Manifold regularization: building the graph Single-linkage clustering Nearest neighbor methods Use graph laplacian instead of manifold Laplacian

Manifold regularization: using the graph Theorem By choosing exponential weights for the edges, the graph Laplacian converges to the manifold Laplacian in probability. f 1 = argmin l f HK l i=1 V (x i, y i, f ) + γ A f 2 K + γ I f T Lf (u+l) 2 L = D W

Main result Theorem f (x) = l+u i=1 α ik(x i, x)

Regularized least squares Classical RLS: argmin f HK 1 l l i=1 (y i f (x i )) 2 + λ f 2 K

Regularized least squares Classical RLS: argmin f HK 1 l l i=1 (y i f (x i )) 2 + λ f 2 K Solution: f (x) = l i=1 α i K(x i, x), α = (K + λli ) 1 Y

Regularized least squares 1 Classical RLS: argmin l f HK l i=1 (y i f (x i )) 2 + λ f 2 K Solution: f (x) = l i=1 α i K(x i, x), α = (K + λli ) 1 Y Laplacian RLS: argmin f HK 1 l l i=1 (y i f (x i )) 2 + λ A f 2 K + λ I (u+l) 2 f T Lf

Support vector machines Like in regularized least squares, there is a version of the SVM called Laplacian SVM.

Two moons dataset

Wisconsin breast cancer data 683 samples. Benign or malignant? Clump thickness Uniformity of cell size and shape etc

Wisconsin breast cancer data: results

Longer term stuff Besides geometric structure, what else can we use? Invariance? Learning the manifold: Simplicial complex instead of graph? Homology. Nice example in natural image statistics (Mumford et al, 2003)

Longer term stuff 2 Hickernell, Song, and Zhang. Reproducing kernel Banach spaces with the l 1 norm. Preprint. Reproducing kernel Banach spaces with the l 1 norm II: error analysis for regularized least squares regression. Preprint.