Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Size: px

Start display at page:

Download "Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst"

Paulina Page
6 years ago
Views:

1 Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

2 Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of the data as a graph Construct an embedding of the graph Applications clustering, semi-supervised learning, regression, and reinforcement learning

3 Semi-Supervised Learning In many applications, unlabeled examples are plentiful, but labeled ones are in limited supply Is it possible to exploit unlabeled data to improve classification? Use the geometry of the space of unlabeled data Crucial assumption: the label function is smooth on the manifold

4 Semi-Supervised Learning on Graphs Random Walk Matrix? ?? 4 7? ? Non-symmetric

5 Label Propagation (Zhou and Ghahramani, 2002) Compute affinity matrix W Form the row sums D ii = Σ j W ij Initialize Y 0 = (y 1,,y l,y l+1, y n ) Iterate: + Y t+1 = D -1 W Y t Y l t+1 = Y l (clamp observed labels) Assign labels by sign of Y - Two-moons problem -

6 Label Propagation Convergence Final labels

7 Nonlinear dimensionality reduction (ISOMAP, LLE, Laplacian Eigenmaps, Diffusion Maps, MVU, ) Swissroll Embedding Embedding should preserve locality

8 Local Linear Embedding [Roweis and Saul, Science 2000] Learn weight matrix W (W )= X i i j W ij Xj 2 Learn embedding Y φ(y )= i Y i j W ij Yj 2

9 [Roweis, LLE]

10 [Roweis, LLE]

12 Constructing Similarity Matrix Manifold methods can be given as input a neighborhood similarity matrix W over the data Gaussian kernel: W (i, j) =e x i x j 2 2σ 2 K-NN kernel: W(i,j) = 1 if x i is near x j The matrix is then normalized and diagonalized (or dilated): Basis functions: columns (eigenvectors) Embeddings: rows (sorted in increasing or decreasing order w.r.t. eigenvalues)

13 Undirected Graph Embedding Optimization problem of embedding a graph to preserve local geometry: Min y i,j (y i y j ) 2 w i,j s.t. y T D y = 1 where y i ε R is the embedding of i th vertex and D is a diagonal matrix of row sums of W x 1 w 13 x 3 x 2 y 3 y 1 y 2

14 Graph Embedding The best mapping is found by solving the generalized eigenvector problem W φ = λ D φ where D is a diagonal matrix of row sums of W If the graph is connected, this can be written as D -1 W φ = λ φ

15 Introducing the Laplacian The graph embedding problem can be written as Min y i,j (y i y j ) 2 w i,j = Min y i,j (y i 2 + y j 2 2y i y j ) w i,j = Min y y T L y where L = D-W is the combinatorial Laplacian

16 Properties of the Laplacian The Laplacian L is positive semidefinite 1 2 The Laplacian for this graph is = Note that <f, Lf> = f T L f = (f 1 f 2 ) 2 Hence, for any f 0, <f, Lf> >= 0 All the eigenvalues of L are non-negative Combinatorial Laplacian L = D W acts on f (L f)(i) =Σ i~j (f i f j ) w ij

Combinatorial Graph Laplacian (Fiedler, 73) = 2 1 0 0... 1 1 2 1 0.

17 Combinatorial Graph Laplacian (Fiedler, 73) = closed chain 4 Eigenvectors of the Combinatorial Laplacian L = D - W 7 6 5

18 Normalized Graph Laplacian Normalized Graph Laplacian is defined as Note that L = D -1/2 (D-W) D -1/2 D -1 W = D -1/2 (D -1/2 W D -1/2 ) D 1/2 = D -1/2 (I L) D 1/2 The normalized graph Laplacian has the same eigenvalues as the random walk matrix

19 Faculty Collaboration Graph: What hidden structure does this contain?

20 Spectral Clustering 1. Given a set X of data points to cluster 2. Form the normalized matrix M = D -1/2 W D -1/2 3. Compute its k largest eigenvectors 4. Arrange the eigenvectors as the columns of a matrix Y 5. Each point is embedded in R k given by the row of the matrix Y 6. Run K-means on the new embedding

21 Spectral Clustering using Graph Laplacian Cluster: 1 Embedding using the 2 nd and 3 rd eigenvector of the graph Laplacian Adler Barrington Immerman Kurose Rosenberg Shenoy Sitaraman Towsley cluster: 2 Adrion Allan Avrunin Barto Brock Clarke Cohen Croft Grupen Hanson Jensen Lehnert Lesser Levine Mahadevan Manmatha McCallum Moll Moss Osterweil Riseman Rissland Schultz Utgoff Woolf Zilberstein Weems

22 Spectral Clustering Cluster: 3 Adrion Cluster: 2 Adler Barrington Immerman Kurose Rosenberg Shenoy Sitaraman Towsley Weems Embedding using the 2 nd and 3 rd eigenvector of the graph Laplacian Cluster: 1 Barto Brock Grupen Hanson Mahadevan Moll Moss Riseman Schultz Utgoff Allan Avrunin Clarke Cohen Croft Jensen Lehnert Lesser Levine Manmatha McCallum Osterweil Rissland Woolf Zilberstein

23 Spectral Clustering using Graph Laplacian Cluster: 1 Cluster: 2 Cluster: 3 Cluster: 4 Cluster: 5 Cluster: 6 Adrion Adler Barto Avrunin Allan Barrington Brock Rosenberg Grupen Clarke Croft Immerman Cohen Sitaraman Hanson Lesser Jensen Kurose Lehnert Weems Mahadevan Osterweil Levine Shenoy Rissland Moll Manmatha Towsley Utgoff Moss McCallum Woolf Riseman Zilberstein Schultz

24 [Ng, Jordan, and Weiss, NIPS]

25 Regularization Perspective Combinatorial Laplacian L = D W acts on f (L f)(i) =Σ i~j (f i f j ) w ij We can express <f, Lf> as a Dirichlet sum <f, Lf> = (i,j) (f i f j ) 2 w ij The pseudo-inverse of the Laplacian L + defines a reproducing kernel Hilbert space (RKHS)

26 Laplacian Eigenmaps (Belkin and Niyogi) Given a set of instances, form the affinity matrix W (e.g. using Gaussian kernel) Form the combinatorial Laplacian L = D-W Compute its k lowest eigenvectors Lφ i = λ i φ i These can be used to smoothly approximate any function on the graph

27 Least-Squares Reinforcement Learning (Boyan, Bradtke and Barto, Bertsekas and Nedic, Lagoudakis and Parr) T π ( Vˆ( s)) = i V ˆ( s) φ ( s) i w i Basis Φ T π ( ˆV (s)) = E π (R + γ ˆV )

2007) G OPTIMAL Bottlenecks Polynomial basis

28 Learning Representation and Control in Markov Decision Processes (Mahadevan and Maggioni, JMLR 2007) G OPTIMAL Bottlenecks Polynomial basis does poorly here How to automatically find a good basis?

29 Representation Policy Iteration (Mahadevan, UAI 2005) Policy improvement Greedy Policy Actor Policy evaluation Φ Representation Learner Trajectories Critic New bases

The Nystrom extension is a classical interpolation developed in the

30 Out of Sample Extension (Baker, 1976; Williams and Seeger, NIPS 2001) How to compute the embeddings of new points? The Nystrom extension is a classical interpolation developed in the solution of integral equations φ m (x) = 1/λ m j w j k(x,s j ) φ m (s j ) Mountain Car MDP

RPI in Continuous Domains: Inverted Pendulum

of Steps 2000 1500 1000 Machine-generated

31 RPI in Continuous Domains: Inverted Pendulum [Mahadevan and Maggioni, JMLR 2007] PVFs on the Inverted Pendlum Task 3000 PVFs [Lagoudakis and Parr, JMLR 2003] 2500 Number of Steps Machine-generated representation Human-designed representation Number of Episodes 25 Eigenvectors of Normalized Laplacian Speedup of RBFs

,@A:62@712@1BA02.2:1C2-=6@ Task ("1;<?7D1E@!

32 #!!! *!! )!! RPI in Continuous Domains: Acrobot Task (4-dim state space) +,-./012319:/57 (!! '!! "!! &!! %!! $!! Machine-generated representation #!!!! " #! #" $! $" %! +,-./ /7 TD + CMAC 40X faster Human-designed representation

33 Other Methods in Manifold Learning Multiscale diffusion wavelets Instead of localizing bases to a particular eigenvalue, find bases over a frequency band Bases are local and multiscale, not global Semi-definite embedding Learn a kernel matrix from the data that preserves local geometry and global variance ISOMAP Learn an embedding that preserves global distances on the graph

34 Multiscale Diffusion Wavelets (Coifman and Maggioni, ACHA 2006) Diffusion Wavelet at Level 1 Basis Function 1 Diffusion Wavelet at Level 3 Basis Function 4 Diffusion Wavelet at Level 4 Basis Function Unit vectors Level 1 Level 3 Level Diffusion Wavelet at Level 6 Basis Function 1 Diffusion Wavelet at Level 7 Basis Function 1 Diffusion Wavelet at Level 9 Basis Function Eigenvector Level 6 Level 7 Level

35 Compression in 3D Graphics (Karni & Gotsman, SIGGRAPH 2000) ~20,000 vertices ~ 1.5 Mb Extend JPEG to 3D Topology Geometry Object Topology X Coordinate Function Y Coordinate Function Z Coordinate Function nz = x 10 4

36 3D Mesh Compression using Diffusion Wavelets (Mahadevan, ICML 2007) Multiscale bases Level 9 Level 4 Level 5 Level 8 Level 10

Compressing Large 3D Objects Elephant Object File: ea4.obj Vertices: 19753 Partitions: 300 5 Laplacian Bases: 11.81 seconds 4.5 DWT Bases: 27.

37 Compressing Large 3D Objects Elephant Object File: ea4.obj Vertices: Partitions: Laplacian Bases: seconds 4.5 DWT Bases: seconds 4 Geometric + Laplacian Error ~20,000 vertices Numbers of bases in Multiples of 10

38 Summary Learning on graphs and manifolds exploits the non-euclidean geometry of the underlying space Label propagation: semi-supervised learning on graphs Spectral clustering: uses the eigenvectors of the Laplacian as a new representation Laplacian eigenmaps: uses the eigenvectors to approximate functions Diffusion wavelets: multiscale approach

CMPSCI 791BB: Advanced ML: Laplacian Learning

CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!