Manifold Regularization

Similar documents
Graphs, Geometry and Semi-supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning

A graph based approach to semi-supervised learning

of semi-supervised Gaussian process classifiers.

How to learn from very few examples?

Semi-Supervised Learning of Speech Sounds

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Justin Solomon MIT, Spring 2017

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

Graphs in Machine Learning

Joint distribution optimal transportation for domain adaptation

What is semi-supervised learning?

Multi-View Point Cloud Kernels for Semi-Supervised Learning

Nonlinear Dimensionality Reduction. Jose A. Costa

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances

Discrete vs. Continuous: Two Sides of Machine Learning

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31


THE UNIVERSITY OF CHICAGO ON SEMI-SUPERVISED KERNEL METHODS A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES

LA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce

An Analytical Comparison between Bayes Point Machines and Support Vector Machines

Framework for functional tree simulation applied to 'golden delicious' apple trees

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS

SVM example: cancer classification Support Vector Machines

Large Scale Semi-supervised Linear SVMs. University of Chicago

Semi-Supervised Learning on Riemannian Manifolds

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data

Learning with Consistency between Inductive Functions and Kernels

LABELED data is expensive to obtain in terms of both. Laplacian Embedded Regression for Scalable Manifold Regularization

Predicting Graph Labels using Perceptron. Shuang Song

Data dependent operators for the spatial-spectral fusion problem

Optimal data-independent point locations for RBF interpolation

SEMI-SUPERVISED classification, which estimates a decision

Semi-Supervised Learning on Riemannian Manifolds

Manifold Regularization

Cluster Kernels for Semi-Supervised Learning

Kernels for Multi task Learning

Semi-Supervised Learning

Kernel Methods. Machine Learning A W VO

Semi-Supervised Learning with Graphs

Convex Optimization in Classification Problems

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Kernel expansions with unlabeled examples

Multi-view Laplacian Support Vector Machines

A Functional Quantum Programming Language

Regression on Manifolds Using Kernel Dimension Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Gene expression experiments. Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces

Infinite Dimensional Vector Spaces. 1. Motivation: Statistical machine learning and reproducing kernel Hilbert Spaces

An RKHS for Multi-View Learning and Manifold Co-Regularization

An Introduction to Optimal Control Applied to Disease Models

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Non-linear Dimensionality Reduction

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

DISSIPATIVE SYSTEMS. Lectures by. Jan C. Willems. K.U. Leuven

Kernel Methods. Barnabás Póczos

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Nonlinear Dimensionality Reduction

Vectors. Teaching Learning Point. Ç, where OP. l m n

Laplace-Beltrami Eigenfunctions for Deformation Invariant Shape Representation

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Redoing the Foundations of Decision Theory

Back to the future: Radial Basis Function networks revisited

Advanced Introduction to Machine Learning

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

ETIKA V PROFESII PSYCHOLÓGA

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Does Unlabeled Data Help?

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Learning gradients: prescriptive models

A Beamforming Method for Blind Calibration of Time-Interleaved A/D Converters

Convergence of Eigenspaces in Kernel Principal Component Analysis

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Data-dependent representations: Laplacian Eigenmaps

Statistical machine learning and kernel methods. Primary references: John Shawe-Taylor and Nello Cristianini, Kernel Methods for Pattern Analysis

Towards a High Level Quantum Programming Language

Synopsis of Numerical Linear Algebra

Regularization on Discrete Spaces

A Brief Survey on Semi-supervised Learning with Graph Regularization

1 Graph Kernels by Spectral Transforms

Support Vector Machines for Classification: A Statistical Portrait

The Learning Problem and Regularization

Support Vector Machines (SVMs).

An Example file... log.txt

Optimal Control of PDEs

4.3 Laplace Transform in Linear System Analysis

CS 6375 Machine Learning

Active and Semi-supervised Kernel Classification

Laplacian Spectrum Learning

Limits of Spectral Clustering

QUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Global vs. Multiscale Approaches

Robust Laplacian Eigenmaps Using Global Information

Transcription:

Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1

The Problem of Learning is drawn from an unknown probability distribution. A Learning Algorithm maps of a hypothesis space mapping. to an element of functions should provide good labels for future examples. Regularization : Choose a simple function that agrees with data. TTI-C Talk September 14, 24 p.2

The Problem of Learning Notions of simplicity are the key to successful learning. Here s a simple function that agrees with data. TTI-C Talk September 14, 24 p.3

Learning and Prior Knowledge But Simplicity is a Relative Concept. Prior Knowledge of the Marginal can modify our notions of simplicity. TTI-C Talk September 14, 24 p.4

Motivation How can we exploit prior knowledge of the marginal distribution? More practically, how can we use unlabeled examples drawn from Why is this important? Natural Data has structure to exploit. Natural Learning is largely semi-supervised. Labels are Expensive, Unlabeled data is cheap and plenty. TTI-C Talk September 14, 24 p.5

Contributions A data-dependent, Geometric Regularization Framework for Learning from examples. Representer Theorems provide solutions. Extensions of SVM and RLS for Semi-supervised Learning. Regularized Spectral Clustering and Dimensionality Reduction. The problem of Out-of-sample extensions in graph methods is resolved. Good Empirical Performance. TTI-C Talk September 14, 24 p.6

%. Regularization with RKHS Learning in Reproducing Kernel Hilbert Spaces : ' (! " Regularized Least Squares (RLS) : ' ) Support Vector Machine (SVM) : )! +-, * TTI-C Talk September 14, 24 p.7

/ 6 4 9 ; What are RKHS? Hilbert Spaces with a nice property : If two functions are close in the distance derived from the inner product, their values are close. / 1 Reproducing Property : is linear, continuous. By Reisz s Representation theorem, 5 4 2 3 7 Kernel Function RKHS : 9 3 8 2 3. 3 8 : 3 : TTI-C Talk September 14, 24 p.8

D D G Why RKHS? Rich Function Spaces with complexity control e.g Gaussian Kernel H I G ' ( 'M I L 'KJ > C >@?BA DFE = < Representer Theorems show that the minimizer has the form : and therefore, 7 O 7 N S NS N RQ S P 9 8 ' ( Motivates kernelization (KPCA, KFD, etc). Good empirical performance. : TTI-C Talk September 14, 24 p.9

Known Marginal If is known, solve : ' V %WV ' ( %UT! " Extrinsic and Intrinsic Regularization controls complexity in ambient space. controls complexity in the intrinsic geometry of %XT %V TTI-C Talk September 14, 24 p.1

Z Continuous Representer Theorem Assume that the penalty term is sufficiently smooth with respect to the RKHS norm. Then the solution to the optimization problem exists and admits the following representation V ( M N Y N where ] ] [\ is the support of the marginal. TTI-C Talk September 14, 24 p.11

_ ^ 9 8 Z Z ( 9 8 Z Z A Manifold Regularizer If, the support of the marginal is a compact submanifold, it seems natural to choose : ' V Z and to find that minimizes : Z %V ' ( %UT! " TTI-C Talk September 14, 24 p.12

Z 9 8 Z Z Z Z 9 8 Mc d ) Z Z Z Laplace Beltrami Operator The intrinsic regularizer is a quadratic form involving the Laplace-Beltrami operator on the manifold d Mc ) `ba : Z V because some calculus on manifolds establishes that for any vector field, TTI-C Talk September 14, 24 p.13

l l /m S S ) n Passage to the Discrete In reality, is unknown and sampled only via examples. Labels are not required for empirical estimates of. BeUf ' V Manifold Graph S k ghji 4 S. ef Laplace Beltrami Graph Laplacian. `ba Mc ol p o ' V Z ' V S ' S ) TTI-C Talk September 14, 24 p.14

( p o '. Algorithms We have motivated the following optimization problem : Find a function that minimizes : %V ol ' ( %qt! " " r Laplacian RLS ' ) Laplacian SVM )! +-, * TTI-C Talk September 14, 24 p.15

( s S s Empirical Representer Theorem The minimizer admits an expansion BeUf N N BeUf as Proof : Write any S N BeUf 9 3k 8, s. increases the norm. So TTI-C Talk September 14, 24 p.16

N p p ' l t N N N N ) r t " "u N r Laplacian RLS By the Representer Theorem, the problem becomes finite dimensional. For Laplacian RLS, we find that minimizes : Bef %WV N %XT ' "! ".,, 7 77 7 77 +. The solution is :,!, 7 77! 7 77 /m where : Gram Matrix ; and Mc %WV = l t %XT '" TTI-C Talk September 14, 24 p.17

, u z u z N r Laplacian SVM For Laplacian SVMs, we solve a QP : p ' ) yx *wv, subject to :, and pt = l D {} ~ BeUf where then invert a linear system : -z t %T %WV p t = l -z %XT ' " TTI-C Talk September 14, 24 p.18

4 N Manifold Regularization Input : l labeled and u unlabeled examples Output : _ 5 Algorithm : Contruct adjacency Graph. Compute Laplacian. Choose Kernel matrix K. Choose. (?) Compute. Output %V %T. Compute Gram Y N Bef TTI-C Talk September 14, 24 p.19

Out-of-sample Extn.Spectral Clustering Ž Unity of Learning Supervised Partially Supervised Unsupervised SVM/RLS Graph Regularization Graph Mincut š X ƒ Ž @ ƒ ŠŒ Xˆ ƒ ª ± ª ž³ ²ž± ž± ª q ª U ª «ž œÿž œ b ž B b ž œÿžf œ µµ ª ª ƒ Š ˆ ƒ Out-of-sample Extn. ª q ª U ª «ž œÿž œ ª ª Š ˆ ƒ Reg. Spectral Clust. ŠŒ -ˆ ƒ ¹ µµ ª ª TTI-C Talk September 14, 24 p.2

ol p o % º N Æ l % d d N Ç Regularized Spectral Clustering Unsupervised Manifold Regularization : ' ( º ÃÄà > ¾ >¼ ÀÂÁ»½¼ ¾ Representer Theorem : leads to an eigenvalue problem : f ' Å Å and. is the smallest-eigenvalue eigenvector; P projects orthogonal to. TTI-C Talk September 14, 24 p.21

Experiments : Synthetic SVM Laplacian SVM Laplacian SVM 2 2 2 1 1 1 1 1 1 1 1 2 γ A =.3125 γ I = 1 1 2 γ A =.3125 γ I =.1 1 1 2 γ A =.3125 γ I = 1 2 2 2 1 1 1 1 1 1 1 1 2 γ A = 1e 6 γ I = 1 1 1 2 γ A =.1 γ I = 1 1 1 2 γ A =.1 γ I = 1 TTI-C Talk September 14, 24 p.22

œ Î œ Î œ Õ Related Algorithms Transductive SVMs [Joachims, Vapnik] œš ¹ µµ b ž š ž Ï ³ ÈÎ b ž š ž Ï ³ ƒ ÈÊÉ œš ž ž Ë ÍÍÍ ŠŒ Ë Ì ˆ Semi-supervised SVMs [Bennet,Fung et al] ž š ž Ï ³ ƒ ÈÊÉ Ñ ž Ë ÍÍÍ ŠÐ Ë Ì ˆ µµ Ó ž š Ï b ž š Ò Ï ³ œš œš žf C Measure-based Reg. [Bousquet et al] Ú b ØbÙ b Ö U ž ž ƒ È É Ô ˆ ž TTI-C Talk September 14, 24 p.23

Experiments : Synthetic Data 2.5 SVM 2.5 Transductive SVM 2.5 Laplacian SVM 2 2 2 1.5 1.5 1.5 1 1 1.5.5.5.5.5.5 1 1 1 1.5 1 1 2 1.5 1 1 2 1.5 1 1 2 TTI-C Talk September 14, 24 p.24

Experiments : Digits Error Rates 2 15 1 RLS vs LapRLS RLS LapRLS Error Rates 2 15 1 SVM vs LapSVM SVM LapSVM Error Rates 2 15 1 TSVM vs LapSVM TSVM LapSVM 5 5 5 1 2 3 4 45 Classification Problems 1 2 3 4 45 Classification Problems 1 2 3 4 45 Classification Problems LapRLS (Test) 15 1 5 Out of Sample Extension 5 1 15 LapRLS (Unlabeled) LapSVM (Test) 15 1 5 Out of Sample Extension 5 1 15 LapSVM (Unlabeled) SVM (o), TSVM (x) Deviation 15 1 5 Performance Deviation 2 4 6 LapSVM Deviation TTI-C Talk September 14, 24 p.25

Experiments : Digits 9 8 RLS vs LapRLS RLS (T) RLS (U) LapRLS (T) LapRLS (U) 1 9 SVM vs LapSVM SVM (T) SVM (U) LapSVM (T) LapSVM (U) 7 8 6 7 Average Error Rate 5 4 Average Error Rate 6 5 4 3 3 2 2 1 1 2 4 8 16 32 64 128 Number of Labeled Examples 2 4 8 16 32 64 128 Number of Labeled Examples TTI-C Talk September 14, 24 p.26

Experiments : Speech Error Rate (unlabeled set) Error Rates (test set) 28 26 24 22 2 18 16 14 35 3 25 2 RLS vs LapRLS RLS LapRLS 1 2 3 Labeled Speaker RLS vs LapRLS RLS LapRLS 1 2 3 Labeled Speaker Error Rates (unlabeled set) Error Rates (test set) 4 35 3 25 2 15 4 35 3 25 2 SVM vs TSVM vs LapSVM SVM TSVM LapSVM 1 2 3 Labeled Speaker SVM vs TSVM vs LapSVM SVM TSVM LapSVM 1 2 3 Labeled Speaker TTI-C Talk September 14, 24 p.27

Experiments : Speech 3 RLS 3 LapRLS Error Rate (Test) 25 2 Error Rate (Test) 25 2 Experiment 1 Experiment 2 15 15 2 25 3 Error Rate (Unlabeled) Experiment 1 Experiment 2 15 15 2 25 3 Error Rate (Unlabeled) 3 SVM 3 LapSVM Error Rate (Test) 25 2 Error Rate (Test) 25 2 Experiment 1 Experiment 2 15 15 2 25 3 Error Rate (Unlabeled) Experiment 1 Experiment 2 15 15 2 25 3 Error Rate (Unlabeled) TTI-C Talk September 14, 24 p.28

Experiments : Text Method PRBEP Error k-nn 73.2 13.3 SGT 86.2 6.2 Naive-Bayes 12.9 Cotraining 6.2 SVM 76.39 (5.6) 1.41 (2.5) TSVM 88.15 (1.) 5.22 (.5) LapSVM 87.73 (2.3) 5.41 (1.) RLS 73.49 (6.2) 11.68 (2.7) LapRLS 86.37 (3.1) 5.99 (1.4) TTI-C Talk September 14, 24 p.29

Experiments : Text Performance of RLS, LapRLS Performance of SVM, LapSVM 85 85 8 8 PRBEP 75 7 PRBEP 75 7 65 6 rls (U) rls (T) laprls (U) laprls (T) 65 6 svm (U) svm (T) lapsvm (U) lapsvm (T) 2 4 8 16 32 64 2 4 8 16 32 64 Number of Labeled Examples LapSVM performance (Unlabeled) Number of Labeled Examples LapSVM performance (Test) 88 86 86 84 PRBEP 84 82 PRBEP 82 8 8 U=779 l U=35 U=15 78 U=779 l U=35 U=15 2 4 8 16 32 64 2 4 8 16 32 64 Number of Labeled Examples Number of Labeled Examples TTI-C Talk September 14, 24 p.3

Future Work Generalization as a function of labeled and unlabeled examples. Additional Structure : Structured Outputs, Invariances Active Learning, Feature Selection Efficient Algorithms : Linear Methods, Sparse Solutions Applications : Bioinformatics, Text, Speech, Vision,... TTI-C Talk September 14, 24 p.31