Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Similar documents
What is semi-supervised learning?

How to learn from very few examples?

Discrete vs. Continuous: Two Sides of Machine Learning

A Regularization Framework for Learning from Graph Data

Analysis of Spectral Kernel Design based Semi-supervised Learning

Cluster Kernels for Semi-Supervised Learning

Active and Semi-supervised Kernel Classification

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Semi-Supervised Learning

Graphs, Geometry and Semi-supervised Learning

Semi-Supervised Learning with Graphs

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Regularization on Discrete Spaces

1 Graph Kernels by Spectral Transforms

Semi-Supervised Learning on Riemannian Manifolds

Generalized Optimization Framework for Graph-based Semi-supervised Learning

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Semi-Supervised Learning on Riemannian Manifolds

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Semi-Supervised Learning with the Graph Laplacian: The Limit of Infinite Unlabelled Data

Graphs in Machine Learning

Graph-Based Semi-Supervised Learning

Limits of Spectral Clustering

An Iterated Graph Laplacian Approach for Ranking on Manifolds

EECS 275 Matrix Computation

Manifold Coarse Graining for Online Semi-supervised Learning

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Global vs. Multiscale Approaches

Efficient Iterative Semi-Supervised Classification on Manifold

Kernel methods for comparing distributions, measuring dependence

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions

Partially labeled classification with Markov random walks

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Semi-Supervised Classification with Universum

Similarity and kernels in machine learning

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Random Walks, Random Fields, and Graph Kernels

Semi-Supervised Learning through Principal Directions Estimation

CS6220: DATA MINING TECHNIQUES

Manifold Regularization

Improving Semi-Supervised Target Alignment via Label-Aware Base Kernels

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Clustering and efficient use of unlabeled examples

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning with Consistency between Inductive Functions and Kernels

A graph based approach to semi-supervised learning

Appendix to: On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

CS6220: DATA MINING TECHNIQUES

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Graphs in Machine Learning

Higher Order Learning with Graphs

Semi-Supervised Learning by Multi-Manifold Separation

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Spectral Feature Selection for Supervised and Unsupervised Learning

Manifold Denoising. Matthias Hein Markus Maier Max Planck Institute for Biological Cybernetics Tübingen, Germany

One-class Label Propagation Using Local Cone Based Similarity

Laplacian Spectrum Learning

Graph Quality Judgement: A Large Margin Expedition

Predicting Graph Labels using Perceptron. Shuang Song

Data dependent operators for the spatial-spectral fusion problem

Spectral Clustering. Guokun Lai 2016/10

Online Social Networks and Media. Link Analysis and Web Search

Large-Scale Graph-Based Semi-Supervised Learning via Tree Laplacian Solver

Manifold Regularization

Learning gradients: prescriptive models

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Non-linear Dimensionality Reduction

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega

Unsupervised dimensionality reduction

Noise Robust Spectral Clustering

Iterative Laplacian Score for Feature Selection

Online Manifold Regularization: A New Learning Setting and Empirical Study

Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances

Relational Learning with Gaussian Processes

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Semi-supervised Learning using Sparse Eigenfunction Bases

Information Extraction from Text

Convex Methods for Transduction

Semi Supervised Distance Metric Learning

Graph-based Semi-supervised Learning: Realizing Pointwise Smoothness Probabilistically

Graph Metrics and Dimension Reduction

ECS289: Scalable Machine Learning

Kernel Methods in Machine Learning

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Lecture: Some Practical Considerations (3 of 4)

CS249: ADVANCED DATA MINING

Fast Graph Laplacian Regularized Kernel Learning via Semidefinite Quadratic Linear Programming

An RKHS for Multi-View Learning and Manifold Co-Regularization

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Discriminative K-means for Clustering

Fast Krylov Methods for N-Body Learning

ECS289: Scalable Machine Learning

SEMI-SUPERVISED classification, which estimates a decision

Transcription:

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Examples Input space X, and output space Y = {1, 1}. Training set S = {z 1 = (x 1, y 1 ),..., z l = (x l, y l )} in Z = X Y drawn i.i.d. from some unknown distribution. Classifier f : X Y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 2/31

Transductive Setting Input space X = {x 1,..., x n }, and output space Y = {1, 1}. Training set S = {z 1 = (x 1, y 1 ),..., z l = (x l, y l )}. Classifier f : X Y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 3/31

Intuition about classification: Manifold Local consistency. Nearby points are likely to have the same label. Global consistency. Points on the same structure (typically referred to as a cluster or manifold) are likely to have the same label. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 4/31

A Toy Dataset (Two Moons) 1.5 1 (a) Toy Data (Two Moons) unlabeled point labeled point 1 labeled point +1 1.5 1 (b) SVM (RBF Kernel) 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 (c) k NN 1.5 (d) Ideal Classification 1 1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 5/31

Algorithm 1. Form the affinity matrix W defined by W ij = exp( x i x j 2 /2σ 2 ) if i j and W ii = 0. 2. Construct the matrix S = D 1/2 W D 1/2 in which D is a diagonal matrix with its (i, i)-element equal to the sum of the i-th row of W. 3. Iterate f(t + 1) = αsf(t) + (1 α)y until convergence, where α is a parameter in (0, 1). 4. Let f denote the limit of the sequence {f(t)}. Label each point x i as y i = sgn(f i ). Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 6/31

Convergence Theorem. The sequence {f(t)} converges to f = β(i αs) 1 y, where β = 1 α. Proof. Suppose F (0) = Y. By the iteration equation, we have t 1 f(t) = (αs) t 1 Y + (1 α) (αs) i Y. (1) i=0 Since 0 < α < 1 and the eigenvalues of S in [ 1, 1], lim t (αs)t 1 = 0, and lim t t 1 i=0 (αs) i = (I αs) 1. (2) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 7/31

Regularization Framework Cost function Q(f) = 1 [ n 2 W ij ( 1 Dii f i 1 Djj f j ) 2 + µ n ( fi y i ) 2 ] i,j=1 i=1 Smoothness term. Measure the changes between nearby points. Fitting term. Measure the changes from the initial label assignments. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 8/31

Regularization Framework Theorem. f = arg min f F Q(f). Proof. Differentiating Q(f) with respect to f, we have Q f = f Sf + µ(f y) = 0, (1) f=f which can be transformed into f 1 1 + µ Sf µ y = 0. (2) 1 + µ Let α = 1/(1 + µ) and β = µ/(1 + µ). Then (I αs)f = βy. (3) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 9/31

Two Variants Substitute P = D 1 W for S in the iteration equation. Then f = (I αp ) 1 y. Replace S with P T, the transpose of P. Then f = (I αp T ) 1 y, which is equivalent to f = (D αw ) 1 y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 10/31

Toy Problem 1.5 (a) t = 10 1.5 (b) t = 50 1 1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 (c) t = 100 1.5 (d) t = 400 1 1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 11/31

Toy Problem Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 12/31

Handwritten Digit Recognition (USPS) 0.45 0.4 0.35 k NN (k = 1) SVM (RBF kernel) consistency variant (1) variant (2) 0.3 test error 0.25 0.2 0.15 0.1 0.05 10 20 30 40 50 60 70 80 90 100 # labeled points Dimension: 16x16. Size: 9298. (α = 0.95) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 13/31

Handwritten Digit Recognition (USPS) 0.45 0.4 consistency variant (1) variant (2) 0.35 0.3 test error 0.25 0.2 0.15 0.1 0.05 0.7 0.75 0.8 0.85 0.9 0.95 0.99 values of parameter α Size of labeled data: l = 50. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 14/31

Text Classification (20-newsgroups) 0.8 0.7 k NN (k = 1) SVM (RBF kernel) consistency variant (1) variant (2) 0.6 test error 0.5 0.4 0.3 0.2 0.1 10 20 30 40 50 60 70 80 90 100 # labeled points Dimension: 8014. Size: 3970. (α = 0.95) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 15/31

Text Classification (20-newsgroups) 0.5 0.45 consistency variant (1) variant (2) 0.4 test error 0.35 0.3 0.25 0.2 0.7 0.75 0.8 0.85 0.9 0.95 0.99 values of parameter α Size of labeled data: l = 50. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 16/31

Spectral Graph Theory Normalized graph Laplacian = D 1/2 (D W )D 1/2. Linear operator on the space of functions defined on the Graph. Theorem. i,j W ij ( ) 2 1 Dii f i 1 f j = f, f. Djj Discrete analogy of Laplace-Beltrami operator on Riemannian Manifold which satisfies f 2 = (f)f. M M Discrete Laplace equation f = y. Green s function G =. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 17/31

Reversible Markov Chains Lazy random walk defined by the transition probability matrix P = (1 α)i + αd 1 W, α (0, 1). Hitting time H ij = E{ number of steps required for a random walk to reach a position x j with an initial position x i }. Commute time C ij = H ij + H ji. Theorem. Let L = (D αw ) 1. Then C ij L ii + L jj L ij L ji Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 18/31

Ranking Problem Problem setting. Given a set of point X = {x 1,..., x q, x q+1,..., x n } R m, the first q points are the queries. The task is to rank the remaining points according to their relevances to the queries. Examples. Image, document, movie, book, protein ("killer application"),... Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 19/31

Intuition of Ranking: Manifold (a) Two moons ranking problem (b) Ideal ranking query The relevant degrees of points in the upper moon to the query should decrease along the moon shape. All points in the upper moon should be more relevant to the query than the points in the lower moon. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 20/31

Toy Ranking (a) Connected graph Simply ranking the data according to the shortest paths on the graph does not work well. Robust solution is to assemble all paths between two points: f = i αi S i y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 21/31

Toy Ranking Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 22/31

Connection to Google Theorem. For the task of ranking data represented by a connected and undirected graph without queries, f and PageRank yield the same ranking list. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 23/31

Personalized Google: a variant The ranking scores given by PageRank: π(t + 1) = αp T π(t). (4) Add a query term on the right-hand side for the query-based ranking, π(t + 1) = αp T π(t) + (1 α)y. (5) This can be viewed as the personalized version of PageRank. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 24/31

Image Ranking The top-left digit in each panel is the query. The left panel shows the top 99 by our method; and the right panel shows the top 99 by the Euclidean distance. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 25/31

Document Ranking 0.9 (a) autos 0.9 (b) motorcycles 0.8 0.8 inner product 0.7 0.6 0.5 inner product 0.7 0.6 0.5 0.4 0.4 inner product 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 manifold ranking 0.9 0.8 0.7 0.6 0.5 0.4 (c) baseball 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 manifold ranking inner product 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 manifold ranking 0.9 0.8 0.7 0.6 0.5 0.4 (d) hockey 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 manifold ranking Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 26/31

Related Work Graph/disffusion/cluster kernel (Kondor et al 2002; Chapelle et al. 2002; Smola et al. 2003). Spectral clustering (Shi et al. 1997; Ng et al. 2001). Manifold learning (nonlinear data reduction)(tenenbaum et al. 2000; Roweis et al. 2000) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 27/31

Related Work Random walks (Szummer et al. 2001). Graph min-cuts (Blum et al. 2001) Learning on manifolds (Belkin et al. 2001). Gaussian random fields (Zhu et al. 2003). Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 28/31

Conclusion Proposed a general semi-supervised learning algorithm. Proposed a general example-based ranking algorithm. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 29/31

Next Work Model selection. Active learning. Generalization theory of learning from labeled and unlabeled data. Specifical problems & large-scale problems. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 30/31

References 1. Zhou, D., Bousquet, O., Lal, T.N., Weston, J. and Schölkopf, B.: Learning with Local and Global Consistency. NIPS, 2003. 2. Zhou, D., Weston, J., Gretton, A., Bousquet, O. and Schölkopf, B.: Ranking on Data Manifolds. NIPS, 2003. 3. Weston, J., C. Leslie, D. Zhou, A. Elisseeff and W. S. Noble: Semi-Supervised Protein Classification using Cluster Kernels. NIPS, 2003. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 31/31