What is semi-supervised learning?

Similar documents
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

How to learn from very few examples?

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning

Discrete vs. Continuous: Two Sides of Machine Learning

Manifold Coarse Graining for Online Semi-supervised Learning

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Jeff Howbert Introduction to Machine Learning Winter

CS 6375 Machine Learning

9.520 Problem Set 2. Due April 25, 2011

A graph based approach to semi-supervised learning

Dimensionality Reduction

The role of dimensionality reduction in classification

Manifold Regularization

Graphs in Machine Learning

Discriminative Direction for Kernel Classifiers

Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011

Analysis of Spectral Kernel Design based Semi-supervised Learning

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Semi-Supervised Learning of Speech Sounds

Manifold Regularization

Semi-Supervised Learning

Clustering and efficient use of unlabeled examples

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Semi-Supervised Learning in Gigantic Image Collections. Rob Fergus (New York University) Yair Weiss (Hebrew University) Antonio Torralba (MIT)

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Multiple Kernel Learning

SEMI-SUPERVISED classification, which estimates a decision

Online Manifold Regularization: A New Learning Setting and Empirical Study

Similarity and kernels in machine learning

Semi-Supervised Classification with Universum

Does Unlabeled Data Help?

A Regularization Framework for Learning from Graph Data

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Semi-supervised Learning

Lecture 2: Linear Algebra Review

Large Scale Semi-supervised Linear SVMs. University of Chicago

LABELED data is expensive to obtain in terms of both. Laplacian Embedded Regression for Scalable Manifold Regularization

Classifier Complexity and Support Vector Classifiers

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Presented by. Committee

COMS 4721: Machine Learning for Data Science Lecture 20, 4/11/2017

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Semi-Supervised Learning with Graphs

Batch Mode Sparse Active Learning. Lixin Shi, Yuhang Zhao Tsinghua University

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

Introduction to Machine Learning

Foundations of Machine Learning

1 Graph Kernels by Spectral Transforms

Active and Semi-supervised Kernel Classification

Partially labeled classification with Markov random walks

Learning gradients: prescriptive models

L11: Pattern recognition principles

ABC-LogitBoost for Multi-Class Classification

6.036 midterm review. Wednesday, March 18, 15

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning by Multi-Manifold Separation

Efficient Iterative Semi-Supervised Classification on Manifold

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

When Dictionary Learning Meets Classification

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

CS798: Selected topics in Machine Learning

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

ECS289: Scalable Machine Learning

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Support Vector Machine (SVM) and Kernel Methods

A Least Squares Formulation for Canonical Correlation Analysis

LEC 3: Fisher Discriminant Analysis (FDA)

COMS 4771 Introduction to Machine Learning. Nakul Verma

Graph-Based Semi-Supervised Learning

Semi-Supervised Learning on Riemannian Manifolds

Chemometrics: Classification of spectra

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Advanced Introduction to Machine Learning

Models, Data, Learning Problems

Supervised locally linear embedding

Random Matrices in Machine Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Support Vector Machines: Kernels

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Graphs, Geometry and Semi-supervised Learning

Support Vector Machines for Classification: A Statistical Portrait

A PROBABILISTIC INTERPRETATION OF SAMPLING THEORY OF GRAPH SIGNALS. Akshay Gadde and Antonio Ortega

Support Vector Machine (continued)

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Unlabeled Data: Now It Helps, Now It Doesn t

Graphs in Machine Learning

Examples are not Enough, Learn to Criticize! Criticism for Interpretability

Convex Methods for Transduction

An RKHS for Multi-View Learning and Manifold Co-Regularization

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Machine learning for pervasive systems Classification in high-dimensional spaces

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Transcription:

What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing, bioinformatics Semi-supervised Learning: learning from a combination of both labeled and unlabeled data

Comparing Supervised learning algorithms require enough labeled training data to learn reasonably accurate classifiers. Unsupervised learning methods are employed to discover structure in unlabeled data Semi-supervised learning allows taking advantage of the strengths of both

Why should it be useful? Unlabeled data can help in two different ways Identify data structure Find a meaningful representation of complicated high dimensional data through a first unsupervised learning step. Cluster assumption which can be stated in two equivalent ways: Two points which can be connected by a high density path (i.e. in the same cluster) are likely to be of the same label. Decision boundary should lie in a low density region.

A Toy Dataset (Two Moons)

Learning from Examples Input space X, and output space Y = {1, 1}. Training set S = {z 1 = (x 1, y 1 ),..., z l = (x l, y l )} in Z = X Y drawn i.i.d. from some unknown distribution. Classifier f : X Y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 2/31

Transductive Setting Input space X = {x 1,..., x n }, and output space Y = {1, 1}. Training set S = {z 1 = (x 1, y 1 ),..., z l = (x l, y l )}. Classifier f : X Y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 3/31

Intuition about classification: Manifold Local consistency. Nearby points are likely to have the same label. Global consistency. Points on the same structure (typically referred to as a cluster or manifold) are likely to have the same label. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 4/31

Algorithm 1. Form the affinity matrix W defined by W ij = exp( x i x j 2 /2σ 2 ) if i j and W ii = 0. 2. Construct the matrix S = D 1/2 W D 1/2 in which D is a diagonal matrix with its (i, i)-element equal to the sum of the i-th row of W. 3. Iterate f(t + 1) = αsf(t) + (1 α)y until convergence, where α is a parameter in (0, 1). 4. Let f denote the limit of the sequence {f(t)}. Label each point x i as y i = sgn(f i ). Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 6/31

Convergence Theorem. The sequence {f(t)} converges to f = β(i αs) 1 y, where β = 1 α. Proof. Suppose F (0) = Y. By the iteration equation, we have t 1 f(t) = (αs) t 1 Y + (1 α) (αs) i Y. (1) i=0 Since 0 < α < 1 and the eigenvalues of S in [ 1, 1], lim t (αs)t 1 = 0, and lim t t 1 i=0 (αs) i = (I αs) 1. (2) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 7/31

Regularization Framework Cost function Q(f) = 1 [ n 2 W ij ( 1 Dii f i 1 Djj f j ) 2 + µ n ( fi y i ) 2 ] i,j=1 i=1 Smoothness term. Measure the changes between nearby points. Fitting term. Measure the changes from the initial label assignments. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 8/31

Regularization Framework Theorem. f = arg min f F Q(f). Proof. Differentiating Q(f) with respect to f, we have Q f = f Sf + µ(f y) = 0, (1) f=f which can be transformed into f 1 1 + µ Sf µ y = 0. (2) 1 + µ Let α = 1/(1 + µ) and β = µ/(1 + µ). Then (I αs)f = βy. (3) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 9/31

Two Variants Substitute P = D 1 W for S in the iteration equation. Then f = (I αp ) 1 y. Replace S with P T, the transpose of P. Then f = (I αp T ) 1 y, which is equivalent to f = (D αw ) 1 y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 10/31

Toy Problem 1.5 (a) t = 10 1.5 (b) t = 50 1 1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 (c) t = 100 1.5 (d) t = 400 1 1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 1.5 1.5 1 0.5 0 0.5 1 1.5 2 2.5 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 11/31

Toy Problem Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 12/31

Handwritten Digit Recognition (USPS) 0.45 0.4 0.35 k NN (k = 1) SVM (RBF kernel) consistency variant (1) variant (2) 0.3 test error 0.25 0.2 0.15 0.1 0.05 10 20 30 40 50 60 70 80 90 100 # labeled points Dimension: 16x16. Size: 9298. (α = 0.95) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 13/31

Handwritten Digit Recognition (USPS) 0.45 0.4 consistency variant (1) variant (2) 0.35 0.3 test error 0.25 0.2 0.15 0.1 0.05 0.7 0.75 0.8 0.85 0.9 0.95 0.99 values of parameter α Size of labeled data: l = 50. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 14/31

Text Classification (20-newsgroups) 0.8 0.7 k NN (k = 1) SVM (RBF kernel) consistency variant (1) variant (2) 0.6 test error 0.5 0.4 0.3 0.2 0.1 10 20 30 40 50 60 70 80 90 100 # labeled points Dimension: 8014. Size: 3970. (α = 0.95) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 15/31

Text Classification (20-newsgroups) 0.5 0.45 consistency variant (1) variant (2) 0.4 test error 0.35 0.3 0.25 0.2 0.7 0.75 0.8 0.85 0.9 0.95 0.99 values of parameter α Size of labeled data: l = 50. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 16/31