What is semi-supervised learning?

Size: px

Start display at page:

Download "What is semi-supervised learning?"

Lynne Lindsey
5 years ago
Views:

1 What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing, bioinformatics Semi-supervised Learning: learning from a combination of both labeled and unlabeled data

2 Comparing Supervised learning algorithms require enough labeled training data to learn reasonably accurate classifiers. Unsupervised learning methods are employed to discover structure in unlabeled data Semi-supervised learning allows taking advantage of the strengths of both

3 Why should it be useful? Unlabeled data can help in two different ways Identify data structure Find a meaningful representation of complicated high dimensional data through a first unsupervised learning step. Cluster assumption which can be stated in two equivalent ways: Two points which can be connected by a high density path (i.e. in the same cluster) are likely to be of the same label. Decision boundary should lie in a low density region.

4 A Toy Dataset (Two Moons)

5 Learning from Examples Input space X, and output space Y = {1, 1}. Training set S = {z 1 = (x 1, y 1 ),..., z l = (x l, y l )} in Z = X Y drawn i.i.d. from some unknown distribution. Classifier f : X Y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 2/31

6 Transductive Setting Input space X = {x 1,..., x n }, and output space Y = {1, 1}. Training set S = {z 1 = (x 1, y 1 ),..., z l = (x l, y l )}. Classifier f : X Y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 3/31

7 Intuition about classification: Manifold Local consistency. Nearby points are likely to have the same label. Global consistency. Points on the same structure (typically referred to as a cluster or manifold) are likely to have the same label. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 4/31

8 Algorithm 1. Form the affinity matrix W defined by W ij = exp( x i x j 2 /2σ 2 ) if i j and W ii = Construct the matrix S = D 1/2 W D 1/2 in which D is a diagonal matrix with its (i, i)-element equal to the sum of the i-th row of W. 3. Iterate f(t + 1) = αsf(t) + (1 α)y until convergence, where α is a parameter in (0, 1). 4. Let f denote the limit of the sequence {f(t)}. Label each point x i as y i = sgn(f i ). Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 6/31

9 Convergence Theorem. The sequence {f(t)} converges to f = β(i αs) 1 y, where β = 1 α. Proof. Suppose F (0) = Y. By the iteration equation, we have t 1 f(t) = (αs) t 1 Y + (1 α) (αs) i Y. (1) i=0 Since 0 < α < 1 and the eigenvalues of S in [ 1, 1], lim t (αs)t 1 = 0, and lim t t 1 i=0 (αs) i = (I αs) 1. (2) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 7/31

10 Regularization Framework Cost function Q(f) = 1 [ n 2 W ij ( 1 Dii f i 1 Djj f j ) 2 + µ n ( fi y i ) 2 ] i,j=1 i=1 Smoothness term. Measure the changes between nearby points. Fitting term. Measure the changes from the initial label assignments. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 8/31

11 Regularization Framework Theorem. f = arg min f F Q(f). Proof. Differentiating Q(f) with respect to f, we have Q f = f Sf + µ(f y) = 0, (1) f=f which can be transformed into f µ Sf µ y = 0. (2) 1 + µ Let α = 1/(1 + µ) and β = µ/(1 + µ). Then (I αs)f = βy. (3) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 9/31

12 Two Variants Substitute P = D 1 W for S in the iteration equation. Then f = (I αp ) 1 y. Replace S with P T, the transpose of P. Then f = (I αp T ) 1 y, which is equivalent to f = (D αw ) 1 y. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 10/31

13 Toy Problem 1.5 (a) t = (b) t = (c) t = (d) t = Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 11/31

14 Toy Problem Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 12/31

15 Handwritten Digit Recognition (USPS) k NN (k = 1) SVM (RBF kernel) consistency variant (1) variant (2) 0.3 test error # labeled points Dimension: 16x16. Size: (α = 0.95) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 13/31

16 Handwritten Digit Recognition (USPS) consistency variant (1) variant (2) test error values of parameter α Size of labeled data: l = 50. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 14/31

17 Text Classification (20-newsgroups) k NN (k = 1) SVM (RBF kernel) consistency variant (1) variant (2) 0.6 test error # labeled points Dimension: Size: (α = 0.95) Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 15/31

18 Text Classification (20-newsgroups) consistency variant (1) variant (2) 0.4 test error values of parameter α Size of labeled data: l = 50. Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 16/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from