Automatic Feature Decomposition for Single View Co-training

Size: px

Start display at page:

Download "Automatic Feature Decomposition for Single View Co-training"

Darcy Murphy
5 years ago
Views:

1 Automatic Feature Decomposition for Single View Co-training Minmin Chen, Kilian Weinberger, Yixin Chen Computer Science and Engineering Washington University in Saint Louis Minmin Chen, Kilian Weinberger, Yixin Chen 1

2 Motivation Motivation and background Motivation Co-training and its limitation What if your classifier could search the web - and use the results to improve its accuracy? Minmin Chen, Kilian Weinberger, Yixin Chen 2

3 Caltech-256 Object Recognition Motivation Co-training and its limitation Americanflag Basketballhoop Hotairballoon AK47? Frog Cake Beermug Eiffeltower Hawksbill Problem : manual labeling is expensive! Minmin Chen, Kilian Weinberger, Yixin Chen 3

4 Weakly labeled web data Motivation Co-training and its limitation Tons of images are available online, and can be retrieved free of charge. Americanflag Basketballhoop Hotairballoon Americanflag Basketballhoop Hotairballoon AK47 Frog Cake Beermug Eiffeltower Hawksbill Beermug Eiffeltower Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 4

5 Free, but noisy Motivation and background Motivation Co-training and its limitation Web retrieved images are both visually and semantically less coherent. Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 5

6 Naive combination Motivation and background Motivation Co-training and its limitation The weakly-labeled images are noisy enough to be harmful. 4 Caltech256 with weakly labeled web images 35 3 Accuracy (%) SVM t : SVM on Caltech 256 only (Bergamo, NIPS21) SVM t s : SVM on Caltech 256 and web images (Bergamo, NIPS21) Number of target training images Minmin Chen, Kilian Weinberger, Yixin Chen 6

7 Cherry pick the good ones Motivation Co-training and its limitation What if we can cherry pick the good ones? Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 7

8 Co-training Motivation and background Motivation Co-training and its limitation One of the most successful semi-supervised learning algorithms is co-training (Blum et. al 1998) for multi-view data. Won 1 years best paper award at ICML 28; Applied to many applications across computer science and beyond (Collins & Singer, 1999; Nigam & Ghani, 2; Ghani, 21; Levin et al., 23; Brefeld & Scheffer, 24; Chan et al., 24). Three conditions for co-training to work: Class-conditionally independent multi-view; Two good classifiers; Two classifiers confident on different inputs. Minmin Chen, Kilian Weinberger, Yixin Chen 8

9 Limitations Motivation and background Motivation Co-training and its limitation Most real datasets only have ONE view. Current state-of-art: Manual feature splitting (Blum & Mitchell, 1998, Brefeld & Scheffer 24); Random feature splitting (Nigam & Ghani, 2, Chan et al. 24); Greedy algorithms (Abney 22, Zhang & Zheng 29). Minmin Chen, Kilian Weinberger, Yixin Chen 9

10 Method Motivation and background Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings How can we use co-training on single view data? Minmin Chen, Kilian Weinberger, Yixin Chen 1

11 Method Motivation and background Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings How can we use co-training on single view data? Artificially create multi-view for co-training by solving a single optimization problem. Minmin Chen, Kilian Weinberger, Yixin Chen 1

12 Three conditions for co-training to work Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 1: both classifiers are good Make sure both classifiers suffer low loss by minimizing the maximum of the two loss, min u,v max[l(u;l),l(v;l)]. Minmin Chen, Kilian Weinberger, Yixin Chen 11

Three conditions for co-training to work Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 1: both classifiers are good

13 Three conditions for co-training to work Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 1: both classifiers are good Make sure both classifiers suffer low loss by minimizing the maximum of the two loss, min u,v max[l(u;l),l(v;l)]. Execution Softmax approximation min u,v log (e l(u;l) +e l(v;l) ). Agnostic to the specific choice of loss (logistic regression) l(u;l) = (x,y) L log(1+e u xy ). Minmin Chen, Kilian Weinberger, Yixin Chen 11

14 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 2: both classifiers use different features X (1) X (2) dfeature es Xu v Make sure each feature is used by exactly one classifier, u i v i =, i = 1,,d Minmin Chen, Kilian Weinberger, Yixin Chen 12

15 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 2: both classifiers use different features X (1) X (2) dfeature es Xu v Make sure each feature is used by exactly one classifier, u i v i =, i = 1,,d Execution Square and sum over all features d u 2 ivi 2 = i=1 Minmin Chen, Kilian Weinberger, Yixin Chen 12

16 Class-conditionally independent? Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Class-conditionally independent stringent ǫ-expanding (Balcan et. al, 24) ǫ-expanding X = X 1 X 2, D is ǫ-expanding with respect to the hypothesis class H 1 H 2 if for any two classifiers h 1 H 1,h 2 H 2, the following statement holds Pr(S 1 S 2 ) + Pr(S 1 S 2 ) ǫmin[ Pr(S 1 S 2 ), Pr(S 1 S 2 ) ] S1S 2 S S 2 S 1 S 2 S S where S 1 denotes the event that a sample x = (x 1,x 2 ) D falls into the confident set of h 1 - similarly for S 2. If the expanding assumption holds, co-training will succeed given appropriately strong PAC-learning algorithms on each feature set. Minmin Chen, Kilian Weinberger, Yixin Chen 13

17 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 3: both classifiers make different confident predictions D is ǫ-expanding w.r.t. h u,h v. Pr(S u S v ) + Pr(S u S v ) ǫmin[ Pr(S u S v ), Pr(S u S v ) ] Minmin Chen, Kilian Weinberger, Yixin Chen 14

18 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 3: both classifiers make different confident predictions D is ǫ-expanding w.r.t. h u,h v. Pr(S u S v ) + Pr(S u S v ) ǫmin[ Pr(S u S v ), Pr(S u S v ) ] Execution Confidence indicator { 1 if Pr(hu(x) x;u) > τ c u(x) = otherwise. Approximation of ǫ-expandability. c u(x) c v(x) + c u(x)c v(x) ǫmin[ c u(x)c v(x), c u(x) c v(x) ] x U x U x U x U }{{}}{{}}{{}}{{} Pr(S u S v) Pr(S u S v) Pr(S u S v) Pr(S u S v) Minmin Chen, Kilian Weinberger, Yixin Chen 14

19 Pseudo multi-view decomposition (PMD) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Optimization problem log (e l(u;l) +e l(v;l)) min u,v subject to: d (1) u 2 ivi 2 = i=1 (2) [c u (x) c v (x)+ c u (x)c v (x)] [ ] x U ǫmin u (x)c v x Uc (x), c u (x) c v (x) x U Optimize with Augmented Lagrangian method (Bertsekas, 1999). Minmin Chen, Kilian Weinberger, Yixin Chen 15

20 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Minmin Chen, Kilian Weinberger, Yixin Chen 16

21 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Find u, v by solving PMD onlandu Minmin Chen, Kilian Weinberger, Yixin Chen 16

22 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Find u, v by solving PMD onlandu h u T T x signx u, h x signx v Apply h x, h u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16

23 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Move up-to l confident inputs fromutol Exist confident predictions? Find h u u, v Apply h x, h by solving PMD onlandu T T x signx u, h x signx v u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16

settings Input: Labeled set L Unlabeled setu Move up-to l confident inputs

h x h u v x Find h u u, v Apply h x, h by solving PMD onlandu T T x signx

24 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Move up-to l confident inputs fromutol Exist confident Output final classifier: predictions? h x h u v x Find h u u, v Apply h x, h by solving PMD onlandu T T x signx u, h x signx v u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16

25 f Motivation and background Extension to multi-class setting Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 4: Same decomposition across different classes Multi-class: Y = {1,2,,K} U=[u 1,...,u K ] R d K and V=[v 1,...,v K ] R d K U df features Kclasses Kclasses u f Kclasses u u df features V U K ud df features Kclasses u 1 u K K ud u 1 K 1 u1 u 2 d i1 k1 2 k U 2 U,1 K i 2 Minmin Chen, Kilian Weinberger, Yixin Chen 17

26 f Motivation and background Extension to multi-class setting Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 4: Same decomposition across different classes Multi-class: Y = {1,2,,K} U=[u 1,...,u K ] R d K and V=[v 1,...,v K ] R d K U df features Kclasses Kclasses Kclasses Execution u u u f Add L 2,1 normalization ( min log e l(u;l) +e l(v;l)) +λ( U 2,1 + V 2,1) U,V df features V U K ud df features Kclasses u 1 u K K ud u 1 K 1 u1 u 2 Subject to constraints as before. d i1 k1 2 k U 2 U,1 K i Minmin Chen, Kilian Weinberger, Yixin Chen 17 2

27 Experimental results Motivation and background Paired-digit set Caltech 256 object recognition data set Toy dataset: Paired handwritten digit set (class-conditionally independent) Minmin Chen, Kilian Weinberger, Yixin Chen 18

28 Experimental results Motivation and background Paired-digit set Caltech 256 object recognition data set Toy dataset: Paired handwritten digit set (class-conditionally independent) Caltech-256 and weakly labeled web images Minmin Chen, Kilian Weinberger, Yixin Chen 18

29 Sanity check on PMD Paired digit set: conditionally independent views Paired-digit set Caltech 256 object recognition data set We solve PMD for u and v, starting with a random initialization; Their non-zero weights are divided almost exactly into the two class-conditionally independent feature sets. label=+1 label=+1label=1 label= 1 l(u; X, y): l(v; X, y): u v 4 2 l(u *, X, y): 5.543e 11 l(v *, X, y): 1.898e 11 2 L = 1 4 U = 15 u * v * 6 8 Minmin Chen, Kilian Weinberger, Yixin Chen 19

30 Sanity check on PMC Paired digit set: conditionally independent views Paired-digit set Caltech 256 object recognition data set As the confident set expands, the automatically found feature splits vary between PMC iterations, and gradually approximates the class-conditional feature split. u v 1 L = 2 U = Minmin Chen, Kilian Weinberger, Yixin Chen 2

31 Paired-digit set Caltech 256 object recognition data set Experimental results of PMC on paired digit set Test Baseline RFS ICA-RFS PMC Err(%) Mean STD Test Error (%) Iteration h u h v h u+v baseline L = 2, U = 15 Baseline: a logistic regression trained exclusively on L; RFS: co-training with random feature split; ICA-RFS: co-training with random feature split on ICs; Minmin Chen, Kilian Weinberger, Yixin Chen 21

32 Paired-digit set Caltech 256 object recognition data set Exploit weakly labeled web data to improve object recognition Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 22

33 Paired-digit set Caltech 256 object recognition data set Experimental results of PMC on Caltech-256 PMC outperforms all other algorithms by a visible margin across all training set sizes. Caltech256 with weakly labeled web images 4 35 L = 5 5 per class U = 3 per class Accuracy (%) 3 25 McPMC McLR t SVM t s (Bergamo) RFS 2 SVM t (Bergamo) DWSVM (Bergamo) TSVM (Bergamo) Number of target training images Minmin Chen, Kilian Weinberger, Yixin Chen 23

34 Image re-ranking Motivation and background Paired-digit set Caltech 256 object recognition data set PMC can potentially be used for image re-ranking of search engines. Target training examples Positive examples Negative examples Minmin Chen, Kilian Weinberger, Yixin Chen 24

35 Motivation and background Introduced, a framework for co-training on single-view data. Incorporated the three conditions for co-training to succeed explicitly as an optimization problem; Solved the optimization problem to discover the decomposition; Combined with co-training to utilize unlabeled data to improve performance. Demonstrated the efficacy of our method on the challenging Caltech-256 object recognition task. Showed potential for improving web search ranking. Minmin Chen, Kilian Weinberger, Yixin Chen 25

arxiv: v1 [cs.lg] 15 Aug 2017

arxiv: v1 [cs.lg] 15 Aug 2017 Theoretical Foundation of Co-Training and Disagreement-Based Algorithms arxiv:1708.04403v1 [cs.lg] 15 Aug 017 Abstract Wei Wang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing