Automatic Feature Decomposition for Single View Co-training

Size: px
Start display at page:

Download "Automatic Feature Decomposition for Single View Co-training"

Transcription

1 Automatic Feature Decomposition for Single View Co-training Minmin Chen, Kilian Weinberger, Yixin Chen Computer Science and Engineering Washington University in Saint Louis Minmin Chen, Kilian Weinberger, Yixin Chen 1

2 Motivation Motivation and background Motivation Co-training and its limitation What if your classifier could search the web - and use the results to improve its accuracy? Minmin Chen, Kilian Weinberger, Yixin Chen 2

3 Caltech-256 Object Recognition Motivation Co-training and its limitation Americanflag Basketballhoop Hotairballoon AK47? Frog Cake Beermug Eiffeltower Hawksbill Problem : manual labeling is expensive! Minmin Chen, Kilian Weinberger, Yixin Chen 3

4 Weakly labeled web data Motivation Co-training and its limitation Tons of images are available online, and can be retrieved free of charge. Americanflag Basketballhoop Hotairballoon Americanflag Basketballhoop Hotairballoon AK47 Frog Cake Beermug Eiffeltower Hawksbill Beermug Eiffeltower Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 4

5 Free, but noisy Motivation and background Motivation Co-training and its limitation Web retrieved images are both visually and semantically less coherent. Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 5

6 Naive combination Motivation and background Motivation Co-training and its limitation The weakly-labeled images are noisy enough to be harmful. 4 Caltech256 with weakly labeled web images 35 3 Accuracy (%) SVM t : SVM on Caltech 256 only (Bergamo, NIPS21) SVM t s : SVM on Caltech 256 and web images (Bergamo, NIPS21) Number of target training images Minmin Chen, Kilian Weinberger, Yixin Chen 6

7 Cherry pick the good ones Motivation Co-training and its limitation What if we can cherry pick the good ones? Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 7

8 Co-training Motivation and background Motivation Co-training and its limitation One of the most successful semi-supervised learning algorithms is co-training (Blum et. al 1998) for multi-view data. Won 1 years best paper award at ICML 28; Applied to many applications across computer science and beyond (Collins & Singer, 1999; Nigam & Ghani, 2; Ghani, 21; Levin et al., 23; Brefeld & Scheffer, 24; Chan et al., 24). Three conditions for co-training to work: Class-conditionally independent multi-view; Two good classifiers; Two classifiers confident on different inputs. Minmin Chen, Kilian Weinberger, Yixin Chen 8

9 Limitations Motivation and background Motivation Co-training and its limitation Most real datasets only have ONE view. Current state-of-art: Manual feature splitting (Blum & Mitchell, 1998, Brefeld & Scheffer 24); Random feature splitting (Nigam & Ghani, 2, Chan et al. 24); Greedy algorithms (Abney 22, Zhang & Zheng 29). Minmin Chen, Kilian Weinberger, Yixin Chen 9

10 Method Motivation and background Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings How can we use co-training on single view data? Minmin Chen, Kilian Weinberger, Yixin Chen 1

11 Method Motivation and background Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings How can we use co-training on single view data? Artificially create multi-view for co-training by solving a single optimization problem. Minmin Chen, Kilian Weinberger, Yixin Chen 1

12 Three conditions for co-training to work Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 1: both classifiers are good Make sure both classifiers suffer low loss by minimizing the maximum of the two loss, min u,v max[l(u;l),l(v;l)]. Minmin Chen, Kilian Weinberger, Yixin Chen 11

13 Three conditions for co-training to work Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 1: both classifiers are good Make sure both classifiers suffer low loss by minimizing the maximum of the two loss, min u,v max[l(u;l),l(v;l)]. Execution Softmax approximation min u,v log (e l(u;l) +e l(v;l) ). Agnostic to the specific choice of loss (logistic regression) l(u;l) = (x,y) L log(1+e u xy ). Minmin Chen, Kilian Weinberger, Yixin Chen 11

14 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 2: both classifiers use different features X (1) X (2) dfeature es Xu v Make sure each feature is used by exactly one classifier, u i v i =, i = 1,,d Minmin Chen, Kilian Weinberger, Yixin Chen 12

15 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 2: both classifiers use different features X (1) X (2) dfeature es Xu v Make sure each feature is used by exactly one classifier, u i v i =, i = 1,,d Execution Square and sum over all features d u 2 ivi 2 = i=1 Minmin Chen, Kilian Weinberger, Yixin Chen 12

16 Class-conditionally independent? Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Class-conditionally independent stringent ǫ-expanding (Balcan et. al, 24) ǫ-expanding X = X 1 X 2, D is ǫ-expanding with respect to the hypothesis class H 1 H 2 if for any two classifiers h 1 H 1,h 2 H 2, the following statement holds Pr(S 1 S 2 ) + Pr(S 1 S 2 ) ǫmin[ Pr(S 1 S 2 ), Pr(S 1 S 2 ) ] S1S 2 S S 2 S 1 S 2 S S where S 1 denotes the event that a sample x = (x 1,x 2 ) D falls into the confident set of h 1 - similarly for S 2. If the expanding assumption holds, co-training will succeed given appropriately strong PAC-learning algorithms on each feature set. Minmin Chen, Kilian Weinberger, Yixin Chen 13

17 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 3: both classifiers make different confident predictions D is ǫ-expanding w.r.t. h u,h v. Pr(S u S v ) + Pr(S u S v ) ǫmin[ Pr(S u S v ), Pr(S u S v ) ] Minmin Chen, Kilian Weinberger, Yixin Chen 14

18 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 3: both classifiers make different confident predictions D is ǫ-expanding w.r.t. h u,h v. Pr(S u S v ) + Pr(S u S v ) ǫmin[ Pr(S u S v ), Pr(S u S v ) ] Execution Confidence indicator { 1 if Pr(hu(x) x;u) > τ c u(x) = otherwise. Approximation of ǫ-expandability. c u(x) c v(x) + c u(x)c v(x) ǫmin[ c u(x)c v(x), c u(x) c v(x) ] x U x U x U x U }{{}}{{}}{{}}{{} Pr(S u S v) Pr(S u S v) Pr(S u S v) Pr(S u S v) Minmin Chen, Kilian Weinberger, Yixin Chen 14

19 Pseudo multi-view decomposition (PMD) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Optimization problem log (e l(u;l) +e l(v;l)) min u,v subject to: d (1) u 2 ivi 2 = i=1 (2) [c u (x) c v (x)+ c u (x)c v (x)] [ ] x U ǫmin u (x)c v x Uc (x), c u (x) c v (x) x U Optimize with Augmented Lagrangian method (Bertsekas, 1999). Minmin Chen, Kilian Weinberger, Yixin Chen 15

20 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Minmin Chen, Kilian Weinberger, Yixin Chen 16

21 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Find u, v by solving PMD onlandu Minmin Chen, Kilian Weinberger, Yixin Chen 16

22 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Find u, v by solving PMD onlandu h u T T x signx u, h x signx v Apply h x, h u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16

23 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Move up-to l confident inputs fromutol Exist confident predictions? Find h u u, v Apply h x, h by solving PMD onlandu T T x signx u, h x signx v u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16

24 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Move up-to l confident inputs fromutol Exist confident Output final classifier: predictions? h x h u v x Find h u u, v Apply h x, h by solving PMD onlandu T T x signx u, h x signx v u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16

25 f Motivation and background Extension to multi-class setting Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 4: Same decomposition across different classes Multi-class: Y = {1,2,,K} U=[u 1,...,u K ] R d K and V=[v 1,...,v K ] R d K U df features Kclasses Kclasses u f Kclasses u u df features V U K ud df features Kclasses u 1 u K K ud u 1 K 1 u1 u 2 d i1 k1 2 k U 2 U,1 K i 2 Minmin Chen, Kilian Weinberger, Yixin Chen 17

26 f Motivation and background Extension to multi-class setting Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 4: Same decomposition across different classes Multi-class: Y = {1,2,,K} U=[u 1,...,u K ] R d K and V=[v 1,...,v K ] R d K U df features Kclasses Kclasses Kclasses Execution u u u f Add L 2,1 normalization ( min log e l(u;l) +e l(v;l)) +λ( U 2,1 + V 2,1) U,V df features V U K ud df features Kclasses u 1 u K K ud u 1 K 1 u1 u 2 Subject to constraints as before. d i1 k1 2 k U 2 U,1 K i Minmin Chen, Kilian Weinberger, Yixin Chen 17 2

27 Experimental results Motivation and background Paired-digit set Caltech 256 object recognition data set Toy dataset: Paired handwritten digit set (class-conditionally independent) Minmin Chen, Kilian Weinberger, Yixin Chen 18

28 Experimental results Motivation and background Paired-digit set Caltech 256 object recognition data set Toy dataset: Paired handwritten digit set (class-conditionally independent) Caltech-256 and weakly labeled web images Minmin Chen, Kilian Weinberger, Yixin Chen 18

29 Sanity check on PMD Paired digit set: conditionally independent views Paired-digit set Caltech 256 object recognition data set We solve PMD for u and v, starting with a random initialization; Their non-zero weights are divided almost exactly into the two class-conditionally independent feature sets. label=+1 label=+1label=1 label= 1 l(u; X, y): l(v; X, y): u v 4 2 l(u *, X, y): 5.543e 11 l(v *, X, y): 1.898e 11 2 L = 1 4 U = 15 u * v * 6 8 Minmin Chen, Kilian Weinberger, Yixin Chen 19

30 Sanity check on PMC Paired digit set: conditionally independent views Paired-digit set Caltech 256 object recognition data set As the confident set expands, the automatically found feature splits vary between PMC iterations, and gradually approximates the class-conditional feature split. u v 1 L = 2 U = Minmin Chen, Kilian Weinberger, Yixin Chen 2

31 Paired-digit set Caltech 256 object recognition data set Experimental results of PMC on paired digit set Test Baseline RFS ICA-RFS PMC Err(%) Mean STD Test Error (%) Iteration h u h v h u+v baseline L = 2, U = 15 Baseline: a logistic regression trained exclusively on L; RFS: co-training with random feature split; ICA-RFS: co-training with random feature split on ICs; Minmin Chen, Kilian Weinberger, Yixin Chen 21

32 Paired-digit set Caltech 256 object recognition data set Exploit weakly labeled web data to improve object recognition Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 22

33 Paired-digit set Caltech 256 object recognition data set Experimental results of PMC on Caltech-256 PMC outperforms all other algorithms by a visible margin across all training set sizes. Caltech256 with weakly labeled web images 4 35 L = 5 5 per class U = 3 per class Accuracy (%) 3 25 McPMC McLR t SVM t s (Bergamo) RFS 2 SVM t (Bergamo) DWSVM (Bergamo) TSVM (Bergamo) Number of target training images Minmin Chen, Kilian Weinberger, Yixin Chen 23

34 Image re-ranking Motivation and background Paired-digit set Caltech 256 object recognition data set PMC can potentially be used for image re-ranking of search engines. Target training examples Positive examples Negative examples Minmin Chen, Kilian Weinberger, Yixin Chen 24

35 Motivation and background Introduced, a framework for co-training on single-view data. Incorporated the three conditions for co-training to succeed explicitly as an optimization problem; Solved the optimization problem to discover the decomposition; Combined with co-training to utilize unlabeled data to improve performance. Demonstrated the efficacy of our method on the challenging Caltech-256 object recognition task. Showed potential for improving web search ranking. Minmin Chen, Kilian Weinberger, Yixin Chen 25

arxiv: v1 [cs.lg] 15 Aug 2017

arxiv: v1 [cs.lg] 15 Aug 2017 Theoretical Foundation of Co-Training and Disagreement-Based Algorithms arxiv:1708.04403v1 [cs.lg] 15 Aug 017 Abstract Wei Wang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts

More information

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL

Active Learning Class 22, 03 May Claire Monteleoni MIT CSAIL Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How

More information

Semi-Supervised Learning with Very Few Labeled Training Examples

Semi-Supervised Learning with Very Few Labeled Training Examples Semi-Supervised Learning with Very Few Labeled Training Examples Zhi-Hua Zhou De-Chuan Zhan Qiang Yang 2 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 Department of

More information

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians

More information

Bayesian Co-Training

Bayesian Co-Training Bayesian Co-Training Shipeng Yu, Balai Krishnapuram, Rómer Rosales, Harald Steck, R. Bharat Rao CAD & Knowledge Solutions, Siemens Medical Solutions USA, Inc. firstname.lastname@siemens.com Abstract We

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Modern ML: New Learning Approaches Modern applications: massive amounts of

More information

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:

More information

Co-Training with Insufficient Views

Co-Training with Insufficient Views JMLR: Workshop and Conference Proceedings 29:1 16, 2013 ACML 2013 Co-Training with Insufficient Views Wei Wang Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing

More information

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image

More information

21 An Augmented PAC Model for Semi- Supervised Learning

21 An Augmented PAC Model for Semi- Supervised Learning 21 An Augmented PAC Model for Semi- Supervised Learning Maria-Florina Balcan Avrim Blum A PAC Model for Semi-Supervised Learning The standard PAC-learning model has proven to be a useful theoretical framework

More information

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

ABC-Boost: Adaptive Base Class Boost for Multi-class Classification

ABC-Boost: Adaptive Base Class Boost for Multi-class Classification ABC-Boost: Adaptive Base Class Boost for Multi-class Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose -boost (adaptive

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

PAC Generalization Bounds for Co-training

PAC Generalization Bounds for Co-training PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

MULTIPLEKERNELLEARNING CSE902

MULTIPLEKERNELLEARNING CSE902 MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Manifold Coarse Graining for Online Semi-supervised Learning

Manifold Coarse Graining for Online Semi-supervised Learning for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,

More information

Active Learning from Single and Multiple Annotators

Active Learning from Single and Multiple Annotators Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels

More information

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech

Distributed Machine Learning. Maria-Florina Balcan, Georgia Tech Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University

Distributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Modern applications: massive amounts of data distributed across multiple locations. Distributed

More information

Co-Training and Expansion: Towards Bridging Theory and Practice

Co-Training and Expansion: Towards Bridging Theory and Practice Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan Computer Science Dept. Carnegie Mellon Univ. Pittsburgh, PA 53 ninamf@cs.cmu.edu Avrim Blum Computer Science Dept. Carnegie

More information

Generalization and Overfitting

Generalization and Overfitting Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle

More information

Margin Based PU Learning

Margin Based PU Learning Margin Based PU Learning Tieliang Gong, Guangtao Wang 2, Jieping Ye 2, Zongben Xu, Ming Lin 2 School of Mathematics and Statistics, Xi an Jiaotong University, Xi an 749, Shaanxi, P. R. China 2 Department

More information

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague

AdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees

More information

ABC-LogitBoost for Multi-Class Classification

ABC-LogitBoost for Multi-Class Classification Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12

More information

COMS 4771 Lecture Boosting 1 / 16

COMS 4771 Lecture Boosting 1 / 16 COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Bootstrapping. 1 Overview. 2 Problem Setting and Notation. Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932

Bootstrapping. 1 Overview. 2 Problem Setting and Notation. Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932 Bootstrapping Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932 Abstract This paper refines the analysis of cotraining, defines and evaluates a new co-training algorithm

More information

Efficient Semi-supervised and Active Learning of Disjunctions

Efficient Semi-supervised and Active Learning of Disjunctions Maria-Florina Balcan ninamf@cc.gatech.edu Christopher Berlind cberlind@gatech.edu Steven Ehrlich sehrlich@cc.gatech.edu Yingyu Liang yliang39@gatech.edu School of Computer Science, College of Computing,

More information

Sparse Domain Adaptation in a Good Similarity-Based Projection Space

Sparse Domain Adaptation in a Good Similarity-Based Projection Space Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Generalization, Overfitting, and Model Selection

Generalization, Overfitting, and Model Selection Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data

Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Sandor Szedma 1 and John Shawe-Taylor 1 1 - Electronics and Computer Science, ISIS Group University of Southampton, SO17 1BJ, United

More information

Agnostic Domain Adaptation

Agnostic Domain Adaptation Agnostic Domain Adaptation Alexander Vezhnevets Joachim M. Buhmann ETH Zurich 8092 Zurich, Switzerland {alexander.vezhnevets,jbuhmann}@inf.ethz.ch Abstract. The supervised learning paradigm assumes in

More information

Semi-Supervised Boosting using Visual Similarity Learning

Semi-Supervised Boosting using Visual Similarity Learning Semi-Supervised Boosting using Visual Similarity Learning Christian Leistner Helmut Grabner Horst Bischof Graz University of Technology Institute for Computer Graphics and Vision {leistner, hgrabner, bischof}@icg.tugraz.at

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,

More information

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we

More information

Co-training and Learning with Noise

Co-training and Learning with Noise Co-training and Learning with Noise Wee Sun Lee LEEWS@COMP.NUS.EDU.SG Department of Computer Science and Singapore-MIT Alliance, National University of Singapore, Singapore 117543, Republic of Singapore

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Predictive analysis on Multivariate, Time Series datasets using Shapelets 1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage

More information

Beyond Spatial Pyramids

Beyond Spatial Pyramids Beyond Spatial Pyramids Receptive Field Learning for Pooled Image Features Yangqing Jia 1 Chang Huang 2 Trevor Darrell 1 1 UC Berkeley EECS 2 NEC Labs America Goal coding pooling Bear Analysis of the pooling

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction

Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006

More information

Predicting Graph Labels using Perceptron. Shuang Song

Predicting Graph Labels using Perceptron. Shuang Song Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

Multi-label Active Learning with Auxiliary Learner

Multi-label Active Learning with Auxiliary Learner Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

CS229 Supplemental Lecture notes

CS229 Supplemental Lecture notes CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,

More information

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Two-view Feature Generation Model for Semi-supervised Learning

Two-view Feature Generation Model for Semi-supervised Learning Rie Kubota Ando IBM T.J. Watson Research Center, Hawthorne, New York, USA Tong Zhang Yahoo Inc., New York, New York, USA rie@us.ibm.com tzhang@yahoo-inc.com Abstract We consider a setting for discriminative

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

How to learn from very few examples?

How to learn from very few examples? How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A

More information

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng

More information

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction

More information

Multi-View Dimensionality Reduction via Canonical Correlation Analysis

Multi-View Dimensionality Reduction via Canonical Correlation Analysis Technical Report TTI-TR-2008-4 Multi-View Dimensionality Reduction via Canonical Correlation Analysis Dean P. Foster University of Pennsylvania Sham M. Kakade Toyota Technological Institute at Chicago

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti

MACHINE LEARNING. Support Vector Machines. Alessandro Moschitti MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Kernel Regression with Order Preferences

Kernel Regression with Order Preferences Kernel Regression with Order Preferences Xiaojin Zhu and Andrew B. Goldberg Department of Computer Sciences University of Wisconsin, Madison, WI 37, USA Abstract We propose a novel kernel regression algorithm

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Discriminative Learning can Succeed where Generative Learning Fails

Discriminative Learning can Succeed where Generative Learning Fails Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,

More information

Semi-Supervised Learning by Multi-Manifold Separation

Semi-Supervised Learning by Multi-Manifold Separation Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob

More information

Listwise Approach to Learning to Rank Theory and Algorithm

Listwise Approach to Learning to Rank Theory and Algorithm Listwise Approach to Learning to Rank Theory and Algorithm Fen Xia *, Tie-Yan Liu Jue Wang, Wensheng Zhang and Hang Li Microsoft Research Asia Chinese Academy of Sciences document s Learning to Rank for

More information