Automatic Feature Decomposition for Single View Co-training
|
|
- Darcy Murphy
- 5 years ago
- Views:
Transcription
1 Automatic Feature Decomposition for Single View Co-training Minmin Chen, Kilian Weinberger, Yixin Chen Computer Science and Engineering Washington University in Saint Louis Minmin Chen, Kilian Weinberger, Yixin Chen 1
2 Motivation Motivation and background Motivation Co-training and its limitation What if your classifier could search the web - and use the results to improve its accuracy? Minmin Chen, Kilian Weinberger, Yixin Chen 2
3 Caltech-256 Object Recognition Motivation Co-training and its limitation Americanflag Basketballhoop Hotairballoon AK47? Frog Cake Beermug Eiffeltower Hawksbill Problem : manual labeling is expensive! Minmin Chen, Kilian Weinberger, Yixin Chen 3
4 Weakly labeled web data Motivation Co-training and its limitation Tons of images are available online, and can be retrieved free of charge. Americanflag Basketballhoop Hotairballoon Americanflag Basketballhoop Hotairballoon AK47 Frog Cake Beermug Eiffeltower Hawksbill Beermug Eiffeltower Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 4
5 Free, but noisy Motivation and background Motivation Co-training and its limitation Web retrieved images are both visually and semantically less coherent. Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 5
6 Naive combination Motivation and background Motivation Co-training and its limitation The weakly-labeled images are noisy enough to be harmful. 4 Caltech256 with weakly labeled web images 35 3 Accuracy (%) SVM t : SVM on Caltech 256 only (Bergamo, NIPS21) SVM t s : SVM on Caltech 256 and web images (Bergamo, NIPS21) Number of target training images Minmin Chen, Kilian Weinberger, Yixin Chen 6
7 Cherry pick the good ones Motivation Co-training and its limitation What if we can cherry pick the good ones? Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 7
8 Co-training Motivation and background Motivation Co-training and its limitation One of the most successful semi-supervised learning algorithms is co-training (Blum et. al 1998) for multi-view data. Won 1 years best paper award at ICML 28; Applied to many applications across computer science and beyond (Collins & Singer, 1999; Nigam & Ghani, 2; Ghani, 21; Levin et al., 23; Brefeld & Scheffer, 24; Chan et al., 24). Three conditions for co-training to work: Class-conditionally independent multi-view; Two good classifiers; Two classifiers confident on different inputs. Minmin Chen, Kilian Weinberger, Yixin Chen 8
9 Limitations Motivation and background Motivation Co-training and its limitation Most real datasets only have ONE view. Current state-of-art: Manual feature splitting (Blum & Mitchell, 1998, Brefeld & Scheffer 24); Random feature splitting (Nigam & Ghani, 2, Chan et al. 24); Greedy algorithms (Abney 22, Zhang & Zheng 29). Minmin Chen, Kilian Weinberger, Yixin Chen 9
10 Method Motivation and background Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings How can we use co-training on single view data? Minmin Chen, Kilian Weinberger, Yixin Chen 1
11 Method Motivation and background Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings How can we use co-training on single view data? Artificially create multi-view for co-training by solving a single optimization problem. Minmin Chen, Kilian Weinberger, Yixin Chen 1
12 Three conditions for co-training to work Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 1: both classifiers are good Make sure both classifiers suffer low loss by minimizing the maximum of the two loss, min u,v max[l(u;l),l(v;l)]. Minmin Chen, Kilian Weinberger, Yixin Chen 11
13 Three conditions for co-training to work Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 1: both classifiers are good Make sure both classifiers suffer low loss by minimizing the maximum of the two loss, min u,v max[l(u;l),l(v;l)]. Execution Softmax approximation min u,v log (e l(u;l) +e l(v;l) ). Agnostic to the specific choice of loss (logistic regression) l(u;l) = (x,y) L log(1+e u xy ). Minmin Chen, Kilian Weinberger, Yixin Chen 11
14 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 2: both classifiers use different features X (1) X (2) dfeature es Xu v Make sure each feature is used by exactly one classifier, u i v i =, i = 1,,d Minmin Chen, Kilian Weinberger, Yixin Chen 12
15 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 2: both classifiers use different features X (1) X (2) dfeature es Xu v Make sure each feature is used by exactly one classifier, u i v i =, i = 1,,d Execution Square and sum over all features d u 2 ivi 2 = i=1 Minmin Chen, Kilian Weinberger, Yixin Chen 12
16 Class-conditionally independent? Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Class-conditionally independent stringent ǫ-expanding (Balcan et. al, 24) ǫ-expanding X = X 1 X 2, D is ǫ-expanding with respect to the hypothesis class H 1 H 2 if for any two classifiers h 1 H 1,h 2 H 2, the following statement holds Pr(S 1 S 2 ) + Pr(S 1 S 2 ) ǫmin[ Pr(S 1 S 2 ), Pr(S 1 S 2 ) ] S1S 2 S S 2 S 1 S 2 S S where S 1 denotes the event that a sample x = (x 1,x 2 ) D falls into the confident set of h 1 - similarly for S 2. If the expanding assumption holds, co-training will succeed given appropriately strong PAC-learning algorithms on each feature set. Minmin Chen, Kilian Weinberger, Yixin Chen 13
17 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 3: both classifiers make different confident predictions D is ǫ-expanding w.r.t. h u,h v. Pr(S u S v ) + Pr(S u S v ) ǫmin[ Pr(S u S v ), Pr(S u S v ) ] Minmin Chen, Kilian Weinberger, Yixin Chen 14
18 Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Three conditions for co-training to work (cont.) Condition 3: both classifiers make different confident predictions D is ǫ-expanding w.r.t. h u,h v. Pr(S u S v ) + Pr(S u S v ) ǫmin[ Pr(S u S v ), Pr(S u S v ) ] Execution Confidence indicator { 1 if Pr(hu(x) x;u) > τ c u(x) = otherwise. Approximation of ǫ-expandability. c u(x) c v(x) + c u(x)c v(x) ǫmin[ c u(x)c v(x), c u(x) c v(x) ] x U x U x U x U }{{}}{{}}{{}}{{} Pr(S u S v) Pr(S u S v) Pr(S u S v) Pr(S u S v) Minmin Chen, Kilian Weinberger, Yixin Chen 14
19 Pseudo multi-view decomposition (PMD) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Optimization problem log (e l(u;l) +e l(v;l)) min u,v subject to: d (1) u 2 ivi 2 = i=1 (2) [c u (x) c v (x)+ c u (x)c v (x)] [ ] x U ǫmin u (x)c v x Uc (x), c u (x) c v (x) x U Optimize with Augmented Lagrangian method (Bertsekas, 1999). Minmin Chen, Kilian Weinberger, Yixin Chen 15
20 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Minmin Chen, Kilian Weinberger, Yixin Chen 16
21 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Find u, v by solving PMD onlandu Minmin Chen, Kilian Weinberger, Yixin Chen 16
22 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Find u, v by solving PMD onlandu h u T T x signx u, h x signx v Apply h x, h u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16
23 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Move up-to l confident inputs fromutol Exist confident predictions? Find h u u, v Apply h x, h by solving PMD onlandu T T x signx u, h x signx v u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16
24 Pseudo multi-view co-training (PMC) Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Input: Labeled set L Unlabeled setu Move up-to l confident inputs fromutol Exist confident Output final classifier: predictions? h x h u v x Find h u u, v Apply h x, h by solving PMD onlandu T T x signx u, h x signx v u v x v on all elements ofu. Minmin Chen, Kilian Weinberger, Yixin Chen 16
25 f Motivation and background Extension to multi-class setting Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 4: Same decomposition across different classes Multi-class: Y = {1,2,,K} U=[u 1,...,u K ] R d K and V=[v 1,...,v K ] R d K U df features Kclasses Kclasses u f Kclasses u u df features V U K ud df features Kclasses u 1 u K K ud u 1 K 1 u1 u 2 d i1 k1 2 k U 2 U,1 K i 2 Minmin Chen, Kilian Weinberger, Yixin Chen 17
26 f Motivation and background Extension to multi-class setting Three conditions pseudo multi-view decomposition (PMD) Pseudo multi-view co-training Extension to multi-class settings Condition 4: Same decomposition across different classes Multi-class: Y = {1,2,,K} U=[u 1,...,u K ] R d K and V=[v 1,...,v K ] R d K U df features Kclasses Kclasses Kclasses Execution u u u f Add L 2,1 normalization ( min log e l(u;l) +e l(v;l)) +λ( U 2,1 + V 2,1) U,V df features V U K ud df features Kclasses u 1 u K K ud u 1 K 1 u1 u 2 Subject to constraints as before. d i1 k1 2 k U 2 U,1 K i Minmin Chen, Kilian Weinberger, Yixin Chen 17 2
27 Experimental results Motivation and background Paired-digit set Caltech 256 object recognition data set Toy dataset: Paired handwritten digit set (class-conditionally independent) Minmin Chen, Kilian Weinberger, Yixin Chen 18
28 Experimental results Motivation and background Paired-digit set Caltech 256 object recognition data set Toy dataset: Paired handwritten digit set (class-conditionally independent) Caltech-256 and weakly labeled web images Minmin Chen, Kilian Weinberger, Yixin Chen 18
29 Sanity check on PMD Paired digit set: conditionally independent views Paired-digit set Caltech 256 object recognition data set We solve PMD for u and v, starting with a random initialization; Their non-zero weights are divided almost exactly into the two class-conditionally independent feature sets. label=+1 label=+1label=1 label= 1 l(u; X, y): l(v; X, y): u v 4 2 l(u *, X, y): 5.543e 11 l(v *, X, y): 1.898e 11 2 L = 1 4 U = 15 u * v * 6 8 Minmin Chen, Kilian Weinberger, Yixin Chen 19
30 Sanity check on PMC Paired digit set: conditionally independent views Paired-digit set Caltech 256 object recognition data set As the confident set expands, the automatically found feature splits vary between PMC iterations, and gradually approximates the class-conditional feature split. u v 1 L = 2 U = Minmin Chen, Kilian Weinberger, Yixin Chen 2
31 Paired-digit set Caltech 256 object recognition data set Experimental results of PMC on paired digit set Test Baseline RFS ICA-RFS PMC Err(%) Mean STD Test Error (%) Iteration h u h v h u+v baseline L = 2, U = 15 Baseline: a logistic regression trained exclusively on L; RFS: co-training with random feature split; ICA-RFS: co-training with random feature split on ICs; Minmin Chen, Kilian Weinberger, Yixin Chen 21
32 Paired-digit set Caltech 256 object recognition data set Exploit weakly labeled web data to improve object recognition Caltech256 Bingimages Americanflag Eiffeltower Beermug Hawksbill Minmin Chen, Kilian Weinberger, Yixin Chen 22
33 Paired-digit set Caltech 256 object recognition data set Experimental results of PMC on Caltech-256 PMC outperforms all other algorithms by a visible margin across all training set sizes. Caltech256 with weakly labeled web images 4 35 L = 5 5 per class U = 3 per class Accuracy (%) 3 25 McPMC McLR t SVM t s (Bergamo) RFS 2 SVM t (Bergamo) DWSVM (Bergamo) TSVM (Bergamo) Number of target training images Minmin Chen, Kilian Weinberger, Yixin Chen 23
34 Image re-ranking Motivation and background Paired-digit set Caltech 256 object recognition data set PMC can potentially be used for image re-ranking of search engines. Target training examples Positive examples Negative examples Minmin Chen, Kilian Weinberger, Yixin Chen 24
35 Motivation and background Introduced, a framework for co-training on single-view data. Incorporated the three conditions for co-training to succeed explicitly as an optimization problem; Solved the optimization problem to discover the decomposition; Combined with co-training to utilize unlabeled data to improve performance. Demonstrated the efficacy of our method on the challenging Caltech-256 object recognition task. Showed potential for improving web search ranking. Minmin Chen, Kilian Weinberger, Yixin Chen 25
arxiv: v1 [cs.lg] 15 Aug 2017
Theoretical Foundation of Co-Training and Disagreement-Based Algorithms arxiv:1708.04403v1 [cs.lg] 15 Aug 017 Abstract Wei Wang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationSupervised Metric Learning with Generalization Guarantees
Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationFoundations For Learning in the Age of Big Data. Maria-Florina Balcan
Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts
More informationActive Learning Class 22, 03 May Claire Monteleoni MIT CSAIL
Active Learning 9.520 Class 22, 03 May 2006 Claire Monteleoni MIT CSAIL Outline Motivation Historical framework: query learning Current framework: selective sampling Some recent results Open problems Active
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationSemi-Supervised Learning with Very Few Labeled Training Examples
Semi-Supervised Learning with Very Few Labeled Training Examples Zhi-Hua Zhou De-Chuan Zhan Qiang Yang 2 National Key Laboratory for Novel Software Technology, Nanjing University, China 2 Department of
More informationExpectation Maximization, and Learning from Partly Unobserved Data (part 2)
Expectation Maximization, and Learning from Partly Unobserved Data (part 2) Machine Learning 10-701 April 2005 Tom M. Mitchell Carnegie Mellon University Clustering Outline K means EM: Mixture of Gaussians
More informationBayesian Co-Training
Bayesian Co-Training Shipeng Yu, Balai Krishnapuram, Rómer Rosales, Harald Steck, R. Bharat Rao CAD & Knowledge Solutions, Siemens Medical Solutions USA, Inc. firstname.lastname@siemens.com Abstract We
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationSupport Vector Machines (SVMs).
Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationFoundations For Learning in the Age of Big Data. Maria-Florina Balcan
Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Modern ML: New Learning Approaches Modern applications: massive amounts of
More informationSample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationNatural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley
Natural Language Processing Classification Classification I Dan Klein UC Berkeley Classification Automatically make a decision about inputs Example: document category Example: image of digit digit Example:
More informationCo-Training with Insufficient Views
JMLR: Workshop and Conference Proceedings 29:1 16, 2013 ACML 2013 Co-Training with Insufficient Views Wei Wang Zhi-Hua Zhou National Key Laboratory for Novel Software Technology Nanjing University, Nanjing
More informationAnticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image
More information21 An Augmented PAC Model for Semi- Supervised Learning
21 An Augmented PAC Model for Semi- Supervised Learning Maria-Florina Balcan Avrim Blum A PAC Model for Semi-Supervised Learning The standard PAC-learning model has proven to be a useful theoretical framework
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationABC-Boost: Adaptive Base Class Boost for Multi-class Classification
ABC-Boost: Adaptive Base Class Boost for Multi-class Classification Ping Li Department of Statistical Science, Cornell University, Ithaca, NY 14853 USA pingli@cornell.edu Abstract We propose -boost (adaptive
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationPAC Generalization Bounds for Co-training
PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationMULTIPLEKERNELLEARNING CSE902
MULTIPLEKERNELLEARNING CSE902 Multiple Kernel Learning -keywords Heterogeneous information fusion Feature selection Max-margin classification Multiple kernel learning MKL Convex optimization Kernel classification
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationActive Learning from Single and Multiple Annotators
Active Learning from Single and Multiple Annotators Kamalika Chaudhuri University of California, San Diego Joint work with Chicheng Zhang Classification Given: ( xi, yi ) Vector of features Discrete Labels
More informationDistributed Machine Learning. Maria-Florina Balcan, Georgia Tech
Distributed Machine Learning Maria-Florina Balcan, Georgia Tech Model for reasoning about key issues in supervised learning [Balcan-Blum-Fine-Mansour, COLT 12] Supervised Learning Example: which emails
More informationSummary and discussion of: Dropout Training as Adaptive Regularization
Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial
More informationCluster Kernels for Semi-Supervised Learning
Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationDistributed Machine Learning. Maria-Florina Balcan Carnegie Mellon University
Distributed Machine Learning Maria-Florina Balcan Carnegie Mellon University Distributed Machine Learning Modern applications: massive amounts of data distributed across multiple locations. Distributed
More informationCo-Training and Expansion: Towards Bridging Theory and Practice
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan Computer Science Dept. Carnegie Mellon Univ. Pittsburgh, PA 53 ninamf@cs.cmu.edu Avrim Blum Computer Science Dept. Carnegie
More informationGeneralization and Overfitting
Generalization and Overfitting Model Selection Maria-Florina (Nina) Balcan February 24th, 2016 PAC/SLT models for Supervised Learning Data Source Distribution D on X Learning Algorithm Expert / Oracle
More informationMargin Based PU Learning
Margin Based PU Learning Tieliang Gong, Guangtao Wang 2, Jieping Ye 2, Zongben Xu, Ming Lin 2 School of Mathematics and Statistics, Xi an Jiaotong University, Xi an 749, Shaanxi, P. R. China 2 Department
More informationPAC-Bayesian Learning and Domain Adaptation
PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationABC-LogitBoost for Multi-Class Classification
Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12
More informationCOMS 4771 Lecture Boosting 1 / 16
COMS 4771 Lecture 12 1. Boosting 1 / 16 Boosting What is boosting? Boosting: Using a learning algorithm that provides rough rules-of-thumb to construct a very accurate predictor. 3 / 16 What is boosting?
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationBootstrapping. 1 Overview. 2 Problem Setting and Notation. Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932
Bootstrapping Steven Abney AT&T Laboratories Research 180 Park Avenue Florham Park, NJ, USA, 07932 Abstract This paper refines the analysis of cotraining, defines and evaluates a new co-training algorithm
More informationEfficient Semi-supervised and Active Learning of Disjunctions
Maria-Florina Balcan ninamf@cc.gatech.edu Christopher Berlind cberlind@gatech.edu Steven Ehrlich sehrlich@cc.gatech.edu Yingyu Liang yliang39@gatech.edu School of Computer Science, College of Computing,
More informationSparse Domain Adaptation in a Good Similarity-Based Projection Space
Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification MariaFlorina (Nina) Balcan 10/05/2016 Reminders Midterm Exam Mon, Oct. 10th Midterm Review Session
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationSynthesis of Maximum Margin and Multiview Learning using Unlabeled Data
Synthesis of Maximum Margin and Multiview Learning using Unlabeled Data Sandor Szedma 1 and John Shawe-Taylor 1 1 - Electronics and Computer Science, ISIS Group University of Southampton, SO17 1BJ, United
More informationAgnostic Domain Adaptation
Agnostic Domain Adaptation Alexander Vezhnevets Joachim M. Buhmann ETH Zurich 8092 Zurich, Switzerland {alexander.vezhnevets,jbuhmann}@inf.ethz.ch Abstract. The supervised learning paradigm assumes in
More informationSemi-Supervised Boosting using Visual Similarity Learning
Semi-Supervised Boosting using Visual Similarity Learning Christian Leistner Helmut Grabner Horst Bischof Graz University of Technology Institute for Computer Graphics and Vision {leistner, hgrabner, bischof}@icg.tugraz.at
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationNotes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces
Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we
More informationCo-training and Learning with Noise
Co-training and Learning with Noise Wee Sun Lee LEEWS@COMP.NUS.EDU.SG Department of Computer Science and Singapore-MIT Alliance, National University of Singapore, Singapore 117543, Republic of Singapore
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationPredictive analysis on Multivariate, Time Series datasets using Shapelets
1 Predictive analysis on Multivariate, Time Series datasets using Shapelets Hemal Thakkar Department of Computer Science, Stanford University hemal@stanford.edu hemal.tt@gmail.com Abstract Multivariate,
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage
More informationBeyond Spatial Pyramids
Beyond Spatial Pyramids Receptive Field Learning for Pooled Image Features Yangqing Jia 1 Chang Huang 2 Trevor Darrell 1 1 UC Berkeley EECS 2 NEC Labs America Goal coding pooling Bear Analysis of the pooling
More informationFrom Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018
From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction
More informationSimilarity-Based Theoretical Foundation for Sparse Parzen Window Prediction
Similarity-Based Theoretical Foundation for Sparse Parzen Window Prediction Nina Balcan Avrim Blum Nati Srebro Toyota Technological Institute Chicago Sparse Parzen Window Prediction We are concerned with
More informationIntroduction. Chapter 1
Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More informationPredicting Graph Labels using Perceptron. Shuang Song
Predicting Graph Labels using Perceptron Shuang Song shs037@eng.ucsd.edu Online learning over graphs M. Herbster, M. Pontil, and L. Wainer, Proc. 22nd Int. Conf. Machine Learning (ICML'05), 2005 Prediction
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationMulti-label Active Learning with Auxiliary Learner
Multi-label Active Learning with Auxiliary Learner Chen-Wei Hung and Hsuan-Tien Lin National Taiwan University November 15, 2011 C.-W. Hung & H.-T. Lin (NTU) Multi-label AL w/ Auxiliary Learner 11/15/2011
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationCS229 Supplemental Lecture notes
CS229 Supplemental Lecture notes John Duchi 1 Boosting We have seen so far how to solve classification (and other) problems when we have a data representation already chosen. We now talk about a procedure,
More informationMetric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury
Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationTwo-view Feature Generation Model for Semi-supervised Learning
Rie Kubota Ando IBM T.J. Watson Research Center, Hawthorne, New York, USA Tong Zhang Yahoo Inc., New York, New York, USA rie@us.ibm.com tzhang@yahoo-inc.com Abstract We consider a setting for discriminative
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationSemi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLarge Scale Semi-supervised Linear SVM with Stochastic Gradient Descent
Journal of Computational Information Systems 9: 15 (2013) 6251 6258 Available at http://www.jofcis.com Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationMulti-View Dimensionality Reduction via Canonical Correlation Analysis
Technical Report TTI-TR-2008-4 Multi-View Dimensionality Reduction via Canonical Correlation Analysis Dean P. Foster University of Pennsylvania Sham M. Kakade Toyota Technological Institute at Chicago
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationMACHINE LEARNING. Support Vector Machines. Alessandro Moschitti
MACHINE LEARNING Support Vector Machines Alessandro Moschitti Department of information and communication technology University of Trento Email: moschitti@dit.unitn.it Summary Support Vector Machines
More information1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning
More informationMachine Learning : Support Vector Machines
Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into
More informationKernel Regression with Order Preferences
Kernel Regression with Order Preferences Xiaojin Zhu and Andrew B. Goldberg Department of Computer Sciences University of Wisconsin, Madison, WI 37, USA Abstract We propose a novel kernel regression algorithm
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationDiscriminative Learning can Succeed where Generative Learning Fails
Discriminative Learning can Succeed where Generative Learning Fails Philip M. Long, a Rocco A. Servedio, b,,1 Hans Ulrich Simon c a Google, Mountain View, CA, USA b Columbia University, New York, New York,
More informationSemi-Supervised Learning by Multi-Manifold Separation
Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob
More informationListwise Approach to Learning to Rank Theory and Algorithm
Listwise Approach to Learning to Rank Theory and Algorithm Fen Xia *, Tie-Yan Liu Jue Wang, Wensheng Zhang and Hang Li Microsoft Research Asia Chinese Academy of Sciences document s Learning to Rank for
More information