Domain-Adversarial Neural Networks
|
|
- Shon Wells
- 5 years ago
- Views:
Transcription
1 Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département d informatique, Université de Sherbrooke, Québec, Canada Groupe de recherche en apprentissage automatique de l Université Laval (GRAAL) December, Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
2 Outline Domain Adaptation Setting Theoretical Foundations Neural Network for Domain Adaptation Empirical Results Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
3 Our Domain Adaptation Setting Binary classification tasks Input space: R d Labels: {, } Two different data distributions Source domain: D S Target domain: D T A domain adaptation learning algorithm is provided with a labeled source sample S = {(x s i, y s i )} m i= (D S ) m, an unlabeled target sample T = {x t i } m i= (D T ) m. The goal is to build a classifier η : R d {, } with a low target risk R DT (η) def = Pr (x t,y t ) D T [η(x t ) y t ]. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
4 Divergence between source and target domains Definition (Ben David et al., 6) Given two domain distributions D S and D T, and a hypothesis class H, the H-divergence between D S and D T is def [ d H (D S, D T ) = sup Pr η(x s ) = ] [ Pr η(x t ) = ] η H x s D S x t D T. [ = sup Pr η(x s ) = ] [ + Pr η(x t ) = ] η H x s D S x t D T. The H-divergence measures the ability of an hypothesis class H to discriminate between source D S and target D T distributions. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
5 Bound on the target risk Theorem (Ben David et al., 6) Let H be a hypothesis class of VC-dimension d. With probability δ over the choice of samples S (D S ) m and T (D T ) m, for every η H: R DT (η) R S (η)+ m with β inf η H [R D S (η )+R DT (η )]. Empirical risk on the source sample: d log e m d + log δ +ˆd H (S, T )+ m d log m d + log δ +β R S (η) Empirical H-divergence: [ ˆd H (S, T ) def = max η H m def = m I [η(x s i ) y s i ]. i= I [η(x s i ) = ] + m i= ] I [η(x t i ) = ]. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 5 / i=
6 Bound on the target risk Theorem (Ben David et al., 6) Let H be a hypothesis class of VC-dimension d. With probability δ over the choice of samples S (D S ) m and T (D T ) m, for every η H: R DT (η) R S (η)+ m with β inf η H [R D S (η )+R DT (η )]. d log e m d + log δ +ˆd H (S, T )+ m d log m d + log δ +β Target risk R DT (η) is low if, given S and T, R S (η) is small, i.e., η H is good on and ˆd H(S, T ) is small, i.e., all η H are bad on Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 6 /
7 Standard Neural Network Let consider a neural network architecture with one hidden layer h(x) = sigm(b + Wx), and f(h(x)) = softmax(c + Vh(x)). min W,V,b,c [ m ( ) ] log y s i f(h(x s i )). i= } {{ } source loss Given a source sample S = {(x s i, y s i )}m i= (D S) m,. Pick a x s S. Update V towards f(h(x s )) = y s. Update W towards f(h(x s )) = y s f(h(x)) h(x)... h(x) j... x... x j... V x d h(x) n W The hidden layer learns a representation h( ) from which linear hypothesis f( ) can classify source examples. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 7 /
8 Domain-Adversarial Neural Network (DANN) Empirical H-divergence [ ˆd H (S, T ) def = max η H m I [η(x s i ) = ] + m i= ] I [η(x t i ) = ]. i= We estimate the H-divergence by a logistic regressor that model the probability that a given input (either x s or x t ) is from the source domain: o(h(x)) def = sigm(d + w h(x)). Given a representation output by the hidden layer h( ) : [ ) ˆd H (h(s), h(t ) max log ( o(h(x s i )) ) + log ( o(h(x t i )) ) ]. w,d m m i= i= Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 8 /
9 Domain-Adversarial Neural Network (DANN) min W,V,b,c [ m log ( y s i f(h(x s i )) ) ( +λ max i= } {{ } source loss w,d m log ( o(h(x s i )) ) + log ( o(h(x t m i )) )) ], i= i= } {{ } adaptation regularizer where λ > weights the domain adaptation regularization term. Given a source sample S = {(x s i, y s i )}m i= (D S) m, and a target sample T = {(x t i )}m i= (D T ) m,. Pick a x s S and x t T. Update V towards f(h(x s )) = y s. Update W towards f(h(x s )) = y s. Update w towards o(h(x s )) = and o(h(x t )) = 5. Update W towards o(h(x s )) = and o(h(x t )) = f(h(x)) DANN finds a representation h( ) that are good on S; but unable to discriminate between S and T. V h(x)... h(x) j... x... x j... o(h(x)) w h(x) n x d W Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 9 /
10 Toy Dataset Standard Neural Network (NN) f(h(x)) Trained to classify source Trained to classify domains V h(x)... h(x) j... h(x) n W x x j x d Domain-Adversarial Neural Networks (DANN) f(h(x)) o(h(x)) Classification output: f(h(x)) Domain output: o(h(x)) V w h(x)... h(x) j... h(x) n W x x j x d Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
11 Amazon Reviews Input: product review (bag of words) Output: positive or negative rating. Dataset DANN NN books dvd..99 books electronics.6.5 books kitchen..5 dvd books.7.6 dvd electronics.7.56 dvd kitchen.7.7 electronics books.8.8 electronics dvd.7.77 electronics kitchen.8.9 kitchen books.8.88 kitchen dvd.6.6 kitchen electronics.6.6 Note: We use a small labeled subset of target examples to select the hyperparameters. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
12 Marginalized Stacked Denoising Autoencoders (msda) Question Does DANN can be combined with other representation learning techniques for domain adaptation? The autoencoders msda (Chen et al. ) provides a new common representation for source and target (unsupervised) With msda+svm, Chen et al. () obtained state-of-the-art results on Amazon Reviews: Train a linear SVM on msda source representations. We try msda+dann: Train DANN on source representations and target representations. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
13 Amazon Reviews Input: product review (bag of words) Output: positive or negative rating. Dataset msda+dann msda+svm books dvd books electronics.97. books kitchen.69.7 dvd books dvd electronics.8. dvd kitchen.5.78 electronics books.7.9 electronics dvd.6.6 electronics kitchen.8.7 kitchen books.. kitchen dvd.8.9 kitchen electronics..8 Note: We use a small labeled subset of target examples to select the hyperparameters. The noise parameter of msda representations is fixed to 5%. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
14 Future Work Several paths to explore: Deeper neural networks architectures. Multiclass / Multilabels problems. Multisource domain adaptation. Thank you! Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /
Domain-Adversarial Neural Networks
Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada
More informationPAC-Bayesian Learning and Domain Adaptation
PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel
More informationGeneralization of the PAC-Bayesian Theory
Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de
More informationA Pseudo-Boolean Set Covering Machine
A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper GRAAL (Université Laval, Québec city) October 9, 2012
More informationA Pseudo-Boolean Set Covering Machine
A Pseudo-Boolean Set Covering Machine Pascal Germain, Sébastien Giguère, Jean-Francis Roy, Brice Zirakiza, François Laviolette, and Claude-Guy Quimper Département d informatique et de génie logiciel, Université
More informationFew-shot learning with KRR
Few-shot learning with KRR Prudencio Tossou Groupe de Recherche en Apprentissage Automatique Départment d informatique et de génie logiciel Université Laval April 6, 2018 Prudencio Tossou (UL) Few-shot
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationPAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers
PAC-Bayes Ris Bounds for Sample-Compressed Gibbs Classifiers François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Département d informatique et de génie logiciel,
More informationMachine Learning. Linear Models. Fabio Vandin October 10, 2017
Machine Learning Linear Models Fabio Vandin October 10, 2017 1 Linear Predictors and Affine Functions Consider X = R d Affine functions: L d = {h w,b : w R d, b R} where ( d ) h w,b (x) = w, x + b = w
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationDenoising Autoencoders
Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationLogistic Regression Trained with Different Loss Functions. Discussion
Logistic Regression Trained with Different Loss Functions Discussion CS640 Notations We restrict our discussions to the binary case. g(z) = g (z) = g(z) z h w (x) = g(wx) = + e z = g(z)( g(z)) + e wx =
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationarxiv: v2 [cs.lg] 27 Oct 2017
MULTIPLE SOURCE DOMAIN ADAPTATION WITH ADVERSARIAL LEARNING Multiple Source Domain Adaptation with Adversarial Learning arxiv:1705.09684v2 [cs.lg] 27 Oct 2017 Han Zhao Shanghang Zhang Guanhang Wu João
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSample complexity bounds for differentially private learning
Sample complexity bounds for differentially private learning Kamalika Chaudhuri University of California, San Diego Daniel Hsu Microsoft Research Outline 1. Learning and privacy model 2. Our results: sample
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationVariations sur la borne PAC-bayésienne
Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationModel Averaging With Holdout Estimation of the Posterior Distribution
Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationarxiv: v1 [cs.lg] 4 Feb 2014
arxiv:140796v1 [cs.lg] 4 Feb 2014 Alexandre Lacoste ALEXANDRE.LACOSTE.1@ULAVAL.CA Département d informatique et de génie logiciel, Université Laval, Québec, Canada, G1K-7P4 Hugo Larochelle HUGO.LAROCHELLE@USHERBROOKE.CA
More informationTowards a Data-driven Approach to Exploring Galaxy Evolution via Generative Adversarial Networks
Towards a Data-driven Approach to Exploring Galaxy Evolution via Generative Adversarial Networks Tian Li tian.li@pku.edu.cn EECS, Peking University Abstract Since laboratory experiments for exploring astrophysical
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationTwo transfer learning approaches for domain adaptation
Two transfer learning approaches for domain adaptation Amaury Habrard amaury.habrard@univ-st-etienne.fr Laboratoire Hubert Curien, UMR CNRS 5516 University Jean Monnet - Saint-Étienne (France) Seminar,
More informationFrom PAC-Bayes Bounds to Quadratic Programs for Majority Votes
François Laviolette FrancoisLaviolette@iftulavalca Mario Marchand MarioMarchand@iftulavalca Jean-Francis Roy Jean-FrancisRoy1@ulavalca Département d informatique et de génie logiciel, Université Laval,
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationDomain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer
Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label
More informationLa théorie PAC-Bayes en apprentissage supervisé
La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationMaximum Margin Interval Trees
Maximum Margin Interval Trees Alexandre Drouin Département d informatique et de génie logiciel Université Laval, Québec, Canada alexandre.drouin.8@ulaval.ca Toby Dylan Hocking McGill Genome Center McGill
More informationarxiv: v1 [stat.ml] 3 Nov 2015
Under review as a conference paper at ICLR 6 THE VARIATIONAL FAIR AUTO ENCODER Christos Louizos, Kevin Swersky, Yujia Li, Ma Welling, Richard Zemel Machine Learning Group, University of Amsterdam Department
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationCS 188: Artificial Intelligence. Outline
CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationSelective Prediction. Binary classifications. Rong Zhou November 8, 2017
Selective Prediction Binary classifications Rong Zhou November 8, 2017 Table of contents 1. What are selective classifiers? 2. The Realizable Setting 3. The Noisy Setting 1 What are selective classifiers?
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationSparse Domain Adaptation in a Good Similarity-Based Projection Space
Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationGreedy Biomarker Discovery in the Genome with Applications to Antibiotic Resistance
Greedy Biomarker Discovery in the Genome with Applications to Antibiotic Resistance Alexandre Drouin, Sébastien Giguère, Maxime Déraspe, François Laviolette, Mario Marchand, Jacques Corbeil Department
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting
More informationTowards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationLinear classifiers Lecture 3
Linear classifiers Lecture 3 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin ML Methodology Data: labeled instances, e.g. emails marked spam/ham
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationManual for a computer class in ML
Manual for a computer class in ML November 3, 2015 Abstract This document describes a tour of Machine Learning (ML) techniques using tools in MATLAB. We point to the standard implementations, give example
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationMachine Learning and Related Disciplines
Machine Learning and Related Disciplines The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 8-12 (Mon.-Fri.) Yung-Kyun Noh Machine Learning Interdisciplinary
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationNumerical Learning Algorithms
Numerical Learning Algorithms Example SVM for Separable Examples.......................... Example SVM for Nonseparable Examples....................... 4 Example Gaussian Kernel SVM...............................
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More informationarxiv: v1 [stat.ml] 17 Jul 2017
PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,
More informationUnsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA
Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA with Hiroshi Morioka Dept of Computer Science University of Helsinki, Finland Facebook AI Summit, 13th June 2016 Abstract
More informationPAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:
More informationReducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning
More informationActive Learning and Optimized Information Gathering
Active Learning and Optimized Information Gathering Lecture 7 Learning Theory CS 101.2 Andreas Krause Announcements Project proposal: Due tomorrow 1/27 Homework 1: Due Thursday 1/29 Any time is ok. Office
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationCS446: Machine Learning Fall Final Exam. December 6 th, 2016
CS446: Machine Learning Fall 2016 Final Exam December 6 th, 2016 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationEnsembles. Léon Bottou COS 424 4/8/2010
Ensembles Léon Bottou COS 424 4/8/2010 Readings T. G. Dietterich (2000) Ensemble Methods in Machine Learning. R. E. Schapire (2003): The Boosting Approach to Machine Learning. Sections 1,2,3,4,6. Léon
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationRisk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm
Journal of Machine Learning Research 16 2015 787-860 Submitted 5/13; Revised 9/14; Published 4/15 Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm Pascal Germain
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationPAC-Bayesian Generalization Bound for Multi-class Learning
PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More information