Two transfer learning approaches for domain adaptation

Size: px
Start display at page:

Download "Two transfer learning approaches for domain adaptation"

Transcription

1 Two transfer learning approaches for domain adaptation Amaury Habrard Laboratoire Hubert Curien, UMR CNRS 5516 University Jean Monnet - Saint-Étienne (France) Seminar, University of Alicante 15/05/2014

2 Introduction and Motivation When do we need Domain Adaptation (DA)? The Learning distribution is different from the Testing distribution An example of a DA task We have labeled images from a Web image corpus Is there a Person in unlabeled images from a Video corpus?? Person no Person Is there a Person? How can we learn, from one distribution, a low-error classifier on another distribution? Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

3 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

4 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

5 Traditional Machine Learning Intuition and motivation from a CV perspective (I m not an expert) %&%'(#)*+#,-%(.*/.#"(%(0*!"#(12."*/.#"(%(0* Training and test data are from the same domain Training and test data are from different domains Can we train classifiers with Flickr photos, as they have already been collected and le in domain annotated, A and Sample hope the in domain classifiers B still work well Sample on mobile in domain camera C images > No [Gonq et al., CVPR 2012] (Figures are adapted from Pan) object classifiers optimized on benchmark dataset often exhibit significant degradation in recognition accuracy when evaluated on another one [Gonq et al.,icml 2013, Torralba et al., CVPR 2011, Perronnin et al., CVPR 2010] Important topic in many areas: tutorial at ICML 2010, CVPR 2012, Interspeech 2012, workshops at ICCV 2013, NIPS Sessions dedicated to transfer/domain adaptation in many top conferences. Hot topic. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

6 Hard to predict what will change in the new domain Dilemma: Hard to predict what will change in new domain high quality low quality daylight sunset posed!"#$%&'$(")*+ art surveillance Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

7 ution: Problems work with data representations in feature space! digital SLR webcam!"#$%!,$-% &'%()%*++%!"##$%$&'( )"*$&+",&+( &'%()%.+++% Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

8 Problems in NLP - sentiment analysis critiques de livres??? Exemple critiques de films -1??? +1??? Algorithme d'apprentissage +1??? -1??? Classificateur +1-1 Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

9 Task and notations Binary classification task: X input space ; Y = { 1, +1} output space Supervised Classification P S source domain: distribution over X Y ; D S marginal distribution over X S = {(x s i, yi s )} ms i=1 (P S) ms a labeled source sample Objective: Find a classifier h H with a low source error R PS (h)= E I [ h(x s ) y s] (x s,y s ) P S Supervised Classification Distribution P S Labeled Sample From P S Learning Model Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

10 Task and notations Binary classification task: X input space ; Y = { 1, +1} output space Supervised Classification P S source domain: distribution over X Y ; D S marginal distribution over X S = {(x s i, yi s )} ms i=1 (P S) ms a labeled source sample Objective: Find a classifier h H with a low source error R PS (h)= E I [ h(x s ) y s] (x s,y s ) P S Domain Adaptation P T target domain: distribution over X Y ; D T marginal distribution over X T = {x t j } mt j=1 (D T ) mt an unlabeled target sample Objective: Find a classifier h H with a low target error R PT (h)= E I [ h(x t ) y t] (x t,y t ) P T Different Distribution P T Domain Adaptation Distribution P S Unlabeled Sample From D T Labeled Sample From P S Learning Model Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

11 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

12 A first insight R PT (h) = E I [ h(x t ) y t] P S (x t, y t ) = E (x t,y t ) P T (x t,y t ) P T P S (x t, y t ) I[ h(x t ) y t] = (x t,y t ) P T (x t, y t ) P S(x t, y t ) P S (x t, y t ) I[ h(x t ) y t] = E (x t,y t ) P S P T (x t, y t ) P S (x t, y t ) I[ h(x t ) y t] If the tasks are similar, P S (y t x t ) = P T (y t x t ) (covariate shift assumption) D T (x t )P T (y t x t ) = E (x t,y t ) P S D S (x t )P S (y t x t ) I[ h(x t ) y t] = E (x t,y t ) P S D T (x t ) D S (x t ) I[ h(x t ) y t] Idea learn an estimate of D T (x t ) D S, htne learn a classifier source data, but: (x t ) The tasks are not similar in general. This analysis does not take into account the hypothesis space considered. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

13 Domain Adaptation Theory Necessity of a Domain Divergence Labeled Sample S = {(x s i, y s i )} ms i=1 Source sample drawn i.i.d. from P S Unlabeled Sample T = {x t i } mt i=1 Target Sample drawn i.i.d. from D T If h is learned from source domain, how does it perform on target domain? = If the domains are close then a low source error classifier could be a low target error classifier Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

14 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If D S and D T are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, R PT (h) R PS (h) d H H(D S, D T ) + ν Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

15 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If D S and D T are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, R PT (h) R PS (h) d H H(D S, D T ) + ν R PS (h): Classical expected error on the source domain d H H (D S, D T ): The H H-divergence 1 2 d H H(D S, D T ) = sup R DT (h, h ) R DS (h, h ) (h,h ) H 2 = sup E I [ h(x t ) h (x t ) ] E x t D T (h,h ) H 2 x s D S I [ h(x s ) h (x s ) ] Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

16 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If DS and DT are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, RPT (h) RPS (h) + 1 dh H (DS, DT ) + ν 2 ν = inf h0 H RPS (h0 ) + RPT (h0 ) : Error of the joint optimal classifier or ν = RPT (ht ) + RPT (ht, hs ), hd best hypothesis on domain D [Mohri et al, 2009] Small ν Amaury Habrard (LaHC) Large ν Domain Adaptation 15/05/ / 32

17 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If D S and D T are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, R PT (h) R PS (h) d H H(D S, D T ) + ν Idea : Minimize the bound : reweighted methods, new projection space, new feature-based representation. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

18 Illustration of the main methods Instance weighting Sample bias Covariate shift PAC Bayesian Theory Statistical Learning Theory Metric Learning Subspace Alignement Latent Pattern Mining Feature Representation Domain invariant features Latent features Iterative Models EM based methods Self training Boosting based models Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

19 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

20 Related 2 31"*445(0)#1$4"1$"%1((1$".1(*#$"6'*)5&'47" work - look for intermediate representations )80'"16")&*$461&(*)#1$" " "#$%&'()&%$*'+,-./#/01223 [Gopalan et al., ICCV 11] Project the data in a common subspace on the geodesic path between source and target subspaces Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

21 An algorithmic approach - Align the two subspaces - Principle Very simple method Totally unsupervised Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

22 Algorithm Algorithm 1: Subspace alignment DA algorithm Data: Source data S, Target data T, Source labels L S, Subspace dimension d Result: Predicted target labels L T X S PCA(S, d) (source subspace defined by the first d eigenvectors) ; X T PCA(T, d) (target subspace defined by the first d eigenvectors); X a X S X SX T (operator for aligning the source subspace to the target one); S a = SX a (new source data in the aligned space); T T = TX T (new target data in the aligned space); L T Classifier(S a, T T, L S ) ; the term M = X SX T corresponds to the subspace alignment matrix : M = argmin M X S M X T X a = X S X SX T = X S M projects the source data to the target subspace A natural similarity: Sim(x s, x t) = x sx S M X T x t = x sax t Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

23 X d D X D Some results From the Figure 1. Classifying ImageNet images using Caltech-256 images as the source domain. In the first row, we show an ImageNet query lemma for th Adaptationimage. fromin Office/Caltech-10 the second row, the nearest datasets neighbour (fourimage domains selected to by adapt) For the sake our method is shown. the same not Adaptation on ImageNet, LabelMe and Caltech-256 datasets : one is ple used D as (resp. source and one as target Comparisons positive semidefinite) where A encodes the relative contri- 1: projection of theon different the source components subspace of the vectors in their the orthogon Lemma 1. L Baselinebutions Baselineoriginal 2: projection space. on the target subspace first d eigenv 2 related methods : GFK [Gong et al., CVPR 12] and GFS [Gopalan et al.,iccv 11] We use Sim(y S, y T ) directly to perform a k-nearest λ d >λ d+1 ( neighbor classification task. On the other hand, since any n Sim(y S, y T ) is not PSD we can not make use of it to learn ( Amaury Habrard asvmdirectly. (LaHC) Aswewillseeintheexperimentalsec- Domain Adaptation 15/05/2014 at least 181/ 32 ( any n at least 1 (

24 GFK [7] OUR Method A D C D W D A W C W D W Result Method tables NA A D C D W D A W C W D W NA Baseline Office/Caltech-10 Baseline datasets with domains 74.0 Baseline A, B, C,D Baseline GFS [8] Figure 2. Finding GFS Method [8] a stable 30.7solution 32.6 and 54.3 a subspace 31.0 dimensionality GFK [7] C A D A W A A C D C W C using the GFK consistency [7] theorem OUR NA OUR Table 2. Recognition accuracy with unsupervised DA using a NN Baseline Method Table 2. Recognition NA Baseline accuracy 1 with Baseline unsupervised 2 GFKDA using OUR a NN classifier (Office dataset + Caltech10). Baseline TDAS classifier (Office 1.25 dataset Caltech10) GFS [8] HΔH Method C A D A W A A C D C W C Table 1. Several GFK [7] Methoddistribution C A discrepancy D A W Ameasures A C averaged D C W C over Baseline DA problems OUR using 39.0 Office dataset r Baseline Baseline Method A D C D W D A W C W D W Baseline GFK GFK NA OUR pared tobaseline the other 1 baselines (highest 71.8 TDAS 35.1value 33.5 and lowest HΔHBaseline measure) Both GFK 36.4 and 72.9 our method have lower 78.4 OUR Method A D C D W D A W C W D W - Method A D C D W D A W C W D W Baseline r HΔH values GFS [8] meaning 30.7that 32.6 these 54.3 methods 31.0 are more 30.6 likely 66.0 Baseline to performgfk well [7] Baseline Baseline GFK OUR GFK OUR Table Classification 2. Recognition Results accuracy with unsupervised DA using a NN OUR Table 3. Recognition accuracy with unsupervised DA using a SVM classifier (Office dataset + Caltech10). Visual Table 3. Recognition domain adaptation accuracy withperformance unsupervised DAwith using Office/Caltech10 classifier(office datasets: + InCaltech10). this experiment we evaluate the a SVM classifier(office dataset + Caltech10). - ImageNet Method (I), C A LabelMe D A W A (L) A CandD C Caltech-256 W C (C) datasets different methods using Office [14]/Caltech10 [8] datasets Method L C L I C L C I I L I C AVG r Baseline whichmethod L C L I C L C I I L I C consist of four domains (A, C, D and W). The results forna the 12 DA 46.0 problems in the31.3 unsupervised setting37.9 Baseline1 NA AVG Method NA L C 46.0 L I 38.4 C L 29.5 C I 31.3 I L 36.9 I C 45.5 AVG Baseline GFK usingbaseline1 a NN classifier 24.2 are 27.2 shown 46.9 in Table In 9 out 33.8 of the 34.9 Baseline1 Baseline OUR DABaseline2 problems24.6 our method outperforms 42.0 the 35.6other 33.8ones Baseline2 GFK Method A D C D W D A W C W D W r. GFK The results obtained in the semi-supervised DA setting (see GFK OUR OUR Baseline supplementary material) confirm this behavior. Here our Table OUR 4. Recognition accuracy 43.8with50.9 unsupervised 46.3 DA 62.8with50.1 NN r TableBaseline 4. Recognition accuracy 35.3 with 73.6unsupervised DA with 80.5NN Table classifier 5. Recognition (ImageNet (I), accuracy LabelMe with(l) unsupervised and Caltech-256 DA with (C)). SVM method classifier outperforms GFK (ImageNet 37.9 the (I), others LabelMe 36.1 in 10 (L) 74.6 DA and Caltech-256 problems (C)) classifier (ImageNet (I), LabelMe (L) and Caltech-256 (C)). The results OURobtained 38.8with39.4 a SVM 77.9classifier 39.6 in38.9 the unsupervised TableDA 3. Recognition case are shown accuracyinwith Table unsupervised 3. Our method DA using out Amaury Habrard (LaHC) Domain a SVMAdaptation using Caltech-256 images is shown in Figure 15/05/ The near- 19 / 32

25 Unsupervised Subspace alignement- conclusion Conclusion Very simple and intuitive method Totally unsupervised Theoretical results for dimensionality detection Good results on computer vision datasets Can be combined with supervised information (future work) Subspace alignement offers theoretical and practical perspectives. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

26 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

27 A PAC-Bayesian Approach for Domain Adaptation (PBDA) PAC-Bayesian Theory (1/3) Objective To offer generalization guarantees for majority vote classifiers (bayesian inference, boosting,... ) Especially, for the ρ-weighted majority vote [ ] B ρ(x) = sign ρ(h)h(x) where ρ is the posterior distribution over H learned from the prior distribution π over H such that R PS (B ρ) is as small as possible We work on the Gibbs classifier h H We have: R PS (B ρ) 2R PS (G ρ) R PS (G ρ) = E h ρ R PS (h) Generalization bound [ 2 R PS (G ρ) R S (G ρ) + KL(ρ π)+ln 8 ] m m δ Amaury Habrard (LaHC) ρ(h) Domain Adaptation 15/05/ / 32

28 A PAC-Bayesian Approach for Domain Adaptation (PBDA) A Domain Divergence suitable for PAC-Bayes Definition Let H be a hypothesis class. For any marginal distributions D S and D T over X, any distribution ρ on H, the domain disagreement dis ρ(d S, D T ) between D S and D T is [ dis ρ(d S, D T ) = E RDT (h, h ) R DS (h, h ) ] h,h ρ 2 Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

29 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Domain Adaptation Bound for the Gibbs Classifier Theorem Let H be a hypothesis class. For every distribution ρ on H, we have R PT (G ρ) R PS (G ρ) + dis ρ(d S, D T ) + ν ρ with ρ T = argmin ρ R PT (G ρ) is the best target posterior and ν ρ = R PT (G ρ T ) + E [R DT (h, h ) + R DS (h, h )] h ρ E h ρ T Comparaison between dis ρ(d S, D T ) and 1 2 d H H(D S, D T ) 1 2 d H H(D S, D T ) is a worst case divergence dis ρ(d S, D T ) is specific to the considered G ρ 1 we have: d 2 H H(D S, D T ) dis ρ(d S, D T ) One solution for PAC-Bayesian DA Jointly minimizing R PS (G ρ) and dis ρ(d S, D T ) with theoretical justification Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

30 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Domain Adaptation Bound for the Gibbs Classifier - Consistency bound Theorem: PAC-Bayesian Generalization bound generalization (McAllester s style) For any domain P S and P T (respectively with marginal D S and D T ) over X Y, and for any set H of hypothesis, for any prior distribution π over H, any δ (0, 1], with a probability at least 1 δ over the choice of S 1 (D S ) m 1, S 2 (D S ) m 2, and T (D T ) m, for every ρ over H, we have, R PT (G ρ) R S (G ρ)+dis ρ(s, T ) + ν ρ+ 3 2 m [ KL(ρ π)+ln 8 m δ where m = max{m 1, m 2, m } and ν ρ = R PT (G ρ T ) + R DT (G ρ, G ρ T ) + R DS (G ρ, G ρ T ). ] Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

31 = 1 2 kwk2 :Regularizer Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32 Algorithm: Learn ρ - optimization problem eorem??, we use PAC-Bayesian theory specialized to linear Algorithm for linear classifiers with ρ and π isotropic Gaussians centered on w and 0 s R S (G ρw ) = E φ(y s w.x ) (sigmoidal loss φ(a) = exp( z 2 )dz), (x s,y s ) P x s a S hs,t i R d Y R d and any 2 (0, 1], wehave, KL(ρ w π 0) = 1 2 w 2, dis ρw (S, T ) = E φ dis ( w.xs ) E φ x s D x s dis ( w.x t w 2 R d : kl B ) (φ S x t x D t dis = 2φ(a)φ( a)). hs,t i B PhS,T i apple 1» 2KL( w T k 0 )+ln (m) «1, m The optimization problem is similar to learning a linear classifier w) = R PS (G w )+dis w (D S, D T ). argmin w R S (G ρw )+ C dis ρw (S, T ) + A w 2 A and C being parameters to tune. ween: E ) (x s,y s ) P S tropic ) = E (x s,y s dis ) P S y s w xs kx s k w w x s kx s k E x t D T dis w x t kx t k

32 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Experimentations - Setup Comparison with SVM, DASVM [Bruzzone et al.,pami 10] and CODA [Chen et al.,nips 12] (PBDA 5 times faster than DASVM and CODA) 1 Toy problem inter-twinning moons source domain 7 different target domains according to 7 rotation angles 10 draws for each angle Performances on a test set of 1500 target instances Gaussian kernel Amazon reviews 2 Sentiment Analysis Dataset (text reviews on Amazon products) Books Dvds Electronics Kitchen 4 types of products data dimension: 40, DA tasks: adaptation from one type to another (e.g. books kitchen) Source domain: 2, 000 labeled examples Target domain: 2, 000 unlabeled examples Performances on a Target test set: between 3, 000 and 6, 000 examples Linear Kernel Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

33 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Experimentations - Inter-twinning moons (2/2) (a) 10 (d) 40 Amaury Habrard (LaHC) (b) 20 (e) 50 Domain Adaptation (c) 30 (f) 70 15/05/ / 32

34 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Experimentations - Sentiment Analysis Books DVDs Books Electronics Books Kitchen PBGD SVM DASVM CODA PBDA DVDs Books DVDs Electronics DVDs Kitchen PBGD SVM DASVM CODA PBDA Electronics Books Electronics DVDs Electronics Kitchen PBGD SVM DASVM CODA PBDA Kitchen Books Kitchen DVDs Kitchen Electronics Average PBGD SVM DASVM CODA PBDA Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

35 A PAC-Bayesian Approach for Domain Adaptation - conclusion Conclusion The first PAC-Bayesian Analysis for domain adaptation expressed as a ρ-average over a class of hypothesis a divergence depending on ρ has the advantage to be directly optimizable (with theoretical justification) A first algorithm specialized to linear classifiers with promising results opens the doors to tackle DA tasks by making use of all the PAC-Bayesian tools Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

36 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

37 Conclusion and Perspectives Domain adaptation/transfer learning is a hot topic. Many domain application need such technology : image classification, computer vision, multimedia indexing, speech recognition, natural language processing,... Domain goes fast, many methods exist The first theories were good to understand how it can work, but there are still settings that are not fully understood (importance of the distance) Change of space representations: lots of possible directions Approaches taking into account multi-source and multi-task settings become more and more popular Classifier combination approaches combined with new space representation learning methods (w.r.t. an appropriate distance) is promising. Controlling negative transfer is an important issue. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

38 Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

39 A word about parameter tuning: open problem Reverse Classifier h r l and Validation Problem: No label on target domain Solution: Kind of reverse validation [Zhong et al.,ecml 10] With the reverse classifier h r l LS TS 1 Learning of TS + + h l from LS U TS h l LS 2 Auto Labeling TS r + 4 Evaluation h r l of h on LS l by cross-validation 3 Learning of r h from TS auto labeled l of TS with hl Two domains are related h r l performs well on the source domain [Bruzzone et al. PAMI10] Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

40 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Domain Adaptation Bound for the Gibbs Classifier - from which the algo is designed Theorem: PAC-Bayesian Generalization bound generalization (Catoni s style) For any domain P S and P T (resp. with marginal D S and D T ) over X Y, any set of hypothesis H, any prior distribution π over H, any δ (0, 1], any real numbers α > 0 and c > 0, with a probability at least 1 δ over the choice of S T (P S D T ) m, we have, ρ H, R PT (G ρ) ν ρ + α 1 + c R S (G ρ) + α dis ρ(s, T ) ( c + c + 2 ) α KL(ρ π) + ln 3 δ α m where ν ρ = R PT (G ρ T ) + R DT (G ρ, G ρ T ) + R DS (G ρ, G ρ T ), c = c 1 e c, and α = 2α 1 e 2α = Similarly to PBGD, we have specialized it to a set of linear classifiers (PBDA) Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32

PAC-Bayesian Learning and Domain Adaptation

PAC-Bayesian Learning and Domain Adaptation PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel

More information

Sparse Domain Adaptation in a Good Similarity-Based Projection Space

Sparse Domain Adaptation in a Good Similarity-Based Projection Space Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain

More information

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer

Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label

More information

Generalization of the PAC-Bayesian Theory

Generalization of the PAC-Bayesian Theory Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de

More information

Domain adaptation of weighted majority votes via perturbed variation-based self-labeling

Domain adaptation of weighted majority votes via perturbed variation-based self-labeling Domain adaptation of weighted majority votes via perturbed variation-based self-labeling Emilie Morvant To cite this version: Emilie Morvant. Domain adaptation of weighted majority votes via perturbed

More information

arxiv: v1 [stat.ml] 17 Jul 2017

arxiv: v1 [stat.ml] 17 Jul 2017 PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département

More information

Supervised Metric Learning with Generalization Guarantees

Supervised Metric Learning with Generalization Guarantees Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina

More information

Deep Domain Adaptation by Geodesic Distance Minimization

Deep Domain Adaptation by Geodesic Distance Minimization Deep Domain Adaptation by Geodesic Distance Minimization Yifei Wang, Wen Li, Dengxin Dai, Luc Van Gool EHT Zurich Ramistrasse 101, 8092 Zurich yifewang@ethz.ch {liwen, dai, vangool}@vision.ee.ethz.ch Abstract

More information

A Strongly Quasiconvex PAC-Bayesian Bound

A Strongly Quasiconvex PAC-Bayesian Bound A Strongly Quasiconvex PAC-Bayesian Bound Yevgeny Seldin NIPS-2017 Workshop on (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights Based on joint work with Niklas Thiemann, Christian

More information

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

A New PAC-Bayesian Perspective on Domain Adaptation

A New PAC-Bayesian Perspective on Domain Adaptation A New PAC-Bayesian Perspective on Domain Adaptation Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant To cite this version: Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant.

More information

Instance-based Domain Adaptation

Instance-based Domain Adaptation Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1 Problem Background Training data Test data Movie Domain Sentiment Classifier

More information

arxiv: v3 [stat.ml] 9 Aug 2016

arxiv: v3 [stat.ml] 9 Aug 2016 with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Domain adaptation for deep learning

Domain adaptation for deep learning What you saw is not what you get Domain adaptation for deep learning Kate Saenko Successes of Deep Learning in AI A Learning Advance in Artificial Intelligence Rivals Human Abilities Deep Learning for

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Domain Adaptation for Regression

Domain Adaptation for Regression Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.

More information

PAC-Bayesian Generalization Bound for Multi-class Learning

PAC-Bayesian Generalization Bound for Multi-class Learning PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma

More information

Kernel Density Topic Models: Visual Topics Without Visual Words

Kernel Density Topic Models: Visual Topics Without Visual Words Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba

Anticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image

More information

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury

Metric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."

More information

Learning with Imperfect Data

Learning with Imperfect Data Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Variations sur la borne PAC-bayésienne

Variations sur la borne PAC-bayésienne Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015

1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Unsupervised Domain Adaptation with Distribution Matching Machines

Unsupervised Domain Adaptation with Distribution Matching Machines Unsupervised Domain Adaptation with Distribution Matching Machines Yue Cao, Mingsheng Long, Jianmin Wang KLiss, MOE; NEL-BDS; TNList; School of Software, Tsinghua University, China caoyue1@gmail.com mingsheng@tsinghua.edu.cn

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)

Metric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT) Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,

More information

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Agnostic Domain Adaptation

Agnostic Domain Adaptation Agnostic Domain Adaptation Alexander Vezhnevets Joachim M. Buhmann ETH Zurich 8092 Zurich, Switzerland {alexander.vezhnevets,jbuhmann}@inf.ethz.ch Abstract. The supervised learning paradigm assumes in

More information

Reconnaissance d objetsd et vision artificielle

Reconnaissance d objetsd et vision artificielle Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Global vs. Multiscale Approaches

Global vs. Multiscale Approaches Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:

More information

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University Smart PCA Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Abstract PCA can be smarter and makes more sensible projections. In this paper, we propose smart PCA, an extension

More information

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Riemannian Metric Learning for Symmetric Positive Definite Matrices

Riemannian Metric Learning for Symmetric Positive Definite Matrices CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs

More information

Laconic: Label Consistency for Image Categorization

Laconic: Label Consistency for Image Categorization 1 Laconic: Label Consistency for Image Categorization Samy Bengio, Google with Jeff Dean, Eugene Ie, Dumitru Erhan, Quoc Le, Andrew Rabinovich, Jon Shlens, and Yoram Singer 2 Motivation WHAT IS THE OCCLUDED

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

A Unified Framework for Domain Adaptation using Metric Learning on Manifolds

A Unified Framework for Domain Adaptation using Metric Learning on Manifolds A Unified Framework for Domain Adaptation using Metric Learning on Manifolds Sridhar Mahadevan 1, Bamdev Mishra 2, and Shalini Ghosh 3 1 University of Massachusetts. Amherst, MA 01003, and Department of

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Unsupervised Learning

Unsupervised Learning CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

arxiv: v2 [cs.lg] 17 Nov 2016

arxiv: v2 [cs.lg] 17 Nov 2016 Approximating Wisdom of Crowds using K-RBMs Abhay Gupta Microsoft India R&D Pvt. Ltd. abhgup@microsoft.com arxiv:1611.05340v2 [cs.lg] 17 Nov 2016 Abstract An important way to make large training sets is

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Auto-Encoding Variational Bayes

Auto-Encoding Variational Bayes Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting

Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Anil Goyal To cite this version: Anil Goyal. Learning a Multiview Weighted Majority Vote Classifier: Using

More information

Fisher Vector image representation

Fisher Vector image representation Fisher Vector image representation Machine Learning and Category Representation 2014-2015 Jakob Verbeek, January 9, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15 A brief recap on kernel

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Efficient and Principled Online Classification Algorithms for Lifelon

Efficient and Principled Online Classification Algorithms for Lifelon Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,

More information

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan

Foundations For Learning in the Age of Big Data. Maria-Florina Balcan Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts

More information

GI01/M055: Supervised Learning

GI01/M055: Supervised Learning GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet

More information

A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers

A PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers A PAC-Bayesian Approach for Doain Adaptation with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette ilie Morvant o cite this version: Pascal Gerain Aaury Habrard François

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Instance-based Domain Adaptation via Multi-clustering Logistic Approximation

Instance-based Domain Adaptation via Multi-clustering Logistic Approximation Instance-based Domain Adaptation via Multi-clustering Logistic Approximation FENG U, Nanjing University of Science and Technology JIANFEI YU, Singapore Management University RUI IA, Nanjing University

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Domain Adaptation Can Quantity Compensate for Quality?

Domain Adaptation Can Quantity Compensate for Quality? Domain Adaptation Can Quantity Compensate for Quality? hai Ben-David David R. Cheriton chool of Computer cience University of Waterloo Waterloo, ON N2L 3G1 CANADA shai@cs.uwaterloo.ca hai halev-hwartz

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Large-Margin Thresholded Ensembles for Ordinal Regression

Large-Margin Thresholded Ensembles for Ordinal Regression Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006

More information

La théorie PAC-Bayes en apprentissage supervisé

La théorie PAC-Bayes en apprentissage supervisé La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

A Unified Framework for Metric Transfer Learning

A Unified Framework for Metric Transfer Learning This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TKDE.217.2669193,

More information

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions

More information