Two transfer learning approaches for domain adaptation
|
|
- Hortense Robertson
- 5 years ago
- Views:
Transcription
1 Two transfer learning approaches for domain adaptation Amaury Habrard Laboratoire Hubert Curien, UMR CNRS 5516 University Jean Monnet - Saint-Étienne (France) Seminar, University of Alicante 15/05/2014
2 Introduction and Motivation When do we need Domain Adaptation (DA)? The Learning distribution is different from the Testing distribution An example of a DA task We have labeled images from a Web image corpus Is there a Person in unlabeled images from a Video corpus?? Person no Person Is there a Person? How can we learn, from one distribution, a low-error classifier on another distribution? Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
3 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
4 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
5 Traditional Machine Learning Intuition and motivation from a CV perspective (I m not an expert) %&%'(#)*+#,-%(.*/.#"(%(0*!"#(12."*/.#"(%(0* Training and test data are from the same domain Training and test data are from different domains Can we train classifiers with Flickr photos, as they have already been collected and le in domain annotated, A and Sample hope the in domain classifiers B still work well Sample on mobile in domain camera C images > No [Gonq et al., CVPR 2012] (Figures are adapted from Pan) object classifiers optimized on benchmark dataset often exhibit significant degradation in recognition accuracy when evaluated on another one [Gonq et al.,icml 2013, Torralba et al., CVPR 2011, Perronnin et al., CVPR 2010] Important topic in many areas: tutorial at ICML 2010, CVPR 2012, Interspeech 2012, workshops at ICCV 2013, NIPS Sessions dedicated to transfer/domain adaptation in many top conferences. Hot topic. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
6 Hard to predict what will change in the new domain Dilemma: Hard to predict what will change in new domain high quality low quality daylight sunset posed!"#$%&'$(")*+ art surveillance Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
7 ution: Problems work with data representations in feature space! digital SLR webcam!"#$%!,$-% &'%()%*++%!"##$%$&'( )"*$&+",&+( &'%()%.+++% Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
8 Problems in NLP - sentiment analysis critiques de livres??? Exemple critiques de films -1??? +1??? Algorithme d'apprentissage +1??? -1??? Classificateur +1-1 Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
9 Task and notations Binary classification task: X input space ; Y = { 1, +1} output space Supervised Classification P S source domain: distribution over X Y ; D S marginal distribution over X S = {(x s i, yi s )} ms i=1 (P S) ms a labeled source sample Objective: Find a classifier h H with a low source error R PS (h)= E I [ h(x s ) y s] (x s,y s ) P S Supervised Classification Distribution P S Labeled Sample From P S Learning Model Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
10 Task and notations Binary classification task: X input space ; Y = { 1, +1} output space Supervised Classification P S source domain: distribution over X Y ; D S marginal distribution over X S = {(x s i, yi s )} ms i=1 (P S) ms a labeled source sample Objective: Find a classifier h H with a low source error R PS (h)= E I [ h(x s ) y s] (x s,y s ) P S Domain Adaptation P T target domain: distribution over X Y ; D T marginal distribution over X T = {x t j } mt j=1 (D T ) mt an unlabeled target sample Objective: Find a classifier h H with a low target error R PT (h)= E I [ h(x t ) y t] (x t,y t ) P T Different Distribution P T Domain Adaptation Distribution P S Unlabeled Sample From D T Labeled Sample From P S Learning Model Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
11 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
12 A first insight R PT (h) = E I [ h(x t ) y t] P S (x t, y t ) = E (x t,y t ) P T (x t,y t ) P T P S (x t, y t ) I[ h(x t ) y t] = (x t,y t ) P T (x t, y t ) P S(x t, y t ) P S (x t, y t ) I[ h(x t ) y t] = E (x t,y t ) P S P T (x t, y t ) P S (x t, y t ) I[ h(x t ) y t] If the tasks are similar, P S (y t x t ) = P T (y t x t ) (covariate shift assumption) D T (x t )P T (y t x t ) = E (x t,y t ) P S D S (x t )P S (y t x t ) I[ h(x t ) y t] = E (x t,y t ) P S D T (x t ) D S (x t ) I[ h(x t ) y t] Idea learn an estimate of D T (x t ) D S, htne learn a classifier source data, but: (x t ) The tasks are not similar in general. This analysis does not take into account the hypothesis space considered. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
13 Domain Adaptation Theory Necessity of a Domain Divergence Labeled Sample S = {(x s i, y s i )} ms i=1 Source sample drawn i.i.d. from P S Unlabeled Sample T = {x t i } mt i=1 Target Sample drawn i.i.d. from D T If h is learned from source domain, how does it perform on target domain? = If the domains are close then a low source error classifier could be a low target error classifier Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
14 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If D S and D T are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, R PT (h) R PS (h) d H H(D S, D T ) + ν Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
15 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If D S and D T are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, R PT (h) R PS (h) d H H(D S, D T ) + ν R PS (h): Classical expected error on the source domain d H H (D S, D T ): The H H-divergence 1 2 d H H(D S, D T ) = sup R DT (h, h ) R DS (h, h ) (h,h ) H 2 = sup E I [ h(x t ) h (x t ) ] E x t D T (h,h ) H 2 x s D S I [ h(x s ) h (x s ) ] Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
16 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If DS and DT are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, RPT (h) RPS (h) + 1 dh H (DS, DT ) + ν 2 ν = inf h0 H RPS (h0 ) + RPT (h0 ) : Error of the joint optimal classifier or ν = RPT (ht ) + RPT (ht, hs ), hd best hypothesis on domain D [Mohri et al, 2009] Small ν Amaury Habrard (LaHC) Large ν Domain Adaptation 15/05/ / 32
17 Domain Adaptation Theory S. Ben-David et al. Result Theorem [Ben-David et al.,mlj 10,NIPS 06] Let H a symmetric hypothesis space. If D S and D T are respectively the marginal distributions of source and target instances, then for all δ (0, 1], with probability at least 1 δ : h H, R PT (h) R PS (h) d H H(D S, D T ) + ν Idea : Minimize the bound : reweighted methods, new projection space, new feature-based representation. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
18 Illustration of the main methods Instance weighting Sample bias Covariate shift PAC Bayesian Theory Statistical Learning Theory Metric Learning Subspace Alignement Latent Pattern Mining Feature Representation Domain invariant features Latent features Iterative Models EM based methods Self training Boosting based models Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
19 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
20 Related 2 31"*445(0)#1$4"1$"%1((1$".1(*#$"6'*)5&'47" work - look for intermediate representations )80'"16")&*$461&(*)#1$" " "#$%&'()&%$*'+,-./#/01223 [Gopalan et al., ICCV 11] Project the data in a common subspace on the geodesic path between source and target subspaces Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
21 An algorithmic approach - Align the two subspaces - Principle Very simple method Totally unsupervised Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
22 Algorithm Algorithm 1: Subspace alignment DA algorithm Data: Source data S, Target data T, Source labels L S, Subspace dimension d Result: Predicted target labels L T X S PCA(S, d) (source subspace defined by the first d eigenvectors) ; X T PCA(T, d) (target subspace defined by the first d eigenvectors); X a X S X SX T (operator for aligning the source subspace to the target one); S a = SX a (new source data in the aligned space); T T = TX T (new target data in the aligned space); L T Classifier(S a, T T, L S ) ; the term M = X SX T corresponds to the subspace alignment matrix : M = argmin M X S M X T X a = X S X SX T = X S M projects the source data to the target subspace A natural similarity: Sim(x s, x t) = x sx S M X T x t = x sax t Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
23 X d D X D Some results From the Figure 1. Classifying ImageNet images using Caltech-256 images as the source domain. In the first row, we show an ImageNet query lemma for th Adaptationimage. fromin Office/Caltech-10 the second row, the nearest datasets neighbour (fourimage domains selected to by adapt) For the sake our method is shown. the same not Adaptation on ImageNet, LabelMe and Caltech-256 datasets : one is ple used D as (resp. source and one as target Comparisons positive semidefinite) where A encodes the relative contri- 1: projection of theon different the source components subspace of the vectors in their the orthogon Lemma 1. L Baselinebutions Baselineoriginal 2: projection space. on the target subspace first d eigenv 2 related methods : GFK [Gong et al., CVPR 12] and GFS [Gopalan et al.,iccv 11] We use Sim(y S, y T ) directly to perform a k-nearest λ d >λ d+1 ( neighbor classification task. On the other hand, since any n Sim(y S, y T ) is not PSD we can not make use of it to learn ( Amaury Habrard asvmdirectly. (LaHC) Aswewillseeintheexperimentalsec- Domain Adaptation 15/05/2014 at least 181/ 32 ( any n at least 1 (
24 GFK [7] OUR Method A D C D W D A W C W D W Result Method tables NA A D C D W D A W C W D W NA Baseline Office/Caltech-10 Baseline datasets with domains 74.0 Baseline A, B, C,D Baseline GFS [8] Figure 2. Finding GFS Method [8] a stable 30.7solution 32.6 and 54.3 a subspace 31.0 dimensionality GFK [7] C A D A W A A C D C W C using the GFK consistency [7] theorem OUR NA OUR Table 2. Recognition accuracy with unsupervised DA using a NN Baseline Method Table 2. Recognition NA Baseline accuracy 1 with Baseline unsupervised 2 GFKDA using OUR a NN classifier (Office dataset + Caltech10). Baseline TDAS classifier (Office 1.25 dataset Caltech10) GFS [8] HΔH Method C A D A W A A C D C W C Table 1. Several GFK [7] Methoddistribution C A discrepancy D A W Ameasures A C averaged D C W C over Baseline DA problems OUR using 39.0 Office dataset r Baseline Baseline Method A D C D W D A W C W D W Baseline GFK GFK NA OUR pared tobaseline the other 1 baselines (highest 71.8 TDAS 35.1value 33.5 and lowest HΔHBaseline measure) Both GFK 36.4 and 72.9 our method have lower 78.4 OUR Method A D C D W D A W C W D W - Method A D C D W D A W C W D W Baseline r HΔH values GFS [8] meaning 30.7that 32.6 these 54.3 methods 31.0 are more 30.6 likely 66.0 Baseline to performgfk well [7] Baseline Baseline GFK OUR GFK OUR Table Classification 2. Recognition Results accuracy with unsupervised DA using a NN OUR Table 3. Recognition accuracy with unsupervised DA using a SVM classifier (Office dataset + Caltech10). Visual Table 3. Recognition domain adaptation accuracy withperformance unsupervised DAwith using Office/Caltech10 classifier(office datasets: + InCaltech10). this experiment we evaluate the a SVM classifier(office dataset + Caltech10). - ImageNet Method (I), C A LabelMe D A W A (L) A CandD C Caltech-256 W C (C) datasets different methods using Office [14]/Caltech10 [8] datasets Method L C L I C L C I I L I C AVG r Baseline whichmethod L C L I C L C I I L I C consist of four domains (A, C, D and W). The results forna the 12 DA 46.0 problems in the31.3 unsupervised setting37.9 Baseline1 NA AVG Method NA L C 46.0 L I 38.4 C L 29.5 C I 31.3 I L 36.9 I C 45.5 AVG Baseline GFK usingbaseline1 a NN classifier 24.2 are 27.2 shown 46.9 in Table In 9 out 33.8 of the 34.9 Baseline1 Baseline OUR DABaseline2 problems24.6 our method outperforms 42.0 the 35.6other 33.8ones Baseline2 GFK Method A D C D W D A W C W D W r. GFK The results obtained in the semi-supervised DA setting (see GFK OUR OUR Baseline supplementary material) confirm this behavior. Here our Table OUR 4. Recognition accuracy 43.8with50.9 unsupervised 46.3 DA 62.8with50.1 NN r TableBaseline 4. Recognition accuracy 35.3 with 73.6unsupervised DA with 80.5NN Table classifier 5. Recognition (ImageNet (I), accuracy LabelMe with(l) unsupervised and Caltech-256 DA with (C)). SVM method classifier outperforms GFK (ImageNet 37.9 the (I), others LabelMe 36.1 in 10 (L) 74.6 DA and Caltech-256 problems (C)) classifier (ImageNet (I), LabelMe (L) and Caltech-256 (C)). The results OURobtained 38.8with39.4 a SVM 77.9classifier 39.6 in38.9 the unsupervised TableDA 3. Recognition case are shown accuracyinwith Table unsupervised 3. Our method DA using out Amaury Habrard (LaHC) Domain a SVMAdaptation using Caltech-256 images is shown in Figure 15/05/ The near- 19 / 32
25 Unsupervised Subspace alignement- conclusion Conclusion Very simple and intuitive method Totally unsupervised Theoretical results for dimensionality detection Good results on computer vision datasets Can be combined with supervised information (future work) Subspace alignement offers theoretical and practical perspectives. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
26 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
27 A PAC-Bayesian Approach for Domain Adaptation (PBDA) PAC-Bayesian Theory (1/3) Objective To offer generalization guarantees for majority vote classifiers (bayesian inference, boosting,... ) Especially, for the ρ-weighted majority vote [ ] B ρ(x) = sign ρ(h)h(x) where ρ is the posterior distribution over H learned from the prior distribution π over H such that R PS (B ρ) is as small as possible We work on the Gibbs classifier h H We have: R PS (B ρ) 2R PS (G ρ) R PS (G ρ) = E h ρ R PS (h) Generalization bound [ 2 R PS (G ρ) R S (G ρ) + KL(ρ π)+ln 8 ] m m δ Amaury Habrard (LaHC) ρ(h) Domain Adaptation 15/05/ / 32
28 A PAC-Bayesian Approach for Domain Adaptation (PBDA) A Domain Divergence suitable for PAC-Bayes Definition Let H be a hypothesis class. For any marginal distributions D S and D T over X, any distribution ρ on H, the domain disagreement dis ρ(d S, D T ) between D S and D T is [ dis ρ(d S, D T ) = E RDT (h, h ) R DS (h, h ) ] h,h ρ 2 Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
29 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Domain Adaptation Bound for the Gibbs Classifier Theorem Let H be a hypothesis class. For every distribution ρ on H, we have R PT (G ρ) R PS (G ρ) + dis ρ(d S, D T ) + ν ρ with ρ T = argmin ρ R PT (G ρ) is the best target posterior and ν ρ = R PT (G ρ T ) + E [R DT (h, h ) + R DS (h, h )] h ρ E h ρ T Comparaison between dis ρ(d S, D T ) and 1 2 d H H(D S, D T ) 1 2 d H H(D S, D T ) is a worst case divergence dis ρ(d S, D T ) is specific to the considered G ρ 1 we have: d 2 H H(D S, D T ) dis ρ(d S, D T ) One solution for PAC-Bayesian DA Jointly minimizing R PS (G ρ) and dis ρ(d S, D T ) with theoretical justification Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
30 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Domain Adaptation Bound for the Gibbs Classifier - Consistency bound Theorem: PAC-Bayesian Generalization bound generalization (McAllester s style) For any domain P S and P T (respectively with marginal D S and D T ) over X Y, and for any set H of hypothesis, for any prior distribution π over H, any δ (0, 1], with a probability at least 1 δ over the choice of S 1 (D S ) m 1, S 2 (D S ) m 2, and T (D T ) m, for every ρ over H, we have, R PT (G ρ) R S (G ρ)+dis ρ(s, T ) + ν ρ+ 3 2 m [ KL(ρ π)+ln 8 m δ where m = max{m 1, m 2, m } and ν ρ = R PT (G ρ T ) + R DT (G ρ, G ρ T ) + R DS (G ρ, G ρ T ). ] Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
31 = 1 2 kwk2 :Regularizer Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32 Algorithm: Learn ρ - optimization problem eorem??, we use PAC-Bayesian theory specialized to linear Algorithm for linear classifiers with ρ and π isotropic Gaussians centered on w and 0 s R S (G ρw ) = E φ(y s w.x ) (sigmoidal loss φ(a) = exp( z 2 )dz), (x s,y s ) P x s a S hs,t i R d Y R d and any 2 (0, 1], wehave, KL(ρ w π 0) = 1 2 w 2, dis ρw (S, T ) = E φ dis ( w.xs ) E φ x s D x s dis ( w.x t w 2 R d : kl B ) (φ S x t x D t dis = 2φ(a)φ( a)). hs,t i B PhS,T i apple 1» 2KL( w T k 0 )+ln (m) «1, m The optimization problem is similar to learning a linear classifier w) = R PS (G w )+dis w (D S, D T ). argmin w R S (G ρw )+ C dis ρw (S, T ) + A w 2 A and C being parameters to tune. ween: E ) (x s,y s ) P S tropic ) = E (x s,y s dis ) P S y s w xs kx s k w w x s kx s k E x t D T dis w x t kx t k
32 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Experimentations - Setup Comparison with SVM, DASVM [Bruzzone et al.,pami 10] and CODA [Chen et al.,nips 12] (PBDA 5 times faster than DASVM and CODA) 1 Toy problem inter-twinning moons source domain 7 different target domains according to 7 rotation angles 10 draws for each angle Performances on a test set of 1500 target instances Gaussian kernel Amazon reviews 2 Sentiment Analysis Dataset (text reviews on Amazon products) Books Dvds Electronics Kitchen 4 types of products data dimension: 40, DA tasks: adaptation from one type to another (e.g. books kitchen) Source domain: 2, 000 labeled examples Target domain: 2, 000 unlabeled examples Performances on a Target test set: between 3, 000 and 6, 000 examples Linear Kernel Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
33 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Experimentations - Inter-twinning moons (2/2) (a) 10 (d) 40 Amaury Habrard (LaHC) (b) 20 (e) 50 Domain Adaptation (c) 30 (f) 70 15/05/ / 32
34 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Experimentations - Sentiment Analysis Books DVDs Books Electronics Books Kitchen PBGD SVM DASVM CODA PBDA DVDs Books DVDs Electronics DVDs Kitchen PBGD SVM DASVM CODA PBDA Electronics Books Electronics DVDs Electronics Kitchen PBGD SVM DASVM CODA PBDA Kitchen Books Kitchen DVDs Kitchen Electronics Average PBGD SVM DASVM CODA PBDA Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
35 A PAC-Bayesian Approach for Domain Adaptation - conclusion Conclusion The first PAC-Bayesian Analysis for domain adaptation expressed as a ρ-average over a class of hypothesis a divergence depending on ρ has the advantage to be directly optimizable (with theoretical justification) A first algorithm specialized to linear classifiers with promising results opens the doors to tackle DA tasks by making use of all the PAC-Bayesian tools Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
36 Outline 1 Domain Adaptation - Introduction 2 What the theory says 3 Unsupervised Visual Domain Adaptation Using Subspace Alignment - ICCV 2013 Joint work with B. Fernando and T. Tuytelaars (K.U. Leuven), and M. Sebban 4 A PAC-Bayesian Approach for Domain Adaptation (PBDA) - ICML 13 Joint work with P. Germain and F. Laviolette (U. Laval, Canada), and E. Morvant (LIF, Marseille - now post-doc IST Austria) 5 Conclusion Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
37 Conclusion and Perspectives Domain adaptation/transfer learning is a hot topic. Many domain application need such technology : image classification, computer vision, multimedia indexing, speech recognition, natural language processing,... Domain goes fast, many methods exist The first theories were good to understand how it can work, but there are still settings that are not fully understood (importance of the distance) Change of space representations: lots of possible directions Approaches taking into account multi-source and multi-task settings become more and more popular Classifier combination approaches combined with new space representation learning methods (w.r.t. an appropriate distance) is promising. Controlling negative transfer is an important issue. Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
38 Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
39 A word about parameter tuning: open problem Reverse Classifier h r l and Validation Problem: No label on target domain Solution: Kind of reverse validation [Zhong et al.,ecml 10] With the reverse classifier h r l LS TS 1 Learning of TS + + h l from LS U TS h l LS 2 Auto Labeling TS r + 4 Evaluation h r l of h on LS l by cross-validation 3 Learning of r h from TS auto labeled l of TS with hl Two domains are related h r l performs well on the source domain [Bruzzone et al. PAMI10] Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
40 A PAC-Bayesian Approach for Domain Adaptation (PBDA) Domain Adaptation Bound for the Gibbs Classifier - from which the algo is designed Theorem: PAC-Bayesian Generalization bound generalization (Catoni s style) For any domain P S and P T (resp. with marginal D S and D T ) over X Y, any set of hypothesis H, any prior distribution π over H, any δ (0, 1], any real numbers α > 0 and c > 0, with a probability at least 1 δ over the choice of S T (P S D T ) m, we have, ρ H, R PT (G ρ) ν ρ + α 1 + c R S (G ρ) + α dis ρ(s, T ) ( c + c + 2 ) α KL(ρ π) + ln 3 δ α m where ν ρ = R PT (G ρ T ) + R DT (G ρ, G ρ T ) + R DS (G ρ, G ρ T ), c = c 1 e c, and α = 2α 1 e 2α = Similarly to PBGD, we have specialized it to a set of linear classifiers (PBDA) Amaury Habrard (LaHC) Domain Adaptation 15/05/ / 32
PAC-Bayesian Learning and Domain Adaptation
PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel
More informationSparse Domain Adaptation in a Good Similarity-Based Projection Space
Sparse Domain Adaptation in a Good Similarity-Based Projection Space Emilie Morvant, Amaury Habrard, Stéphane Ayache To cite this version: Emilie Morvant, Amaury Habrard, Stéphane Ayache. Sparse Domain
More informationDomain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer
Domain Adaptation of Majority Votes via Perturbed Variation-based Label Transfer Emilie Morvant To cite this version: Emilie Morvant. Domain Adaptation of Majority Votes via Perturbed Variation-based Label
More informationGeneralization of the PAC-Bayesian Theory
Generalization of the PACBayesian Theory and Applications to SemiSupervised Learning Pascal Germain INRIA Paris (SIERRA Team) Modal Seminar INRIA Lille January 24, 2017 Dans la vie, l essentiel est de
More informationDomain adaptation of weighted majority votes via perturbed variation-based self-labeling
Domain adaptation of weighted majority votes via perturbed variation-based self-labeling Emilie Morvant To cite this version: Emilie Morvant. Domain adaptation of weighted majority votes via perturbed
More informationarxiv: v1 [stat.ml] 17 Jul 2017
PACBayes and Domain Adaptation arxiv:1707.05712v1 [stat.ml] 17 Jul 2017 Pascal Germain pascal.germain@inria.fr Département d informatique de l ENS, École normale supérieure, CNRS, PSL Research University,
More informationDomain-Adversarial Neural Networks
Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département
More informationSupervised Metric Learning with Generalization Guarantees
Supervised Metric Learning with Generalization Guarantees Aurélien Bellet Laboratoire Hubert Curien, Université de Saint-Etienne, Université de Lyon Reviewers: Pierre Dupont (UC Louvain) and Jose Oncina
More informationDeep Domain Adaptation by Geodesic Distance Minimization
Deep Domain Adaptation by Geodesic Distance Minimization Yifei Wang, Wen Li, Dengxin Dai, Luc Van Gool EHT Zurich Ramistrasse 101, 8092 Zurich yifewang@ethz.ch {liwen, dai, vangool}@vision.ee.ethz.ch Abstract
More informationA Strongly Quasiconvex PAC-Bayesian Bound
A Strongly Quasiconvex PAC-Bayesian Bound Yevgeny Seldin NIPS-2017 Workshop on (Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights Based on joint work with Niklas Thiemann, Christian
More informationTowards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert
Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert HSE Computer Science Colloquium September 6, 2016 IST Austria (Institute of Science and Technology
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationA New PAC-Bayesian Perspective on Domain Adaptation
A New PAC-Bayesian Perspective on Domain Adaptation Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant To cite this version: Pascal Germain, Amaury Habrard, François Laviolette, milie Morvant.
More informationInstance-based Domain Adaptation
Instance-based Domain Adaptation Rui Xia School of Computer Science and Engineering Nanjing University of Science and Technology 1 Problem Background Training data Test data Movie Domain Sentiment Classifier
More informationarxiv: v3 [stat.ml] 9 Aug 2016
with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationDomain adaptation for deep learning
What you saw is not what you get Domain adaptation for deep learning Kate Saenko Successes of Deep Learning in AI A Learning Advance in Artificial Intelligence Rivals Human Abilities Deep Learning for
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationDomain Adaptation for Regression
Domain Adaptation for Regression Corinna Cortes Google Research corinna@google.com Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Motivation Applications: distinct training and test distributions.
More informationPAC-Bayesian Generalization Bound for Multi-class Learning
PAC-Bayesian Generalization Bound for Multi-class Learning Loubna BENABBOU Department of Industrial Engineering Ecole Mohammadia d Ingènieurs Mohammed V University in Rabat, Morocco Benabbou@emi.ac.ma
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationActive and Semi-supervised Kernel Classification
Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),
More informationAnticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image
More informationMetric Learning. 16 th Feb 2017 Rahul Dey Anurag Chowdhury
Metric Learning 16 th Feb 2017 Rahul Dey Anurag Chowdhury 1 Presentation based on Bellet, Aurélien, Amaury Habrard, and Marc Sebban. "A survey on metric learning for feature vectors and structured data."
More informationLearning with Imperfect Data
Mehryar Mohri Courant Institute and Google mohri@cims.nyu.edu Joint work with: Yishay Mansour (Tel-Aviv & Google) and Afshin Rostamizadeh (Courant Institute). Standard Learning Assumptions IID assumption.
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationVariations sur la borne PAC-bayésienne
Variations sur la borne PAC-bayésienne Pascal Germain INRIA Paris Équipe SIRRA Séminaires du département d informatique et de génie logiciel Université Laval 11 juillet 2016 Pascal Germain INRIA/SIRRA
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More information1 Active Learning Foundations of Machine Learning and Data Science. Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015
10-806 Foundations of Machine Learning and Data Science Lecturer: Maria-Florina Balcan Lecture 20 & 21: November 16 & 18, 2015 1 Active Learning Most classic machine learning methods and the formal learning
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationUnsupervised Domain Adaptation with Distribution Matching Machines
Unsupervised Domain Adaptation with Distribution Matching Machines Yue Cao, Mingsheng Long, Jianmin Wang KLiss, MOE; NEL-BDS; TNList; School of Software, Tsinghua University, China caoyue1@gmail.com mingsheng@tsinghua.edu.cn
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationMetric Embedding of Task-Specific Similarity. joint work with Trevor Darrell (MIT)
Metric Embedding of Task-Specific Similarity Greg Shakhnarovich Brown University joint work with Trevor Darrell (MIT) August 9, 2006 Task-specific similarity A toy example: Task-specific similarity A toy
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin and Ling Li Learning Systems Group, California Institute of Technology, U.S.A. Conf. on Algorithmic Learning Theory, October 9,
More informationSample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationAgnostic Domain Adaptation
Agnostic Domain Adaptation Alexander Vezhnevets Joachim M. Buhmann ETH Zurich 8092 Zurich, Switzerland {alexander.vezhnevets,jbuhmann}@inf.ethz.ch Abstract. The supervised learning paradigm assumes in
More informationReconnaissance d objetsd et vision artificielle
Reconnaissance d objetsd et vision artificielle http://www.di.ens.fr/willow/teaching/recvis09 Lecture 6 Face recognition Face detection Neural nets Attention! Troisième exercice de programmation du le
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationGlobal vs. Multiscale Approaches
Harmonic Analysis on Graphs Global vs. Multiscale Approaches Weizmann Institute of Science, Rehovot, Israel July 2011 Joint work with Matan Gavish (WIS/Stanford), Ronald Coifman (Yale), ICML 10' Challenge:
More informationSmart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University
Smart PCA Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Abstract PCA can be smarter and makes more sensible projections. In this paper, we propose smart PCA, an extension
More informationUnsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih
Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationRiemannian Metric Learning for Symmetric Positive Definite Matrices
CMSC 88J: Linear Subspaces and Manifolds for Computer Vision and Machine Learning Riemannian Metric Learning for Symmetric Positive Definite Matrices Raviteja Vemulapalli Guide: Professor David W. Jacobs
More informationLaconic: Label Consistency for Image Categorization
1 Laconic: Label Consistency for Image Categorization Samy Bengio, Google with Jeff Dean, Eugene Ie, Dumitru Erhan, Quoc Le, Andrew Rabinovich, Jon Shlens, and Yoram Singer 2 Motivation WHAT IS THE OCCLUDED
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationA Unified Framework for Domain Adaptation using Metric Learning on Manifolds
A Unified Framework for Domain Adaptation using Metric Learning on Manifolds Sridhar Mahadevan 1, Bamdev Mishra 2, and Shalini Ghosh 3 1 University of Massachusetts. Amherst, MA 01003, and Department of
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationIntroduction to Machine Learning. Introduction to ML - TAU 2016/7 1
Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger
More informationUnsupervised Learning
CS 3750 Advanced Machine Learning hkc6@pitt.edu Unsupervised Learning Data: Just data, no labels Goal: Learn some underlying hidden structure of the data P(, ) P( ) Principle Component Analysis (Dimensionality
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationarxiv: v2 [cs.lg] 17 Nov 2016
Approximating Wisdom of Crowds using K-RBMs Abhay Gupta Microsoft India R&D Pvt. Ltd. abhgup@microsoft.com arxiv:1611.05340v2 [cs.lg] 17 Nov 2016 Abstract An important way to make large training sets is
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationAuto-Encoding Variational Bayes
Auto-Encoding Variational Bayes Diederik P Kingma, Max Welling June 18, 2018 Diederik P Kingma, Max Welling Auto-Encoding Variational Bayes June 18, 2018 1 / 39 Outline 1 Introduction 2 Variational Lower
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationLearning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting
Learning a Multiview Weighted Majority Vote Classifier: Using PAC-Bayesian Theory and Boosting Anil Goyal To cite this version: Anil Goyal. Learning a Multiview Weighted Majority Vote Classifier: Using
More informationFisher Vector image representation
Fisher Vector image representation Machine Learning and Category Representation 2014-2015 Jakob Verbeek, January 9, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15 A brief recap on kernel
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationVariational Autoencoders
Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationMachine Learning, Fall 2009: Midterm
10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all
More informationFace Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition
ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr
More informationCSE 546 Final Exam, Autumn 2013
CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationFoundations For Learning in the Age of Big Data. Maria-Florina Balcan
Foundations For Learning in the Age of Big Data Maria-Florina Balcan Modern Machine Learning New applications Explosion of data Classic Paradigm Insufficient Nowadays Modern applications: massive amounts
More informationGI01/M055: Supervised Learning
GI01/M055: Supervised Learning 1. Introduction to Supervised Learning October 5, 2009 John Shawe-Taylor 1 Course information 1. When: Mondays, 14:00 17:00 Where: Room 1.20, Engineering Building, Malet
More informationA PAC-Bayesian Approach for Domain Adaptation with Specialization to Linear Classifiers
A PAC-Bayesian Approach for Doain Adaptation with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette ilie Morvant o cite this version: Pascal Gerain Aaury Habrard François
More informationSupport Vector Machines (SVMs).
Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most
More informationInstance-based Domain Adaptation via Multi-clustering Logistic Approximation
Instance-based Domain Adaptation via Multi-clustering Logistic Approximation FENG U, Nanjing University of Science and Technology JIANFEI YU, Singapore Management University RUI IA, Nanjing University
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationDomain Adaptation Can Quantity Compensate for Quality?
Domain Adaptation Can Quantity Compensate for Quality? hai Ben-David David R. Cheriton chool of Computer cience University of Waterloo Waterloo, ON N2L 3G1 CANADA shai@cs.uwaterloo.ca hai halev-hwartz
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationLarge-Margin Thresholded Ensembles for Ordinal Regression
Large-Margin Thresholded Ensembles for Ordinal Regression Hsuan-Tien Lin (accepted by ALT 06, joint work with Ling Li) Learning Systems Group, Caltech Workshop Talk in MLSS 2006, Taipei, Taiwan, 07/25/2006
More informationLa théorie PAC-Bayes en apprentissage supervisé
La théorie PAC-Bayes en apprentissage supervisé Présentation au LRI de l université Paris XI François Laviolette, Laboratoire du GRAAL, Université Laval, Québec, Canada 14 dcembre 2010 Summary Aujourd
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationA Unified Framework for Metric Transfer Learning
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 1.119/TKDE.217.2669193,
More informationNotes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces
Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions
More information