PAC-Bayesian Learning and Domain Adaptation

Size: px

Start display at page:

Download "PAC-Bayesian Learning and Domain Adaptation"

Patricia Andrews
5 years ago
Views:

Habrard 2 Emilie Morvant 3 1 GRAAL Machine

informatique et de génie logiciel Université

fr 3 Laboratoire d Informatique Fondamentale

1 PAC-Bayesian Learning and Domain Adaptation Pascal Germain 1 François Laviolette 1 Amaury Habrard 2 Emilie Morvant 3 1 GRAAL Machine Learning Research Group Département d informatique et de génie logiciel Université Laval, Québec, Canada {pascal.germain,francois.laviolette}@ift.ulaval.ca 2 Laboratoire Hubert Curien Saint-Étienne University, France amaury.habrard@univ-st-etienne.fr 3 Laboratoire d Informatique Fondamentale QARMA Group Aix-Marseille University, France emilie.morvant@lif.univ-mrs.fr NIPS 2012 Workshop: Multi-Trade-offs in Machine Learning, December 7, 2012

2 PAC-Bayesian Learning and Domain Adaptation Outline 1 Domain Adaptation Problem Description A Classical Domain Adaptation Bound A New Domain Adaptation Bound 2 PAC-Bayesian Learning PAC-Bayesian Learning of Linear Classifier PAC-Bayesian Domain Adaptation Learning of Linear Classifiers 3 Preliminary Experimental Results Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

Domain Adaptation DA) : Problem Description When we need DA

image corpus Is there a Person in unlabeled images from a

How to learn, from the source domain, a low-error classifier

3 Domain Adaptation DA) : Problem Description When we need DA The Learning distribution is different from the Testing distribution. An example of a DA problem We have labeled images from a Web image corpus Is there a Person in unlabeled images from a Video corpus?? Person no Person Is there a Person? How to learn, from the source domain, a low-error classifier on the target one? Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

4 Domain Adaptation DA) : Problem Description Supervised Classification We consider binary classification task: X input space, Y = { 1, 1} label set P S source domain: distribution over X Y ; D S marginal distribution over X S P S ) m a labeled source sample = Objective: Find a classifier h H with a low source risk R PS h). Domain Adaptation P T target domain: distribution over X Y ; D T marginal distribution over X T D T ) m a unlabeled target sample = Objective: Find a classifier h H with a low target risk R PT h). Supervised Classification Different Distribution P T Dom ain Adaptation Distribution P S Distribution P S Unlabeled Sample From D T Labeled Sample From P S Learning Model Labeled Sample From P S Learning Model Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

5 A Classical Domain Adaptation Bound VC-dim approach) Let H be an hypothesis space. Theorem [Ben-David et al., 2010] For every h H and for all δ ]0, 1], with probability at least 1 δ : R PT h) R PS h) d H HD S, D T ) + λ, with λ = min RPS h ) + R h PT h ) ). H Trade-off between: R PS h) is the classical expected error on the source domain d H H D S,D T ) is the H H-distance between source and target domains d H H D S,D T ) = 2 sup Pr hx) h x)) Pr hx) h x)) x D S x D T h,h H H Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

6 A New Domain Adaptation Bound PAC-Bayesian approach) Let H be an hypothesis space. Given a weight distribution ρ H, we study the ρ-average errors: Theorem R PS G ρ ) = E R PS h), R PT G ρ ) = E R PT h). h ρ h ρ For all δ ]0, 1], with probability at least 1 δ, for every posterior distribution ρ: E R P T h) h ρ E R PS h) + dis ρ D S, D T ) + λ ρ, h ρ with λ ρ = R PS h ) + R PT h ), and h = argmin h H { E h ρ RDT h, h ) R DS h, h ) )}. [ ] Domain disagreement: dis ρd S,D T ) = E Pr hx) h x)) Pr hx) h x)). h 1,h 2 ρ 2 x D S x D T Given empirical observations S P S ) m and T D S ) m, We want to minimize : B P S,T G ρ ) def = R PS G ρ ) + dis ρ D S, D T ), where P S,T denotes the joint distribution over P S D T. Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

7 PAC-Bayesian Learning of Linear Classifier [Germain, Lacasse, Laviolette and Marchand, 2009] Let H be a set of linear classifiers h vx) def = sgn v x) Consider a prior π 0 and a posterior ρ w defined as isotropic Gaussians respectively centered on vectors 0 and w. w Theorem [Langford and Shawe-Taylor, 2002] For any domain P S R d Y and any δ 0, 1], we have, w R d : kl R S G ρw ) ) R PS G ρw ) 1 m Pr S P S ) m [ KLρ w π 0 ) + ln ξm) δ ]) 1 δ. Trade-off between: R S G ρw ) = E Φ x,y) P S y w x x ) is the sigmoidal loss KLρ w π 0 ) = 1 2 w 2 is a regularizer Φa) a Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

8 PAC-Bayesian Domain Adaptation Learning of Linear Classifier Given empirical observations S P S ) m and T D S ) m, We want to minimize : B P S,T G ρ ) def = R PS G ρ ) + dis ρ D S, D T ), Theorem where P S,T denotes the joint distribution over P S D T. For any domain P S,T R d Y R d and any δ 0, 1], we have, Pr w R d : kl S,T P S,T ) m B S,T ) B P S,T 1 m where B P S,T G ρw ) = R PS G ρw ) + dis ρw D S, D T ). [ 2KLρ w π 0 ) + ln ξm) δ ]) 1 δ, Trade-off between: R PS G ρw ) = E Φ x s,y s ) P S y s w xs x s dis ρw D S, D T ) = E Φ dis x s,y s ) P S KLρ w π 0 ) = 1 2 w 2 ) w x s x s ) E x t D T Φ dis w x t x t ) Φa) Φ dis a) Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10 a

Preliminary Experimental Results Bound minimization by gradient descent Illustration of the decision boundary on 4 rotations angles: 20 30 40 50 Rotation angle 20 30 40 50 0.8 0.7 PBGD 99.5 89.8 78.

9 Preliminary Experimental Results Bound minimization by gradient descent Illustration of the decision boundary on 4 rotations angles: Rotation angle PBGD source error 0.6 target error SVM TSVM DASVM DASF DA-PBGD Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

10 Thank you! See you at our poster. Germain et al. GRAAL, LIF & LaHC) PAC-Bayesian Learning and Domain Adaptation December 7, / 10

Two transfer learning approaches for domain adaptation

Two transfer learning approaches for domain adaptation Amaury Habrard amaury.habrard@univ-st-etienne.fr Laboratoire Hubert Curien, UMR CNRS 5516 University Jean Monnet - Saint-Étienne (France) Seminar,