Domain-Adversarial Neural Networks

Size: px

Start display at page:

Download "Domain-Adversarial Neural Networks"

Shon Wells
5 years ago
Views:

1 Domain-Adversarial Neural Networks Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand Département d informatique et de génie logiciel, Université Laval, Québec, Canada Département d informatique, Université de Sherbrooke, Québec, Canada Groupe de recherche en apprentissage automatique de l Université Laval (GRAAL) December, Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

2 Outline Domain Adaptation Setting Theoretical Foundations Neural Network for Domain Adaptation Empirical Results Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

3 Our Domain Adaptation Setting Binary classification tasks Input space: R d Labels: {, } Two different data distributions Source domain: D S Target domain: D T A domain adaptation learning algorithm is provided with a labeled source sample S = {(x s i, y s i )} m i= (D S ) m, an unlabeled target sample T = {x t i } m i= (D T ) m. The goal is to build a classifier η : R d {, } with a low target risk R DT (η) def = Pr (x t,y t ) D T [η(x t ) y t ]. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

4 Divergence between source and target domains Definition (Ben David et al., 6) Given two domain distributions D S and D T, and a hypothesis class H, the H-divergence between D S and D T is def [ d H (D S, D T ) = sup Pr η(x s ) = ] [ Pr η(x t ) = ] η H x s D S x t D T. [ = sup Pr η(x s ) = ] [ + Pr η(x t ) = ] η H x s D S x t D T. The H-divergence measures the ability of an hypothesis class H to discriminate between source D S and target D T distributions. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

5 Bound on the target risk Theorem (Ben David et al., 6) Let H be a hypothesis class of VC-dimension d. With probability δ over the choice of samples S (D S ) m and T (D T ) m, for every η H: R DT (η) R S (η)+ m with β inf η H [R D S (η )+R DT (η )]. Empirical risk on the source sample: d log e m d + log δ +ˆd H (S, T )+ m d log m d + log δ +β R S (η) Empirical H-divergence: [ ˆd H (S, T ) def = max η H m def = m I [η(x s i ) y s i ]. i= I [η(x s i ) = ] + m i= ] I [η(x t i ) = ]. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 5 / i=

6 Bound on the target risk Theorem (Ben David et al., 6) Let H be a hypothesis class of VC-dimension d. With probability δ over the choice of samples S (D S ) m and T (D T ) m, for every η H: R DT (η) R S (η)+ m with β inf η H [R D S (η )+R DT (η )]. d log e m d + log δ +ˆd H (S, T )+ m d log m d + log δ +β Target risk R DT (η) is low if, given S and T, R S (η) is small, i.e., η H is good on and ˆd H(S, T ) is small, i.e., all η H are bad on Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 6 /

7 Standard Neural Network Let consider a neural network architecture with one hidden layer h(x) = sigm(b + Wx), and f(h(x)) = softmax(c + Vh(x)). min W,V,b,c [ m ( ) ] log y s i f(h(x s i )). i= } {{ } source loss Given a source sample S = {(x s i, y s i )}m i= (D S) m,. Pick a x s S. Update V towards f(h(x s )) = y s. Update W towards f(h(x s )) = y s f(h(x)) h(x)... h(x) j... x... x j... V x d h(x) n W The hidden layer learns a representation h( ) from which linear hypothesis f( ) can classify source examples. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 7 /

8 Domain-Adversarial Neural Network (DANN) Empirical H-divergence [ ˆd H (S, T ) def = max η H m I [η(x s i ) = ] + m i= ] I [η(x t i ) = ]. i= We estimate the H-divergence by a logistic regressor that model the probability that a given input (either x s or x t ) is from the source domain: o(h(x)) def = sigm(d + w h(x)). Given a representation output by the hidden layer h( ) : [ ) ˆd H (h(s), h(t ) max log ( o(h(x s i )) ) + log ( o(h(x t i )) ) ]. w,d m m i= i= Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 8 /

9 Domain-Adversarial Neural Network (DANN) min W,V,b,c [ m log ( y s i f(h(x s i )) ) ( +λ max i= } {{ } source loss w,d m log ( o(h(x s i )) ) + log ( o(h(x t m i )) )) ], i= i= } {{ } adaptation regularizer where λ > weights the domain adaptation regularization term. Given a source sample S = {(x s i, y s i )}m i= (D S) m, and a target sample T = {(x t i )}m i= (D T ) m,. Pick a x s S and x t T. Update V towards f(h(x s )) = y s. Update W towards f(h(x s )) = y s. Update w towards o(h(x s )) = and o(h(x t )) = 5. Update W towards o(h(x s )) = and o(h(x t )) = f(h(x)) DANN finds a representation h( ) that are good on S; but unable to discriminate between S and T. V h(x)... h(x) j... x... x j... o(h(x)) w h(x) n x d W Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, 9 /

10 Toy Dataset Standard Neural Network (NN) f(h(x)) Trained to classify source Trained to classify domains V h(x)... h(x) j... h(x) n W x x j x d Domain-Adversarial Neural Networks (DANN) f(h(x)) o(h(x)) Classification output: f(h(x)) Domain output: o(h(x)) V w h(x)... h(x) j... h(x) n W x x j x d Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

11 Amazon Reviews Input: product review (bag of words) Output: positive or negative rating. Dataset DANN NN books dvd..99 books electronics.6.5 books kitchen..5 dvd books.7.6 dvd electronics.7.56 dvd kitchen.7.7 electronics books.8.8 electronics dvd.7.77 electronics kitchen.8.9 kitchen books.8.88 kitchen dvd.6.6 kitchen electronics.6.6 Note: We use a small labeled subset of target examples to select the hyperparameters. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

12 Marginalized Stacked Denoising Autoencoders (msda) Question Does DANN can be combined with other representation learning techniques for domain adaptation? The autoencoders msda (Chen et al. ) provides a new common representation for source and target (unsupervised) With msda+svm, Chen et al. () obtained state-of-the-art results on Amazon Reviews: Train a linear SVM on msda source representations. We try msda+dann: Train DANN on source representations and target representations. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

13 Amazon Reviews Input: product review (bag of words) Output: positive or negative rating. Dataset msda+dann msda+svm books dvd books electronics.97. books kitchen.69.7 dvd books dvd electronics.8. dvd kitchen.5.78 electronics books.7.9 electronics dvd.6.6 electronics kitchen.8.7 kitchen books.. kitchen dvd.8.9 kitchen electronics..8 Note: We use a small labeled subset of target examples to select the hyperparameters. The noise parameter of msda representations is fixed to 5%. Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

14 Future Work Several paths to explore: Deeper neural networks architectures. Multiclass / Multilabels problems. Multisource domain adaptation. Thank you! Pascal Germain (GRAAL) Domain-Adversarial Neural Networks December, /

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada