Joint distribution optimal transportation for domain adaptation

Size: px

Start display at page:

Download "Joint distribution optimal transportation for domain adaptation"

Dominick Hart
5 years ago
Views:

1 Optimal Transport for Domain Adaptation (TPAMI 2016) Joint distribution optimal transportation for domain adaptation (NIPS 2017) Joint distribution optimal transportation for domain adaptation Nicolas Courty, Rémi Flamary, Amaury Habrard, Alain Rakotomamonjy Université de Bretagne Sud, Université Côte d Azur, Univ Lyon, Normandie Universite NIPS 2017 Presented by Yulai Cong September 7, 2018

2 Optimal Transport for Domain Adaptation (TPAMI 2016) Joint distribution optimal transportation for domain adaptation (NIPS 2017) Table of Contents 1 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Mathematical Modelings 2 Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings Experiments

3 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings Table of Contents 1 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Mathematical Modelings 2 Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings Experiments

4 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings Domain adaption problem Source domain: Training features X s = {x s i }Ns i=1 Ω s, training labels Y s = {yi s}ns i=1 C Target domain: Testing features X t = {x t i }Nt i=1 Ω t, unknown testing labels Y t C Naive assumption: (X s, Y s ) and (X t, Y t ) obey the same P(x, y). BUT in practise, Computer Vision: different lighting conditions, acquisition devices, backgrounds, pre-processing, or compression Natural Language Processing: different background noise, tone, or gender

Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical

5 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings Domain adaption problem: Illustration Domain Adaptation: Different underlying distributions for (X s, Y s ) and (X t, Y t ); but with very similar information.

6 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings 1 The Monge problem: Optimal Transport Map T 0 = argmin d(x s, T(x s ))dµ s (x s ), s.t. T#µ s = µ t (1) T Ω s where µ s (x) is the source measure, µ t (x) the target measure, and T#µ s (A t ) = µ s (T 1 (A t )), BorelA t Ω t. 2 The Kantorovitch problem: Optimal Transport Plan γ 0 = argmin γ (µ s,µ t) Ω s Ω t d(x s, x t )dγ(x s, x t ) (2) where γ (µ s, µ t ) = {γ p + γ = µ s, p γ = µ t }.

7 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings Practical solution for the K problem Use discrete data samples {x s i, xt i } to represent N s N t µ s = p s i δ x s i, µ t = p t iδ x t i, (3) i=1 where N s i=1 ps i = N t i=1 pt i = 1. The K formulation becomes a linear programming problem γ 0 = argmin γ B i=1 < γ, C > F (4) where B = {γ (R + ) Ns Nt γ1 Nt = µ s, γ T 1 Ns = µ t }, <> F is Frobenius dot product, and C(i, j) = d(x s i, xt j ). Regularizations: Entropy, Group sparsity, Graph Laplacian...

8 Table of Contents 1 Optimal Transport for Domain Adaptation (TPAMI 2016) Domain Adaptation Problem Mathematical Modelings 2 Joint distribution optimal transportation for domain adaptation (NIPS 2017) Mathematical Modelings Experiments

9 Joint distribution Optimal Transport (JDOT) The Kantorovitch problem: Optimal Transport Plan γ 0 = d(x s, x t )dγ(x s, x t ) (5) Ω s Ω t argmin γ (µ s,µ t) Main idea of joint distribution Optimal Transport (JDOT): γ 0 = D(x s, y s ; x t, y t )dγ(x s, y s ; x t, y t ) (Ω C) 2 argmin γ (P s,p t) where D(x s, y s ; x t, y t ) = αd(x s, x t ) + L(y s, y t ), and L(y s, y t ) is MSE loss or hinge loss. Problem: y t s are not available in practice y t f(x t ). (6)

10 Empirical JDOT Empirical solution: min f,γ ˆP s = 1 N s i,j N s i=1 δ x s i,yi s, ˆP f t = 1 N t δ N x t t i,f(x t i ) (7) i=1 D(x s i, y s i ; x t j, f(x t j))γ ij = min f W 1 ( ˆP s, ˆP f t ) (8) where W 1 is the 1-Wasserstein distance. Compared to previous Optimal Transport methods JDOT considers joint distributions ˆP s (x s, y s ), ˆP t (x t, f(x t )) The author stated that JDOT directly fuse labels from source to target

11 Illustration of JDOT on a 1D regression problem

12 Expected Target Error for Function f(x) err T (f) E (x,y) Pt L(y, f(x)) (9)

13 Experiments: Caltech-Office classification dataset Images: Amazon, Caltech-256, Webcam and DSLR x: Features from the FC layer of the DECAF CNN Different background, lighting conditions, image quality

14 Experiments: Caltech-Office classification dataset

15 Experiments: Amazon review classification dataset x: product reviews: bag_of_words; y: positive or negative Different words for different categories Domain Adaption

16 Experiments: Wifi localization regression dataset Goal: Detect the location of a device in a hallway. x: signal collected from several access points (sensors) y: the location of a device Adaption: different time periods, different devices

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Nicolas Courty, Rémi Flamary, Amaury Habrard, Alain Rakotomamonjy To cite this version: Nicolas Courty, Rémi Flamary, Amaury Habrard, Alain