Joint distribution optimal transportation for domain adaptation

Size: px

Start display at page:

Download "Joint distribution optimal transportation for domain adaptation"

Emmeline Robinson
5 years ago
Views:

1 Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018

2 Joint distribution optimal transportation for domain adaptation OUTLINE Problem Statement Assumption and Notations Joint Distribution Optimal Transport Bound on the Target Error Learning with Joint Distribution OT Examples

3 Joint distribution optimal transportation for domain adaptation OUTLINE Problem Statement Assumption and Notations Joint Distribution Optimal Transport Bound on the Target Error Learning with Joint Distribution OT Examples

4 Problem Statement In DA problem, we study two different (but related) distributions DS and D on X Y. The DA task consists of the transfer of knowledge from the D to D. The objective is to learn f (from labeled or unlabeled samples of two S T domains) such that it commits as small error as possible on the target domain. T D T

5 Joint distribution optimal transportation for domain adaptation OUTLINE Problem Statement Assumption and Notations Joint Distribution Optimal Transport Bound on the Target Error Learning with Joint Distribution OT Examples

6 Assumption and Notations Assumption: there exists a nonlinear transformation between the label space distributions of the two domain P S and P T that can be estimated with optimal transport. Notations: X = { x } = s s N S i i 1 t N T = { i} i t 1 X x = Y = { y } = s Y = { y } = t s N S i i 1 t N T i i 1 d Ω 1 C P ( Ω) Ps PT ( XY, ) ( XY, ) A set of data from sample domain A set of data from target domain A set of class label information associated with Xs A set of class label information associated with X T Compact input measureable space with dimension d Label space All probability over Ω Joint probability distributions in D S Joint probability distributions in D T

7 Joint distribution optimal transportation for domain adaptation OUTLINE Problem Statement Assumption and Notations Joint Distribution Optimal Transport Bound on the Target Error Learning with Joint Distribution OT Examples

8 Joint Distribution Optimal Transport Optimal transport in domain adaptation Seek for a transport plan (or equivalently a joint probability distribution) γ P ( Ω Ω) such that: where Π µ, µ = γ P Ω Ω p + # γ = µ, p # γ = µ and p + and p denotes the two marginal projections of Ω Ω ( ) ( ) Ω p # γ { } s t s t to, and the image measure of γ by p. Joint distribution optimal transport loss in DA To handle a change in both marginal and conditional distributions. where ( x, ;x, ) = α d( x,x ) + (, ) D y y y y is a joint cost measure combining both distance and a loss function measuring the discrepancy between y 1 and y 2

9 Joint Distribution Optimal Transport Joint distribution optimal transport loss in DA To handle a change in both marginal and conditional distributions. In the unsupervised DA problem, one does not have access to labels in the target domain, and as such it is not possible to find the optimal coupling. Since our goal is to find a function on the target domain f : Define the following joint distribution that uses a given function f as a proxy for y in target domain: Pt f = ( x, f ( x) ) x µ t f P t In practice we consider empirical versions of and, i.e. P s Ω Pˆ s 1 1 = = N s Ns Ns, ˆ f δ P x, y t δ x, f x i= 1 Nt i= 1 ( ) s s t t i i i i

10 Joint Distribution Optimal Transport Joint distribution optimal transport loss in DA to handle a change in both marginal and conditional distributions. JDOT: f Pt Pˆ s : Ω f = 1 1 = = N s ( x, f ( x) ) x µ Ns Ns, ˆ f δ s s P x, t t i y t δ i xi, f xi i= 1 Ns i= 1 t ( ) where W1 is the 1-Wasserstein distance for the loss D. Remark: The function f we retrieve is theoretically bound with respect to the target error.

11 Joint distribution optimal transportation for domain adaptation OUTLINE Problem Statement Assumption and Notations Joint Distribution Optimal Transport Bound on the Target Error Learning with Joint Distribution OT Examples

12 A Bound on the Target Error Define the expected loss in the target domain err T ( f ) ( ) ( ) E( ) ( ) err f y, f x x, ~ T y P t Similarly, ( ) ( ) E( ) ( ) err f y, f x x, ~ S y P s Assume the loss function Symmetric: to be bounded, symmetric, k-lipschitz and satisfying the triangle inequality. ( ) ( ) y, y = y, y, y, y k-lipschitz: there exists k such that ( ) ( ) y, y y, y k y y, y, y, y Triangle inequality ( ) ( ) ( ) y, y y, y + y, y, y, y, y

13 A Bound on the Target Error PTL: Note: Given a deterministic labeling functions f and a coupling Π, it bounds the probability of finding pairs of source-target instances labelled differently in a (1/λ)-ball with respect to Π.

14 A Bound on the Target Error Correspond to the objective function Correspond to the joint error minimizer illustrating that domain adaptation can work only if we can predict well in both domains Assesses the probability under which the PTL does not hold

15 A Bound on the Target Error ( ) ( ) f ( ) f x,x ~ µ, µ ( ) > λ ( ) φ( λ ) Pr x x d x, x 1 2 s t

16 A Bound on the Target Error Proof: errt f W1 Ps P t err ' S f errt f km c δ Ns + N t ( ) ( ˆ ˆ f ), + log + ( ) + ( ) + φ( λ) Triangle inequality Definition err T ( f ), Symmetric Since, then

17 A Bound on the Target Error Proof: errt f W1 Ps P t err ' S f errt f km c δ Ns + N t ( ) ( ˆ ˆ f ), + log + ( ) + ( ) + φ( λ) Conditional probability definition Duality form of Kantorovitch- Rubinstein theorem Triangle inequality

18 A Bound on the Target Error Proof: errt f W1 Ps P t err ' S f errt f km c δ Ns + N t ( ) ( ˆ ˆ f ), + log + ( ) + ( ) + φ( λ) k-lipchitz inequality PTL Triangle, α = kλ

19 A Bound on the Target Error Proof: errt f W1 Ps P t err ' S f errt f km c δ Ns + N t ( ) ( ˆ ˆ f ), + log + ( ) + ( ) + φ( λ) Using triangle inequality of W1 distance: Using a result from Bolley s paper

20 A Bound on the Target Error Proof: errt f W1 Ps P t err ' S f errt f km c δ Ns + N t ( ) ( ˆ ˆ f ), + log + ( ) + ( ) + φ( λ) Using a result from Bolley s paper

21 A Bound on the Target Error Proof: errt f W1 Ps P t err ' S f errt f km c δ Ns + N t ( ) ( ˆ ˆ f, ) + log + ( ) + ( ) + φ( λ) ( ˆ ) c δ W P P > ε N ε 2 2 ( f ˆ f ) c δ W P P > ε Nε Pr 1 s, s exp s, 2 Pr 1 t, f exp t, ( ˆ ) ( f ˆ f ) W1 Ps, Ps + W1 Pt, Pf log + c δ N N with at least1-δ probability. s t

22 Joint distribution optimal transportation for domain adaptation OUTLINE Problem Statement Assumption and Notations Joint Distribution Optimal Transport Bound on the Target Error Learning with Joint Distribution OT Examples

23 Learning with Joint Distribution OT Optimization using BCD Assume that the function space to which f belongs is either a RKHS or a function space parametrized by some p parameters w. RKHS: Reproducing kernel Hilbert space where the loss function is continuous and differentiable with respects to its second variable. Ω f is the regularization term either a non-decreasing function of the squared-norm or a squared-norm on the vector parameter. Ω( f ) is continuously differentiable. The optimization problem with fixed leads to a new learning problem expressed as ( )

24 Learning with Joint Distribution OT Optimization using BCD ( ) y = f x k Ω( f )

25 Joint distribution optimal transportation for domain adaptation OUTLINE Problem Statement Assumption and Notations Joint Distribution Optimal Transport Bound on the Target Error Learning with Joint Distribution OT Examples

26 Examples 3-class toy example Source domain samples: drawn from three different 2D Gaussian distributions with different centers and standard deviations. (+) Target domain: obtained by rotating the source distribution by ππ/4 radian.(º) Two types of kernel are considered: linear and RBF

27 QUESTION Thank you!

examples, a set of unlabeled source examples and an unlabeled set of target examples.

28 Problem Statement Distinction between usual machine learning setting and transfer learning, and positioning of domain adaptation. The different types of domain adaptation: Unsupervised domain adaptation: the learning sample contains a set of labeled source examples, a set of unlabeled source examples and an unlabeled set of target examples. Semi-supervised domain adaptation: consider a "small" set of labeled target examples. Supervised domain adaptation: all the examples considered are supposed to be labeled.

Manifold Regularization

Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is