Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania)

Size: px

Start display at page:

Download "Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania)"

Arnold Johnson
6 years ago
Views:

1 Learning Bouds for Domain Adaptation Authors: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira and Jennifer Wortman (University of Pennsylvania) Presentation by: Afshin Rostamizadeh (New York University)

2 Classical Scenario Generalization guarantees are based on emprical error and complexity. Assumes that training set and test set are drawn from the same distribution, D.

3 Classical Scenario Generalization guarantees are based on emprical error and complexity. Assumes that training set and test set are drawn from the same distribution, D. Learner, H h tra in Real World t s te D

4 Domain Adaptation Scenario Test set and training set are drawn from different distributions. Learner, H h tra in DS Real World DT t t s e

5 Domain Adaptation Scenario Test set and training set are drawn from different distributions. Learner, H h tra in DS Real World t DT t s e Intuitively, we will need to bound the difference between DS and DT. DS? DT

6 Some Notation In general, we measure errors as follows: Notice if fd is the true labeling function, then: To keep notation consistent, we write:

7 Distance Between Distributions One natural distance, the l1 distance: In general, hard to compute from finite samples. Other natural distances: Relative Entropy, l infinity, any other lp distance.

8 A Simple Bound Let, If the the loss function is bounded by M, then,

9 The A Distance Introduce a new distance that only cares about important regions.

10 The A Distance Introduce a new distance that only cares about important regions. The set A, contains regions of importance. Consider the following set,

11 The A Distance Introduce a new distance that only cares about important regions. The set A, contains regions of importance. Consider the following set, Then,

12 The A Distance If A has finite VC dimension, d, then is the emprical estimate of D, based on m Where D (unlabeled) samples. Thus, if A has finite VC dim, da can be estimated from the emprical A distance.

13 The A Distance If A has finite VC dimension, d, then is the emprical estimate of D, based on m Where D (unlabeled) samples. Thus, if A has finite VC dim, da can be estimated from the emprical A distance. What about computing the emprical A distance?

14 Ideal Hypothesis The authors define an ideal hypothesis as follows, Similarly, define the ideal combined risk,

15 Ideal Hypothesis The authors define an ideal hypothesis as follows, Similarly, define the ideal combined risk, The ideal hypothesis is mean to embody the notion of adaptability. If the ideal hypothesis performs poorly, then one cannot hope to generalize by minimizing source error.

16 Domain Adaptation Bound Given the ideal combined risk and A distance, then we can give the following bound,

17 An Extended Scenario Suppose now, that we also have some labeled examples from the target distribution. Thus, we define a mixed error rate,

18 An Extended Scenario Suppose now, that we also have some labeled examples from the target distribution. Thus, we define a mixed error rate, What is the best mixture to use?

19 Supporting Lemmas Lemma 1, relates Dα and DT, Lemma 2, relates the empirical error and risk, holds with probability at least 1 - δ,

20 Extended Scenario Bound Bound presents trade off in choice of mixture.

21 Extended Scenario Bound Bound presents trade off in choice of mixture. Is this bound computable?

22 Experimental Results Note that theorem 2 is not tractably computable, instead authors approximate with the following, The zeta function approximates the A distance as (1 hinge loss) of a linear seperator that classifies points from each domain.

23 Experimental Results Note the ``phase shift'' in the bound: After about 3,000 points from the target distribution, it is best to ignore any number of points from the source.

24 Experimental Results A qualitative study of the bound:

Impossibility Theorems for Domain Adaptation

Impossibility Theorems for Domain Adaptation Shai Ben-David and Teresa Luu Tyler Lu Dávid Pál School of Computer Science University of Waterloo Waterloo, ON, CAN {shai,t2luu}@cs.uwaterloo.ca Dept. of Computer Science University of Toronto Toronto,