Comparing Dissimilarity Representations

Size: px

Start display at page:

Download "Comparing Dissimilarity Representations"

Clifton Mosley
5 years ago
Views:

1 Comparing Dissimilarity Representations Adam Cardinal-Stakenas Department of Applied Mathematics & Statistics Johns Hopkins University Interface, Durham, NC 1 / 18

2 Comparing Dissimilarity Representations Adam Cardinal-Stakenas Department of Applied Mathematics & Statistics Johns Hopkins University Carey E. Priebe Zhiliang Ma Interface, Durham, NC Youngser Park 1 / 18

3 Introduction The Statistical Inference Problem Dissimilarities Approaches Framework Multiple Dissimilarities Motivation Examples Image/Caption Fusion 2 / 18

4 The Problem: Classification Setup: (X i, Y i ) i.i.d F XY with X i : Ω Ξ, and Y i : Ω {1,, J}. 3 / 18

5 The Problem: Classification Setup: (X i, Y i ) i.i.d F XY with X i : Ω Ξ, and Y i : Ω {1,, J}. Examples: Image analysis Text analysis Graph data Computational anatomy 3 / 18

6 The Problem: Classification Setup: (X i, Y i ) i.i.d F XY with X i : Ω Ξ, and Y i : Ω {1,, J}. Examples: Image analysis Text analysis Graph data Computational anatomy Goal: To obtain superior performance in the exploitation task using comparisons of the data. 3 / 18

7 Dissimilarities Definition A dissimilarity measure is a function δ : Ξ Ξ R + {0} with: 1. Positivity: δ(x 1, x 2 ) 0, 2. Identifiability: δ(x 1, x 2 ) = 0 x 1 = x Reflexivity: δ(x, x) = 0, 4. Symmetry: δ(x 1, x 2 ) = δ(x 2, x 1 ), 4 / 18

8 Dissimilarities Definition A dissimilarity measure is a function δ : Ξ Ξ R + {0} with: 1. Positivity: δ(x 1, x 2 ) 0, 2. Identifiability: δ(x 1, x 2 ) = 0 x 1 = x Reflexivity: δ(x, x) = 0, 4. Symmetry: δ(x 1, x 2 ) = δ(x 2, x 1 ), A dissimilarity representation for a set of n objects is expressed as a symmetric and hollow matrix with positive off-diagonal entries. 4 / 18

9 Dissimilarities Definition A dissimilarity measure is a function δ : Ξ Ξ R + {0} with: 1. Positivity: δ(x 1, x 2 ) 0, 2. Identifiability: δ(x 1, x 2 ) = 0 x 1 = x Reflexivity: δ(x, x) = 0, 4. Symmetry: δ(x 1, x 2 ) = δ(x 2, x 1 ), A dissimilarity representation for a set of n objects is expressed as a symmetric and hollow matrix with positive off-diagonal entries. We may observe numerous dissimilarities. That is, for a given Ξ, we may have available or be able to calculate, δ 1, δ 2,..., δ k where δ i : Ξ Ξ R is a dissimilarity function for i {1,..., l} 4 / 18

10 Example: R 2 y a x δ δ(a, b) δ(b, c) δ(a, c) Man Euc Min. p= Sup c b Manhattan Distance: δ Man. = x 1 x 2 + y 1 y 2 Euclidean Distance: δ Euc. = (x 1 x 2 ) 2 + (y 1 y 2 ) 2 Minkowsi Distance: δ Min. p = p x 1 x 2 p + y 1 y 2 p Supremum Distance: δ Sup. = max{ x 1 x 2, y 1 y 2 } 5 / 18

11 Introduction The Statistical Inference Problem Dissimilarities Approaches Framework Multiple Dissimilarities Motivation Examples Image/Caption Fusion 6 / 18

12 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding

13 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding 1. The neighborhood-based approach interprets dissimilarities as neighborhood relations.

14 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding 1. The neighborhood-based approach interprets dissimilarities as neighborhood relations. 2. The dissimilarity space approach defines a representation set R = {p 1,, p r }, and interprets dissimilarities from a point to each element of the representation set as features of this point.

15 Framework Three dissimilarity classification approaches Dissimilarity Matrix (n x n) Dissimilarity Matrix (n x n) Feature Matrix (n x d) Neighborhood-based Dissimilarity Space-based Embedding 1. The neighborhood-based approach interprets dissimilarities as neighborhood relations. 2. The dissimilarity space approach defines a representation set R = {p 1,, p r }, and interprets dissimilarities from a point to each element of the representation set as features of this point. 3. The embedding approach embeds dissimilarities into R d in such a way that the configuration s interpoint distances approximate the original dissimilarities.

16 X Ξ F X R d 1 δ 1 D F g δ ˆL F g X R d 2 δ g D ˆL M g δ ˆL F, F, F: feature extraction procedures δ {δ 1,..., δ k }: dissimilarity function g : classifer g δ : dissimilarity based classifier X R k δ g ˆL ˆL 8 / 18

17 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. 9 / 18

18 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). 9 / 18

19 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). We define congruence using the multidimensional scaling criterion function [Borg & Groenen]: i<j µ : D D [0, 1], µ(d(x), δ) = d(x i, X j )δ ij i<j d(x i, X j ) 2. i<j δ2 ij 9 / 18

20 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). We define congruence using the multidimensional scaling criterion function [Borg & Groenen]: i<j µ : D D [0, 1], µ(d(x), δ) = d(x i, X j )δ ij i<j d(x i, X j ) 2. i<j δ2 ij We calculate c( ) = µ(, Y ). 9 / 18

21 Comparing Dissimilarities To choose a dissimilarity, a classifier-independent methodology for evaluating a dissimilarity representation is needed. To achieve this, measure the congruence from an observed dissimilarity matrix to Y, the ideal dissimilarity matrix (from the training data). We define congruence using the multidimensional scaling criterion function [Borg & Groenen]: i<j µ : D D [0, 1], µ(d(x), δ) = d(x i, X j )δ ij i<j d(x i, X j ) 2. i<j δ2 ij We calculate c( ) = µ(, Y ). Larger values indicate stronger correlation to the ideal dissimilarity. 9 / 18

22 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example 10 / 18

23 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} 10 / 18

24 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} L NN = E{2η(X)(1 η(x))} 10 / 18

25 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} L NN = { E{2η(X)(1 η(x))} η(x)(1 } ρ = E η(x) 10 / 18

26 Motivation In the style of Devroye, Györfi, and Lugosi, the typical comparison measure for a classification problem involve calculating some measurement based on the posterior probabilities. For example L = E{min(η(X), 1 η(x))} L NN = { E{2η(X)(1 η(x))} η(x)(1 } ρ = E η(x) The congruence coefficient is classifier independent in the sense that it only depends on the data, however it is not (obviously) asymptotically related to the posteriors. 10 / 18

27 Introduction The Statistical Inference Problem Dissimilarities Approaches Framework Multiple Dissimilarities Motivation Examples Image/Caption Fusion 11 / 18

Image/Caption Fusion Motivation Example: Image/Caption Fusion: Images and

The geometries of image spaces and caption spaces are not well understood.

Tigers troops club shortstop looking after A two-monthold Guillen at

28 Image/Caption Fusion Motivation Example: Image/Caption Fusion: Images and text are most naturally considered marginally. The geometries of image spaces and caption spaces are not well understood. Our fusion methodologies allow for subsequent joint analyses. Images!I Captions!C This undated picture Tiger shows Woods Sri lets Lankan Detroit go of his Tigers troops club shortstop looking after A two-monthold Guillen at weapons hitting Carlos a seized wayward throws from the shot Indochinese the ball on the to first 7th in tee tiger cub a is double seen play inside its cage at the / 18

29 Image/Caption Fusion The data 140,577 images & captions were collected from Yahoo! Photos website 1,600 pairs were selected using query word tiger on captions. They were labeled manually based only on captions: label # animal tiger 148 Detroit Tigers baseball team 145 Tiger Woods the golfer 897 Tamil Tigers soldiers of Sri Lanka 330 Leicester Tigers rugby team 48 others 32 Two class problem: Tiger Woods and Tamil Tigers. 13 / 18

30 2 D m.d.s X 2 R d 2 X 1 R d 1 δ 1 δ 2 3 D C Ξ WC X R d MI X 3 R d 3 δ 3 δ 1 m.d.s WC: Word count histogram MI: Mutual Information δ 1 =Euclidecan distance 1 D δ 2 =cosine distance m.d.s δ 3 =random forests prox. X 4 R d 4 m.d.s.=multidimensional scaling 14 / 18

31 Comparing Dissimilarities c-function: comparison of various caption dissimilarities cos. dist. emb. rand. for. emb. word count M.I. w/ disc. cos. dist. rand. for. Density / 18

32 Comparing Dissimilarities Estimated probability of misclassification cos. dist. rand. for. word count M.I. w/ disc. Density / 18

33 Conclusion and Future Work We can evaluate how closely a particular dissimilarity representation matches the dissimilarities between the class labels. This measure relates to classifier performance, but it does not depend on the use of a particular classifier. More work is needed to investigate how congruence coefficient changes with the dimensionality of the data. To improve the performace of the congruence coefficient for predicting classifier performace we wish to investigate tailoring the ideal dissimilarity to the observed data. 17 / 18

34 Questions? Adam Cardinal-Stakenas Department of Applied Mathematics and Statistics Johns Hopkins University adam 18 / 18

arxiv: v1 [stat.ml] 17 Sep 2012

arxiv: v1 [stat.ml] 17 Sep 2012 Generalized Canonical Correlation Analysis for Disparate Data Fusion Ming Sun a, Carey E. Priebe b,, Minh Tang c arxiv:1209.3761v1 [stat.ml] 17 Sep 2012 a Department of Electrical and Computer Engineering,