Distance Preservation - Part I

Size: px

Start display at page:

Download "Distance Preservation - Part I"

Polly Jones
6 years ago
Views:

1 October 2, 2007

2 1 Introduction 2 Scalar product Equivalence with PCA Euclidean distance 3 4 5

3 Spatial distances Only the coordinates of the points affects the distances. L p norm: a p = p D k=1 a k p Minkowski distance: d(a,b) = a b p Maximum (p = ): a b = max 1 k D a k b k City-block (p = 1): a b 1 = D k=1 a k b k D Euclidean (p = 2): a b 2 = k=1 (a k b k ) 2 Mahalanobis norm: a Mahalanobis = a T M 1 a Usually M = C aa = E {aa T } a Mahalanobis = a 2 M = I

4 Scalar product Equivalence with PCA Euclidean distance Classical metric multidimensional scaling (MDS) Preserves pairwise scalar products instead of distances. Assumes a simple generative model as with PCA: y = Wx observed variables y, uncorrelated latent variables x variables are assumed to be centered orthogonal D-by-P matrix W: W T W = I P N points in matrix form: Y = [y(1),...,y(n)] Scalar products are known: s y (i,j) = y(i),y(j) = y(i) T y(j) Then S = [s y (i,j)] 1 i,j N = Y T Y = X T W T WX = X T X Usually Y and X are unknown

5 MDS: Finding latent variables Scalar product Equivalence with PCA Euclidean distance The eigenvalue decomposition of the Gram matrix S: S = UΛU T = (Λ 1/2 U T ) T (Λ 1/2 U T ) Eigenvalues sorted in descending order P-dimensional latent variables: ˆX = I P N Λ 1/2 U T

6 Equivalence of PCA and MDS Scalar product Equivalence with PCA Euclidean distance PCA and MDS give the same projection. SVD: Y = VΣU T Then Ĉyy YY T = VΛ PCA V T and S = Y T Y = UΛ MDS U T, where Λ PCA = ΣΣ T and Λ MDS = Σ T Σ We get: ˆX MDS = I P N Λ 1/2 MDS UT = I P N ΣU T = I P N V T VΣU T = I P N V T Y = ˆX PCA Thus MDS minimizes the criterion: E MDS = N i,j=1 (s y(i,j) sˆx (i,j)) 2

7 Three ways to calculate PCA Scalar product Equivalence with PCA Euclidean distance The P-by-N matrix Y is known. PCA: 1 P N: Ĉ yy YY T = VΛ PCA V T ˆX PCA = I P N V T Y 2 P N S = Y T Y = UΛ MDS U T ˆX PCA = ˆX MDS = I P N Λ 1/2 MDS UT 3 P N Y = VΣU T ˆX PCA = I P N V T Y

8 MDS with Euclidean distances Scalar product Equivalence with PCA Euclidean distance Instead of scalar products, pairwise distances are known D = [d 2 y (i,j)] 1 i,j N Solution: transform distances to scalar products dy 2(i,j) = y(i) y(j),y(i) y(j) = s y(i,i) 2s y (i,j)+s y (j,j) s y (i,j) = 1 ( 2 d 2 y (i,j) s y (i,i) s y (j,j) )

9 Double centering of D Scalar product Equivalence with PCA Euclidean distance Calculate the mean: µ j (d 2 y(i,j)) = µ j ( y(i) y(j),y(i) y(j) ) = y(i),y(i) 2 y(i),µ j (y(j)) + µ j ( y(j),y(j) ) = s y (i,i) + µ j (s y (j,j)) µ i (d 2 y (i,j)) = µ i (s y (i,i)) + s y (j,j) µ i,j (d 2 y (i,j)) = µ i(s y (i,i)) + µ j (s y (j,j)) s y (i,j) = 1 2 (d2 y (i,j) µ j(dy 2(i,j)) µ i(dy 2(i,j))+µ i,j(dy 2(i,j))) S = 1 ( 2 D 1 N D1 N1 T N 1 N 1 N1 T N D N 2 N 1 T N D1 N1 T ) N

10 MDS: Algorithm Introduction Scalar product Equivalence with PCA Euclidean distance 1 If data is Y, center it, compute S = Y T Y and go to step 3 2 If pairwise distances D, transform them to scalar products S by double centering 3 EVD: S = UΛU T 4 ˆX = IP N Λ 1/2 U T

11 Embedding of test set Scalar product Equivalence with PCA Euclidean distance Test set as coordinates: Test point y ˆx = I P D V T y Test set as scalar products: Test point s = Y T y ˆx = I P N Λ 1/2 U T s Test set as distances Test point ( d = [ y(i) y,y(i) y ] 1 i N s 1 2 d 1 N 1 N1 T N d 1 N D1 N + 1 N 1 2 N 1 T N D1 ) N

12 : Embeddings with MDS Scalar product Equivalence with PCA Euclidean distance

13 MDS variants Introduction Scalar product Equivalence with PCA Euclidean distance Classical metric MDS preserves only the pairwise scalar products Variants try to preserve the pairwise distances directly by minimizing the stress function of metric MDS: E mmds = 1 2 N w ij (d y (i,j) d x (i,j)) 2 i,j=1 Variants do not depend on any generative model

14 (NLM) Sammon s stress function: E NLM = 1 c c = N i,j=1 N d y (i,j) i=1 i<j (d y (i,j) d x (i,j)) 2 d y (i,j) Minimizes it iteratively with quasi-newton optimization: E NLM x k (i) x k (i) x k (i) α 2 E NLM x k (i) 2

15 NLM: Derivation Introduction Direct calculation gives E NLM x k (i) = E NLM d x (i,j) N = 2 c 2 E NLM 2 xk 2 = (i) c j=1 j i N j=1 j i d x (i,j) x k (i) d y (i,j) d x (i,j) d y (i,j)d x (i,j) (x k(i) x k (j)) ( dy (i,j) d x (i,j) d y (i,j)d x (i,j) (x k(i) x k (j)) 2 ) dx 3(i,j)

16 NLM: Algorithm Introduction 1 Compute pairwise distances d y (i,j) 2 Initialize points x(i) randomly or by PCA 3 Calculate the quasi-newton update for each point 4 Update the coordinates of all points x(i) 5 Return to step 3 until convergence

17 Embedding of test set No easy way to generalize the embedding for new points: Updating only the new point with quasi-newton Interpolation procedure of Curvilinear Component Analysis Neural variants of NLM, like the SAMANN

18 : Embeddings with NLM

19 (CCA) Minimizes stress function: E CCA = 1 N (d y (i,j) d x (i,j)) 2 F λ (d x (i,j)) 2 i,j=1 Typically F λ is monotonically decreasing: F λ (d x ) = exp ( ) dx λ F λ (d x ) = H(λ d x ), where H(u) = 0 if u 0 and 1 otherwise

20 CCA: Derivation Introduction Minimization by gradient descent: Direct calculation gives: x(i) E CCA = N j=1 x(i) x(i) α x(i) E CCA (d y d x ) ( 2F λ (d x ) (d y d x )F λ (d x) ) x(j) x(i) d x, where d y = d y (i,j) and d x = d x (i,j)

21 Condition for λ Introduction The condition 2F λ (d x ) > (d y d x )F λ (d x) guarantees that distances change reasonably F λ (d x ) = exp ( ) dx λ : λ > 1 2 (d x d y ) F λ (d x ) = H(λ d x ): The condition is always fulfilled The parameters α and λ can be decreased during the convergence

22 CCA: Problem with traditional gradient descent Gradient descent can get stuck into local minimum: Better solution: Stochastic gradient descent

23 CCA: Stochastic gradient descent Decompose E CCA : E CCA = E i CCA = 1 2 N i=1 E i CCA N (d y (i,j) d x (i,j)) 2 F λ (d x (i,j)) j=1 Separate optimization: x(j) x(j) α x(j) ECCA i x(i) x(j) x(j) αβ(i,j), d x where β(i,j) = (d y d x )(2F λ (d x ) (d y d x )F λ (d x))

24 CCA: Algorithm Introduction 1 Perform vector quantization for size reduction 2 Compute pairwise distances d y (i,j) 3 Initialize points x(i) randomly or by PCA 4 Give learning rate α and neighborhood width λ 5 Select a point x(i) and update all others 6 Return to step 5 until all points x(i) selected in this epoch 7 If not converged, return to step 4

25 Embedding of test set Original points are fixed For each test point, the update rule is applied to move it to the right position

26 : Embeddings with CCA

27 Introduction Three dimensionality reduction methods based on distance preservation: multidimensional scaling, Shannon s nonlinear mapping, curvilinear component analysis MDS is a generalization of PCA to pairwise scalar products and distances NLM and CCA preserve distances directly by minimizing a corresponding stress function

Distance Preservation - Part 2

Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear