Dimensionality Reduction
|
|
- Sharleen Heath
- 5 years ago
- Views:
Transcription
1 Dimensionality Reduction Neil D. Lawrence Mathematics for Data Modelling University of Sheffield January 23rd 28 Neil Lawrence () Dimensionality Reduction Data Modelling School 1 / 7
2 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 2 / 7
3 Online Resources All source code and slides are available online This talk available from my home page (see talks link on left hand side). MATLAB examples in the dimred toolbox (vrs.1) MATLAB commands used for examples given in typewriter font. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7
4 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 4 / 7
5 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7
6 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7
7 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7
8 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7
9 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
10 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
11 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
12 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
13 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
14 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
15 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
16 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
17 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
18 MATLAB Demo demdigitsmanifold([1 2], all ) Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7
19 MATLAB Demo demdigitsmanifold([1 2], all ).1.5 PC no PC no 1 Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7
20 MATLAB Demo demdigitsmanifold([1 2], sixnine ).1.5 PC no PC no 1 Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7
21 Low Dimensional Manifolds Pure Rotation is too Simple In practice the data may undergo several distortions. e.g. digits undergo thinning, translation and rotation. For data with structure : we expect fewer distortions than dimensions; we therefore expect the data to live on a lower dimensional manifold. Conclusion: deal with high dimensional data by looking for lower dimensional non-linear embedding. Neil Lawrence () Dimensionality Reduction Data Modelling School 8 / 7
22 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 9 / 7
23 Notation q dimension of latent/embedded space D dimension of data space N number of data points data matrix, Y = [y 1,:,..., y N,: ] T = [y :,1,..., y :,D ] R N D latent variables, X = [x 1,:,..., x N,: ] T = [x :,1,..., x :,q ] R N q mapping matrix, W R D q centering matrix, H = I N 1 11 T R N N Neil Lawrence () Dimensionality Reduction Data Modelling School 1 / 7
24 Reading Notation a i,: is a vector from the ith row of a given matrix A. a :,j is a vector from the jth row of a given matrix A. X and Y are design matrices. Centred data matrix given by Ŷ = HY. Background Sample covariance given by S = N 1 Ŷ T Ŷ. Centred inner product matrix given by K = ŶŶ T. Neil Lawrence () Dimensionality Reduction Data Modelling School 11 / 7
25 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 12 / 7
26 Data Representation Classical statistical approach: represent via proximities. [Mardia, 1972] Proximity data: similarities or dissimilarities. Example of a dissimilarity matrix: a distance matrix. d i,j = y i,: y j,: 2 = (y i,: y j,: ) T (y i,: y j,: ) For a data set can display as a matrix. Neil Lawrence () Dimensionality Reduction Data Modelling School 13 / 7
27 Interpoint Distances for Rotated Sixes Figure: Interpoint distances for the rotated digits data. Neil Lawrence () Dimensionality Reduction Data Modelling School 14 / 7
28 Multidimensional Scaling Find a configuration of points, X, such that each δ i,j = x i,: x j,: 2 closely matches the corresponding d i,j in the distance matrix. Need an objective function for matching = (δ i,j ) i,j to D = (d i,j ) i,j. Neil Lawrence () Dimensionality Reduction Data Modelling School 15 / 7
29 Feature Selection An entrywise L 1 norm on difference between squared distances E (X) = N N d ij 2 δij 2. i=1 j=1 Reduce dimension by selecting features from data set. Select for X, in turn, the column from Y that most reduces this error until we have the desired q. To minimise E (Y) we compose X by extracting the columns of Y which have the largest variance. Derive Algorithm Neil Lawrence () Dimensionality Reduction Data Modelling School 16 / 7
30 Reconstruction from Latent Space Left: distances reconstructed with two dimensions. Right: distances reconstructed with 1 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 17 / 7
31 Reconstruction from Latent Space Left: distances reconstructed with 1 dimensions. Right: distances reconstructed with 1 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 17 / 7
32 Feature Selection Figure: demrotationdist. Feature selection via distance preservation. Neil Lawrence () Dimensionality Reduction Data Modelling School 18 / 7
33 Feature Selection Figure: demrotationdist. Feature selection via distance preservation. Neil Lawrence () Dimensionality Reduction Data Modelling School 18 / 7
34 Feature Selection Figure: demrotationdist. Feature selection via distance preservation. Neil Lawrence () Dimensionality Reduction Data Modelling School 18 / 7
35 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7
36 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7
37 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7
38 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7
39 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7
40 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances. Residuals are much reduced. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7
41 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances. Residuals are much reduced. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7
42 Which Rotation? We need the rotation that will minimise residual error. We already derived an algorithm for discarding directions. Discard direction with maximum variance. Error is then given by the sum of residual variances. E (X) = 2N 2 D k=q+1 σ 2 k. Rotations of data matrix do not effect this analysis. Neil Lawrence () Dimensionality Reduction Data Modelling School 2 / 7
43 Rotation Reconstruction from Latent Space Left: distances reconstructed with two dimensions. Right: distances reconstructed with 1 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 21 / 7
44 Rotation Reconstruction from Latent Space Left: distances reconstructed with 1 dimensions. Right: distances reconstructed with 36 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 21 / 7
45 Reminder: Principal Component Analysis How do we find these directions? Find directions in data with maximal variance. That s what PCA does! PCA: rotate data to extract these directions. PCA: work on the sample covariance matrix S = N 1 Ŷ T Ŷ. Neil Lawrence () Dimensionality Reduction Data Modelling School 22 / 7
46 Principal Component Analysis Find a direction in the data, x :,1 = Ŷr 1, for which variance is maximised. ) r 1 = argmax r1 var (Ŷr1 subject to : r T 1 r 1 = 1 Can rewrite in terms of sample covariance ( ) T ( ) var (x :,1 ) = N 1 Ŷr 1 Ŷr1 = r1 T N 1 Ŷ T Ŷ r 1 = r1 T Sr 1 }{{} sample covariance Neil Lawrence () Dimensionality Reduction Data Modelling School 23 / 7
47 Lagrangian Solution via constrained optimisation: L (r 1, λ 1 ) = r T 1 Sr 1 + λ 1 ( 1 r T 1 r 1 ) Gradient with respect to r 1 dl (r 1, λ 1 ) dr 1 = 2Sr 1 2λ 1 r 1 rearrange to form Sr 1 = λ 1 r 1. Which is recognised as an eigenvalue problem. Neil Lawrence () Dimensionality Reduction Data Modelling School 24 / 7
48 Lagrange Multiplier Recall the gradient, dl (r 1, λ 1 ) dr 1 = 2Sr 1 2λ 1 r 1 (1) to find λ 1 premultiply (1) by r T 1 and rearrange giving λ 1 = r T 1 Sr 1. Maximum variance is therefore necessarily the maximum eigenvalue of S. This is the first principal component. Neil Lawrence () Dimensionality Reduction Data Modelling School 25 / 7
49 Further Directions Find orthogonal directions to earlier extracted directions with maximal variance. Orthogonality constraints, for j < k we have Lagrangian r T j r k = r T k r k = 1 L (r k, λ k, γ) = rk T ( ) k 1 Sr k + λ k 1 r T k r k + γ j rj T r k j=1 dl (r k, λ k ) k 1 = 2Sr k 2λ k r k + γ j r j dr k j=1 Neil Lawrence () Dimensionality Reduction Data Modelling School 26 / 7
50 Further Eigenvectors Gradient of Lagrangian: dl (r k, λ k ) k 1 = 2Sr k 2λ k r k + γ j r j (2) dr k Premultipling (2) by r i with i < k implies which allows us to write Premultiplying (2) by r k implies γ i = Sr k = λ k r k. λ k = r T k Sr k. This is the kth principal component. j=1 Neil Lawrence () Dimensionality Reduction Data Modelling School 27 / 7
51 Principal Coordinates Analysis The rotation which finds directions of maximum variance is the eigenvectors of the covariance matrix. The variance in each direction is given by the eigenvalues. Problem: working directly with the sample covariance, S, may be impossible. For example: perhaps we are given distances between data points, but not absolute locations. No access to absolute positions: cannot compute original sample covariance. Neil Lawrence () Dimensionality Reduction Data Modelling School 28 / 7
52 An Alternative Formalism Matrix representation of eigenvalue problem for first q eigenvectors. Ŷ T ŶR q = R q Λ q R q R D q (3) Premultiply by Ŷ: ŶŶ T ŶR q = ŶR q Λ q Postmultiply by Λ 1 2 q ŶŶ T ŶR q Λ 1 2 q = ŶR q Λ q Λ 1 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 29 / 7
53 An Alternative Formalism Matrix representation of eigenvalue problem for first q eigenvectors. Ŷ T ŶR q = R q Λ q R q R D q (3) Premultiply by Ŷ: ŶŶ T ŶR q = ŶR q Λ q Postmultiply by Λ 1 2 q ŶŶ T ŶR q Λ 1 2 q = ŶR q Λ 1 2 q Λ q Neil Lawrence () Dimensionality Reduction Data Modelling School 29 / 7
54 An Alternative Formalism Matrix representation of eigenvalue problem for first q eigenvectors. Ŷ T ŶR q = R q Λ q R q R D q (3) Premultiply by Ŷ: ŶŶ T ŶR q = ŶR q Λ q Postmultiply by Λ 1 2 q ŶŶ T U q = U q Λ q U q = ŶR q Λ 1 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 29 / 7
55 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q ŶT ŶŶ T ŶR q Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7
56 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T (ŶT ) 2 q Ŷ Rq Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7
57 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T (ŶT ) 2 q Ŷ Rq Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7
58 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T (ŶT ) 2 q Ŷ Rq Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7
59 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7
60 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
61 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
62 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
63 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. [ R T q RΛ 2 R T R q ] Λ 1 2 q U T q ŶŶ T U q = Λ 1 2 q Product of the first q eigenvectors with the rest, R T R q = [ Iq ] R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
64 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 [ 2 q R T q RΛ 2 R T ] R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
65 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q Λ 2 qλ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
66 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
67 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. ŶŶ T U q = U q Λ q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7
68 Equivalent Eigenvalue Problems Two eigenvalue problems are equivalent. One solves for the rotation, the other solves for the location of the rotated points. When D < N it is easier to solve for the rotation, R q. But when D > N we solve for the embedding (principal coordinate analysis). In MDS we may not know Y, cannot compute Ŷ T Ŷ from distance matrix. Can we compute ŶŶ T instead? Neil Lawrence () Dimensionality Reduction Data Modelling School 32 / 7
69 The Covariance Interpretation N 1 Ŷ T Ŷ is the data covariance. ŶŶ T is a centred inner product matrix. Also has an interpretation as a covariance matrix (Gaussian processes). It expresses correlation and anti correlation between data points. Standard covariance expresses correlation and anti correlation between data dimensions. Neil Lawrence () Dimensionality Reduction Data Modelling School 33 / 7
70 Distance to Similarity: A Gaussian Covariance Interpretation Translate between covariance and distance. Consider a vector sampled from a zero mean Gaussian distribution, z N (, K). Expected square distance between two elements of this vector is di,j 2 = (z i z j ) 2 di,j 2 = zi 2 + z 2 j 2 zi z j under a zero mean Gaussian with covariance given by K this is d 2 i,j = k i,i + k j,j 2k i,j. Take the distance to be square root of this, d i,j = (k i,i + k j,j 2k i,j ) 1 2. Neil Lawrence () Dimensionality Reduction Data Modelling School 34 / 7
71 Standard Transformation This transformation is known as the standard transformation between a similarity and a distance [Mardia et al., 1979, pg 42]. If the covariance is of the form K = ŶŶ T then k i,j = y T i,: y j,: and d i,j = ( y T i,:y i,: + y T j,:y j,: 2y T i,:y j,: ) 1 2 = y i,: y j,: 2. For other distance matrices this gives us an approach to covert to a similarity matrix or kernel matrix so we can perform classical MDS. Neil Lawrence () Dimensionality Reduction Data Modelling School 35 / 7
72 Example: Road Distances with Classical MDS Classical example: redraw a map from road distances (see e.g. Mardia et al. 1979). Here we use distances across Europe. Between each city we have road distance. Enter these in a distance matrix. Convert to a similarity matrix using the covariance interpretation. Perform eigendecomposition. See for the data we used. Neil Lawrence () Dimensionality Reduction Data Modelling School 36 / 7
73 Distance Matrix Convert distances to similarities using covariance interpretation x Figure: Left: road distances between European cities visualised as a matrix. Right: similarity matrix derived from these distances. If this matrix is a covariance matrix, then expected distance between samples from this covariance is given on the left. Neil Lawrence () Dimensionality Reduction Data Modelling School 37 / 7
74 Example: Road Distances with Classical MDS 6 o N 54 o N 48 o N 42 o N 36 o N Edinburgh Sassnitz Dublin Hamburg AmsterdamBerlin Warszawa London Bruxelles Praha Luxembourg Paris Wien Budapest Bern Innsbruck Milano Ljubljana Bordeaux Genova BeogradBucuresti Sofija Barcelona Roma Brindisi Istanbul Madrid Lisboa Moskva 3 o N 12 o W o 12 o E 24 o E 36 o E 48 o E Figure: demcmdsroaddata. Reconstructed locations projected onto true map using Procrustes rotations. Neil Lawrence () Dimensionality Reduction Data Modelling School 38 / 7
75 Beware Negative Eigenvalues 5 x 17 6 o N 4 54 o N 48 o N 42 o N 36 o N Edinburgh Sassnitz Dublin Hamburg AmsterdamBerlin Warszawa London Bruxelles Praha Luxembourg Paris Wien Budapest Bern Innsbruck Milano Ljubljana Bordeaux Genova BeogradBucuresti Sofija Barcelona Roma Brindisi Istanbul Madrid Lisboa Moskva o N 12 o W o 12 o E 24 o E 36 o E 48 o E Figure: Eigenvalues of the similarity matrix are negative in this case. Neil Lawrence () Dimensionality Reduction Data Modelling School 39 / 7
76 European Cities Distance Matrices Figure: Left: the original distance matrix. Right: the reconstructed distance matrix. Neil Lawrence () Dimensionality Reduction Data Modelling School 4 / 7
77 Other Distance Similarity Measures Can use similarity/distance of your choice. Beware though! The similarity must be positive semi definite for the distance to be Euclidean. Why? Can immediately see positive definite is sufficient from the covariance intepretation. For more details see [Mardia et al., 1979, Theorem ]. Neil Lawrence () Dimensionality Reduction Data Modelling School 41 / 7
78 A Class of Similarities for Vector Data All Mercer kernels are positive semi definite. Example, squared exponential (also known as RBF or Gaussian) ( ) k i,j = exp y i,: y j,: 2 2l 2. This leads to a kernel eigenvalue problem. This is known as Kernel PCA Schölkopf et al Neil Lawrence () Dimensionality Reduction Data Modelling School 42 / 7
79 Implied Distance Matrix What is the equivalent distance d i,j? d i,j = k i,i + k j,j 2k i,j If point separation is large, k i,j. k i,i = 1 and k j,j = 1. d i,j = 2 Kernel with RBF kernel projects along axes PCA can produce poor results. Uses many dimensions to keep dissimilar objects a constant amount apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 43 / 7
80 Implied Distances on Rotated Sixes Figure: Left: similarity matrix for RBF kernel on rotated sixes. Right: implied distance matrix for kernel on rotated sixes. Note that most of the distances are set to Neil Lawrence () Dimensionality Reduction Data Modelling School 44 / 7
81 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7
82 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7
83 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7
84 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7
85 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7
86 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7
87 MDS Conclusions Multidimensional scaling: preserve a distance matrix. Classical MDS a particular objective function for Classical MDS distance matching is equivalent to maximum variance spectral decomposition of the similarity matrix For Euclidean distances in Y space classical MDS is equivalent to PCA. known as principal coordinate analysis (PCO) Haven t discussed choice of distance matrix. Neil Lawrence () Dimensionality Reduction Data Modelling School 46 / 7
88 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 47 / 7
89 Data (a) plane (b) swissroll (c) trefoil Figure: Illustrative data sets for the talk. Each data set is generated by calling generatemanifolddata(datatype). The datatype argument is given below each plot. Neil Lawrence () Dimensionality Reduction Data Modelling School 48 / 7
90 Isomap Tenenbaum et al. 2 MDS finds geometric configuration preserving distances MDS applied to Manifold distance Geodesic Distance = Manifold Distance Cannot compute geodesic distance without knowing manifold Neil Lawrence () Dimensionality Reduction Data Modelling School 49 / 7
91 Isomap Fig. 2. The residual variance of PCA (open triangles), MDS [open triangles in (A) through (C); open circles in (D)], and Isomap (filled circles) on four data sets (42). (A) Face images varying in pose and illumination (Fig. 1A). (B) Swiss roll data (Fig. 3). (C) Hand images varying in finger extension and wrist rotation (2). (D) Handwritten 2 s (Fig. 1B). In all cas- set of handwritten digits, which does not have a clear manifold geometry, Isomap still finds globally meaningful coordinates (Fig. 1B) and nonlinear structure that PCA or MDS do not detect (Fig. 2D). For all three data sets, the natural appearance of linear interpolations between distant points in the low-dimensional coordinate space confirms that Isomap has captured the data s perceptually relevant structure (Fig. 4). Previous attempts to extend PCA and MDS to nonlinear data sets fall into two broad classes, each of which suffers from limitations overcome by our approach. Local linear techniques (21 23) are not designed to Isomap: define neighbours and compute represent distances the global structure ofbetween a data set neighbours. es, residual variance decreases as the dimensionality d is increased. The intrinsic dimensionality of the data can be estimated by looking for the elbow within a single coordinate system, as we do in Fig. 1. Nonlinear techniques based on greedy optimization procedures (24 3) attempt to discover global structure, but lack the crucial algorithmic features that Isomap inherits Geodesic Distance approximated by shortest from PCA and MDS: path a noniterative, through polynomial time procedure with a guarantee of glob- adjacency matrix. al optimality; for intrinsically Euclidean man- at which this curve ceases to decrease significantly with added dimensions. Arrows mark the true or approximate dimensionality, when known. Note the tendency of PCA and MDS to overestimate the dimensionality, in contrast to Isomap. Fig. 3. The Swiss roll data set, illustrating how Isomap exploits geodesic paths for nonlinear dimensionality reduction. (A) For two arbitrary points (circled) on a nonlinear manifold, their Euclidean distance in the highdimensional input space (length of dashed line) may not accurately reflect their intrinsic similarity, as measured by geodesic distance along the low-dimensional manifold (length of solid curve). (B) The neighborhood graph G constructed in step one of Isomap (with K 7 and N 1 data points) allows an approximation (red segments) to the true geodesic path to be computed efficiently in step two, as the shortest path in G. (C) The two-dimensional embedding recovered by Isomap in step three, which best preserves the shortest path distances in the neighborhood graph (overlaid). Straight lines in the embedding (blue) now represent simpler and cleaner approximations to the true geodesic paths than do the corresponding graph paths (red). SCIENCE VOL DECEMBER Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7
92 Isomap Examples demisomap Data generation Carl Henrik Ek Neil Lawrence () Dimensionality Reduction Data Modelling School 51 / 7
93 Isomap Examples demisomap 2 1 Data generation Carl Henrik Ek Neil Lawrence () Dimensionality Reduction Data Modelling School 51 / 7
94 Isomap Examples demisomap Data generation Carl Henrik Ek Neil Lawrence () Dimensionality Reduction Data Modelling School 51 / 7
95 Isomap: Summary MDS on shortest path approximation of manifold distance + Simple + Intrinsic dimension from eigen spectra - Solves a very large eigenvalue problem - Cannot handle holes or non-convex manifold - Sensitive to short circuit Neil Lawrence () Dimensionality Reduction Data Modelling School 52 / 7
96 Inverse Covariance From the covariance interpretation we think of the similarity matrix as a covariance. Each element of the covariance is a function of two data points. Another option is to specify the inverse covariance. If the inverse covariance between two points is zero. Those points are independent given all other points. This is a conditional independence. Describes how points are connected. Laplacian Eigenmaps and LLE can both be seen as specifiying the inverse covariance. Neil Lawrence () Dimensionality Reduction Data Modelling School 53 / 7
97 LLE Examples demlle neighbours used. No playing with settings. Neil Lawrence () Dimensionality Reduction Data Modelling School 54 / 7
98 LLE Examples demlle neighbours used. No playing with settings. Neil Lawrence () Dimensionality Reduction Data Modelling School 54 / 7
99 LLE Examples demlle neighbours used. No playing with settings. Neil Lawrence () Dimensionality Reduction Data Modelling School 54 / 7
100 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7
101 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7
102 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7
103 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7
104 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 56 / 7
105 Model Selection Observed data have been sampled from low dimensional manifold y = f (x) Idea: Model f rank embedding according to 1 Data fit of f 2 Complexity of f How to model f? 1 Making as few assumtpions about f as possible? 2 Allowing f from as rich class as possible? Neil Lawrence () Dimensionality Reduction Data Modelling School 57 / 7
106 Gaussian Processes Generalisation of Gaussian Distribution over infinite index sets Can be used specify distributions over functions Regression y = f (x) + ɛ p(y X, Φ) = p(y f, X, Φ)p(f X, Φ)df p(f X, Φ) = N (, K) ˆΦ = argmax Φ p(y X, Φ) Neil Lawrence () Dimensionality Reduction Data Modelling School 58 / 7
107 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
108 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
109 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
110 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
111 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
112 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
113 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
114 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
115 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7
116 Gaussian Process Latent Variable Models GP-LVM models sampling process y = f (x) + ɛ p(y X, Φ) = p(y f, X, Φ)p(f X, Φ)df p(f X, Φ) = N (, K) {ˆX, ˆΦ} = argmax X,Φ p(y X, Φ) Linear: Closed form solution Non-Linear: Gradient based solution Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7
117 Model Selection Lawrence - 23 suggested the use of Spectral algorithms to initialise the latent space Y Harmeling - 27 evaluated the use of GP-LVM objective for model selection Comparisons between Procrustes score to ground truth and GP-LVM objective Neil Lawrence () Dimensionality Reduction Data Modelling School 61 / 7
118 Model Selection: Results 4 Embedding 6 4 2!2!4!6 1 5!5!1!6!4! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
119 Model Selection: Results 4 Embedding 1 Isomap 1 MVU !5!5 2!1!1!5 5 1!1!1!5 5 1!2!1 LLE.15 Laplacian!4!1.1!6 1 5!5!1!6!4! !1!1!1!1!1!4! !.5!.1!.2! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
120 Model Selection: Results 4 18 GP!LVM Objective 1 Isomap 1 MVU !5!5 1!1!1!5 5 1!1!1! !1 LLE.15 Laplacian 6! !1!1!1!1.5! !1!4!2 2 4!.1!.2! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
121 Model Selection: Results 4 Embedding 5!5 5 5!5!5 4 Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
122 Model Selection: Results 4 Embedding 4 Isomap 1 MVU 2 5 5!2!4!5!6!2!1 1 2!1!1!5 5 1!1 LLE.1 Laplacian!1.5!5 5 5!1!.5!1!.1!5!5!1!2!1 1 2!.15!.15!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
123 Model Selection: Results 4 1 GP!LVM Objective 4 Isomap 1 MVU !2 7!4!5 6!6!2!1 1 2!1!1! !1 LLE.1 Laplacian 3!1.5 2!1!.5 1!1! !1!2!1 1 2!.15!.15!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
124 Model Selection: Results 4 Embedding !5!1!15 2 1!1!2!15!1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
125 Model Selection: Results 4 Embedding 2 Isomap 15 MVU !1!5!1!2!1!5 5!15!2!1 1 2!5 2 LLE.1 Laplacian!1.5!15 2 1!1!2!15!1! !2!4!6!3!2!1 1!.5!.1!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
126 Model Selection: Results 4 # '(!)*+,-./ Isomap 15 MVU !1!5!1 15!2!1!5 5!15!2! LLE.1 Laplacian 5.5!2!4!.5!5 1 2 # $!6!3!2!1 1!.1!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7
127 Conclusion Assume local structure contains enough characteristics to unravel global structure + Intuative - Hard to set parameters without knowing manifold - Learns embeddings not mappings i.e. Visualisations - Models problem wrong way around - Sensitive to noise + Currently best strategy to initialise generative models Neil Lawrence () Dimensionality Reduction Data Modelling School 63 / 7
128 References I K. V. Mardia. Statistics of Directional Data. Academic Press, London, K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate analysis. Academic Press, London, B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1: , J. B. Tenenbaum, V. d. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 29(55): , 2. doi: /science Neil Lawrence () Dimensionality Reduction Data Modelling School 64 / 7
129 Material Acknowledgement: Carl Henrik Ek for GP log likelihood examples. My examples given here This talk Neil Lawrence () Dimensionality Reduction Data Modelling School 65 / 7
130 Outline Distance Matching Neil Lawrence () Dimensionality Reduction Data Modelling School 66 / 7
131 Centering Matrix If Ŷ is a version of Y with the mean removed then: Ŷ = HY Ŷ = ( I N 1 11 T) Y = Y 1 ( N 1 1 T Y ) ( ) T 1 N = Y 1 y i,: N i=1 = Y ȳ,: ȳ,:. ȳ,: Return Neil Lawrence () Dimensionality Reduction Data Modelling School 67 / 7
132 Feature Selection Derivation Squared distance can be re-expressed as d 2 ij = D (y i,k y j,k ) 2. k=1 Can re-order the columns of Y without affecting the distances. Choose ordering: first q columns of Y are the those that will best represent the distance matrix. Substitution x :,k = y :,k for k = 1... q. Distance in latent space is given by: δ 2 ij = q (x i,k x j,k ) 2 = k=1 q (y i,k y j,k ) 2 k=1 Neil Lawrence () Dimensionality Reduction Data Modelling School 68 / 7
133 Feature Selection Derivation II Can rewrite as N N E (X) = d ij 2 δij 2. i=1 j=1 N N D E (X) = (y i,k y j,k ) 2. i=1 j=1 k=q+1 Introduce mean of each dimension, ȳ k = 1 N N i=1 y i,k, N N D E (X) = ((y i,k ȳ k ) (y j,k ȳ k )) 2 i=1 j=1 k=q+1 Expand brackets N N E (X) = D (y i,k ȳ k ) 2 + (y j,k ȳ k ) 2 2 (y j,k ȳ k ) (y i,k ȳ k ) i=1 j=1 k=q+1 Neil Lawrence () Dimensionality Reduction Data Modelling School 69 / 7
134 Feature Selection Derivation III Expand brackets E (X) = N N D (y i,k ȳ k ) 2 + (y j,k ȳ k ) 2 2 (y j,k ȳ k ) (y i,k ȳ k ) i=1 j=1 k=q+1 Bring sums in E (X) = DX NX X `yi,k N X ȳ k 2 + N `yj,k N ȳ k 2 2 `yj,k ȳ k NX 1 `yi,k ȳ k A i=1 j=1 Recognise as the sum of the variances discarded columns of Y, E (X) = 2N 2 D k=q+1 We should compose X by extracting the columns of Y which have the largest variance. Return Selection Return Rotation Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7 σ 2 k. j=1 i=1
Unsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Raquel Urtasun TTI Chicago August 2, 2013 R. Urtasun (TTIC) Gaussian Processes August 2, 2013 1 / 59 Motivation for Non-Linear Dimensionality Reduction USPS Data Set
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationModels. Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence. Computer Science Departmental Seminar University of Bristol October 25 th, 2007
Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence Oxford Brookes University University of Manchester Computer Science Departmental Seminar University of Bristol October 25 th, 2007 Source code and slides
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationCarl Henrik Ek Computer Vision Reading Group University of Oxford August 31, Dimensionality Reduction.
Non- Locally carlhenrik@carlhenrik.com Computer Vision Reading Group University of Oxford August 31, 27 Non- Locally 1 2 3 4 Non- 6 Non- Locally 1 2 3 4 Non- 6 Non- Locally Computer Vision As a scientific
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationManifold Learning and it s application
Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception
More informationIntrinsic Structure Study on Whale Vocalizations
1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of
More informationISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationSpectral Dimensionality Reduction via Maximum Entropy
Sheffield Institute for Translational Neuroscience and Department of Computer Science, University of Sheffield Abstract We introduce a new perspective on spectral dimensionality reduction which views these
More informationLatent Variable Models with Gaussian Processes
Latent Variable Models with Gaussian Processes Neil D. Lawrence GP Master Class 6th February 2017 Outline Motivating Example Linear Dimensionality Reduction Non-linear Dimensionality Reduction Outline
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationNonlinear Manifold Learning Summary
Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection
More informationAll you want to know about GPs: Linear Dimensionality Reduction
All you want to know about GPs: Linear Dimensionality Reduction Raquel Urtasun and Neil Lawrence TTI Chicago, University of Sheffield June 16, 2012 Urtasun & Lawrence () GP tutorial June 16, 2012 1 / 40
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationDimensionality Reduction AShortTutorial
Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationDimensionality Reduction by Unsupervised Regression
Dimensionality Reduction by Unsupervised Regression Miguel Á. Carreira-Perpiñán, EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Zhengdong Lu, CSEE, OGI http://www.csee.ogi.edu/~zhengdon
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationGlobal (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction
Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationApprentissage non supervisée
Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationDistance Preservation - Part 2
Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear
More informationProbabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models
Journal of Machine Learning Research 6 (25) 1783 1816 Submitted 4/5; Revised 1/5; Published 11/5 Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models Neil
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLearning a Kernel Matrix for Nonlinear Dimensionality Reduction
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationLearning a kernel matrix for nonlinear dimensionality reduction
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger
More informationTutorial on Gaussian Processes and the Gaussian Process Latent Variable Model
Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationDimensionality Reduction: A Comparative Review
Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap
More informationDimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationData Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE
D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More information7. Variable extraction and dimensionality reduction
7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationExploring model selection techniques for nonlinear dimensionality reduction
Exploring model selection techniques for nonlinear dimensionality reduction Stefan Harmeling Edinburgh University, Scotland stefan.harmeling@ed.ac.uk Informatics Research Report EDI-INF-RR-0960 SCHOOL
More informationMissing Data in Kernel PCA
Missing Data in Kernel PCA Guido Sanguinetti, Neil D. Lawrence Department of Computer Science, University of Sheffield 211 Portobello Street, Sheffield S1 4DP, U.K. 19th June 26 Abstract Kernel Principal
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationDimensionality Reduction with Principal Component Analysis
10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,
More informationIntroduction to Matrix Algebra
Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p
More informationMachine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,
Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling
More informationBayesian Manifold Learning: The Locally Linear Latent Variable Model
Bayesian Manifold Learning: The Locally Linear Latent Variable Model Mijung Park, Wittawat Jitkrittum, Ahmad Qamar, Zoltán Szabó, Lars Buesing, Maneesh Sahani Gatsby Computational Neuroscience Unit University
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method
CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationMachine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction
More informationNeuroscience Introduction
Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter
More informationGWAS V: Gaussian processes
GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More information