Dimensionality Reduction

Size: px
Start display at page:

Download "Dimensionality Reduction"

Transcription

1 Dimensionality Reduction Neil D. Lawrence Mathematics for Data Modelling University of Sheffield January 23rd 28 Neil Lawrence () Dimensionality Reduction Data Modelling School 1 / 7

2 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 2 / 7

3 Online Resources All source code and slides are available online This talk available from my home page (see talks link on left hand side). MATLAB examples in the dimred toolbox (vrs.1) MATLAB commands used for examples given in typewriter font. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7

4 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 4 / 7

5 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7

6 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7

7 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7

8 High Dimensional Data USPS Data Set Handwritten Digit 3648 Dimensions 64 rows by 57 columns Space contains more than just this digit. Even if we sample every nanosecond from now until the end of the universe, you won t see the original six! Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7

9 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

10 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

11 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

12 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

13 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

14 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

15 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

16 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

17 Simple Model of Digit Rotate a Prototype Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

18 MATLAB Demo demdigitsmanifold([1 2], all ) Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7

19 MATLAB Demo demdigitsmanifold([1 2], all ).1.5 PC no PC no 1 Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7

20 MATLAB Demo demdigitsmanifold([1 2], sixnine ).1.5 PC no PC no 1 Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7

21 Low Dimensional Manifolds Pure Rotation is too Simple In practice the data may undergo several distortions. e.g. digits undergo thinning, translation and rotation. For data with structure : we expect fewer distortions than dimensions; we therefore expect the data to live on a lower dimensional manifold. Conclusion: deal with high dimensional data by looking for lower dimensional non-linear embedding. Neil Lawrence () Dimensionality Reduction Data Modelling School 8 / 7

22 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 9 / 7

23 Notation q dimension of latent/embedded space D dimension of data space N number of data points data matrix, Y = [y 1,:,..., y N,: ] T = [y :,1,..., y :,D ] R N D latent variables, X = [x 1,:,..., x N,: ] T = [x :,1,..., x :,q ] R N q mapping matrix, W R D q centering matrix, H = I N 1 11 T R N N Neil Lawrence () Dimensionality Reduction Data Modelling School 1 / 7

24 Reading Notation a i,: is a vector from the ith row of a given matrix A. a :,j is a vector from the jth row of a given matrix A. X and Y are design matrices. Centred data matrix given by Ŷ = HY. Background Sample covariance given by S = N 1 Ŷ T Ŷ. Centred inner product matrix given by K = ŶŶ T. Neil Lawrence () Dimensionality Reduction Data Modelling School 11 / 7

25 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 12 / 7

26 Data Representation Classical statistical approach: represent via proximities. [Mardia, 1972] Proximity data: similarities or dissimilarities. Example of a dissimilarity matrix: a distance matrix. d i,j = y i,: y j,: 2 = (y i,: y j,: ) T (y i,: y j,: ) For a data set can display as a matrix. Neil Lawrence () Dimensionality Reduction Data Modelling School 13 / 7

27 Interpoint Distances for Rotated Sixes Figure: Interpoint distances for the rotated digits data. Neil Lawrence () Dimensionality Reduction Data Modelling School 14 / 7

28 Multidimensional Scaling Find a configuration of points, X, such that each δ i,j = x i,: x j,: 2 closely matches the corresponding d i,j in the distance matrix. Need an objective function for matching = (δ i,j ) i,j to D = (d i,j ) i,j. Neil Lawrence () Dimensionality Reduction Data Modelling School 15 / 7

29 Feature Selection An entrywise L 1 norm on difference between squared distances E (X) = N N d ij 2 δij 2. i=1 j=1 Reduce dimension by selecting features from data set. Select for X, in turn, the column from Y that most reduces this error until we have the desired q. To minimise E (Y) we compose X by extracting the columns of Y which have the largest variance. Derive Algorithm Neil Lawrence () Dimensionality Reduction Data Modelling School 16 / 7

30 Reconstruction from Latent Space Left: distances reconstructed with two dimensions. Right: distances reconstructed with 1 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 17 / 7

31 Reconstruction from Latent Space Left: distances reconstructed with 1 dimensions. Right: distances reconstructed with 1 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 17 / 7

32 Feature Selection Figure: demrotationdist. Feature selection via distance preservation. Neil Lawrence () Dimensionality Reduction Data Modelling School 18 / 7

33 Feature Selection Figure: demrotationdist. Feature selection via distance preservation. Neil Lawrence () Dimensionality Reduction Data Modelling School 18 / 7

34 Feature Selection Figure: demrotationdist. Feature selection via distance preservation. Neil Lawrence () Dimensionality Reduction Data Modelling School 18 / 7

35 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7

36 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7

37 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7

38 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7

39 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances.. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7

40 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances. Residuals are much reduced. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7

41 Feature Extraction Figure: demrotationdist. Rotation preserves interpoint distances. Residuals are much reduced. Neil Lawrence () Dimensionality Reduction Data Modelling School 19 / 7

42 Which Rotation? We need the rotation that will minimise residual error. We already derived an algorithm for discarding directions. Discard direction with maximum variance. Error is then given by the sum of residual variances. E (X) = 2N 2 D k=q+1 σ 2 k. Rotations of data matrix do not effect this analysis. Neil Lawrence () Dimensionality Reduction Data Modelling School 2 / 7

43 Rotation Reconstruction from Latent Space Left: distances reconstructed with two dimensions. Right: distances reconstructed with 1 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 21 / 7

44 Rotation Reconstruction from Latent Space Left: distances reconstructed with 1 dimensions. Right: distances reconstructed with 36 dimensions. 36 Neil Lawrence () Dimensionality Reduction Data Modelling School 21 / 7

45 Reminder: Principal Component Analysis How do we find these directions? Find directions in data with maximal variance. That s what PCA does! PCA: rotate data to extract these directions. PCA: work on the sample covariance matrix S = N 1 Ŷ T Ŷ. Neil Lawrence () Dimensionality Reduction Data Modelling School 22 / 7

46 Principal Component Analysis Find a direction in the data, x :,1 = Ŷr 1, for which variance is maximised. ) r 1 = argmax r1 var (Ŷr1 subject to : r T 1 r 1 = 1 Can rewrite in terms of sample covariance ( ) T ( ) var (x :,1 ) = N 1 Ŷr 1 Ŷr1 = r1 T N 1 Ŷ T Ŷ r 1 = r1 T Sr 1 }{{} sample covariance Neil Lawrence () Dimensionality Reduction Data Modelling School 23 / 7

47 Lagrangian Solution via constrained optimisation: L (r 1, λ 1 ) = r T 1 Sr 1 + λ 1 ( 1 r T 1 r 1 ) Gradient with respect to r 1 dl (r 1, λ 1 ) dr 1 = 2Sr 1 2λ 1 r 1 rearrange to form Sr 1 = λ 1 r 1. Which is recognised as an eigenvalue problem. Neil Lawrence () Dimensionality Reduction Data Modelling School 24 / 7

48 Lagrange Multiplier Recall the gradient, dl (r 1, λ 1 ) dr 1 = 2Sr 1 2λ 1 r 1 (1) to find λ 1 premultiply (1) by r T 1 and rearrange giving λ 1 = r T 1 Sr 1. Maximum variance is therefore necessarily the maximum eigenvalue of S. This is the first principal component. Neil Lawrence () Dimensionality Reduction Data Modelling School 25 / 7

49 Further Directions Find orthogonal directions to earlier extracted directions with maximal variance. Orthogonality constraints, for j < k we have Lagrangian r T j r k = r T k r k = 1 L (r k, λ k, γ) = rk T ( ) k 1 Sr k + λ k 1 r T k r k + γ j rj T r k j=1 dl (r k, λ k ) k 1 = 2Sr k 2λ k r k + γ j r j dr k j=1 Neil Lawrence () Dimensionality Reduction Data Modelling School 26 / 7

50 Further Eigenvectors Gradient of Lagrangian: dl (r k, λ k ) k 1 = 2Sr k 2λ k r k + γ j r j (2) dr k Premultipling (2) by r i with i < k implies which allows us to write Premultiplying (2) by r k implies γ i = Sr k = λ k r k. λ k = r T k Sr k. This is the kth principal component. j=1 Neil Lawrence () Dimensionality Reduction Data Modelling School 27 / 7

51 Principal Coordinates Analysis The rotation which finds directions of maximum variance is the eigenvectors of the covariance matrix. The variance in each direction is given by the eigenvalues. Problem: working directly with the sample covariance, S, may be impossible. For example: perhaps we are given distances between data points, but not absolute locations. No access to absolute positions: cannot compute original sample covariance. Neil Lawrence () Dimensionality Reduction Data Modelling School 28 / 7

52 An Alternative Formalism Matrix representation of eigenvalue problem for first q eigenvectors. Ŷ T ŶR q = R q Λ q R q R D q (3) Premultiply by Ŷ: ŶŶ T ŶR q = ŶR q Λ q Postmultiply by Λ 1 2 q ŶŶ T ŶR q Λ 1 2 q = ŶR q Λ q Λ 1 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 29 / 7

53 An Alternative Formalism Matrix representation of eigenvalue problem for first q eigenvectors. Ŷ T ŶR q = R q Λ q R q R D q (3) Premultiply by Ŷ: ŶŶ T ŶR q = ŶR q Λ q Postmultiply by Λ 1 2 q ŶŶ T ŶR q Λ 1 2 q = ŶR q Λ 1 2 q Λ q Neil Lawrence () Dimensionality Reduction Data Modelling School 29 / 7

54 An Alternative Formalism Matrix representation of eigenvalue problem for first q eigenvectors. Ŷ T ŶR q = R q Λ q R q R D q (3) Premultiply by Ŷ: ŶŶ T ŶR q = ŶR q Λ q Postmultiply by Λ 1 2 q ŶŶ T U q = U q Λ q U q = ŶR q Λ 1 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 29 / 7

55 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q ŶT ŶŶ T ŶR q Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7

56 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T (ŶT ) 2 q Ŷ Rq Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7

57 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T (ŶT ) 2 q Ŷ Rq Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7

58 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T (ŶT ) 2 q Ŷ Rq Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7

59 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Full eigendecomposition of sample covariance Ŷ T Ŷ = RΛR T Implies that (ŶT ) 2 Ŷ = RΛR T RΛR T = RΛ 2 R T. Neil Lawrence () Dimensionality Reduction Data Modelling School 3 / 7

60 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

61 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

62 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q R T q RΛ 2 R T R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

63 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. [ R T q RΛ 2 R T R q ] Λ 1 2 q U T q ŶŶ T U q = Λ 1 2 q Product of the first q eigenvectors with the rest, R T R q = [ Iq ] R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

64 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 [ 2 q R T q RΛ 2 R T ] R q Λ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

65 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ 1 2 q Λ 2 qλ 1 2 q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

66 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. U T q ŶŶ T U q = Λ q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

67 U q Diagonalizes the Inner Product Matrix Need to prove that U q are eigenvectors of inner product matrix. ŶŶ T U q = U q Λ q Product of the first q eigenvectors with the rest, [ ] R T Iq R q = R D q where we have used I q to denote a q q identity matrix. Premultiplying by eigenvalues gives, [ ] ΛR T Λq R q = Multiplying by self transpose gives R T q RΛ 2 R T R q = Λ 2 q Neil Lawrence () Dimensionality Reduction Data Modelling School 31 / 7

68 Equivalent Eigenvalue Problems Two eigenvalue problems are equivalent. One solves for the rotation, the other solves for the location of the rotated points. When D < N it is easier to solve for the rotation, R q. But when D > N we solve for the embedding (principal coordinate analysis). In MDS we may not know Y, cannot compute Ŷ T Ŷ from distance matrix. Can we compute ŶŶ T instead? Neil Lawrence () Dimensionality Reduction Data Modelling School 32 / 7

69 The Covariance Interpretation N 1 Ŷ T Ŷ is the data covariance. ŶŶ T is a centred inner product matrix. Also has an interpretation as a covariance matrix (Gaussian processes). It expresses correlation and anti correlation between data points. Standard covariance expresses correlation and anti correlation between data dimensions. Neil Lawrence () Dimensionality Reduction Data Modelling School 33 / 7

70 Distance to Similarity: A Gaussian Covariance Interpretation Translate between covariance and distance. Consider a vector sampled from a zero mean Gaussian distribution, z N (, K). Expected square distance between two elements of this vector is di,j 2 = (z i z j ) 2 di,j 2 = zi 2 + z 2 j 2 zi z j under a zero mean Gaussian with covariance given by K this is d 2 i,j = k i,i + k j,j 2k i,j. Take the distance to be square root of this, d i,j = (k i,i + k j,j 2k i,j ) 1 2. Neil Lawrence () Dimensionality Reduction Data Modelling School 34 / 7

71 Standard Transformation This transformation is known as the standard transformation between a similarity and a distance [Mardia et al., 1979, pg 42]. If the covariance is of the form K = ŶŶ T then k i,j = y T i,: y j,: and d i,j = ( y T i,:y i,: + y T j,:y j,: 2y T i,:y j,: ) 1 2 = y i,: y j,: 2. For other distance matrices this gives us an approach to covert to a similarity matrix or kernel matrix so we can perform classical MDS. Neil Lawrence () Dimensionality Reduction Data Modelling School 35 / 7

72 Example: Road Distances with Classical MDS Classical example: redraw a map from road distances (see e.g. Mardia et al. 1979). Here we use distances across Europe. Between each city we have road distance. Enter these in a distance matrix. Convert to a similarity matrix using the covariance interpretation. Perform eigendecomposition. See for the data we used. Neil Lawrence () Dimensionality Reduction Data Modelling School 36 / 7

73 Distance Matrix Convert distances to similarities using covariance interpretation x Figure: Left: road distances between European cities visualised as a matrix. Right: similarity matrix derived from these distances. If this matrix is a covariance matrix, then expected distance between samples from this covariance is given on the left. Neil Lawrence () Dimensionality Reduction Data Modelling School 37 / 7

74 Example: Road Distances with Classical MDS 6 o N 54 o N 48 o N 42 o N 36 o N Edinburgh Sassnitz Dublin Hamburg AmsterdamBerlin Warszawa London Bruxelles Praha Luxembourg Paris Wien Budapest Bern Innsbruck Milano Ljubljana Bordeaux Genova BeogradBucuresti Sofija Barcelona Roma Brindisi Istanbul Madrid Lisboa Moskva 3 o N 12 o W o 12 o E 24 o E 36 o E 48 o E Figure: demcmdsroaddata. Reconstructed locations projected onto true map using Procrustes rotations. Neil Lawrence () Dimensionality Reduction Data Modelling School 38 / 7

75 Beware Negative Eigenvalues 5 x 17 6 o N 4 54 o N 48 o N 42 o N 36 o N Edinburgh Sassnitz Dublin Hamburg AmsterdamBerlin Warszawa London Bruxelles Praha Luxembourg Paris Wien Budapest Bern Innsbruck Milano Ljubljana Bordeaux Genova BeogradBucuresti Sofija Barcelona Roma Brindisi Istanbul Madrid Lisboa Moskva o N 12 o W o 12 o E 24 o E 36 o E 48 o E Figure: Eigenvalues of the similarity matrix are negative in this case. Neil Lawrence () Dimensionality Reduction Data Modelling School 39 / 7

76 European Cities Distance Matrices Figure: Left: the original distance matrix. Right: the reconstructed distance matrix. Neil Lawrence () Dimensionality Reduction Data Modelling School 4 / 7

77 Other Distance Similarity Measures Can use similarity/distance of your choice. Beware though! The similarity must be positive semi definite for the distance to be Euclidean. Why? Can immediately see positive definite is sufficient from the covariance intepretation. For more details see [Mardia et al., 1979, Theorem ]. Neil Lawrence () Dimensionality Reduction Data Modelling School 41 / 7

78 A Class of Similarities for Vector Data All Mercer kernels are positive semi definite. Example, squared exponential (also known as RBF or Gaussian) ( ) k i,j = exp y i,: y j,: 2 2l 2. This leads to a kernel eigenvalue problem. This is known as Kernel PCA Schölkopf et al Neil Lawrence () Dimensionality Reduction Data Modelling School 42 / 7

79 Implied Distance Matrix What is the equivalent distance d i,j? d i,j = k i,i + k j,j 2k i,j If point separation is large, k i,j. k i,i = 1 and k j,j = 1. d i,j = 2 Kernel with RBF kernel projects along axes PCA can produce poor results. Uses many dimensions to keep dissimilar objects a constant amount apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 43 / 7

80 Implied Distances on Rotated Sixes Figure: Left: similarity matrix for RBF kernel on rotated sixes. Right: implied distance matrix for kernel on rotated sixes. Note that most of the distances are set to Neil Lawrence () Dimensionality Reduction Data Modelling School 44 / 7

81 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7

82 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7

83 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7

84 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7

85 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7

86 Kernel PCA on Rotated Sixes Figure: demsixkpca. The fifth, sixth and seventh dimensions of the latent space for kernel PCA. Points spread out along axes so that dissimilar points are always 2 apart. Neil Lawrence () Dimensionality Reduction Data Modelling School 45 / 7

87 MDS Conclusions Multidimensional scaling: preserve a distance matrix. Classical MDS a particular objective function for Classical MDS distance matching is equivalent to maximum variance spectral decomposition of the similarity matrix For Euclidean distances in Y space classical MDS is equivalent to PCA. known as principal coordinate analysis (PCO) Haven t discussed choice of distance matrix. Neil Lawrence () Dimensionality Reduction Data Modelling School 46 / 7

88 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 47 / 7

89 Data (a) plane (b) swissroll (c) trefoil Figure: Illustrative data sets for the talk. Each data set is generated by calling generatemanifolddata(datatype). The datatype argument is given below each plot. Neil Lawrence () Dimensionality Reduction Data Modelling School 48 / 7

90 Isomap Tenenbaum et al. 2 MDS finds geometric configuration preserving distances MDS applied to Manifold distance Geodesic Distance = Manifold Distance Cannot compute geodesic distance without knowing manifold Neil Lawrence () Dimensionality Reduction Data Modelling School 49 / 7

91 Isomap Fig. 2. The residual variance of PCA (open triangles), MDS [open triangles in (A) through (C); open circles in (D)], and Isomap (filled circles) on four data sets (42). (A) Face images varying in pose and illumination (Fig. 1A). (B) Swiss roll data (Fig. 3). (C) Hand images varying in finger extension and wrist rotation (2). (D) Handwritten 2 s (Fig. 1B). In all cas- set of handwritten digits, which does not have a clear manifold geometry, Isomap still finds globally meaningful coordinates (Fig. 1B) and nonlinear structure that PCA or MDS do not detect (Fig. 2D). For all three data sets, the natural appearance of linear interpolations between distant points in the low-dimensional coordinate space confirms that Isomap has captured the data s perceptually relevant structure (Fig. 4). Previous attempts to extend PCA and MDS to nonlinear data sets fall into two broad classes, each of which suffers from limitations overcome by our approach. Local linear techniques (21 23) are not designed to Isomap: define neighbours and compute represent distances the global structure ofbetween a data set neighbours. es, residual variance decreases as the dimensionality d is increased. The intrinsic dimensionality of the data can be estimated by looking for the elbow within a single coordinate system, as we do in Fig. 1. Nonlinear techniques based on greedy optimization procedures (24 3) attempt to discover global structure, but lack the crucial algorithmic features that Isomap inherits Geodesic Distance approximated by shortest from PCA and MDS: path a noniterative, through polynomial time procedure with a guarantee of glob- adjacency matrix. al optimality; for intrinsically Euclidean man- at which this curve ceases to decrease significantly with added dimensions. Arrows mark the true or approximate dimensionality, when known. Note the tendency of PCA and MDS to overestimate the dimensionality, in contrast to Isomap. Fig. 3. The Swiss roll data set, illustrating how Isomap exploits geodesic paths for nonlinear dimensionality reduction. (A) For two arbitrary points (circled) on a nonlinear manifold, their Euclidean distance in the highdimensional input space (length of dashed line) may not accurately reflect their intrinsic similarity, as measured by geodesic distance along the low-dimensional manifold (length of solid curve). (B) The neighborhood graph G constructed in step one of Isomap (with K 7 and N 1 data points) allows an approximation (red segments) to the true geodesic path to be computed efficiently in step two, as the shortest path in G. (C) The two-dimensional embedding recovered by Isomap in step three, which best preserves the shortest path distances in the neighborhood graph (overlaid). Straight lines in the embedding (blue) now represent simpler and cleaner approximations to the true geodesic paths than do the corresponding graph paths (red). SCIENCE VOL DECEMBER Neil Lawrence () Dimensionality Reduction Data Modelling School 5 / 7

92 Isomap Examples demisomap Data generation Carl Henrik Ek Neil Lawrence () Dimensionality Reduction Data Modelling School 51 / 7

93 Isomap Examples demisomap 2 1 Data generation Carl Henrik Ek Neil Lawrence () Dimensionality Reduction Data Modelling School 51 / 7

94 Isomap Examples demisomap Data generation Carl Henrik Ek Neil Lawrence () Dimensionality Reduction Data Modelling School 51 / 7

95 Isomap: Summary MDS on shortest path approximation of manifold distance + Simple + Intrinsic dimension from eigen spectra - Solves a very large eigenvalue problem - Cannot handle holes or non-convex manifold - Sensitive to short circuit Neil Lawrence () Dimensionality Reduction Data Modelling School 52 / 7

96 Inverse Covariance From the covariance interpretation we think of the similarity matrix as a covariance. Each element of the covariance is a function of two data points. Another option is to specify the inverse covariance. If the inverse covariance between two points is zero. Those points are independent given all other points. This is a conditional independence. Describes how points are connected. Laplacian Eigenmaps and LLE can both be seen as specifiying the inverse covariance. Neil Lawrence () Dimensionality Reduction Data Modelling School 53 / 7

97 LLE Examples demlle neighbours used. No playing with settings. Neil Lawrence () Dimensionality Reduction Data Modelling School 54 / 7

98 LLE Examples demlle neighbours used. No playing with settings. Neil Lawrence () Dimensionality Reduction Data Modelling School 54 / 7

99 LLE Examples demlle neighbours used. No playing with settings. Neil Lawrence () Dimensionality Reduction Data Modelling School 54 / 7

100 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7

101 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7

102 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7

103 Generative Observed data have been sampled from manifold Spectral methods start in the wrong end It s a lot easier to make a mess than to clean it up! Things break or disapear How to model observation generation? Neil Lawrence () Dimensionality Reduction Data Modelling School 55 / 7

104 Outline 1 Motivation 2 Background 3 Distance Matching 4 Distances along the Manifold 5 Model Selection 6 Conclusions Neil Lawrence () Dimensionality Reduction Data Modelling School 56 / 7

105 Model Selection Observed data have been sampled from low dimensional manifold y = f (x) Idea: Model f rank embedding according to 1 Data fit of f 2 Complexity of f How to model f? 1 Making as few assumtpions about f as possible? 2 Allowing f from as rich class as possible? Neil Lawrence () Dimensionality Reduction Data Modelling School 57 / 7

106 Gaussian Processes Generalisation of Gaussian Distribution over infinite index sets Can be used specify distributions over functions Regression y = f (x) + ɛ p(y X, Φ) = p(y f, X, Φ)p(f X, Φ)df p(f X, Φ) = N (, K) ˆΦ = argmax Φ p(y X, Φ) Neil Lawrence () Dimensionality Reduction Data Modelling School 58 / 7

107 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

108 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

109 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

110 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

111 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

112 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

113 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

114 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

115 Gaussian Processes log likelihood length scale log p(y X) = 1 2 YT (K + β 1 I) 1 Y }{{} data fit 1 2 log det(k + β 1 I) N log 2π }{{} 2 complexity 3 Images: N.D. Lawrence Neil Lawrence () Dimensionality Reduction Data Modelling School 59 / 7

116 Gaussian Process Latent Variable Models GP-LVM models sampling process y = f (x) + ɛ p(y X, Φ) = p(y f, X, Φ)p(f X, Φ)df p(f X, Φ) = N (, K) {ˆX, ˆΦ} = argmax X,Φ p(y X, Φ) Linear: Closed form solution Non-Linear: Gradient based solution Neil Lawrence () Dimensionality Reduction Data Modelling School 6 / 7

117 Model Selection Lawrence - 23 suggested the use of Spectral algorithms to initialise the latent space Y Harmeling - 27 evaluated the use of GP-LVM objective for model selection Comparisons between Procrustes score to ground truth and GP-LVM objective Neil Lawrence () Dimensionality Reduction Data Modelling School 61 / 7

118 Model Selection: Results 4 Embedding 6 4 2!2!4!6 1 5!5!1!6!4! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

119 Model Selection: Results 4 Embedding 1 Isomap 1 MVU !5!5 2!1!1!5 5 1!1!1!5 5 1!2!1 LLE.15 Laplacian!4!1.1!6 1 5!5!1!6!4! !1!1!1!1!1!4! !.5!.1!.2! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

120 Model Selection: Results 4 18 GP!LVM Objective 1 Isomap 1 MVU !5!5 1!1!1!5 5 1!1!1! !1 LLE.15 Laplacian 6! !1!1!1!1.5! !1!4!2 2 4!.1!.2! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

121 Model Selection: Results 4 Embedding 5!5 5 5!5!5 4 Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

122 Model Selection: Results 4 Embedding 4 Isomap 1 MVU 2 5 5!2!4!5!6!2!1 1 2!1!1!5 5 1!1 LLE.1 Laplacian!1.5!5 5 5!1!.5!1!.1!5!5!1!2!1 1 2!.15!.15!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

123 Model Selection: Results 4 1 GP!LVM Objective 4 Isomap 1 MVU !2 7!4!5 6!6!2!1 1 2!1!1! !1 LLE.1 Laplacian 3!1.5 2!1!.5 1!1! !1!2!1 1 2!.15!.15!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

124 Model Selection: Results 4 Embedding !5!1!15 2 1!1!2!15!1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

125 Model Selection: Results 4 Embedding 2 Isomap 15 MVU !1!5!1!2!1!5 5!15!2!1 1 2!5 2 LLE.1 Laplacian!1.5!15 2 1!1!2!15!1! !2!4!6!3!2!1 1!.5!.1!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

126 Model Selection: Results 4 # '(!)*+,-./ Isomap 15 MVU !1!5!1 15!2!1!5 5!15!2! LLE.1 Laplacian 5.5!2!4!.5!5 1 2 # $!6!3!2!1 1!.1!.1! Model selection results kindly provided by Carl Henrik Ek. Neil Lawrence () Dimensionality Reduction Data Modelling School 62 / 7

127 Conclusion Assume local structure contains enough characteristics to unravel global structure + Intuative - Hard to set parameters without knowing manifold - Learns embeddings not mappings i.e. Visualisations - Models problem wrong way around - Sensitive to noise + Currently best strategy to initialise generative models Neil Lawrence () Dimensionality Reduction Data Modelling School 63 / 7

128 References I K. V. Mardia. Statistics of Directional Data. Academic Press, London, K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate analysis. Academic Press, London, B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 1: , J. B. Tenenbaum, V. d. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 29(55): , 2. doi: /science Neil Lawrence () Dimensionality Reduction Data Modelling School 64 / 7

129 Material Acknowledgement: Carl Henrik Ek for GP log likelihood examples. My examples given here This talk Neil Lawrence () Dimensionality Reduction Data Modelling School 65 / 7

130 Outline Distance Matching Neil Lawrence () Dimensionality Reduction Data Modelling School 66 / 7

131 Centering Matrix If Ŷ is a version of Y with the mean removed then: Ŷ = HY Ŷ = ( I N 1 11 T) Y = Y 1 ( N 1 1 T Y ) ( ) T 1 N = Y 1 y i,: N i=1 = Y ȳ,: ȳ,:. ȳ,: Return Neil Lawrence () Dimensionality Reduction Data Modelling School 67 / 7

132 Feature Selection Derivation Squared distance can be re-expressed as d 2 ij = D (y i,k y j,k ) 2. k=1 Can re-order the columns of Y without affecting the distances. Choose ordering: first q columns of Y are the those that will best represent the distance matrix. Substitution x :,k = y :,k for k = 1... q. Distance in latent space is given by: δ 2 ij = q (x i,k x j,k ) 2 = k=1 q (y i,k y j,k ) 2 k=1 Neil Lawrence () Dimensionality Reduction Data Modelling School 68 / 7

133 Feature Selection Derivation II Can rewrite as N N E (X) = d ij 2 δij 2. i=1 j=1 N N D E (X) = (y i,k y j,k ) 2. i=1 j=1 k=q+1 Introduce mean of each dimension, ȳ k = 1 N N i=1 y i,k, N N D E (X) = ((y i,k ȳ k ) (y j,k ȳ k )) 2 i=1 j=1 k=q+1 Expand brackets N N E (X) = D (y i,k ȳ k ) 2 + (y j,k ȳ k ) 2 2 (y j,k ȳ k ) (y i,k ȳ k ) i=1 j=1 k=q+1 Neil Lawrence () Dimensionality Reduction Data Modelling School 69 / 7

134 Feature Selection Derivation III Expand brackets E (X) = N N D (y i,k ȳ k ) 2 + (y j,k ȳ k ) 2 2 (y j,k ȳ k ) (y i,k ȳ k ) i=1 j=1 k=q+1 Bring sums in E (X) = DX NX X `yi,k N X ȳ k 2 + N `yj,k N ȳ k 2 2 `yj,k ȳ k NX 1 `yi,k ȳ k A i=1 j=1 Recognise as the sum of the variances discarded columns of Y, E (X) = 2N 2 D k=q+1 We should compose X by extracting the columns of Y which have the largest variance. Return Selection Return Rotation Neil Lawrence () Dimensionality Reduction Data Modelling School 7 / 7 σ 2 k. j=1 i=1

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Raquel Urtasun TTI Chicago August 2, 2013 R. Urtasun (TTIC) Gaussian Processes August 2, 2013 1 / 59 Motivation for Non-Linear Dimensionality Reduction USPS Data Set

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Models. Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence. Computer Science Departmental Seminar University of Bristol October 25 th, 2007

Models. Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence. Computer Science Departmental Seminar University of Bristol October 25 th, 2007 Carl Henrik Ek Philip H. S. Torr Neil D. Lawrence Oxford Brookes University University of Manchester Computer Science Departmental Seminar University of Bristol October 25 th, 2007 Source code and slides

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Carl Henrik Ek Computer Vision Reading Group University of Oxford August 31, Dimensionality Reduction.

Carl Henrik Ek Computer Vision Reading Group University of Oxford August 31, Dimensionality Reduction. Non- Locally carlhenrik@carlhenrik.com Computer Vision Reading Group University of Oxford August 31, 27 Non- Locally 1 2 3 4 Non- 6 Non- Locally 1 2 3 4 Non- 6 Non- Locally Computer Vision As a scientific

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Manifold Learning and it s application

Manifold Learning and it s application Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Spectral Dimensionality Reduction via Maximum Entropy

Spectral Dimensionality Reduction via Maximum Entropy Sheffield Institute for Translational Neuroscience and Department of Computer Science, University of Sheffield Abstract We introduce a new perspective on spectral dimensionality reduction which views these

More information

Latent Variable Models with Gaussian Processes

Latent Variable Models with Gaussian Processes Latent Variable Models with Gaussian Processes Neil D. Lawrence GP Master Class 6th February 2017 Outline Motivating Example Linear Dimensionality Reduction Non-linear Dimensionality Reduction Outline

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

All you want to know about GPs: Linear Dimensionality Reduction

All you want to know about GPs: Linear Dimensionality Reduction All you want to know about GPs: Linear Dimensionality Reduction Raquel Urtasun and Neil Lawrence TTI Chicago, University of Sheffield June 16, 2012 Urtasun & Lawrence () GP tutorial June 16, 2012 1 / 40

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Dimensionality Reduction by Unsupervised Regression

Dimensionality Reduction by Unsupervised Regression Dimensionality Reduction by Unsupervised Regression Miguel Á. Carreira-Perpiñán, EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Zhengdong Lu, CSEE, OGI http://www.csee.ogi.edu/~zhengdon

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Distance Preservation - Part 2

Distance Preservation - Part 2 Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear

More information

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models

Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models Journal of Machine Learning Research 6 (25) 1783 1816 Submitted 4/5; Revised 1/5; Published 11/5 Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models Neil

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Learning a kernel matrix for nonlinear dimensionality reduction

Learning a kernel matrix for nonlinear dimensionality reduction University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction: A Comparative Review Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap

More information

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Dimensionality Reduction CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Visualize high dimensional data (and understand its Geometry) } Project the data into lower dimensional spaces }

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Exploring model selection techniques for nonlinear dimensionality reduction

Exploring model selection techniques for nonlinear dimensionality reduction Exploring model selection techniques for nonlinear dimensionality reduction Stefan Harmeling Edinburgh University, Scotland stefan.harmeling@ed.ac.uk Informatics Research Report EDI-INF-RR-0960 SCHOOL

More information

Missing Data in Kernel PCA

Missing Data in Kernel PCA Missing Data in Kernel PCA Guido Sanguinetti, Neil D. Lawrence Department of Computer Science, University of Sheffield 211 Portobello Street, Sheffield S1 4DP, U.K. 19th June 26 Abstract Kernel Principal

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Neil D. Lawrence GPSS 10th June 2013 Book Rasmussen and Williams (2006) Outline The Gaussian Density Covariance from Basis Functions Basis Function Representations Constructing

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Dimensionality Reduction with Principal Component Analysis

Dimensionality Reduction with Principal Component Analysis 10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p

More information

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU, Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling

More information

Bayesian Manifold Learning: The Locally Linear Latent Variable Model

Bayesian Manifold Learning: The Locally Linear Latent Variable Model Bayesian Manifold Learning: The Locally Linear Latent Variable Model Mijung Park, Wittawat Jitkrittum, Ahmad Qamar, Zoltán Szabó, Lars Buesing, Maneesh Sahani Gatsby Computational Neuroscience Unit University

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method

CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method CS168: The Modern Algorithmic Toolbox Lecture #8: PCA and the Power Iteration Method Tim Roughgarden & Gregory Valiant April 15, 015 This lecture began with an extended recap of Lecture 7. Recall that

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

GWAS V: Gaussian processes

GWAS V: Gaussian processes GWAS V: Gaussian processes Dr. Oliver Stegle Christoh Lippert Prof. Dr. Karsten Borgwardt Max-Planck-Institutes Tübingen, Germany Tübingen Summer 2011 Oliver Stegle GWAS V: Gaussian processes Summer 2011

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information