Data Mining Techniques

Size: px
Start display at page:

Download "Data Mining Techniques"

Transcription

1 Data Mining Techniques CS Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang)

2 DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford)

3 Linear Dimensionality Reduction Idea: Project high-dimensional vector onto a lower dimensional space x 2 R z 2 R 10 z = U > x

4 Problem Setup Given n data points in d dimensions: x 1,...,x n 2 R d X = ( x 1 x n ) 2 Rd n

5 Problem Setup Given n data points in d dimensions: x 1,...,x n 2 R d X = ( x 1 x n ) 2 Rd n Want to reduce dimensionality from d to k Choose k directions u 1,...,u k U = ( u 1 u k ) 2 Rd k u = u > x

6 Problem Setup Given n data points in d dimensions: x 1,...,x n 2 R d X = ( x 1 x n ) 2 Rd n Want to reduce dimensionality from d to k Choose k directions u 1,...,u k U = ( u 1 u k ) 2 Rd k For each u j, compute similarity z j = u > j x =( ) > = >

7 Problem Setup Given n data points in d dimensions: x 1,...,x n 2 R d X = ( x 1 x n ) 2 Rd n Want to reduce dimensionality from d to k Choose k directions u 1,...,u k U = ( u 1 u k ) 2 Rd k For each u j, compute similarity z j = u > j x Project x down to z =(z 1,...,z k ) > = U > x How to choose U?

8 Principal Component Analysis x 2 R R z 2 R 10 z = U > x How do we choose U? Two Objectives 1. Minimize the reconstruction error 2. Maximize the projected variance

9 PCA Objective 1: Reconstruction Error U serves two functions: Encode: z = U > x, z j = u > j x P

10 PCA Objective 1: Reconstruction Error U serves two functions: Encode: z = U > x, z j = u > j x Decode: x = Uz = P k j=1 z ju j kx xk

11 PCA Objective 1: Reconstruction Error U serves two functions: Encode: z = U > x, z j = u > j x Decode: x = Uz = P k j=1 z ju j Want reconstruction error kx xk to be small

12 PCA Objective 1: Reconstruction Error U serves two functions: Encode: z = U > x, z j = u > j x Decode: x = Uz = P k j=1 z ju j Want reconstruction error kx xk to be small Objective: minimize total squared reconstruction error min U2R d k nx kx i UU > x i k 2 i=1

13 PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1,...,x n Expectation (think sum over data points): Ê[f(x)] = 1 P n n i=1 f(x i) Variance (think sum of squares if centered): cvar[f(x)] + (Ê[f(x)])2 = Ê[f(x)2 ]= 1 P n n i=1 f(x i) 2

14 PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1,...,x n Expectation (think sum over data points): Ê[f(x)] = 1 P n n i=1 f(x i) Variance (think sum of squares if centered): cvar[f(x)] + (Ê[f(x)])2 = Ê[f(x)2 ]= 1 P n n i=1 f(x i) 2 Assume data is centered: Ê[x] =0 (

15 PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1,...,x n Expectation (think sum over data points): Ê[f(x)] = 1 P n n i=1 f(x i) Variance (think sum of squares if centered): cvar[f(x)] + (Ê[f(x)])2 = Ê[f(x)2 ]= 1 P n n i=1 f(x i) 2 Assume data is centered: Ê[x] =0 (what s Ê[U> x]?)

16 PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1,...,x n Expectation (think sum over data points): Ê[f(x)] = 1 P n n i=1 f(x i) Variance (think sum of squares if centered): cvar[f(x)] + (Ê[f(x)])2 = Ê[f(x)2 ]= 1 P n n i=1 f(x i) 2 Assume data is centered: Ê[x] =0 (what s Ê[U> x]?) Objective: maximize variance of projected data max U2R d k,u > U=I Ê[kU > xk 2 ]

17 PCA Objective 2: Projected Variance Empirical distribution: uniform over x 1,...,x n Expectation (think sum over data points): Ê[f(x)] = 1 P n n i=1 f(x i) Variance (think sum of squares if centered): cvar[f(x)] + (Ê[f(x)])2 = Ê[f(x)2 ]= 1 P n n i=1 f(x i) 2 Assume data is centered: Ê[x] =0 (what s Ê[U> x]?) Objective: maximize variance of projected data max U2R d k,u > U=I Ê[kU > xk 2 ]

18 Equivalence of two objectives Key intuition: variance of data {z } fixed = captured variance {z } want large + reconstruction error {z } want small = > +( > )

19 Equivalence of two objectives Key intuition: variance of data {z } fixed = captured variance {z } want large + reconstruction error {z } want small Pythagorean decomposition: x = UU > x +(I UU > )x kxk k(i UU > )xk kuu > xk Take expectations; note rotation U doesn t a ect length: Ê[kxk 2 ]=Ê[kU> xk 2 ]+Ê[kx UU> xk 2 ] $

20 Equivalence of two objectives Key intuition: variance of data {z } fixed = captured variance {z } want large + reconstruction error {z } want small Pythagorean decomposition: x = UU > x +(I UU > )x kxk k(i UU > )xk kuu > xk Take expectations; note rotation U doesn t a ect length: Ê[kxk 2 ]=Ê[kU> xk 2 ]+Ê[kx UU> xk 2 ] Minimize reconstruction error $ Maximize captured variance

21 Finding one principal component Objective: maximizevariance of projected data ˆ Input data: X = ( x 1... x n ) ncipal component analysis (PCA) / B

22 Finding one principal component Objective: maximizevariance of projected data = max kuk=1 Ê[(u > x) 2 ] 1 X Input data: X = ( x 1... x n ) ncipal component analysis (PCA) / B

23 Finding one principal component Objective: maximizevariance of projected data Input data: X = ( x 1... x n ) = max kuk=1 = max kuk=1 Ê[(u > x) 2 ] 1 n 1 nx (u > x i ) 2 i=1 ncipal component analysis (PCA) / B

24 Finding one principal component Objective: maximizevariance of projected data Input data: X = ( x 1... x n ) = max kuk=1 = max kuk=1 = max kuk=1 Ê[(u > x) 2 ] 1 n nx (u > x i ) 2 i=1 1 n ku> Xk 2 1 ncipal component analysis (PCA) / B

25 Finding one principal component Objective: maximizevariance of projected data Input data: X = ( x 1... x n ) = max kuk=1 = max kuk=1 = max kuk=1 Ê[(u > x) 2 ] 1 n = max kuk=1 u> nx (u > x i ) 2 i=1 1 n ku> Xk 2 1 n XX> u 1 ncipal component analysis (PCA) / B

26 Finding one principal component Objective: maximizevariance of projected data Input data: X = ( x 1... x n ) = max kuk=1 = max kuk=1 = max kuk=1 Ê[(u > x) 2 ] 1 n = max kuk=1 u> nx (u > x i ) 2 i=1 1 n ku> Xk 2 1 n XX> = largest eigenvalue of C def = 1 n XX> u ncipal component analysis (PCA) / B

27 Finding one principal component Objective: maximizevariance of projected data Input data: X = ( x 1... x n ) ncipal component analysis (PCA) / B = max kuk=1 = max kuk=1 = max kuk=1 Ê[(u > x) 2 ] 1 n = max kuk=1 u> nx (u > x i ) 2 i=1 1 n ku> Xk 2 1 n XX> = largest eigenvalue of C def = 1 n XX> (C is covariance matrix of data) principles u

28 How many components? Similar to question of How many clusters? Magnitude of eigenvalues indicate fraction of variance captured.

29 How many components? Similar to question of How many clusters? Magnitude of eigenvalues indicate fraction of variance captured. Eigenvalues on a face image dataset: i i

30 How many components? Similar to question of How many clusters? Magnitude of eigenvalues indicate fraction of variance captured. Eigenvalues on a face image dataset: i i Eigenvalues typically drop o sharply, so don t need that many. Of course variance isn t everything...

31 Computing PCA Method 1: eigendecomposition U are eigenvectors of covariance matrix C = 1 n XX> ( 2 )

32 Computing PCA Method 1: eigendecomposition U are eigenvectors of covariance matrix C = 1 n XX> Computing C already takes O(nd 2 ) time (very expensive)

33 Computing PCA Method 1: eigendecomposition U are eigenvectors of covariance matrix C = 1 n XX> Computing C already takes O(nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d d d n V > n n where U > U = I d d, V > V = I n n, is diagonal ( )

34 Computing PCA Method 1: eigendecomposition U are eigenvectors of covariance matrix C = 1 n XX> Computing C already takes O(nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d d d n V > n n where U > U = I d d, V > V = I n n, is diagonal Computing top k singular vectors takes only O(ndk)

35 Computing PCA Method 1: eigendecomposition U are eigenvectors of covariance matrix C = 1 n XX> Computing C already takes O(nd 2 ) time (very expensive) Method 2: singular value decomposition (SVD) Find X = U d d d n V > n n where U > U = I d d, V > V = I n n, is diagonal Computing top k singular vectors takes only O(ndk) Relationship between eigendecomposition and SVD: Left singular vectors are principal components (C = U 2 U > )

36 Eigen-faces [Turk & Pentland 1991] d = number of pixels Each x i 2 R d is a face image x ji = intensity of the j-th pixel in image i

37 Eigen-faces [Turk & Pentland 1991] d = number of pixels Each x i 2 R d is a face image x ji = intensity of the j-th pixel in image i X d n u U d k Z k n (... ) u ( ) ( z 1... z n ) z x

38 Eigen-faces [Turk & Pentland 1991] d = number of pixels Each x i 2 R d is a face image x ji = intensity of the j-th pixel in image i X d n u U d k Z k n (... ) u ( ) ( z 1... z n ) Idea: z i more meaningful representation of i-th face than x i Can use z i for nearest-neighbor classification ( + ) ( )

39 Eigen-faces [Turk & Pentland 1991] d = number of pixels Each x i 2 R d is a face image x ji = intensity of the j-th pixel in image i X d n u U d k Z k n (... ) u ( ) ( z 1... z n ) Idea: z i more meaningful representation of i-th face than x i Can use z i for nearest-neighbor classification Much faster: O(dk + nk) time instead of O(dn) when n, d k

40 Eigen-faces [Turk & Pentland 1991] d = number of pixels Each x i 2 R d is a face image x ji = intensity of the j-th pixel in image i X d n u U d k Z k n (... ) u ( ) ( z 1... z n ) Idea: z i more meaningful representation of i-th face than x i Can use z i for nearest-neighbor classification Much faster: O(dk + nk) time instead of O(dn) when n, d Why no time savings for linear classifier? k

41 Latent Semantic Analysis [Deerwater 1990] d = number of words in the vocabulary Each x i 2 R d is a vector of word counts x =

42 Latent Semantic Analysis [Deerwater 1990] d = number of words in the vocabulary Each x i 2 R d is a vector of word counts x ji = frequency of word j in document i ( X d n u U d k Z ) ( k n ) u stocks: 2 0 chairman: 4 1 the: wins: 0 2 game: 1 3 ( z 1... z n )

43 Latent Semantic Analysis [Deerwater 1990] d = number of words in the vocabulary Each x i 2 R d is a vector of word counts x ji = frequency of word j in document i ( X d n u U d k Z ) ( k n ) u stocks: 2 0 chairman: 4 1 the: wins: 0 2 game: 1 3 ( z 1... z n ) How to measure similarity between two documents? z > 1 z 2 is probably better than x > 1 x 2

44 Latent Semantic Analysis [Deerwater 1990] d = number of words in the vocabulary Each x i 2 R d is a vector of word counts x ji = frequency of word j in document i ( X d n u U d k Z ) ( k n ) u stocks: 2 0 chairman: 4 1 the: wins: 0 2 game: 1 3 ( z 1... z n ) How to measure similarity between two documents? z > 1 z 2 is probably better than x > 1 x 2 Applications: information retrieval Note: no computational savings; original x is already sparse

45 Network anomaly detection [Lakhina 2005] x ji = amount of tra c on link j in the network during each time interval i

46 Network anomaly detection [Lakhina 2005] x ji = amount of tra c on link j in the network during each time interval i Model assumption: total tra c is sum of flows along a few paths Apply PCA: each principal component intuitively represents a path Anomaly when tra c deviates from first few principal components Normal Anomalous

47 Multi-task learning [Ando & Zhang 2005] Have n related tasks (classify documents for various users) Each task has a linear classifier with weights x i Want to share structure between classifiers

48 Multi-task learning [Ando & Zhang 2005] Have n related tasks (classify documents for various users) Each task has a linear classifier with weights x i Want to share structure between classifiers One step of their procedure: given n linear classifiers x 1,...,x n, run PCA to identify shared structure: X = ( x 1... x n ) u UZ Each principal component is a eigen-classifier

49 Multi-task learning [Ando & Zhang 2005] Have n related tasks (classify documents for various users) Each task has a linear classifier with weights x i Want to share structure between classifiers One step of their procedure: given n linear classifiers x 1,...,x n, run PCA to identify shared structure: X = ( x 1... x n ) u UZ Each principal component is a eigen-classifier Other step of their procedure: Retrain classifiers, regularizing towards subspace U

50 PCA Summary Intuition: capture variance of data or minimize reconstruction error Algorithm: find eigendecomposition of covariance matrix or SVD Impact: reduce storage (from O(nd) to O(nk)), reduce time complexity Advantages: simple,fast Applications: eigen-faces, eigen-documents, network anomaly detection, etc.

51 Probabilistic Interpretation Generative Model [Tipping and Bishop, 1999]: For each data point i =1,...,n: Draw the latent vector: z i N (0,I k k ) Create the data point: x i N (Uz i, 2 I d d ) PCA finds the U that maximizes the likelihood of the data max U p(x U)

52 Probabilistic Interpretation Generative Model [Tipping and Bishop, 1999]: For each data point i =1,...,n: Draw the latent vector: z i N (0,I k k ) Create the data point: x i N (Uz i, 2 I d d ) PCA finds the U that maximizes the likelihood of the data max U p(x U) Advantages: Handles missing data (important for collaborative filtering) Extension to factor analysis: allow non-isotropic noise (replace 2 I d d with arbitrary diagonal matrix)

53 Limitations of Linearity PCA is e ective PCA is ine ective

54 Limitations of Linearity PCA is e ective PCA is ine ective Problem is that PCA subspace is linear: In this example: S = {x = Uz : z 2 R k } S = {(x 1,x 2 ):x 2 = u 2 u 1 x 1 }

55 Nonlinear PCA Broken solution Desired solution We want desired solution: S = {(x 1,x 2 ):x 2 = u 2 u 1 x 2 1} = { (x) =Uz} (x) =( ) >

56 Nonlinear PCA Broken solution Desired solution We want desired solution: S = {(x 1,x 2 ):x 2 = u 2 u 1 x 2 1} We can get this: S = { (x) =Uz} with (x) =(x 2 1,x 2 ) >

57 Nonlinear PCA Broken solution Desired solution We want desired solution: S = {(x 1,x 2 ):x 2 = u 2 u 1 x 2 1} We can get this: S = { (x) =Uz} with (x) =(x 2 1,x 2 ) > { } > Linear dimensionality reduction in (x) space, Nonlinear dimensionality reduction in x space (x) =( sin( ) ) >

58 Nonlinear PCA Broken solution Desired solution We want desired solution: S = {(x 1,x 2 ):x 2 = u 2 u 1 x 2 1} We can get this: S = { (x) =Uz} with (x) =(x 2 1,x 2 ) > { } > Linear dimensionality reduction in (x) space, Nonlinear dimensionality reduction in x space (x) =( sin( ) ) > Idea: Use kernels

59 Kernel PCA Representer theorem: XX > u = u x X u = X = P n i=1 ix i

60 Kernel PCA Representer theorem: X u = X = P n XX > u = u i=1 ix i x Kernel function: k(x 1, x 2 ) such that K, thekernelmatrixformedbyk ij = k(x i, x j ), is positive semi-definite

61 Kernel PCA P Representer theorem: X u = X = P n XX > u = u i=1 ix i x Kernel function: k(x 1, x 2 ) such that K, thekernelmatrixformedbyk ij = k(x i, x j ), is positive semi-definite max kuk=1 u> XX > u = max > X > X =1 > (X > X)(X > X) = max > K =1 > K 2

62 Kernel PCA Direct method: Kernel PCA objective: max > K =1 > K 2 ) kernel PCA eigenvalue problem: X > X = 0 Modular method (if you don t want to think about kernels): Find vectors x 0 1,...,x 0 n such that x 0> i x 0 j = K ij = (x i ) > (x j ) Key: use any vectors that preserve inner products One possibility is Cholesky decomposition K = X x 0 n > Xx 0 n

63 Kernel PCA

64 Canonical Correlation Analysis (CCA)

65 Motivation for CCA [Hotelling 1936] Often, each data point consists of two views: Image retrieval: for each image, have the following: x: Pixels (or other visual features) y: Text around the image

66 Motivation for CCA [Hotelling 1936] Often, each data point consists of two views: Image retrieval: for each image, have the following: x: Pixels (or other visual features) y: Text around the image Time series: x: Signal at time t y: Signal at time t +1

67 Motivation for CCA [Hotelling 1936] Often, each data point consists of two views: Image retrieval: for each image, have the following: x: Pixels (or other visual features) y: Text around the image Time series: x: Signal at time t y: Signal at time t +1 Two-view learning: dividefeaturesintotwosets x: Features of a word/object, etc. y: Features of the context in which it appears

68 Motivation for CCA [Hotelling 1936] Often, each data point consists of two views: Image retrieval: for each image, have the following: x: Pixels (or other visual features) y: Text around the image Time series: x: Signal at time t y: Signal at time t +1 Two-view learning: dividefeaturesintotwosets x: Features of a word/object, etc. y: Features of the context in which it appears Goal: reduce the dimensionality of the two views jointly

69 CCA Example Setup: Input data: (x 1, y 1 ),...,(x n, y n ) (matrices X, Y) Goal: find pair of projections (u, v)

70 CCA Example Setup: Input data: (x 1, y 1 ),...,(x n, y n ) (matrices X, Y) Goal: find pair of projections (u, v) Dimensionality reduction solutions: Independent Joint x and y are paired by brightness

71 CCA Definition Definitions: Variance: cvar(u > x)=u > XX > u Covariance: ccov(u > x, v > y)=u > XY > v Correlation: ccov(u > x,v > y) p p cvar(u > x) cvar(v > y) Objective: maximize correlation between projected views Properties: max u,v dcorr(u> x, v > y) Focus on how variables are related, not how much they vary Invariant to any rotation and scaling of data

72 From PCA to CCA PCA on views separately: no covariance term max u,v u > XX > u u > u + v> YY > v v > v PCA on concatenation (X >, Y > ) > : includes covariance term max u,v u > XX > u +2u > XY > v + v > YY > v u > u + v > v

73 From PCA to CCA PCA on views separately: no covariance term max u,v u > XX > u u > u + v> YY > v v > v PCA on concatenation (X >, Y > ) > : includes covariance term max u,v u > XX > u +2u > XY > v + v > YY > v u > u + v > v Maximum covariance: drop variance terms max u,v u > XY > v p u> u p v > v

74 From PCA to CCA PCA on views separately: no covariance term max u,v u > XX > u u > u + v> YY > v v > v PCA on concatenation (X >, Y > ) > : includes covariance term max u,v u > XX > u +2u > XY > v + v > YY > v u > u + v > v Maximum covariance: drop variance terms max u,v u > XY > v p u> u p v > v Maximum correlation (CCA): divide out variance terms max u,v u > XY > v p u> XX > u p v > YY > v

75 Importance of Regularization Extreme examples of degeneracy: If x = Ay, thenany(u, v) with u = Av is optimal (correlation 1) If x and y are independent, then any (u, v) is optimal (correlation 0) X Y (u v)

76 Importance of Regularization Extreme examples of degeneracy: If x = Ay, thenany(u, v) with u = Av is optimal (correlation 1) If x and y are independent, then any (u, v) is optimal (correlation 0) Problem: ifx or Y has rank n, thenany(u, v) is optimal > (correlation 1) with u = X > Yv ) CCA is meaningless!

77 Importance of Regularization Extreme examples of degeneracy: If x = Ay, thenany(u, v) with u = Av is optimal (correlation 1) If x and y are independent, then any (u, v) is optimal (correlation 0) Problem: ifx or Y has rank n, thenany(u, v) is optimal (correlation 1) with u = X > > Yv ) CCA is meaningless! Solution: regularization (interpolate between maximum covariance and maximum correlation) max u,v u > XY > v p u> (XX > + I)u p v > (YY > + I)v

78 Canonical Correlation Forests CANONICAL CORRELATION FORESTS, RAINFORTH AND WOOD (a) Single CART (unpruned) (b) RF with 200 Trees 2 (c) Single CCT (unpruned) (d) CCF with 200 Trees Fig. 1: Decision surfaces for artificial spirals dataset. (a) Shows the hierarchical partitions and surface for a single axis aligned tree while (b) shows the effect of averaging over a number of, individually randomized, axis aligned trees. (c) Shows a single canonical correlation tree (CCT) and (d) demonstrates that averaging over CCTs to give a canonical correlation forest leads to smoother decision surfaces which better represent the data than the axis aligned equivalent. that different trees are trained in a different coordinate system. Although this was shown to given significant improvement in predictive performance over RF, they require all features to be considered for splitting at every node, leading to a computationally relatively expensive algorithm for more than a modest number of features. 2 node is defined by the tuple j = { j,1, j,2, j, sj } where { j,1, j,2 } J \j are the two child node ids, j 2 RD is a weight vector used to project the input features and sj is the point at which the splitting occurs in the projected space X T j. Note that for orthogonal trees only a single element of j will be non-zero, whereas oblique trees will have multiple non-zero elements. Let B (j, t) denote the partition of the input space associated with node j such that B (0, t) = RD and B (j, t) = B ( j,1, t) [ B ( j,2, t). The partitioning procedure is then defined such that Example: RF that uses CCA to determine axis for splits C ANONICAL C ORRELATION F ORESTS Canonical correlation forests (CCFs) are a new tree ensemble

79 Summary Framework: z = U > x, x u Uz Criteria for choosing U: PCA: maximize projected variance CCA: maximize projected correlation Algorithm: generalized eigenvalue problem Extensions: non-linear using kernels (using same linear framework) probabilistic, sparse, robust (hard optimization)

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Linear Dimensionality Reduction Practical Machine Learning (CS294-34) September 24, 2009 Percy Liang Lots of high-dimensional data... face images Zambian President Levy Mwanawasa has won a second term

More information

Lecture 8: Principal Component Analysis; Kernel PCA

Lecture 8: Principal Component Analysis; Kernel PCA Lecture 8: Principal Component Analysis; Kernel PCA Lester Mackey April 23, 2014 Stats 306B: Unsupervised Learning Sta306b April 19, 2011 Principal components: 16 PCA example: digit data 130 threes, a

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond PCA, ICA and beyond Summer School on Manifold Learning in Image and Signal Analysis, August 17-21, 2009, Hven Technical University of Denmark (DTU) & University of Copenhagen (KU) August 18, 2009 Motivation

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

CITS 4402 Computer Vision

CITS 4402 Computer Vision CITS 4402 Computer Vision A/Prof Ajmal Mian Adj/A/Prof Mehdi Ravanbakhsh Lecture 06 Object Recognition Objectives To understand the concept of image based object recognition To learn how to match images

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Covariance to PCA. CS 510 Lecture #14 February 23, 2018 Covariance to PCA CS 510 Lecture 14 February 23, 2018 Overview: Goal Assume you have a gallery (database) of images, and a probe (test) image. The goal is to find the database image that is most similar

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 21: Review Jan-Willem van de Meent Schedule Topics for Exam Pre-Midterm Probability Information Theory Linear Regression Classification Clustering

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Covariance to PCA. CS 510 Lecture #8 February 17, 2014

Covariance to PCA. CS 510 Lecture #8 February 17, 2014 Covariance to PCA CS 510 Lecture 8 February 17, 2014 Status Update Programming Assignment 2 is due March 7 th Expect questions about your progress at the start of class I still owe you Assignment 1 back

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following

More information

Face Detection and Recognition

Face Detection and Recognition Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

Subspace Methods for Visual Learning and Recognition

Subspace Methods for Visual Learning and Recognition This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462 Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Eigenfaces. Face Recognition Using Principal Components Analysis

Eigenfaces. Face Recognition Using Principal Components Analysis Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

CSE 554 Lecture 7: Alignment

CSE 554 Lecture 7: Alignment CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information