Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota

Size: px
Start display at page:

Download "Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota"

Transcription

1 Efficient Linear Algebra methods for Data Mining Yousef Saad Department of Computer Science and Engineering University of Minnesota NUMAN-08 Kalamata, 08

2 Team members involved in this work Support Past: Efi Kokiopoulou [Now at the U. of Lausanne] Current: Jie Chen [grad student] Sofia Sakellaridi [grad student] Haw-Ren Fang [Post-Doc] Support: National Science Foundation Kalamata, p. 2

3 Introduction, background, and motivation Common goal of data mining methods: to extract meaningful information or patterns from data. Very broad area includes: data analysis, machine learning, pattern recognition, information retrieval,... Main tools used: linear algebra; graph theory; approximation theory; optimization;... In this talk: emphasis on dimension reduction techniques and the interrelations between techniques Kalamata, p. 3

4 Focus on two main problems Information retrieval Face recognition and 3 types of dimension reduction methods Standard subspace methods [SVD, Lanczos] Graph-based methods multilevel methods Kalamata, p. 4

5 The problem Given d m find a mapping Φ : x R m y R d Mapping may be explicit (e.g., y = V T x) Or implicit (nonlinear) Practically: Given X R m n, we want to find a low-dimensional representation Y R d n of X Two classes of methods: (1) projection techniques and (2) nonlinear implicit methods. Kalamata, p. 5

6 Example 1: The Swill-Roll (00 points in 3-D) Original Data in 3 D Kalamata, p. 6

7 2-D reductions : 15 PCA 15 LPP Eigenmaps ONPP Kalamata, p. 7

8 Example 2: Digit images (a sample of 30) Kalamata, p. 8

9 2-D reductions : PCA digits : LLE digits : K PCA digits : ONPP digits : x 3 Kalamata, p. 9

10 Projection-based Dimensionality Reduction Given: a data set X = [x 1, x 2,..., x n ], and d the dimension of the desired reduced space Y. Want: a linear transformation from X to Y n d m v T m X Y n d X R m n V R m d Y = V X Y R d n m-dimens. objects (x i ) flattened to d-dimens. space (y i ) Constraint: The y i s must satisfy certain properties Optimization problem Kalamata, p.

11 Linear Dimensionality Reduction: PCA In PCA projected data must have maximum variance, i.e., we need to maximize over all orthogonal m d matrices V : i y i 1 n j y j 2 2 = = Tr [ V X X V ] Where: X = X(I 1 n 11T ) == origin-recentered version of X Solution V = { dominant eigenvectors } of the covariance matrix == Set of left singular vectors of X Solution V also minimizes reconstruction error.. x i V V T x i 2 = x i V y i 2 i i.. and it also maximizes [Korel and Carmel 04] i,j y i y j 2 Kalamata, p. 11

12 Information Retrieval: Vector Space Model Given: 1) set of documents (columns of a matrix A); 2) a query vector q. Entry a ij of A = frequency of term i in document j + weighting. Terms Documents Queries ( pseudo-documents ) q represented similarly to columns Problem: find columns of A that best match q Kalamata, p. 12

13 Vector Space Model and the Truncated SVD Similarity metric: angle between column A j,: and query q Use Cosines: q T A :,j A :,j 2 q 2 To rank all documents compute the similarity vector: s = A T q Not very effective. Problems : polysemy, synonymy,... LSI: replace matrix A by low rank approximation A = UΣV T A k = U k Σ k V T k s k = A T k q U k : term space, V k : document space. Called TSVD Expensive, hard to update,.. Kalamata, p. 13

14 New similarity vector: s k = A T k q = V kσ k U T k Issues: How to select k? Computational cost (memory + computation) Problem with updates Kalamata, p. 14

15 Alternative: SDD; Less memory but cost still an issue. Alternative: polynomial approximation. s k φ k (A T A)A T q where φ k = deg. k polynom. Yet another alternative: use Lanczos vectors instead of singular vectors [Ruhe and Blom, 05] Kalamata, p. 15

16 IR: Use of the Lanczos algorithm * Joint work with Jie Chen Lanczos is good at catching large (and small) eigenvalues: can compute singular vectors with Lanczos, & use them in LSI Can do better: Use the Lanczos vectors directly for the projection.. First advocated by: K. Blom and A. Ruhe [SIMAX, vol. 26, 05]. Use Lanczos bidiagonalization. Use a similar approach But directly with AA T or A T A. Kalamata, p. 16

17 IR: Use of the Lanczos algorithm (1) Let A R m n. Apply the Lanczos procedure to M = AA T. Result: Q T k AAT Q k = T k with Q k orthogonal, T k tridiagonal. Define s i orth. projection of Ab on subspace span{q i } s i := Q i Q T i Ab. s i can be easily updated from s i 1 : s i = s i 1 + q i q T i Ab. Kalamata, p. 17

18 IR: Use of the Lanczos algorithm (2) If n < m it may be more economial to apply Lanczos to M = A T A which is n n. Result: Q T k AT A Q k = T k Define: t i := A Q i Q T i b, Project b first before applying A to result. Kalamata, p. 18

19 Why does this work? First, recall a result on Lanczos algorithm [YS 83] Let {λ j, u j } = j-th eigen-pair of M (label ) where (I Q k Q T k )u j Q k Q T k u j K j T k j (γ j ) γ j = λ j λ j+1 λ j+1 λ n, K j = (I Q 1 Q T 1 )u j Q 1 Q T 1 u, j 1 j = 1, j 1 i=1 λ i λ n λ i λ j j 1 and T l (x) = Chebyshev polynomial of 1st kind of degree l. This has the form (I Q k Q T k )u j c j /T k j (γ j ), where c j = constant independent of k Kalamata, p. 19

20 Result: Distance between unit eigenvector u j and Krylov subspace span(q k ) decays fast (for small j) Consider component of difference between Ab s k along left singular directions of A. If A = UΣV T, then u j s (columns of U) are eigenvectors of M = AA T. So: Ab s k, u j = (I Q k Q T k )Ab, u j = (I Q k Q T k )u j, Ab (I Q k Q T k )u j Ab c j Ab T 1 k j (γ j) {s i } converges rapidly to Ab in directions of the major left singular vectors of A. Kalamata, p.

21 Similar result for left projection sequence t j Here is a typical distribution of eigenvalues of M: [Matrix of size ] y λ 5 λ 3 0 λ 4 λ 2 λ 1 x Convergence toward first few singular vectors very fast Kalamata, p. 21

22 Advantages of Lanczos over polynomial filters: (1) No need for eigenvalue estimates (2) Mat-vecs performed only in preprocessing Disadvantages: (1) Need to store Lanczos vectors; (2) Preprocessing must be redone when A changes. (3) Need for reorthogonalization expensive for large k. Kalamata, p. 22

23 Tests: IR Information retrieval datasets # Terms # Docs # queries sparsity MED 7,014 1, CRAN 3,763 1, Med dataset. Cran dataset. Preprocessing times preprocessing time lanczos tsvd Med iterations preprocessing time lanczos tsvd Cran iterations Kalamata, p. 23

24 Average query times Med dataset Cran dataset. average query time 5 x Med 3 lanczos tsvd lanczos rf 4 tsvd rf average query time 4 x Cran 3 lanczos tsvd lanczos rf tsvd rf iterations iterations Kalamata, p. 24

25 Average retrieval precision Med dataset Cran dataset 0.9 Med 0.9 Cran average precision average precision iterations lanczos tsvd lanczos rf tsvd rf iterations lanczos tsvd lanczos rf tsvd rf Retrieval precision comparisons Kalamata, p. 25

26 In summary: Results comparable to those of SVD..... at a much lower cost. [However not for the same dimension k] Thanks: Helpful tools and datasets widely available. We used TMG [developed at the U. of Patras (D. Zeimpekis, E. Gallopoulos)] Kalamata, p. 26

27 Problem 2: Face Recognition background Problem: We are given a database of images: [arrays of pixel values]. And a test (new) image. ÿ ÿ ÿ ÿ ÿ ÿ Question: database? Does this new image correspond to one of those in the Kalamata, p. 27

28 Difficulty Different positions, expressions, lighting,..., situations : Common approach: eigenfaces technique Principal Component Analysis Kalamata, p. 28

29 Example: Occlusion. See recent paper by John Wright et al. Right: 50% pixels randomly changed See also: Recent real-life example international man-hunt Kalamata, p. 29

30 Eigenfaces Consider each picture as a one-dimensional colum of all pixels Put together into an array A of size # pixels # images.... =... } {{ } A Do an SVD of A and perform comparison with any test image in low-dim. space Similar to LSI in spirit but data is not sparse. Idea: replace SVD by Lanczos vectors (same as for IR) Kalamata, p. 30

31 Tests: Face Recognition Tests with 2 well-known data sets: ORL 40 subjects, sample images each example: # of pixels : TOT. # images : 400 AR set 126 subjects 4 facial expressions selected for each [natural, smiling, angry, screaming] example: # of pixels : # TOT. # images : 504 Kalamata, p. 31

32 Tests: Face Recognition Recognition accuracy of Lanczos approximation vs SVD ORL dataset AR dataset ORL lanczos tsvd AR lanczos tsvd average error rate average error rate iterations iterations Vertical axis shows average error rate. Horizontal = Subspace dimension Kalamata, p. 32

33 GRAPH-BASED TECHNIQUES

34 Laplacean Eigenmaps (Belkin-Niyogi-02) Not a linear (projection) method but a Nonlinear method Starts with k-nearest-neighors graph Defines the graph Laplacean L = D W. Simplest: D = diag(deg(i)); w ij = 1 if j N i 0 else with N i = neighborhood of i (excl. i); deg(i) = N i x i x j Kalamata, p. 34

35 A few properties of graph Laplacean matrices Let L = any matrix s.t. L = D W, with D = diag(d i ) and w ij 0, d i = j i w ij Property 1: for any x R n : x Lx = 1 w ij x i x j 2 2 Property 2: (generalization) for any Y R d n : Tr [Y LY ] = 1 w ij y i y j 2 2 i,j i,j Kalamata, p. 35

36 Property 3: For the particular L = I 1 n 11 XLX = X X == n Covariance matrix [Proof: 1) L is a projector: L L = L 2 = L, and 2) XL = X] Consequence-1: PCA equivalent to maximizing ij y i y j 2 Consequence-2: what about replacing trivial L with something else? [viewpoint in Koren-Carmel 04] Kalamata, p. 36

37 Property 4: (Graph partitioning) If x is a vector of signs (±1) then x Lx = 4 ( number of edge cuts ) edge-cut = pair (i, j) with x i x j Consequence: Can be used for partitioning graphs, or clustering [take p = sign(u 2 ), where u 2 = 2nd smallest eigenvector..] Kalamata, p. 37

38 Return to Laplacean eigenmaps approach Laplacean Eigenmaps *minimizes* n F EM (Y ) = w ij y i y j 2 subject to Y DY = I. i,j=1 Notes: 1. Motivation: if x i x j is small (orig. data), we want y i y j to be also small (low-d data) 2. Note Min instead of Max as in PCA [counter-intuitive] 3. Above problem uses original data indirectly through its graph Kalamata, p. 38

39 Problem translates to: min Y R d n Y D Y = I Tr [ Y (D W )Y ]. Solution (sort eigenvalues increasingly): (D W )u i = λ i Du i ; y i = u i ; i = 1,, d Note: an n n sparse eigenvalue problem [In sample space] Note: can assume D = I. Amounts to rescaling data. Problem becomes (I W )u i = λ i u i ; y i = u i ; i = 1,, d Kalamata, p. 39

40 Why smallest eigenvalues vs largest for PCA? Intuition: Graph Laplacean and unit Laplacean are very different: one involves a sparse graph (More like a discr. differential operator). The other involves a dense graph. (More like a discr. integral operator). They should be treated as the inverses of each other. Viewpoint confirmed by what we learn from Kernel approach Kalamata, p. 40

41 Locally Linear Embedding (Roweis-Saul-00) LLE is very similar to Eigenmaps. Main differences: 1) Graph Laplacean matrix is replaced by an affinity graph 2) Objective function is changed: want to preserve graph 1. Graph: Each x i is written as a convex combination of its k nearest neighbors: x i Σw ij x j, j N i w ij = 1 Optimal weights computed ( local calculation ) by minimizing x i Σw ij x j for i = 1,, n x i x j Kalamata, p. 41

42 2. Mapping: The y i s should obey the same affinity as x i s Minimize: i y i j Solution: w ij y j 2 subject to: Y 1 = 0, Y Y = I (I W )(I W )u i = λ i u i ; y i = u i. (I W )(I W ) replaces the graph Laplacean of eigenmaps Kalamata, p. 42

43 Locally Preserving Projections (He-Niyogi-03) LPP is a linear dimensionality reduction technique n Recall the setting: m X Want V R m d ; Y = V X d m v T Y d Starts with the same neighborhood graph as Eigenmaps: L D W = graph Laplacean ; with D diag({σ i w ij }). n Kalamata, p. 43

44 Optimization problem is to solve min Y R d n, Y DY =I Σ i,j w ij y i y j 2, Y = V X. Difference with eigenmaps: Y is a projection of X data Solution (sort eigenvalues increasingly) XLX v i = λ i XDX v i y i,: = v i X Note: essentially same method in [Koren-Carmel 04] called weighted PCA [viewed from the angle of improving PCA] Kalamata, p. 44

45 ONPP (Kokiopoulou and YS 05) Orthogonal Neighborhood Preserving Projections Can be viewed as a linear version of LLE Uses the same graph as LLE. Objective: preserve the affinity graph (as in LEE) *but* by means of an orthogonal projection Objective function Φ(Y ) = Σ i y i Σ j w ij y j 2 Constraint: Y = V X, V V = I Notice that Φ(Y ) = Y Y W 2 F = = Tr [ V X(I W )(I W )X V ] Kalamata, p. 45

46 Resulting problem: min V R m d ; V V =I Tr V X(I W )(I W )X }{{} M V Solution: Columns of V = eigenvectors of M associated with smallest d eigenvalues Can be computed as d lowest left singular vectors of X(I W ) Kalamata, p. 46

47 A unified view Method Object. (min) Constraint PCA/MDS Tr [V X( I + ee )X V ] V V = I LLE Tr [Y (I W )(I W )Y ] Y Y = I Eigenmaps Tr [Y (I W )Y ] Y Y = I LPP Tr [V X(I W )X V ] V XX V = I ONPP Tr [V X(I W )(I W )X V ] V V = I LDA Tr [V X(I H)X V ] V XX V = I Kalamata, p. 47

48 Let M = I W = a Laplacean matrix ( I + ee for PCA/MDS); or the LLE matrix (I W )(I W ), or geodesic distance matrix (ISOMAP). All techniques lead to one of two types of problems First type is: min Y R d n Y Y = I Tr [ Y MY ] Y obtained from solving an eigenvalue problem LLE, Eigenmaps (normalized),.. Kalamata, p. 48

49 type is: And the second min V R m d V G V = I Tr [ V XMX V ]. G is either the identity matrix or XDX or XX. Low-Dim. data : Y = V X Important observation: 2nd is just a projected version of the 1st, i.e., approximate eigenvectors are sought in Span {X} [Rayleigh- Ritz procedure] Problem is of dim. m (dim. of data) not n (# of samples). This difference can be mitigated by resorting to Kernels.. Kalamata, p. 49

50 Graph-based methods in a supervised setting Subjects of training set are known (labeled). Q: given a test image (say) find its label. R. S. Varga R. S. Varga R. S. Varga R. S. Varga S. R. Smith S. R. Smith S. R. Smith S. R. Smith ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ ÿ J. R. Doe J. R. Doe J. R. Doe J. R. Doe Question: Find label (best match) for test image. Kalamata, p. 50

51 Methods can be adapted to supervised mode by building the graph to take into account class labels. Idea is simple: Build G so that nodes in the same class are neighbors. If c = # classes, G will consist of c cliques. Matrix W will be block-diagonal W = diag(w 1, W 2,, W c ) Easy to see that rank(w ) = n c. Can be used for LPP, ONPP, etc.. Kalamata, p. 51

52 TIME FOR A MATLAB DEMO

53 Supervised learning experiments: digit recognition Set of 390 images of digits ( of each digit) Each picture has 16 = pixels. Select any one of the digits and try to recognize it among the remaining images Methods: KNN, PCA, LPP, ONPP Kalamata, p. 53

54 One word about KNN-classifiers Simple idea of vote get k nearest neighbors. Assigned label = most common label among these neibhbors? Kalamata, p. 54

55 MULTILEVEL METHODS

56 Multilevel techniques Divide and conquer paradigms as well as multilevel methods in the sense of domain decomposition Main principle: very costly to do an SVD [or Lanczos] on the whole set. Why not find a smaller set on which to do the analysis without too much loss? Tools used: graph coarsening, divide and conquer Kalamata, p. 56

57 Multilevel techniques: Hypergraphs to the rescue General idea: Given X = [x 1,, x n ] R m n find another set ( coarsened set ) ˆX = [ˆx 1,, ˆx k ] R m k ˆX should be a good representative of X then find projector from R m to R d based on this subset Main tool used: graph coarsening. We will describe hypergraph-based techniques Kalamata, p. 57

58 Hypergraphs A hypergraph H = (V, E) is a generalizaition of a graph V = set of vertices V E = set of hyperedges. Each e E is a nonempty subset of V Standard undirected graphs: each e consists of two vertices. degree of e = e degree of a vertex v = number of hyperedges e s.t. x e. Two vertices are neighbors if there is a hyperedge connecting them Kalamata, p. 58

59 Example of a hypergraph e net e c a b 5 d 6 net d Boolean matrix representation a b A = c d 1 1 e Canonical hypergraph representation for sparse data (e.g. text) Kalamata, p. 59

60 Hypergraph Coarsening Coarsening a hypergraph H = (V, E) means finding a coarse approximation Ĥ = ( ˆV, Ê) to H with ˆV < V, which tries to retain as much as possible of the structure of the original hypergraph Idea: repeat coarsening recursively. Result: succession of smaller hypergraphs which approximate the original graph. Several methods exist. We use matching, which successively merges pairs of vertices Used in most graph partitioning methods: hmetis, Patoh, zoltan,.. Algorithm successively selects pairs of vertices to merge based on measure of similarity of the vertices. Kalamata, p. 60

61 Application: Multilevel Dimension Reduction Main Idea: coarsen to a certain level. Then use the resulting data set ˆX to find projector from R m to R d. This pro Graph Coarsen Coarsen n X m n Y d jector can be used to project the original data or any new data. Last Level m X r n r Project Yr n r d Main gain: Dimension reduction is done with a much smaller set. Hope: not much loss compared to using whole data Kalamata, p. 61

62 Application to text mining Recall common approach: 1. Scale data [e.g. TF-IDF scaling: ] 2. Perform a (partial) SVD on resulting matrix X U d Σ d V T d 3. Process query by same scaling (e.g. TF-IDF) 4. Compute similarities in d-dimensional space: s i = ˆq, ˆx i / ˆq ˆx i where [ˆx 1, ˆx 2,..., ˆx n ] = Vd T Rd n ; ˆq = Σ 1 d U T d q Rd Multilevel approach: replace SVD (or any other dim. reduction) by dimension reduction on coarse set. Only difference: TF-IDF done on the coarse set not original set. Kalamata, p. 62

63 Tests Three public data sets used for experiments: Medline, Cran and NPL (cs.cornell.edu) Coarsening to a max. of 4 levels. Data set Medline Cran NPL # documents # terms sparsity (%) 0.74% 1.41% 0.27% # queries avg. # rel./query Kalamata, p. 63

64 Results with NPL Statistics Level coarsen. # optimal optimal avg. time doc. # dim. precision #1 N/A % # % # % # % Kalamata, p. 64

65 Precision-Recall curves 0.5 NPL precision LSI multilevel LSI, r= multilevel LSI, r=3 multilevel LSI, r= recall Kalamata, p. 65

66 CPU times for preprocessing (Dim. reduction part) CPU time for truncated SVD (secs) LSI multilevel LSI, r=2 multilevel LSI, r=3 multilevel LSI, r=4 NPL # dimensions Kalamata, p. 66

67 Conclusion So how is this related to intitial title of efficient algorithms in data mining? Answer: All these eigenvalue problems are not cheap to solve.... and cost issue does not seem to bother practitioners too much for now.. Ingredients that will become mandatory: 1 Avoid the SVD 2 Fast algorithms that do not sacrifice quality. 3 In particullar: Multilevel approaches 4 Multilinear algebra [tensors] Kalamata, p. 67

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Université du Littoral- Calais July 11, 16 First..... to the

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota NASCA 2018, Kalamata July 6, 2018 Introduction, background, and

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY

More information

Linear algebra methods for data mining Yousef Saad Department of Computer Science and Engineering University of Minnesota.

Linear algebra methods for data mining Yousef Saad Department of Computer Science and Engineering University of Minnesota. Linear algebra methods for data mining Yousef Saad Department of Computer Science and Engineering University of Minnesota NCSU Jan 31, 2014 Introduction: a few factoids Data is growing exponentially at

More information

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota University of Padua June 7, 18 Introduction, background, and

More information

A few applications of the SVD

A few applications of the SVD A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering From data-mining to nanotechnology and back: The new problems of numerical linear algebra Yousef Saad Department of Computer Science and Engineering University of Minnesota Computer Science Colloquium

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering From data-mining to nanotechnology and back: The new problems of numerical linear algebra Yousef Saad Department of Computer Science and Engineering University of Minnesota Modelling 09 Roznov pod Radhostem

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM 12 Annual Meeting Twin Cities, July 13th, 12

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Enhanced graph-based dimensionality reduction with repulsion Laplaceans

Enhanced graph-based dimensionality reduction with repulsion Laplaceans Enhanced graph-based dimensionality reduction with repulsion Laplaceans E. Kokiopoulou a, Y. Saad b a EPFL, LTS4 lab, Bat. ELE, Station 11; CH 1015 Lausanne; Switzerland. Email: effrosyni.kokiopoulou@epfl.ch

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Dimensionality Reduction:

Dimensionality Reduction: Dimensionality Reduction: From Data Representation to General Framework Dong XU School of Computer Engineering Nanyang Technological University, Singapore What is Dimensionality Reduction? PCA LDA Examples:

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Information Retrieval

Information Retrieval Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

IN many data analysis problems, such as information retrieval

IN many data analysis problems, such as information retrieval 1 Lanczos Vectors versus Singular Vectors for Effective Dimension Reduction Jie Chen and Yousef Saad Abstract This paper taes an in-depth loo at a technique for computing filtered matrix-vector (mat-vec)

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Main matrix factorizations

Main matrix factorizations Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU, Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4]

EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: Au = λu λ C : eigenvalue u C n : eigenvector Example:

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1

Let A an n n real nonsymmetric matrix. The eigenvalue problem: λ 1 = 1 with eigenvector u 1 = ( ) λ 2 = 2 with eigenvector u 2 = ( 1 Eigenvalue Problems. Introduction Let A an n n real nonsymmetric matrix. The eigenvalue problem: EIGENVALE PROBLEMS AND THE SVD. [5.1 TO 5.3 & 7.4] Au = λu Example: ( ) 2 0 A = 2 1 λ 1 = 1 with eigenvector

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

CS 572: Information Retrieval

CS 572: Information Retrieval CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).

More information

Spectral Regression for Dimensionality Reduction

Spectral Regression for Dimensionality Reduction Spectral Regression for Dimensionality Reduction Deng Cai Xiaofei He Jiawei Han Department of Computer Science, University of Illinois at Urbana-Champaign Yahoo! Research Labs Abstract Spectral methods

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Lecture: Some Practical Considerations (3 of 4)

Lecture: Some Practical Considerations (3 of 4) Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 23 1 / 27 Overview

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden

Index. Copyright (c)2007 The Society for Industrial and Applied Mathematics From: Matrix Methods in Data Mining and Pattern Recgonition By: Lars Elden Index 1-norm, 15 matrix, 17 vector, 15 2-norm, 15, 59 matrix, 17 vector, 15 3-mode array, 91 absolute error, 15 adjacency matrix, 158 Aitken extrapolation, 157 algebra, multi-linear, 91 all-orthogonality,

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Automatic Subspace Learning via Principal Coefficients Embedding

Automatic Subspace Learning via Principal Coefficients Embedding IEEE TRANSACTIONS ON CYBERNETICS 1 Automatic Subspace Learning via Principal Coefficients Embedding Xi Peng, Jiwen Lu, Senior Member, IEEE, Zhang Yi, Fellow, IEEE and Rui Yan, Member, IEEE, arxiv:1411.4419v5

More information

Linear Algebra Background

Linear Algebra Background CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information