Dimension Reduction and Iterative Consensus Clustering

Size: px
Start display at page:

Download "Dimension Reduction and Iterative Consensus Clustering"

Transcription

1 Dimension Reduction and Iterative Consensus Clustering Southeastern Clustering and Ranking Workshop August 24, 2009 Dimension Reduction and Iterative

2 1 Document Clustering Geometry of the SVD Centered SVD Uncentered SVD Principal Direction Divisive Partitioning 2 3 Combination Algorithms Iterating to Reach Consensus 4 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne 5 Dimension Reduction and Iterative

3 Document Clustering Geometry of the SVD Principal Direction Divisive Partitioning For document clustering, we create a term-document matrix, A, as follows: A m n = Term 1 Term i 0 Doc 1 Doc j Doc n f ij 1 C A Term m Where f i,j is the frequency of term i in document j. Various types of term-weighting can be used in place of raw frequencies. For our experiments, we simply normalized the columns. Each column of A represents the coordinates of a document in the m-dimensional term-space", where each standard basis vector represents one term from the dictionary. Dimension Reduction and Iterative

4 Geometry of the SVD Principal Direction Divisive Partitioning Singular Value Decomposition (SVD) Decomposes A = U ΣV T where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. Dimension Reduction and Iterative

5 Geometry of the SVD Principal Direction Divisive Partitioning Singular Value Decomposition (SVD) Decomposes A = U ΣV T where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. The truncated SVD yields the closest rank r approximation to A in the 2-norm. a j rx [V T ] i,j σ i u i i=1 Dimension Reduction and Iterative

6 Geometry of the SVD Principal Direction Divisive Partitioning Singular Value Decomposition (SVD) Decomposes A = U ΣV T where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. The truncated SVD yields the closest rank r approximation to A in the 2-norm. a j rx [V T ] i,j σ i u i i=1 Thus, a column v j of the truncated V T is the coordinates of a j once projected into the lower dimensional space spanned by the orthogonal basis. (σ 1 u 1, σ 2 u 2,... σ r u r ) Dimension Reduction and Iterative

7 Geometry of the SVD Principal Direction Divisive Partitioning Singular Value Decomposition (SVD) Decomposes A = U ΣV T where U and V are orthogonal matrices and Σ is a diagonal matrix of singular values. The truncated SVD yields the closest rank r approximation to A in the 2-norm. a j rx [V T ] i,j σ i u i i=1 Thus, a column v j of the truncated V T is the coordinates of a j once projected into the lower dimensional space spanned by the orthogonal basis. (σ 1 u 1, σ 2 u 2,... σ r u r ) We ll use the columns of V T as a lower dimensional representation of the columns of A for the purposes of clustering. Dimension Reduction and Iterative

8 Geometry of the SVD Principal Direction Divisive Partitioning Geometry of Singular Vectors when A is centered The first left-hand singular vector, u 1, of the centered matrix C = A µe T is the direction along which the variance of the data is maximal. u 1 Dimension Reduction and Iterative

9 Geometry of the SVD Principal Direction Divisive Partitioning Geometry of Singular Vectors when A is centered The second left singular vector of C, u 2, is the direction orthogonal to u 1 along which the variance is maximal. u 2 u 1 u 1 Dimension Reduction and Iterative

10 Outline Geometry of the SVD Principal Direction Divisive Partitioning Geometry of SVD when A is uncentered The first left singular vector of A is the direction of the least-squares line through the origin. T = α* u (A ) 1 (Zero intercept total least squares line) u (A ) 1 u (C ) 1 Dimension Reduction and Iterative

11 Geometry of the SVD Principal Direction Divisive Partitioning Principal Direction Divisive Partitioning (PDDP) Algorithm proposed by Daniel Boley at Univ. of MN in 2002 Dimension Reduction and Iterative

12 Geometry of the SVD Principal Direction Divisive Partitioning Principal Direction Divisive Partitioning (PDDP) Algorithm proposed by Daniel Boley at Univ. of MN in 2002 Iterative process partitions data into 2 clusters with each iteration, based upon their projection onto the direction of maximal variance. Dimension Reduction and Iterative

13 Geometry of the SVD Principal Direction Divisive Partitioning Principal Direction Divisive Partitioning (PDDP) Algorithm proposed by Daniel Boley at Univ. of MN in 2002 Iterative process partitions data into 2 clusters with each iteration, based upon their projection onto the direction of maximal variance. PDDP can be adapted to use more than just the principal singular vector. Dimension Reduction and Iterative

14 Geometry of the SVD Principal Direction Divisive Partitioning Principal Direction Divisive Partitioning (PDDP) Algorithm proposed by Daniel Boley at Univ. of MN in 2002 Iterative process partitions data into 2 clusters with each iteration, based upon their projection onto the direction of maximal variance. PDDP can be adapted to use more than just the principal singular vector. We will often use the results from PDDP to seed the k-means algorithm with an initial guess Dimension Reduction and Iterative

15 Outline Illustration of one iteration of PDDP Geometry of the SVD Principal Direction Divisive Partitioning Signs of the entries in v 1 tell us whether the projected data fall to the left or right of the origin. v (i) <0 1 v 1(i) >0 Figure: Data Cloud Projected onto the span of u 1 (C) Dimension Reduction and Iterative

16 The nonnegative matrix factorization (NMF) seeks to decompose a nonnegative matrix into the product of two nonnegative matrices: A m n = W m r H r n. Dimension Reduction and Iterative

17 The nonnegative matrix factorization (NMF) seeks to decompose a nonnegative matrix into the product of two nonnegative matrices: A m n = W m r H r n. The decomposition is created by solving the following nonlinear optimization problem: min A WH 2 F such that W 0 and H 0 Dimension Reduction and Iterative

18 The nonnegative matrix factorization (NMF) seeks to decompose a nonnegative matrix into the product of two nonnegative matrices: A m n = W m r H r n. The decomposition is created by solving the following nonlinear optimization problem: min A WH 2 F such that W 0 and H 0 The inner dimension of the factorization, r, must be input by the user. Dimension Reduction and Iterative

19 The nonnegative matrix factorization (NMF) seeks to decompose a nonnegative matrix into the product of two nonnegative matrices: A m n = W m r H r n. The decomposition is created by solving the following nonlinear optimization problem: min A WH 2 F such that W 0 and H 0 The inner dimension of the factorization, r, must be input by the user. The result is an additive, parts-based approximation to each data column a j in the form of a linear combination of feature" vectors, w i, as follows: Dimension Reduction and Iterative

20 The nonnegative matrix factorization (NMF) seeks to decompose a nonnegative matrix into the product of two nonnegative matrices: A m n = W m r H r n. The decomposition is created by solving the following nonlinear optimization problem: min A WH 2 F such that W 0 and H 0 The inner dimension of the factorization, r, must be input by the user. The result is an additive, parts-based approximation to each data column a j in the form of a linear combination of feature" vectors, w i, as follows: rx a j h i,j w i i=1 Dimension Reduction and Iterative

21 NMF for Dimension Reduction Columns of H represent the coordinates of each document after projection into the lower dimensional feature-space" spanned by the columns of W. We ll use the columns of H as a lower dimensional representation of the columns of A for the purposes of clustering. a 1 a 2 a 3. a m h 1 h 2 h 3. h r Dimension Reduction and Iterative

22 New Algorithms out of Old Combination Algorithms Iterating to Reach Consensus Data Matrix A SVD centered matrix C uncentered matrix A first r (or k) columns of VT NMF H k-means PDDP Cluster Solution Dimension Reduction and Iterative

23 New Algorithms out of Old Combination Algorithms Iterating to Reach Consensus One Possible Clustering Algorithm un.svdr.pddp-kmeans Data Matrix A SVD centered matrix C uncentered matrix A first r (or k) columns of VT NMF H k-means PDDP Cluster Solution Dimension Reduction and Iterative

24 New Algorithms out of Old Combination Algorithms Iterating to Reach Consensus Clustering Algorithm w/ 2 Dimension Reductions (H.SVDk.PDDP-kmeans) Data Matrix A SVD NMF uncentered matrix A centered matrix C T first r (or k) columns of V H k-means PDDP Cluster Solution Dimension Reduction and Iterative

25 Combination Algorithms Iterating to Reach Consensus Since no single algorithm will perform better than all others on a given class of data, we propose using several algorithms to find agreement upon clusters. Dimension Reduction and Iterative

26 Combination Algorithms Iterating to Reach Consensus Since no single algorithm will perform better than all others on a given class of data, we propose using several algorithms to find agreement upon clusters. We create an adjacency matrix for each clustering, whose (i, j) th entry is 1 if a i and a j were clustered together and 0 otherwise. Dimension Reduction and Iterative

27 Combination Algorithms Iterating to Reach Consensus Since no single algorithm will perform better than all others on a given class of data, we propose using several algorithms to find agreement upon clusters. We create an adjacency matrix for each clustering, whose (i, j) th entry is 1 if a i and a j were clustered together and 0 otherwise. We sum the adjacency matrices from various algorithms to create a consensus matrix, M whose (i, j) th entry reveals the number of times a i and a j were clustered together. Dimension Reduction and Iterative

28 Combination Algorithms Iterating to Reach Consensus Since no single algorithm will perform better than all others on a given class of data, we propose using several algorithms to find agreement upon clusters. We create an adjacency matrix for each clustering, whose (i, j) th entry is 1 if a i and a j were clustered together and 0 otherwise. We sum the adjacency matrices from various algorithms to create a consensus matrix, M whose (i, j) th entry reveals the number of times a i and a j were clustered together. Entries in the consensus matrix that are below a certain tolerance may be changed to zero. Dimension Reduction and Iterative

29 Combination Algorithms Iterating to Reach Consensus Since no single algorithm will perform better than all others on a given class of data, we propose using several algorithms to find agreement upon clusters. We create an adjacency matrix for each clustering, whose (i, j) th entry is 1 if a i and a j were clustered together and 0 otherwise. We sum the adjacency matrices from various algorithms to create a consensus matrix, M whose (i, j) th entry reveals the number of times a i and a j were clustered together. Entries in the consensus matrix that are below a certain tolerance may be changed to zero. This consensus matrix is then clustered using the same algorithms to see if the algorithms will agree upon a solution. Dimension Reduction and Iterative

30 Iterating the Consensus Process Combination Algorithms Iterating to Reach Consensus Data Matrix Clustering Algorithm 1 Clustering Algorithm 2 Clustering Algorithm n cluster solution 1 cluster solution 2 cluster solution n Consensus Matrix Clustering Algorithm 1 Clustering Algorithm 2 Clustering Algorithm n cluster solution 1 cluster solution 2 cluster solution n Consensus Matrix If no consensus reached Dimension Reduction and Iterative

31 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Medlars/Cranfield/CISI - Medical and Scientific Abstracts docs/11000 terms -k = 3 clusters Accuracies for Med/Cran/CISI after Dimension Reduction to r = 3 Algorithm r=k=3 Consensus 1 Consensus 2 Consensus 3 NMF Basic PDDP PDDP-kmeans SVDr-PDDP-kmeans un.svdr-pddp-kmeans H-PDDP H-PDDP-kmeans H-SVDk-PDDP-kmeans H-un.SVDk-PDDP-kmeans Dimension Reduction and Iterative

32 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Medlars/Cranfield/CISI - Medical and Scientific Abstracts doc, terms - k = 3 clusters Accuracies for Med/Cran/CISI after dimension reduction to r = 15 Algorithm r=15 Consensus 1 Consensus 2 Consensus 3 NMF Basic [] PDDP PDDP-kmeans SVDr-PDDP-kmeans un.svdr-pddp-kmeans H-PDDP H-PDDP-kmeans H-SVDk-PDDP-kmeans H-un.SVDk-PDDP-kmeans Dimension Reduction and Iterative

33 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Benchmark Data Set Collected by Sinka and Corne documents proposed as a benchmark collection for document clustering. Dimension Reduction and Iterative

34 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Benchmark Data Set Collected by Sinka and Corne documents proposed as a benchmark collection for document clustering. Documents pertain to 4 broad topics (banking/finance, programming, science, and sport) Dimension Reduction and Iterative

35 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Benchmark Data Set Collected by Sinka and Corne documents proposed as a benchmark collection for document clustering. Documents pertain to 4 broad topics (banking/finance, programming, science, and sport) Each topic contains 2 or 3 subtopics (commercial banks, insurance agencies, java, astronomy, biology, etc). Dimension Reduction and Iterative

36 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Benchmark Data Set Collected by Sinka and Corne documents proposed as a benchmark collection for document clustering. Documents pertain to 4 broad topics (banking/finance, programming, science, and sport) Each topic contains 2 or 3 subtopics (commercial banks, insurance agencies, java, astronomy, biology, etc). Documents were extracted automatically from the web. Dimension Reduction and Iterative

37 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Benchmark Data Set Collected by Sinka and Corne documents proposed as a benchmark collection for document clustering. Documents pertain to 4 broad topics (banking/finance, programming, science, and sport) Each topic contains 2 or 3 subtopics (commercial banks, insurance agencies, java, astronomy, biology, etc). Documents were extracted automatically from the web. Some long detailed articles Some just list of words, addresses, or links. Noisy Data! Dimension Reduction and Iterative

38 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Benchmark Data - subset BCFG - k = 4 clusters Cluster Accuracies for BenchmarkBCFG after Dimension Reduction to r = 4 Algorithm r=4 Consensus 1 Consensus 2 Consensus 3 NMF Basic PDDP PDDP-kmeans SVDr-PDDP-kmeans un.svdr-pddp-kmeans H-PDDP H-PDDP-kmeans H-SVDk-PDDP-kmeans H-un.SVDk-PDDP-kmeans Dimension Reduction and Iterative

39 Medlars/Cranfield/CISI Benchmark Data Set by Sinka and Corne Benchmark Data - subset BCFG-k = 4 clusters Experiment 3: Cluster Accuracies for BenchmarkBCFG after Dimension Reduction to r = 10 Algorithm r=10 Consensus 1 Consensus 2 Consensus 3 Consensus 5 NMF Basic PDDP PDDP-kmeans SVDr-PDDP-kmeans un.svdr-pddp-kmeans H-PDDP H-PDDP-kmeans H-SVDk-PDDP-kmeans H-un.SVDk-PDDP-kmeans Dimension Reduction and Iterative

40 The choices for the size of dimension reduction, r, and the various combinations of algorithms produce hundreds of clusterings for the consensus approach. shows potential as a technique to determine a final clustering solution through many different algorithms. Although the final clustering solution determined through is not guaranteed to be optimal, experiments suggest that the technique provides a solution that is well above the average of the algorithms used. Dimension Reduction and Iterative

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY

Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY Matrix Decomposition in Privacy-Preserving Data Mining JUN ZHANG DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF KENTUCKY OUTLINE Why We Need Matrix Decomposition SVD (Singular Value Decomposition) NMF (Nonnegative

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Lecture 6. Numerical methods. Approximation of functions

Lecture 6. Numerical methods. Approximation of functions Lecture 6 Numerical methods Approximation of functions Lecture 6 OUTLINE 1. Approximation and interpolation 2. Least-square method basis functions design matrix residual weighted least squares normal equation

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Numerical Methods. Elena loli Piccolomini. Civil Engeneering.  piccolom. Metodi Numerici M p. 1/?? Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Quick Introduction to Nonnegative Matrix Factorization

Quick Introduction to Nonnegative Matrix Factorization Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Generic Text Summarization

Generic Text Summarization June 27, 2012 Outline Introduction 1 Introduction Notation and Terminology 2 3 4 5 6 Text Summarization Introduction Notation and Terminology Two Types of Text Summarization Query-Relevant Summarization:

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Methods in Clustering

Methods in Clustering Methods in Clustering Katelyn Gao Heather Hardeman Edward Lim Cristian Potter Carl Meyer Ralph Abbey August 5, 2011 Abstract Cluster Analytics helps to analyze the massive amounts of data which have accrued

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Linear Least Squares. Using SVD Decomposition.

Linear Least Squares. Using SVD Decomposition. Linear Least Squares. Using SVD Decomposition. Dmitriy Leykekhman Spring 2011 Goals SVD-decomposition. Solving LLS with SVD-decomposition. D. Leykekhman Linear Least Squares 1 SVD Decomposition. For any

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,

More information

1 Non-negative Matrix Factorization (NMF)

1 Non-negative Matrix Factorization (NMF) 2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis Hansi Jiang Carl Meyer North Carolina State University October 27, 2015 1 /

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

Clustering. SVD and NMF

Clustering. SVD and NMF Clustering with the SVD and NMF Amy Langville Mathematics Department College of Charleston Dagstuhl 2/14/2007 Outline Fielder Method Extended Fielder Method and SVD Clustering with SVD vs. NMF Demos with

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University

More information

1 Searching the World Wide Web

1 Searching the World Wide Web Hubs and Authorities in a Hyperlinked Environment 1 Searching the World Wide Web Because diverse users each modify the link structure of the WWW within a relatively small scope by creating web-pages on

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 PCA, NMF Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Summary: PCA PCA is SVD

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Summary of Week 9 B = then A A =

Summary of Week 9 B = then A A = Summary of Week 9 Finding the square root of a positive operator Last time we saw that positive operators have a unique positive square root We now briefly look at how one would go about calculating the

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction to Information Retrieval http://informationretrieval.org IIR 18: Latent Semantic Indexing Hinrich Schütze Center for Information and Language Processing, University of Munich 2013-07-10 1/43

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #21: Dimensionality Reduction Seoul National University 1 In This Lecture Understand the motivation and applications of dimensionality reduction Learn the definition

More information

CSE 554 Lecture 7: Alignment

CSE 554 Lecture 7: Alignment CSE 554 Lecture 7: Alignment Fall 2012 CSE554 Alignment Slide 1 Review Fairing (smoothing) Relocating vertices to achieve a smoother appearance Method: centroid averaging Simplification Reducing vertex

More information

STAT 350: Geometry of Least Squares

STAT 350: Geometry of Least Squares The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

Nonnegative Matrix Factorization. Data Mining

Nonnegative Matrix Factorization. Data Mining The Nonnegative Matrix Factorization in Data Mining Amy Langville langvillea@cofc.edu Mathematics Department College of Charleston Charleston, SC Yahoo! Research 10/18/2005 Outline Part 1: Historical Developments

More information

Learning representations

Learning representations Learning representations Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 4/11/2016 General problem For a dataset of n signals X := [ x 1 x

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name: Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces

Singular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang

More information

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Automatic Rank Determination in Projective Nonnegative Matrix Factorization Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and

More information

Designing Information Devices and Systems II

Designing Information Devices and Systems II EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric

More information

DS-GA 1002 Lecture notes 10 November 23, Linear models

DS-GA 1002 Lecture notes 10 November 23, Linear models DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.

More information

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix

MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix MODULE 8 Topics: Null space, range, column space, row space and rank of a matrix Definition: Let L : V 1 V 2 be a linear operator. The null space N (L) of L is the subspace of V 1 defined by N (L) = {x

More information

Solving Linear Systems

Solving Linear Systems Solving Linear Systems Iterative Solutions Methods Philippe B. Laval KSU Fall 207 Philippe B. Laval (KSU) Linear Systems Fall 207 / 2 Introduction We continue looking how to solve linear systems of the

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition and Least Squares Problems The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving

More information

Conceptual Questions for Review

Conceptual Questions for Review Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

5 Linear Algebra and Inverse Problem

5 Linear Algebra and Inverse Problem 5 Linear Algebra and Inverse Problem 5.1 Introduction Direct problem ( Forward problem) is to find field quantities satisfying Governing equations, Boundary conditions, Initial conditions. The direct problem

More information

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides

CSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides CSE 494/598 Lecture-6: Latent Semantic Indexing LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Homework-1 and Quiz-1 Project part-2 released

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

The Nyström Extension and Spectral Methods in Learning

The Nyström Extension and Spectral Methods in Learning Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali

More information

Inverse Singular Value Problems

Inverse Singular Value Problems Chapter 8 Inverse Singular Value Problems IEP versus ISVP Existence question A continuous approach An iterative method for the IEP An iterative method for the ISVP 139 140 Lecture 8 IEP versus ISVP Inverse

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information