CS60021: Scalable Data Mining. Dimensionality Reduction
|
|
- Nancy Garrison
- 5 years ago
- Views:
Transcription
1 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya
2 Assumption: Data lies on or near a low d-dimensional subspace Axes of this subspace are effective representation of the data J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2
3 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 3 Compress / reduce dimensionality: 10 6 rows; 10 3 columns; no updates Random access to any cell(s); small error: OK The above matrix is really 2-dimensional. All rows can be reconstructed by scaling [ ] or [ ]
4 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4 Q: What is rank of a matrix A? A: Number of linearly independent columns of A For example: Matrix A = has rank r=2 Why? The first two rows are linearly independent, so the rank is at least 2, but all three rows are linearly dependent (the first is equal to the sum of the second and third) so the rank must be less than 3. Why do we care about low rank? We can write A as two basis vectors: [1 2 1] [-2-3 1] And new coordinates of : [1 0] [0 1] [1 1]
5 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 5 Cloud of points 3D space: Think of point positions as a matrix: 1 row per point: A B C A We can rewrite coordinates more efficiently! Old basis vectors: [1 0 0] [0 1 0] [0 0 1] New basis vectors: [1 2 1] [-2-3 1] Then A has new coordinates: [1 0]. B: [0 1], C: [1 1] Notice: We reduced the number of coordinates!
6 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 6 Goal of dimensionality reduction is to discover the axis of data! Rather than representing every point with 2 coordinates we represent each point with 1 coordinate (corresponding to the position of the point on the red line). By doing this we incur a bit of error as the points do not exactly lie on the line
7 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 7 Why reduce dimensions? Discover hidden correlations/topics Words that occur commonly together Remove redundant and noisy features Not all words are useful Interpretation and visualization Easier storage and processing of the data
8 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 8 A [m x n] = U [m x r] S [ r x r] (V [n x r] ) T A: Input data matrix m x n matrix (e.g., m documents, n terms) U: Left singular vectors m x r matrix (m documents, r concepts) S: Singular values r x r diagonal matrix (strength of each concept ) (r : rank of the matrix A) V: Right singular vectors n x r matrix (n terms, r concepts)
9 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 9 T n n m A» m S V T U
10 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 10 T n s 1 u 1 v 1 s 2 u 2 v 2 m A» + σ i scalar u i vector v i vector
11 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 11 It is always possible to decompose a real matrix A into A = U S V T, where U, S, V: unique U, V: column orthonormal U T U = I; V T V = I (I: identity matrix) (Columns are orthogonal unit vectors) S: diagonal Entries (singular values) are positive, and sorted in decreasing order (σ 1 ³ σ 2 ³... ³ 0) Nice proof of uniqueness:
12 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 12 SciFi Romnce A = U S V T - example: Users to Movies Matrix Alien Serenity Casablanca Amelie = m U S n V T Concepts AKALatent dimensions AKA Latent factors
13 SciFi Romnce A = U S V T - example: Users to Movies Matrix Alien Serenity Casablanca Amelie = x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 13 x
14 SciFi Romnce A = U S V T - example: Users to Movies Matrix Alien Serenity Casablanca Amelie = SciFi-concept Romance-concept x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 14 x
15 SciFi Romnce A = U S V T - example: Matrix Alien Serenity Casablanca Amelie = SciFi-concept Romance-concept U is user-to-concept similarity matrix x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 15 x
16 SciFi Romnce A = U S V T - example: Matrix Alien Serenity Casablanca Amelie = SciFi-concept x strength of the SciFi-concept J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 16 x
17 SciFi Romnce A = U S V T - example: Matrix Alien Serenity Casablanca Amelie = SciFi-concept SciFi-concept V is movie-to-concept similarity matrix x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 17 x
18 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 18 movies, users and concepts : U: user-to-concept similarity matrix V: movie-to-concept similarity matrix S: its diagonal elements: strength of each concept
19
20 Movie 2 rating first right singular vector v 1 Movie 1 rating Instead of using two coordinates (x, y) to describe point locations, let s use only one coordinate z Point s position is its location along vector v 1 How to choose v 1? Minimize reconstruction error J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 20
21 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 21 Goal: Minimize the sum of reconstruction errors: Movie 2 rating first right singular vector where are the old and are the new coordinates SVD gives best axis to project on: best = minimizing the reconstruction errors In other words, minimum reconstruction error v 1 Movie 1 rating
22 A = U S V T - example: V: movie-to-concept matrix U: user-to-concept matrix Movie 2 rating v 1 first right singular vector Movie 1 rating = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 22
23 A = U S V T - example: variance ( spread ) on the v 1 axis Movie 2 rating v 1 first right singular vector Movie 1 rating = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 23
24 A = U S V T - example: U S: Gives the coordinates of the points in the projection axis Movie 2 rating v 1 first right singular vector Projection of users on the Sci-Fi axis (U S) T : Movie 1 rating J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 24
25 More details Q: How exactly is dim. reduction done? = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 25
26 More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 26
27 More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero » x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 27 x
28 More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero » x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 28 x
29 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 29 More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero » x x
30 More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero » Frobenius norm: ǁMǁ F = ÖΣ ij M ij 2 ǁA-Bǁ F = Ö Σ ij (A ij -B ij ) 2 is small J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 30
31 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 31 A = U Sigma V T B is best approximation of A B = U Sigma V T
32 Theorem: Let A = U S V T and B = U S V T where S = diagonal rxr matrix with s i =σ i (i=1 k) else s i =0 then B is a best rank(b)=k approx. to A What do we mean by best : B is a solution to min B ǁA-Bǁ F where rank(b)=k Σ σ ++ σ,, A B 0 = 2 A 34 B J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 32
33 Details! J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 33 Theorem: Let A = U S V T (σ 1 ³σ 2 ³, rank(a)=r) then B = U S V T S = diagonal rxr matrix where s i =σ i (i=1 k) else s i =0 is a best rank-k approximation to A: B is a solution to min B ǁA-Bǁ F where rank(b)=k σ ++ Σ We will need 2 facts: where M = P Q R is SVD of M U S V T - U S V T = U (S - S) V T σ,,
34 Details! J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 34 We will need 2 facts: M 0 = q :: 5 : where M = P Q R is SVD of M U S V T - U S V T = U (S - S) V T We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal
35 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 35 A = U S V T, B = U S V T (σ 1 ³σ 2 ³ ³ 0, rank(a)=r) S = diagonal nxn matrix where s i =σ i (i=1 k) else s i =0 then B is solution to min B ǁA-Bǁ F, rank(b)=k Why? We want to choose s i to minimize σ 3 s Solution is to set s i =σ i (i=1 k) and other s i =0 å å å + = + = = = + - = r k i i r k i i k i i i s s i ) ( min s s s å = = - = S - = - r i i i s F F k B rank B s S B A i 1 2 ) (, ) ( min min min s We used: U S V T - U S V T = U (S - S) V T Details!
36 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 36 Equivalent: spectral decomposition of the matrix: = x x u 1 u 2 σ 1 σ 2 v 1 v 2
37 n Equivalent: spectral decomposition of the matrix m = σ 1 u 1 v T 1 + σ 2 u 2 v T n x 1 1 x m k terms Assume: σ 1 ³ σ 2 ³ σ 3 ³... ³ 0 Why is setting small σ i to 0 the right thing to do? Vectors u i and v i are unit length, so σ i scales them. So, zeroing small σ i introduces less error. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 37
38 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 38 n Q: How many σs to keep? A: Rule-of-a thumb: keep 80-90% of energy = 2 σ i m = u 1 i σ 1 v T 1 + σ 2 u 2 v T Assume: σ 1 ³ σ 2 ³ σ 3 ³...
39 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 39 To compute SVD: O(nm 2 ) or O(n 2 m) (whichever is less) But: Less work, if we just want singular values or if we want first k singular vectors or if the matrix is sparse Implemented in linear algebra packages like LINPACK, Matlab, SPlus, Mathematica...
40 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 40 SVD: A= U S V T : unique U: user-to-concept similarities V: movie-to-concept similarities S : strength of each concept Dimensionality reduction: keep the few largest singular values (80-90% of energy ) SVD: picks up linear correlations
41 SVD gives us: A = U S V T Eigen-decomposition: A = X L X T A is symmetric U, V, X are orthonormal (U T U=I), L, S are diagonal Now let s calculate: AA T = US V T (US V T ) T = US V T (VS T U T ) = USS T U T A T A = V S T U T (US V T ) = V SS T V T J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 41
42 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 42 SVD gives us: A = U S V T Eigen-decomposition: A = X L X T A is symmetric U, V, X are orthonormal (U T U=I), L, S are diagonal Now let s calculate: AA T = US V T (US V T ) T = US V T (VS T U T ) = USS T U T A T A = V S T U T (US V T ) = V SS T V T X L 2 X T Shows how to compute SVD using eigenvalue decomposition! X L 2 X T
43 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 43 A A T = U S 2 U T A T A = V S 2 V T (A T A) k = V S 2k V T E.g.: (A T A) 2 = V S 2 V T V S 2 V T = V S 4 V T (A T A) k ~ v 1 σ 1 2k v 1 T for k>>1
44
45 SciFi Romnce Q: Find users that like Matrix A: Map query into a concept space how? Matrix Alien Serenity Casablanca Amelie = x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 45 x
46 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 46 Q: Find users that like Matrix A: Map query into a concept space how? q = Matrix Alien Serenity Casablanca Amelie Project into concept space: Inner product with each concept vector v i v2 Alien v1 q Matrix
47 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 47 Q: Find users that like Matrix A: Map query into a concept space how? q = Matrix Alien Serenity Casablanca Amelie Project into concept space: Inner product with each concept vector v i v2 Alien v1 q q*v 1 Matrix
48 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 48 Compactly, we have: q concept = q V E.g.: q = Matrix Alien Serenity Casablanca Amelie SciFi-concept x = movie-to-concept similarities (V)
49 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 49 How would the user d that rated ( Alien, Serenity ) be handled? d concept = d V E.g.: q = Matrix Alien Serenity Casablanca Amelie SciFi-concept x = movie-to-concept similarities (V)
50 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 50 Observation: User d that rated ( Alien, Serenity ) will be similar to user q that rated ( Matrix ), although d and q have zero ratings in common! d = Matrix Alien Serenity Casablanca Amelie SciFi-concept q = Zero ratings in common Similarity 0
51 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, Optimal low-rank approximation in terms of Frobenius norm - Interpretability problem: A singular vector specifies a linear combination of all input columns or rows - Lack of sparsity: Singular vectors are dense! = S U V T
52
53 Frobenius norm: ǁXǁ F = Ö Σ ij X ij 2 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 53 Goal: Express A as a product of matrices C,U,R Make ǁA-C U Rǁ F small Constraints on C and R: A C U R
54 Frobenius norm: ǁXǁ F = Ö Σ ij X ij 2 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 54 Goal: Express A as a product of matrices C,U,R Make ǁA-C U Rǁ F small Constraints on C and R: Pseudo-inverse of the intersection of C and R A C U R
55 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 55 Let: A k be the best rank k approximation to A (that is, A k is SVD of A) Theorem [Drineas et al.] CUR in O(m n) time achieves ǁA-CURǁ F ǁA-A k ǁ F + eǁaǁ F with probability at least 1-d, by picking O(k log(1/d)/e 2 ) columns, and O(k 2 log 3 (1/d)/e 6 ) rows In practice: Pick 4k cols/rows
56 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 56 Sampling columns (similarly for rows): Note this is a randomized algorithm, same column can be sampled more than once
57 Removing duplicates: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 57
58 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 58
59 Approximate Multiplication: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 59
60 Final Algorithm: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 60
61 Why it works: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 61
62 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 62 Let W be the intersection of sampled columns C and rows R Let SVD of W = X Z Y T Then: U = W + = Y Z + X T Z + : reciprocals of non-zero singular values: Z + ii =1/ Z ii W + is the pseudoinverse A» W C R U = W + Why pseudoinverse works? W = X Z Y then W -1 = X -1 Z -1 Y -1 Due to orthonomality X -1 =X T and Y -1 =Y T Since Z is diagonal Z -1 = 1/Z ii Thus, if W is nonsingular, pseudoinverse is the true inverse
63 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 63
64 Algorithm: CUR error SVD error In practice: Pick 4k cols/rows for a rank-k approximation J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 64
65 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 65 Given matrix A, compute leverage scores: Algorithm:
66 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 66 By algorithm: Hence:
67 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, Easy interpretation Since the basis vectors are actual columns and rows + Sparse basis Since the basis vectors are actual columns and rows - Duplicate columns and rows Columns of large norms will be sampled many times Actual column Singular vector
68 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 68 If we want to get rid of the duplicates: Throw them away Scale (multiply) the columns/rows by the square root of the number of duplicates R d R s A C d C s Construct a small U
69 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 69 sparse and small SVD: A = U S V T Huge but sparse Big and dense dense but small CUR: A = C U R Huge but sparse Big but sparse
70 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 70 DBLP bibliographic data Author-to-conference big sparse matrix A ij : Number of papers published by author i at conference j 428K authors (rows), 3659 conferences (columns) Very sparse Want to reduce dimensionality How much time does it take? What is the reconstruction error? How much space do we need?
71 SVD SVD CUR CUR no duplicates SVD CUR CUR no dup CUR Accuracy: 1 relative sum squared errors Space ratio: #output matrix entries / #input matrix entries CPU time Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 07. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 71
72 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 72 SVD is limited to linear projections: Lower-dimensional linear projection that preserves Euclidean distances Non-linear methods: Isomap Data lies on a nonlinear low-dim curve aka manifold Use the distance as measured along the manifold How? Build adjacency graph Geodesic distance is graph distance SVD/PCA the graph pairwise distance matrix
73 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 73 Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, J. Sun, Y. Xie, H. Zhang, C. Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 2007 Intra- and interpopulation genotype reconstruction from tagging SNPs, P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas, Genome Research, 17(1), (2007) Tensor-CUR Decompositions For Tensor-Based Data, M. W. Mahoney, M. Maggioni, and P. Drineas, Proc. 12-th Annual SIGKDD, (2006)
Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #21: Dimensionality Reduction Seoul National University 1 In This Lecture Understand the motivation and applications of dimensionality reduction Learn the definition
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationDimensionality Reduction
394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Text Processing and High-Dimensional Data
Lecture Notes to Winter Term 2017/2018 Text Processing and High-Dimensional Data Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur Schmid, Daniyal Kazempour,
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationLinear Algebra and Eigenproblems
Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details
More informationComputational Methods. Eigenvalues and Singular Values
Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations
More informationIntroduction to Numerical Linear Algebra II
Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More informationCSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides
CSE 494/598 Lecture-6: Latent Semantic Indexing LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Homework-1 and Quiz-1 Project part-2 released
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationDimensionality Reduction
Dimensionality Reduction Given N vectors in n dims, find the k most important axes to project them k is user defined (k < n) Applications: information retrieval & indexing identify the k most important
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationSingular Value Decomposition. 1 Singular Value Decomposition and the Four Fundamental Subspaces
Singular Value Decomposition This handout is a review of some basic concepts in linear algebra For a detailed introduction, consult a linear algebra text Linear lgebra and its pplications by Gilbert Strang
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationDeep Learning Book Notes Chapter 2: Linear Algebra
Deep Learning Book Notes Chapter 2: Linear Algebra Compiled By: Abhinaba Bala, Dakshit Agrawal, Mohit Jain Section 2.1: Scalars, Vectors, Matrices and Tensors Scalar Single Number Lowercase names in italic
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationSubspace sampling and relative-error matrix approximation
Subspace sampling and relative-error matrix approximation Petros Drineas Rensselaer Polytechnic Institute Computer Science Department (joint work with M. W. Mahoney) For papers, etc. drineas The CUR decomposition
More informationNotes on Eigenvalues, Singular Values and QR
Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationImage Registration Lecture 2: Vectors and Matrices
Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this
More informationLecture 9: SVD, Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationText Analytics (Text Mining)
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Text Analytics (Text Mining) Concepts, Algorithms, LSI/SVD Duen Horng (Polo) Chau Assistant Professor Associate Director, MS
More informationA Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag
A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More information4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1
4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1 Two vectors are orthogonal when their dot product is zero: v w = orv T w =. This chapter moves up a level, from orthogonal vectors to orthogonal
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationSVD, PCA & Preprocessing
Chapter 1 SVD, PCA & Preprocessing Part 2: Pre-processing and selecting the rank Pre-processing Skillicorn chapter 3.1 2 Why pre-process? Consider matrix of weather data Monthly temperatures in degrees
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationNumerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??
Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement
More informationAssignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:
Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationReview of Linear Algebra
Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationCS 143 Linear Algebra Review
CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see
More informationNotes on Linear Algebra and Matrix Theory
Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a
More informationSingular Value Decomposition and Digital Image Compression
Singular Value Decomposition and Digital Image Compression Chris Bingham December 1, 016 Page 1 of Abstract The purpose of this document is to be a very basic introduction to the singular value decomposition
More informationLinear Algebra, part 3 QR and SVD
Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We
More informationFaloutsos, Tong ICDE, 2009
Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity
More informationFrom Matrix to Tensor. Charles F. Van Loan
From Matrix to Tensor Charles F. Van Loan Department of Computer Science January 28, 2016 From Matrix to Tensor From Tensor To Matrix 1 / 68 What is a Tensor? Instead of just A(i, j) it s A(i, j, k) or
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationLeast Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo
Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010 Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined
More informationLecture 18 Nov 3rd, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationLatent Semantic Analysis (Tutorial)
Latent Semantic Analysis (Tutorial) Alex Thomo Eigenvalues and Eigenvectors Let A be an n n matrix with elements being real numbers. If x is an n-dimensional vector, then the matrix-vector product Ax is
More informationTHE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR
THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationLinear Least-Squares Data Fitting
CHAPTER 6 Linear Least-Squares Data Fitting 61 Introduction Recall that in chapter 3 we were discussing linear systems of equations, written in shorthand in the form Ax = b In chapter 3, we just considered
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationProblems. Looks for literal term matches. Problems:
Problems Looks for literal term matches erms in queries (esp short ones) don t always capture user s information need well Problems: Synonymy: other words with the same meaning Car and automobile 电脑 vs.
More information1. What is the determinant of the following matrix? a 1 a 2 4a 3 2a 2 b 1 b 2 4b 3 2b c 1. = 4, then det
What is the determinant of the following matrix? 3 4 3 4 3 4 4 3 A 0 B 8 C 55 D 0 E 60 If det a a a 3 b b b 3 c c c 3 = 4, then det a a 4a 3 a b b 4b 3 b c c c 3 c = A 8 B 6 C 4 D E 3 Let A be an n n matrix
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationData Mining and Matrices
Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,
More informationDuke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014
Duke University, Department of Electrical and Computer Engineering Optimization for Scientists and Engineers c Alex Bronstein, 2014 Linear Algebra A Brief Reminder Purpose. The purpose of this document
More informationLinear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016
Linear Algebra Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Large-Scale ML, Fall 2016 Shan-Hung Wu (CS, NTHU) Linear Algebra Large-Scale ML, Fall
More informationLinear Algebra Review. Fei-Fei Li
Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector
More informationOutline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St
Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic
More informationMobile Robotics 1. A Compact Course on Linear Algebra. Giorgio Grisetti
Mobile Robotics 1 A Compact Course on Linear Algebra Giorgio Grisetti SA-1 Vectors Arrays of numbers They represent a point in a n dimensional space 2 Vectors: Scalar Product Scalar-Vector Product Changes
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationThis appendix provides a very basic introduction to linear algebra concepts.
APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not
More information