Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
|
|
- Mavis Briggs
- 5 years ago
- Views:
Transcription
1 Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
2 Assumption:Data lies on or near a low d-dimensional subspace Axes of this subspace are effective representation of the data J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2
3 Compress / reduce dimensionality: 10 6 rows; 10 3 columns; no updates Random access to any cell(s); small error: OK The above matrix is really 2-dimensional. All rows can be reconstructed by scaling [ ] or [ ] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 3
4 Q:What is rankof a matrix A? A:Number of linearly independentcolumns of A For example: Matrix A = has rank r=2 Why? The first two rows are linearly independent, so the rank is at least 2, but all three rows are linearly dependent (the first is equal to the sum of the second and third) so the rank must be less than 3. Why do we care about low rank? We can write Aas two basis vectors: [1 2 1] [-2-3 1] And new coordinates of : [1 0] [0 1] [1 1] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4
5 Cloud of points 3D space: Think of point positions as a matrix: 1 row per point: A B C A We can rewrite coordinates more efficiently! Old basis vectors:[1 00] [010][001] New basis vectors: [121][-2-31] Then Ahas new coordinates: [10]. B: [01], C: [11] Notice: We reduced the number of coordinates! J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 5
6 Goal of dimensionality reduction is to discover the axis of data! Rather than representing every point with 2 coordinates we represent each point with 1 coordinate (corresponding to the position of the point on the red line). By doing this we incur a bit of error as the points do not exactly lie on the line J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 6
7 Why reduce dimensions? Discover hidden correlations/topics Words that occur commonly together Remove redundant and noisy features Not all words are useful Interpretation and visualization Easier storage and processing of the data J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 7
8 A [m x n] = U [m x r] Σ [ r x r] (V [n x r] ) T A: Input data matrix mx nmatrix (e.g., mdocuments, nterms) U: Left singular vectors mx rmatrix (mdocuments, rconcepts) Σ: Singular values rx rdiagonal matrix (strength of each concept ) (r: rank of the matrix A) V: Right singular vectors nx rmatrix (nterms, rconcepts) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 8
9 T n n m A m Σ V T U J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 9
10 T n σ 1 u 1 v 1 σ 2 u 2 v 2 m A + σ i scalar u i vector v i vector J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 10
11 It is alwayspossible to decompose a real matrix Ainto A = U ΣV T, where U, Σ, V: unique U, V: column orthonormal U T U = I; V T V = I (I: identity matrix) (Columns are orthogonal unit vectors) Σ: diagonal Entries (singular values) are positive, and sorted in decreasing order (σ 1 σ ) Nice proof of uniqueness: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 11
12 SciFi Romnce A = U ΣV T -example: Users to Movies Matrix Alien Serenity Casablanca Amelie = m U Σ n V T Concepts AKA Latent dimensions AKA Latent factors J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 12
13 SciFi Romnce A = U ΣV T -example: Users to Movies Matrix Alien Serenity Casablanca Amelie = x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 13 x
14 SciFi Romnce A = U ΣV T -example: Users to Movies Matrix Alien Serenity Casablanca Amelie = SciFi-concept Romance-concept x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 14 x
15 SciFi Romnce A = U ΣV T -example: Matrix Alien Serenity Casablanca Amelie = SciFi-concept Romance-concept Uis user-to-concept similarity matrix x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 15 x
16 SciFi Romnce A = U ΣV T -example: Matrix Alien Serenity Casablanca Amelie = SciFi-concept x strength of the SciFi-concept J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 16 x
17 SciFi Romnce A = U ΣV T -example: Matrix Alien Serenity Casablanca Amelie = SciFi-concept SciFi-concept V is movie-to-concept similarity matrix x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 17 x
18 movies, users and concepts : U: user-to-concept similarity matrix V: movie-to-concept similarity matrix Σ: its diagonal elements: strength of each concept J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 18
19
20 Movie 2 rating first right singular vector v 1 Movie 1 rating Instead of using two coordinates, to describe point locations, let s use only one coordinate Point s position is its location along vector How to choose? Minimize reconstruction error J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 20
21 Goal: Minimize the sum of reconstruction errors: where are the old and are the new coordinates SVD gives best axis to project on: best = minimizing the reconstruction errors In other words, minimum reconstruction error Movie 2 rating v 1 Movie 1 rating first right singular vector J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 21
22 A = U ΣV T -example: V: movie-to-concept matrix U: user-to-concept matrix Movie 2 rating v 1 first right singular vector Movie 1 rating = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 22
23 A = U ΣV T -example: variance ( spread ) on the v 1 axis Movie 2 rating v 1 first right singular vector Movie 1 rating = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 23
24 A = U ΣV T -example: U Σ: Gives the coordinates of the points in the projection axis Movie 2 rating v 1 first right singular vector Projection of users on the Sci-Fi axis (U Σ) Τ : Movie 1 rating J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 24
25 More details Q:How exactly is dim. reduction done? = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 25
26 More details Q:How exactly is dim. reduction done? A: Set smallest singular values to zero = x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 26
27 More details Q:How exactly is dim. reduction done? A: Set smallest singular values to zero x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 27 x
28 More details Q:How exactly is dim. reduction done? A: Set smallest singular values to zero x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 28 x
29 More details Q:How exactly is dim. reduction done? A: Set smallest singular values to zero x x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 29
30 More details Q:How exactly is dim. reduction done? A: Set smallest singular values to zero Frobenius norm: ǁMǁ F = Σ ij M ij 2 ǁA-Bǁ F = Σ ij (A ij -B ij ) 2 is small J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 30
31 Sigma A = U V T B is best approximation of A B = U Sigma V T J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 31
32 Theorem: LetA= U ΣV T andb= U SV T where S=diagonal rxrmatrixwith s i =σ i (i=1 k) else s i =0 then Bis abestrank(b)=kapprox. to A What do we mean by best : Bis a solution to min B ǁA-Bǁ F where rank(b)=k Σ J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 32
33 Details! Theorem:LetA= U ΣV T (σ 1 σ 2, rank(a)=r) thenb= U SV T S=diagonal rxrmatrixwhere s i =σ i (i=1 k) else s i =0 is a best rank-kapproximation to A: Bis a solution to min B ǁA-Bǁ F where rank(b)=k We will need 2 facts: where M= PQRis SVD of M U ΣV T -U SV T = U(Σ-S) V T Σ J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 33
34 Details! We will need 2 facts: where M= PQRis SVD of M U ΣV T -U SV T = U(Σ-S) V T We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 34
35 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 35 A= U ΣV T, B= U SV T (σ 1 σ 2 0, rank(a)=r) S=diagonal nxnmatrix where s i =σ i (i=1 k) else s i =0 then Bis solution to min B ǁA-Bǁ F, rank(b)=k Why? We want to choose s i to minimize Solution is to set s i =σ i (i=1 k) and other s i =0 + = + = = = + = r k i i r k i i k i i i s s i ) ( min σ σ σ = = = Σ = r i i i s F F k B rank B s S B A i 1 2 ) (, ) ( min min min σ We used: U Σ V T - U S V T = U (Σ - S) V T Details!
36 Equivalent: spectral decomposition of the matrix: = x x u 1 u 2 σ 1 σ 2 v 1 v 2 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 36
37 n Equivalent: spectral decomposition of the matrix m = σ 1 u 1 v T 1 + σ 2 u 2 v T n x 1 1 x m k terms Assume: σ 1 σ 2 σ Why is setting small σ i to 0 the right thing to do? Vectors u i and v i are unit length, so σ i scales them. So, zeroing small σ i introduces less error. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 37
38 n Q: How many σsto keep? A:Rule-of-a thumb: keep 80-90% of energy m = u 1 σ 1 v T 1 + σ 2 u 2 v T Assume: σ 1 σ 2 σ 3... J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 38
39 To compute SVD: O(nm 2 )or O(n 2 m)(whichever is less) But: Less work, if we just want singular values or if we want first ksingular vectors or if the matrix is sparse Implemented in linear algebra packages like LINPACK, Matlab, SPlus, Mathematica... J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 39
40 SVD:A= U ΣV T : unique U: user-to-concept similarities V: movie-to-concept similarities Σ : strength of each concept Dimensionality reduction: keep the few largest singular values (80-90% of energy ) SVD: picks up linear correlations J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 40
41 SVD gives us: A = U Σ V T Eigen-decomposition: A = X Λ X T A is symmetric U, V, X are orthonormal(u T U=I), Λ, Σare diagonal Now let s calculate: AA T = UΣ V T (UΣ V T ) T = UΣ V T (VΣ T U T ) = UΣΣ T U T A T A = V Σ T U T (UΣ V T ) = V ΣΣ T V T J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 41
42 SVD gives us: A = U Σ V T Eigen-decomposition: A = X Λ X T A is symmetric U, V, X are orthonormal(u T U=I), Λ, Σare diagonal Now let s calculate: AA T = UΣ V T (UΣ V T ) T = UΣ V T (VΣ T U T ) = UΣΣ T U T A T A = V Σ T U T (UΣ V T ) = V ΣΣ T V T X Λ 2 X T Shows how to compute SVD using eigenvalue decomposition! X Λ 2 X T J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 42
43 AA T = U Σ 2 U T A T A= V Σ 2 V T (A T A) k = V Σ 2k V T E.g.: (A T A) 2 = V Σ 2 V T V Σ 2 V T = V Σ 4 V T (A T A) k ~ v 1 σ 1 2k v 1 T for k>>1 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 43
44
45 SciFi Romnce Q: Find users that like Matrix A: Map query into a concept space how? Matrix Alien Serenity Casablanca Amelie = x J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 45 x
46 Q: Find users that like Matrix A: Map query into a concept space how? q= Matrix Alien Serenity Casablanca Amelie v2 Alien v1 q Project into concept space: Inner product with each concept vector v i Matrix J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 46
47 Q: Find users that like Matrix A: Map query into a concept space how? q= Matrix Alien Serenity Casablanca Amelie v2 Alien v1 q q*v 1 Project into concept space: Inner product with each concept vector v i Matrix J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 47
48 Compactly, we have: q concept = q V E.g.: q= Matrix Alien Serenity Casablanca Amelie SciFi-concept x = movie-to-concept similarities (V) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 48
49 How would the user dthat rated ( Alien, Serenity ) be handled? d concept = d V E.g.: q= Matrix Alien Serenity Casablanca Amelie SciFi-concept x = movie-to-concept similarities (V) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 49
50 Observation:User dthat rated ( Alien, Serenity ) will be similarto user qthat rated ( Matrix ), although dand qhave zero ratings in common! d = Matrix Alien Serenity Casablanca Amelie SciFi-concept q = Zero ratings in common Similarity 0 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 50
51 + Optimal low-rank approximation in terms of Frobenius norm - Interpretability problem: A singular vector specifies a linear combination of all input columns or rows - Lack of sparsity: Singular vectors are dense! = Σ V T U J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 51
52
53 Frobenius norm: ǁXǁ F = Σ ij X ij 2 Goal: Express A as a product of matrices C,U,R Make ǁA-C U Rǁ F small Constraints on C and R: A C U R J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 53
54 Frobenius norm: ǁXǁ F = Σ ij X ij 2 Goal: Express A as a product of matrices C,U,R Make ǁA-C U Rǁ F small Constraints on C and R: Pseudo-inverse of the intersection of C and R A C U R J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 54
55 Let: A k be the best rank kapproximation to A(that is, A k is SVD of A) Theorem[Drineas et al.] CURin O(m n) time achieves ǁA-CURǁ F ǁA-A k ǁ F + εǁaǁ F with probability at least 1-δ, by picking O(k log(1/δ)/ε 2 )columns, and O(k 2 log 3 (1/δ)/ε 6 )rows In practice: Pick 4k cols/rows J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 55
56 Sampling columns (similarly for rows): Note this is a randomized algorithm, same column can be sampled more than once J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 56
57 Let Wbe the intersection of sampled columns Cand rows R Let SVD of W =X ZY T Then:U= W + =YZ + X T Ζ + : reciprocals of non-zero singular values: Ζ + ii =1/ Ζ ii W + is the pseudoinverse A W C R U = W + Why pseudoinverse works? W = X Z Y then W -1 = X -1 Z -1 Y -1 Due to orthonomality X -1 =X T and Y -1 =Y T Since Z is diagonal Z -1 = 1/Z ii Thus, if W is nonsingular, pseudoinverse is the true inverse J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 57
58 For example: Select columns of A using ColumnSelect algorithm Select rows of A using ColumnSelect algorithm Set CUR error Then: with probability 98% SVD error In practice: Pick 4k cols/rows for a rank-k approximation J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 58
59 + Easy interpretation Since the basis vectors are actual columns and rows + Sparse basis Since the basis vectors are actual columns and rows - Duplicate columns and rows Columns of large norms will be sampled many times Actual column Singular vector J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 59
60 If we want to get rid of the duplicates: Throw them away Scale (multiply) the columns/rows by the square root of the number of duplicates R d R s A C d C s Construct a small U J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 60
61 sparse and small SVD: A = U Σ V T Huge but sparse Big and dense dense but small CUR: A = C U R Huge but sparse Big but sparse J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 61
62 DBLP bibliographic data Author-to-conference big sparse matrix A ij : Number of papers published by author iat conference j 428K authors (rows), 3659 conferences (columns) Very sparse Want to reduce dimensionality How much time does it take? What is the reconstruction error? How much space do we need? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 62
63 SVD SVD CUR CUR CMDno duplicates 10 3 SVD CUR CMD CUR no dup space ratio accuracy CUR Accuracy: 1 relative sum squared errors Space ratio: #output matrix entries / #input matrix entries CPU time time(sec) accuracy Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 07. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 63
64 SVD is limited to linear projections: Lower-dimensional linear projection that preserves Euclidean distances Non-linear methods: Isomap Data lies on a nonlinear low-dim curve aka manifold Use the distance as measured along the manifold How? Build adjacency graph Geodesic distance is graph distance SVD/PCA the graph pairwise distance matrix J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 64
65 Drineaset al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, J. Sun, Y. Xie, H. Zhang, C. Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 2007 Intra-and interpopulationgenotype reconstruction from tagging SNPs, P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas, Genome Research, 17(1), (2007) Tensor-CUR Decompositions For Tensor-Based Data, M. W. Mahoney, M. Maggioni, and P. Drineas, Proc. 12-th Annual SIGKDD, (2006) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 65
CS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #21: Dimensionality Reduction Seoul National University 1 In This Lecture Understand the motivation and applications of dimensionality reduction Learn the definition
More informationJeffrey D. Ullman Stanford University
Jeffrey D. Ullman Stanford University 2 Often, our data can be represented by an m-by-n matrix. And this matrix can be closely approximated by the product of two matrices that share a small common dimension
More informationSingular Value Decomposition
Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =
More informationMaths for Signals and Systems Linear Algebra in Engineering
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE
More informationDimensionality Reduction
394 Chapter 11 Dimensionality Reduction There are many sources of data that can be viewed as a large matrix. We saw in Chapter 5 how the Web can be represented as a transition matrix. In Chapter 9, the
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset
More informationReview problems for MA 54, Fall 2004.
Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Text Processing and High-Dimensional Data
Lecture Notes to Winter Term 2017/2018 Text Processing and High-Dimensional Data Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur Schmid, Daniyal Kazempour,
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationLinear Algebra and Eigenproblems
Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationIntroduction to Numerical Linear Algebra II
Introduction to Numerical Linear Algebra II Petros Drineas These slides were prepared by Ilse Ipsen for the 2015 Gene Golub SIAM Summer School on RandNLA 1 / 49 Overview We will cover this material in
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationComputational Methods. Eigenvalues and Singular Values
Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations
More informationProperties of Matrices and Operations on Matrices
Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,
More informationCOMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare
COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of
More informationBackground Mathematics (2/2) 1. David Barber
Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and
More informationLinear Algebra, part 3 QR and SVD
Linear Algebra, part 3 QR and SVD Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2012 Going back to least squares (Section 1.4 from Strang, now also see section 5.2). We
More information1 Linearity and Linear Systems
Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)
More informationLeast Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo
Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010 Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationThe Singular Value Decomposition and Least Squares Problems
The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving
More informationSlides credits: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationNotes on Eigenvalues, Singular Values and QR
Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square
More informationMining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationSingular Value Decomposition
Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our
More informationCSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides
CSE 494/598 Lecture-6: Latent Semantic Indexing LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Homework-1 and Quiz-1 Project part-2 released
More informationAssignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:
Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationLinear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg
Linear Algebra, part 3 Anna-Karin Tornberg Mathematical Models, Analysis and Simulation Fall semester, 2010 Going back to least squares (Sections 1.7 and 2.3 from Strang). We know from before: The vector
More informationLinear Least Squares. Using SVD Decomposition.
Linear Least Squares. Using SVD Decomposition. Dmitriy Leykekhman Spring 2011 Goals SVD-decomposition. Solving LLS with SVD-decomposition. D. Leykekhman Linear Least Squares 1 SVD Decomposition. For any
More informationUNIT 6: The singular value decomposition.
UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationLecture 9: SVD, Low Rank Approximation
CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected
More informationLinear Algebra (Review) Volker Tresp 2018
Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c
More informationImage Registration Lecture 2: Vectors and Matrices
Image Registration Lecture 2: Vectors and Matrices Prof. Charlene Tsai Lecture Overview Vectors Matrices Basics Orthogonal matrices Singular Value Decomposition (SVD) 2 1 Preliminary Comments Some of this
More informationNumerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??
Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationbe a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u
MATH 434/534 Theoretical Assignment 7 Solution Chapter 7 (71) Let H = I 2uuT Hu = u (ii) Hv = v if = 0 be a Householder matrix Then prove the followings H = I 2 uut Hu = (I 2 uu )u = u 2 uut u = u 2u =
More information2. Review of Linear Algebra
2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationSingular Value Decomposition
Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 1. Basic Linear Algebra Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Example
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationFall TMA4145 Linear Methods. Exercise set Given the matrix 1 2
Norwegian University of Science and Technology Department of Mathematical Sciences TMA445 Linear Methods Fall 07 Exercise set Please justify your answers! The most important part is how you arrive at an
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationThe Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)
Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will
More informationThe Singular Value Decomposition
The Singular Value Decomposition We are interested in more than just sym+def matrices. But the eigenvalue decompositions discussed in the last section of notes will play a major role in solving general
More informationDeep Learning Book Notes Chapter 2: Linear Algebra
Deep Learning Book Notes Chapter 2: Linear Algebra Compiled By: Abhinaba Bala, Dakshit Agrawal, Mohit Jain Section 2.1: Scalars, Vectors, Matrices and Tensors Scalar Single Number Lowercase names in italic
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationLinear Least-Squares Data Fitting
CHAPTER 6 Linear Least-Squares Data Fitting 61 Introduction Recall that in chapter 3 we were discussing linear systems of equations, written in shorthand in the form Ax = b In chapter 3, we just considered
More informationSubspace sampling and relative-error matrix approximation
Subspace sampling and relative-error matrix approximation Petros Drineas Rensselaer Polytechnic Institute Computer Science Department (joint work with M. W. Mahoney) For papers, etc. drineas The CUR decomposition
More informationStat 159/259: Linear Algebra Notes
Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the
More informationData Mining and Matrices
Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationVector and Matrix Norms. Vector and Matrix Norms
Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Review of Linear Algebra Denis Helic KTI, TU Graz Oct 9, 2014 Denis Helic (KTI, TU Graz) KDDM1 Oct 9, 2014 1 / 74 Big picture: KDDM Probability Theory
More informationIncremental Pattern Discovery on Streams, Graphs and Tensors
Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Thesis Committee: Christos Faloutsos Tom Mitchell David Steier, External member Philip S. Yu, External member Hui Zhang Abstract
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationQuantum Computing Lecture 2. Review of Linear Algebra
Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces
More informationTHE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR
THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},
More informationEcon Slides from Lecture 7
Econ 205 Sobel Econ 205 - Slides from Lecture 7 Joel Sobel August 31, 2010 Linear Algebra: Main Theory A linear combination of a collection of vectors {x 1,..., x k } is a vector of the form k λ ix i for
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationSlide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University.
Slide source: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org #1: C4.5 Decision Tree - Classification (61 votes) #2: K-Means - Clustering
More informationNotes on Linear Algebra and Matrix Theory
Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a
More informationSingular Value Decomposition (SVD)
School of Computing National University of Singapore CS CS524 Theoretical Foundations of Multimedia More Linear Algebra Singular Value Decomposition (SVD) The highpoint of linear algebra Gilbert Strang
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationSVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)
Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More informationPseudoinverse & Moore-Penrose Conditions
ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 7 ECE 275A Pseudoinverse & Moore-Penrose Conditions ECE 275AB Lecture 7 Fall 2008 V1.0 c K. Kreutz-Delgado, UC San Diego
More informationLatent Semantic Analysis (Tutorial)
Latent Semantic Analysis (Tutorial) Alex Thomo Eigenvalues and Eigenvectors Let A be an n n matrix with elements being real numbers. If x is an n-dimensional vector, then the matrix-vector product Ax is
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationMatrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =
30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can
More informationRandNLA: Randomized Numerical Linear Algebra
RandNLA: Randomized Numerical Linear Algebra Petros Drineas Rensselaer Polytechnic Institute Computer Science Department To access my web page: drineas RandNLA: sketch a matrix by row/ column sampling
More informationCS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)
CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis
More information7. Symmetric Matrices and Quadratic Forms
Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value
More informationProposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)
RODICA D. COSTIN. Singular Value Decomposition.1. Rectangular matrices. For rectangular matrices M the notions of eigenvalue/vector cannot be defined. However, the products MM and/or M M (which are square,
More information4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1
4 ORTHOGONALITY ORTHOGONALITY OF THE FOUR SUBSPACES 4.1 Two vectors are orthogonal when their dot product is zero: v w = orv T w =. This chapter moves up a level, from orthogonal vectors to orthogonal
More information