sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

Size: px
Start display at page:

Download "sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)"

Transcription

1 sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0

2 overview Our Contributions: 1

3 overview Our Contributions: A near optimal low-rank approximation for any positive semidefinite (PSD) matrix can be computed in sublinear time (i.e. without reading the full matrix). 1

4 overview Our Contributions: A near optimal low-rank approximation for any positive semidefinite (PSD) matrix can be computed in sublinear time (i.e. without reading the full matrix). Concrete: Significantly improves on previous, roughly linear time approaches for general matrices, and bypasses a trivial linear time lower bound for general matrices. 1

5 overview Our Contributions: A near optimal low-rank approximation for any positive semidefinite (PSD) matrix can be computed in sublinear time (i.e. without reading the full matrix). Concrete: Significantly improves on previous, roughly linear time approaches for general matrices, and bypasses a trivial linear time lower bound for general matrices. High Level: Demonstrates that PSD structure can be exploited in a much stronger way than previously known for low-rank approximation. Opens the possibility of further advances in algorithms for PSD matrices. 1

6 low-rank matrix approximation Low-rank approximation is one of the most widely used methods for general matrix and data compression. 2

7 low-rank matrix approximation Low-rank approximation is one of the most widely used methods for general matrix and data compression. 2

8 low-rank matrix approximation Low-rank approximation is one of the most widely used methods for general matrix and data compression. Closely related to principal component analysis, spectral embedding/clustering, and low-rank matrix completion. 2

9 low-rank matrix approximation Low-rank approximation is one of the most widely used methods for general matrix and data compression. Closely related to principal component analysis, spectral embedding/clustering, and low-rank matrix completion. Important Special Case: A is positive semidefinite (PSD). I.e.. x T Ax 0, 8x 2 R n 2

10 low-rank matrix approximation Low-rank approximation is one of the most widely used methods for general matrix and data compression. Closely related to principal component analysis, spectral embedding/clustering, and low-rank matrix completion. Important Special Case: A is positive semidefinite (PSD). I.e. x T Ax 0, 8x 2 R n. Includes graph Laplacians, Gram matrices and kernel matrices, covariance matrices, Hessians for convex functions. 2

11 optimal low-rank approximation An optimal low-rank approximation can be computed via the singular value decomposition (SVD). 3

12 optimal low-rank approximation An optimal low-rank approximation can be computed via the singular value decomposition (SVD). 3

13 optimal low-rank approximation An optimal low-rank approximation can be computed via the singular value decomposition (SVD). 3

14 optimal low-rank approximation An optimal low-rank approximation can be computed via the singular value decomposition (SVD). A k = argmin ka B:rank(B)=k Bk F 3

15 optimal low-rank approximation An optimal low-rank approximation can be computed via the singular value decomposition (SVD). A k = argmin ka Bk F = B:rank(B)=k s X (A ij B ij ) 2 i,j 3

16 optimal low-rank approximation An optimal low-rank approximation can be computed via the singular value decomposition (SVD). A k = argmin ka Bk F = B:rank(B)=k s X (A ij B ij ) 2 i,j Unfortunately, computing the SVD takes O(nd 2 )time. 3

17 input sparsity time low-rank approximation Recent work on matrix sketching gives state-of-the-art runtimes 4

18 input sparsity time low-rank approximation Recent work on matrix sketching gives state-of-the-art runtimes Theorem (Clarkson, Woodru 13) There is an algorithm which in O(nnz(A)+n poly(k, 1/ )) time outputs N 2 R n k, M 2 R d k satisfying with prob. 99/100: ka NM T k F apple (1 + )ka A k k F. 4

19 input sparsity time low-rank approximation Recent work on matrix sketching gives state-of-the-art runtimes Theorem (Clarkson, Woodru 13) There is an algorithm which in O(nnz(A)+n poly(k, 1/ )) time outputs N 2 R n k, M 2 R d k satisfying with prob. 99/100: ka NM T k F apple (1 + )ka A k k F. When k, 1/ are not too large, runtime is linear in input size. 4

20 input sparsity time low-rank approximation Recent work on matrix sketching gives state-of-the-art runtimes Theorem (Clarkson, Woodru 13) There is an algorithm which in O(nnz(A)+n poly(k, 1/ )) time outputs N 2 R n k, M 2 R d k satisfying with prob. 99/100: ka NM T k F apple (1 + )ka A k k F. When k, 1/ are not too large, runtime is linear in input size. Best known runtime for both general and PSD matrices. 4

21 sublinear time low-rank approximation Theorem (Main Result Musco, Woodru 17) There is an algorithm running in Õ nk 2 time which, given PSD A, outputs N, M 2 R n k satisfying with probability 99/100: ka NM T k F apple (1 + )ka A k k F. 4 5

22 sublinear time low-rank approximation Theorem (Main Result Musco, Woodru 17) There is an algorithm running in Õ nk 2 time which, given PSD A, outputs N, M 2 R n k satisfying with probability 99/100: ka NM T k F apple (1 + )ka A k k F. 4 Compare to CW 13 which takes O(nnz(A)) + n poly(k, 1/ ). 5

23 sublinear time low-rank approximation Theorem (Main Result Musco, Woodru 17) There is an algorithm running in Õ nk 2 time which, given PSD A, outputs N, M 2 R n k satisfying with probability 99/100: ka NM T k F apple (1 + )ka A k k F. 4 Compare to CW 13 which takes O(nnz(A)) + n poly(k, 1/ ). 5

24 sublinear time low-rank approximation Theorem (Main Result Musco, Woodru 17) There is an algorithm running in Õ nk 2 time which, given PSD A, outputs N, M 2 R n k satisfying with probability 99/100: ka NM T k F apple (1 + )ka A k k F. 4 Compare to CW 13 which takes O(nnz(A)) + n poly(k, 1/ ). If k, 1/ are not too large compared to nnz(a), our runtime is significantly sublinear in the size of A. 5

25 lower bound for general matrices For general matrices, (nnz(a)) time is required. 6

26 lower bound for general matrices For general matrices, (nnz(a)) time is required. Randomly place a single entry which dominates A s Frobenius norm. 6

27 lower bound for general matrices For general matrices, (nnz(a)) time is required. Randomly place a single entry which dominates A s Frobenius norm. 6

28 lower bound for general matrices For general matrices, (nnz(a)) time is required. Randomly place a single entry which dominates A s Frobenius norm. Finding it with constant probability requires reading at least a constant fraction of the non-zero entries in A. 6

29 lower bound for general matrices For general matrices, (nnz(a)) time is required. Randomly place a single entry which dominates A s Frobenius norm. Finding it with constant probability requires reading at least a constant fraction of the non-zero entries in A. Lower bound holds for any approximation factor and even rules out o(nnz(a)) time for weaker guarantees. 6

30 lower bound for general matrices For general matrices, (nnz(a)) time is required. Randomly place a single entry which dominates A s Frobenius norm. Finding it with constant probability requires reading at least a constant fraction of the non-zero entries in A. Lower bound holds for any approximation factor and even rules out o(nnz(a)) time for weaker guarantees. ka NM T k F apple (1 + )ka A k k F 6

31 lower bound for general matrices For general matrices, (nnz(a)) time is required. Randomly place a single entry which dominates A s Frobenius norm. Finding it with constant probability requires reading at least a constant fraction of the non-zero entries in A. Lower bound holds for any approximation factor and even rules out o(nnz(a)) time for weaker guarantees. ka NM T k F appleka A k k F + kak F. 6

32 what about for psd matrices? Observation: For PSD A, wehaveforanyentrya ij : a ij apple max(a ii, a jj ) since otherwise (e i e j ) T A(e i e j ) < 0. 7

33 what about for psd matrices? Observation: For PSD A, wehaveforanyentrya ij : a ij apple max(a ii, a jj ) since otherwise (e i e j ) T A(e i e j ) < 0. So we can find any hidden heavy entry by looking at its corresponding diagonal entries. 7

34 what about for psd matrices? Observation: For PSD A, wehaveforanyentrya ij : a ij apple max(a ii, a jj ) since otherwise (e i e j ) T A(e i e j ) < 0. So we can find any hidden heavy entry by looking at its corresponding diagonal entries. Question: How can we exploit additional structure arising from positive semidefiniteness to achieve sublinear runtime? 7

35 every psd matrix is a gram matrix Very Simple Fact: Every PSD matrix A 2 R n n can be written as B T B for some B 2 R n n. 8

36 every psd matrix is a gram matrix Very Simple Fact: Every PSD matrix A 2 R n n can be written as B T B for some B 2 R n n. B can be any matrix square root of A, e.g. if we let V V T be the eigendecomposition of A, wecansetb = 1/2 V T. 8

37 every psd matrix is a gram matrix Very Simple Fact: Every PSD matrix A 2 R n n can be written as B T B for some B 2 R n n. B can be any matrix square root of A, e.g. if we let V V T be the eigendecomposition of A, wecansetb = 1/2 V T. Letting b 1,...,b n be the columns of B, the entries of A contain every pairwise dot product a ij = b T i b j. 8

38 every psd matrix is a gram matrix The fact that A is a Gram matrix places a variety of geometric constraints on its entries. 9

39 every psd matrix is a gram matrix The fact that A is a Gram matrix places a variety of geometric constraints on its entries. The heavy diagonal observation is just one example. By Cauchy-Schwarz: q a ij = b T i b j apple (b T i b i ) (b T j b j )= p a ii a jj apple max(a ii, a jj ). 9

40 every psd matrix is a gram matrix The fact that A is a Gram matrix places a variety of geometric constraints on its entries. The heavy diagonal observation is just one example. By Cauchy-Schwarz: q a ij = b T i b j apple (b T i b i ) (b T j b j )= p a ii a jj apple max(a ii, a jj ). Another View: A contains a lot of information about the column span of B in a very compressed form with every pairwise dot product stored as a ij. 9

41 factor matrix low-rank approximation Question: Can we compute a low-rank approximation of B using o(n 2 ) column dot products? I.e. o(n 2 )accessestoa? 10

42 factor matrix low-rank approximation Question: Can we compute a low-rank approximation of B using o(n 2 ) column dot products? I.e. o(n 2 )accessestoa? Why? B has the same (right) singular vectors as A, andits singular values are closely related: i(b) = p i(a). 10

43 factor matrix low-rank approximation Question: Can we compute a low-rank approximation of B using o(n 2 ) column dot products? I.e. o(n 2 )accessestoa? Why? B has the same (right) singular vectors as A, andits singular values are closely related: i(b) = p i(a). So the top k singular vectors are the same for the two matrices. An optimal low-rank approximation for B thus gives an optimal low-rank approximation for A. 10

44 factor matrix low-rank approximation Question: Can we compute a low-rank approximation of B using o(n 2 ) column dot products? I.e. o(n 2 )accessestoa? Why? B has the same (right) singular vectors as A, andits singular values are closely related: i(b) = p i(a). So the top k singular vectors are the same for the two matrices. An optimal low-rank approximation for B thus gives an optimal low-rank approximation for A. Things will be messier once we introduce approximation, but this simple idea will lead to a sublinear time algorithm for A. 10

45 low-rank approximation via adaptive sampling 11

46 low-rank approximation via adaptive sampling Theorem (Deshpande, Vempala 06) For any B 2 R n n, there exists a subset of Õ(k 2 / ) columns whose span contains Z 2 R n k satisfying: kb ZZ T Bk F apple (1 + )kb B k k F 11

47 low-rank approximation via adaptive sampling Theorem (Deshpande, Vempala 06) For any B 2 R n n, there exists a subset of Õ(k 2 / ) columns whose span contains Z 2 R n k satisfying: kb ZZ T Bk F apple (1 + )kb B k k F Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 11

48 low-rank approximation via adaptive sampling Theorem (Deshpande, Vempala 06) For any B 2 R n n, there exists a subset of Õ(k 2 / ) columns whose span contains Z 2 R n k satisfying: kb ZZ T Bk F apple (1 + )kb B k k F Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 11

49 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

50 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb = kb i P S b i k 2 P i k 2 n i=1 kb = a i k 2 ii tr(a). 12

51 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb = kb i P S b i k 2 P i k 2 n i=1 kb = a i k 2 ii tr(a). 12

52 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb = kb i P S b i k 2 P i k 2 n i=1 kb = a i k 2 ii tr(a). 12

53 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

54 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

55 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

56 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

57 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

58 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

59 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

60 adaptive sampling Adaptive Sampling Initially, start with an empty column subset S := {}. For t =1,...,Õ(k 2 / ) Let P S be the projection onto the columns in S. kb Add b i to S with probability P i P S b i k 2 n i=1 kb. i P S b i k 2 12

61 sublinear dot product algorithm Theorem (Factor Matrix Low-Rank Approximation) There is an algorithm using Õ(nk 2 / ) accesses to A = B T B which computes Z 2 R n k satisfying with probability 99/100: kb ZZ T Bk F apple (1 + )kb B k k F. 13

62 sublinear dot product algorithm Theorem (Factor Matrix Low-Rank Approximation) There is an algorithm using Õ(nk 2 / ) accesses to A = B T B which computes Z 2 R n k satisfying with probability 99/100: kb ZZ T Bk F apple (1 + )kb B k k F. How does this translate to low-rank approximation of A itself? 13

63 boosting to a psd matrix approximation Lemma If kb ZZ T Bk 2 F apple 1+ 3/2 p n kb B k k 2 F,thenforA = BT B: ka B T ZZ T Bk 2 F apple (1 + )ka A kk 2 F. 14

64 boosting to a psd matrix approximation Lemma If kb ZZ T Bk 2 F apple 1+ 3/2 p n kb B k k 2 F,thenforA = BT B: ka B T ZZ T Bk 2 F apple (1 + )ka A kk 2 F. 14

65 boosting to a psd matrix approximation Lemma If kb ZZ T Bk 2 F apple 1+ 3/2 p n kb B k k 2 F,thenforA = BT B: ka ASCS T A T k 2 F apple (1 + )ka A kk 2 F. 14

66 boosting to a psd matrix approximation Lemma If kb ZZ T Bk 2 F apple 1+ 3/2 p n kb B k k 2 F,thenforA = BT B: ka ASCS T A T k 2 F apple (1 + )ka A kk 2 F. This gives a low-rank approximation algorithm which accesses just Õ nk 2 = n 3/2 poly(k, 1/ ) entries of A. 3/2 / p n 14

67 boosting to a psd matrix approximation Lemma If kb ZZ T Bk 2 F apple 1+ 3/2 p n kb B k k 2 F,thenforA = BT B: ka ASCS T A T k 2 F apple (1 + )ka A kk 2 F. This gives a low-rank approximation algorithm which accesses just Õ nk 2 = n 3/2 poly(k, 1/ ) entries of A. 3/2 / p n Our best algorithm accesses just Õ in Õ nk 2 time. 4 nk 2.5 entries of A and runs 14

68 limitations of column sampling Recall that our algorithm accesses the diagonal of A along with Õ(k 2 / ) columns. 15

69 limitations of column sampling Recall that our algorithm accesses the diagonal of A along with Õ(k 2p n) columns. 15

70 limitations of column sampling Recall that our algorithm accesses the diagonal of A along with Õ(k 2p n) columns. 15

71 limitations of column sampling Recall that our algorithm accesses the diagonal of A along with Õ(k 2p n) columns. If we take fewer columns, we can miss a p n p n block which contains a constant fraction of A s Frobenius norm. 15

72 column and row sampling Solution: Sample both rows and columns of A. 16

73 column and row sampling Solution: Sample both rows and columns of A. Instead of adaptive sampling we use ridge leverage scores, which can also be computed using an iterative sampling scheme making Õ(nk) accesses to A (Musco, Musco 17). 16

74 column and row sampling Solution: Sample both rows and columns of A. Instead of adaptive sampling we use ridge leverage scores, which can also be computed using an iterative sampling scheme making Õ(nk) accesses to A (Musco, Musco 17). Same intuition select a diverse set of columns which span a near-optimal low-rank approximation of the matrix. 16

75 column and row sampling Solution: Sample both rows and columns of A. Instead of adaptive sampling we use ridge leverage scores, which can also be computed using an iterative sampling scheme making Õ(nk) accesses to A (Musco, Musco 17). Same intuition select a diverse set of columns which span a near-optimal low-rank approximation of the matrix. Sample AS is a projection-cost-preserving sketch for A [Cohen et al 15, 17]. For any rank-k projection P, kas PASk 2 F =(1± )ka PAk2 F. 16

76 final algorithm Recover low-rank approximation using two-sided sampling and projection-cost-preserving sketch property. 17

77 final algorithm Recover low-rank approximation using two-sided sampling and projection-cost-preserving sketch property. 17

78 final algorithm Recover low-rank approximation using two-sided sampling and projection-cost-preserving sketch property. 17

79 summary of main ideas View each entry of A as encoding a large amount of information about its square root B. Inparticulara ij = b T i b j. 18

80 summary of main ideas View each entry of A as encoding a large amount of information about its square root B. Inparticulara ij = b T i b j. Use this view to find a low-rank approximation to B using sublinear accesses to A. 18

81 summary of main ideas View each entry of A as encoding a large amount of information about its square root B. Inparticulara ij = b T i b j. Use this view to find a low-rank approximation to B using sublinear accesses to A. Since B has the same singular vectors as A and i(b) = p i(a), a low-rank approximation of B can used to find one for A, albietwitha p n factor loss in quality. 18

82 summary of main ideas View each entry of A as encoding a large amount of information about its square root B. Inparticulara ij = b T i b j. Use this view to find a low-rank approximation to B using sublinear accesses to A. Since B has the same singular vectors as A and i(b) = p i(a), a low-rank approximation of B can used to find one for A, albietwitha p n factor loss in quality. Obtain near-optimal complexity using ridge leverage scores to sample both rows and columns of A. 18

83 open questions 19

84 open questions What else can be done for PSD matrices? We give applications to ridge regression, but what other linear algebraic problems require a second look? 19

85 open questions What else can be done for PSD matrices? We give applications to ridge regression, but what other linear algebraic problems require a second look? Are there other natural classes of matrices that admit sublinear time low-rank approximation? 19

86 open questions What else can be done for PSD matrices? We give applications to ridge regression, but what other linear algebraic problems require a second look? Are there other natural classes of matrices that admit sublinear time low-rank approximation? Starting points are matrices that break the (nnz(a)) time lower bound: e.g. binary matrices, diagonally dominant matrices. 19

87 open questions What else can be done for PSD matrices? We give applications to ridge regression, but what other linear algebraic problems require a second look? Are there other natural classes of matrices that admit sublinear time low-rank approximation? Starting points are matrices that break the (nnz(a)) time lower bound: e.g. binary matrices, diagonally dominant matrices. What can we do when we have PSD matrices with additional structure? E.g. kernel matrices. 19

88 Thanks! Questions? 20

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

dimensionality reduction for k-means and low rank approximation

dimensionality reduction for k-means and low rank approximation dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

Lecture 7: Positive Semidefinite Matrices

Lecture 7: Positive Semidefinite Matrices Lecture 7: Positive Semidefinite Matrices Rajat Mittal IIT Kanpur The main aim of this lecture note is to prepare your background for semidefinite programming. We have already seen some linear algebra.

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Linear Algebra (Review) Volker Tresp 2017

Linear Algebra (Review) Volker Tresp 2017 Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

randomized block krylov methods for stronger and faster approximate svd

randomized block krylov methods for stronger and faster approximate svd randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS singular value decomposition n d left

More information

Deep Learning Book Notes Chapter 2: Linear Algebra

Deep Learning Book Notes Chapter 2: Linear Algebra Deep Learning Book Notes Chapter 2: Linear Algebra Compiled By: Abhinaba Bala, Dakshit Agrawal, Mohit Jain Section 2.1: Scalars, Vectors, Matrices and Tensors Scalar Single Number Lowercase names in italic

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

Fiedler s Theorems on Nodal Domains

Fiedler s Theorems on Nodal Domains Spectral Graph Theory Lecture 7 Fiedler s Theorems on Nodal Domains Daniel A. Spielman September 19, 2018 7.1 Overview In today s lecture we will justify some of the behavior we observed when using eigenvectors

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Matrix Factorizations

Matrix Factorizations 1 Stat 540, Matrix Factorizations Matrix Factorizations LU Factorization Definition... Given a square k k matrix S, the LU factorization (or decomposition) represents S as the product of two triangular

More information

Fiedler s Theorems on Nodal Domains

Fiedler s Theorems on Nodal Domains Spectral Graph Theory Lecture 7 Fiedler s Theorems on Nodal Domains Daniel A Spielman September 9, 202 7 About these notes These notes are not necessarily an accurate representation of what happened in

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit. CSE 151: Introduction to Machine Learning Winter 2017 Problem Set 1 Instructor: Kamalika Chaudhuri Due on: Jan 28 Instructions This is a 40 point homework Homeworks will graded based on content and clarity

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

Lecture 1 and 2: Random Spanning Trees

Lecture 1 and 2: Random Spanning Trees Recent Advances in Approximation Algorithms Spring 2015 Lecture 1 and 2: Random Spanning Trees Lecturer: Shayan Oveis Gharan March 31st Disclaimer: These notes have not been subjected to the usual scrutiny

More information

ORIE 6334 Spectral Graph Theory September 8, Lecture 6. In order to do the first proof, we need to use the following fact.

ORIE 6334 Spectral Graph Theory September 8, Lecture 6. In order to do the first proof, we need to use the following fact. ORIE 6334 Spectral Graph Theory September 8, 2016 Lecture 6 Lecturer: David P. Williamson Scribe: Faisal Alkaabneh 1 The Matrix-Tree Theorem In this lecture, we continue to see the usefulness of the graph

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP) MATH 20F: LINEAR ALGEBRA LECTURE B00 (T KEMP) Definition 01 If T (x) = Ax is a linear transformation from R n to R m then Nul (T ) = {x R n : T (x) = 0} = Nul (A) Ran (T ) = {Ax R m : x R n } = {b R m

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

Lecture 9: Low Rank Approximation

Lecture 9: Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Fall 2018 Lecture 9: Low Rank Approximation Lecturer: Shayan Oveis Gharan February 8th Scribe: Jun Qi Disclaimer: These notes have not been subjected to the

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

1 9/5 Matrices, vectors, and their applications

1 9/5 Matrices, vectors, and their applications 1 9/5 Matrices, vectors, and their applications Algebra: study of objects and operations on them. Linear algebra: object: matrices and vectors. operations: addition, multiplication etc. Algorithms/Geometric

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the

More information

Subspace sampling and relative-error matrix approximation

Subspace sampling and relative-error matrix approximation Subspace sampling and relative-error matrix approximation Petros Drineas Rensselaer Polytechnic Institute Computer Science Department (joint work with M. W. Mahoney) For papers, etc. drineas The CUR decomposition

More information

10 Distributed Matrix Sketching

10 Distributed Matrix Sketching 10 Distributed Matrix Sketching In order to define a distributed matrix sketching problem thoroughly, one has to specify the distributed model, data model and the partition model of data. The distributed

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions Linear Systems Carlo Tomasi June, 08 Section characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian FE661 - Statistical Methods for Financial Engineering 2. Linear algebra Jitkomut Songsiri matrices and vectors linear equations range and nullspace of matrices function of vectors, gradient and Hessian

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Compound matrices and some classical inequalities

Compound matrices and some classical inequalities Compound matrices and some classical inequalities Tin-Yau Tam Mathematics & Statistics Auburn University Dec. 3, 04 We discuss some elegant proofs of several classical inequalities of matrices by using

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Linear Algebra (Review) Volker Tresp 2018

Linear Algebra (Review) Volker Tresp 2018 Linear Algebra (Review) Volker Tresp 2018 1 Vectors k, M, N are scalars A one-dimensional array c is a column vector. Thus in two dimensions, ( ) c1 c = c 2 c i is the i-th component of c c T = (c 1, c

More information

Matrix decompositions

Matrix decompositions Matrix decompositions Zdeněk Dvořák May 19, 2015 Lemma 1 (Schur decomposition). If A is a symmetric real matrix, then there exists an orthogonal matrix Q and a diagonal matrix D such that A = QDQ T. The

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

2. Every linear system with the same number of equations as unknowns has a unique solution.

2. Every linear system with the same number of equations as unknowns has a unique solution. 1. For matrices A, B, C, A + B = A + C if and only if A = B. 2. Every linear system with the same number of equations as unknowns has a unique solution. 3. Every linear system with the same number of equations

More information

4 Frequent Directions

4 Frequent Directions 4 Frequent Directions Edo Liberty[3] discovered a strong connection between matrix sketching and frequent items problems. In FREQUENTITEMS problem, we are given a stream S = hs 1,s 2,...,s n i of n items

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St

Outline Introduction: Problem Description Diculties Algebraic Structure: Algebraic Varieties Rank Decient Toeplitz Matrices Constructing Lower Rank St Structured Lower Rank Approximation by Moody T. Chu (NCSU) joint with Robert E. Funderlic (NCSU) and Robert J. Plemmons (Wake Forest) March 5, 1998 Outline Introduction: Problem Description Diculties Algebraic

More information

Low-Rank PSD Approximation in Input-Sparsity Time

Low-Rank PSD Approximation in Input-Sparsity Time Low-Rank PSD Approximation in Input-Sparsity Time Kenneth L. Clarkson IBM Research Almaden klclarks@us.ibm.com David P. Woodruff IBM Research Almaden dpwoodru@us.ibm.com Abstract We give algorithms for

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

Linear algebra for computational statistics

Linear algebra for computational statistics University of Seoul May 3, 2018 Vector and Matrix Notation Denote 2-dimensional data array (n p matrix) by X. Denote the element in the ith row and the jth column of X by x ij or (X) ij. Denote by X j

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

The Nyström Extension and Spectral Methods in Learning

The Nyström Extension and Spectral Methods in Learning Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

Subspace Embeddings for the Polynomial Kernel

Subspace Embeddings for the Polynomial Kernel Subspace Embeddings for the Polynomial Kernel Haim Avron IBM T.J. Watson Research Center Yorktown Heights, NY 10598 haimav@us.ibm.com Huy L. Nguy ên Simons Institute, UC Berkeley Berkeley, CA 94720 hlnguyen@cs.princeton.edu

More information

Elementary Linear Algebra Review for Exam 2 Exam is Monday, November 16th.

Elementary Linear Algebra Review for Exam 2 Exam is Monday, November 16th. Elementary Linear Algebra Review for Exam Exam is Monday, November 6th. The exam will cover sections:.4,..4, 5. 5., 7., the class notes on Markov Models. You must be able to do each of the following. Section.4

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

3 Best-Fit Subspaces and Singular Value Decomposition

3 Best-Fit Subspaces and Singular Value Decomposition 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) Think of the rows of an n d matrix A as n data points in a d-dimensional space and consider the problem of finding the best k-dimensional subspace

More information

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N. Math 410 Homework Problems In the following pages you will find all of the homework problems for the semester. Homework should be written out neatly and stapled and turned in at the beginning of class

More information

CS60021: Scalable Data Mining. Dimensionality Reduction

CS60021: Scalable Data Mining. Dimensionality Reduction J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a

More information

Lecture 9: Matrix approximation continued

Lecture 9: Matrix approximation continued 0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Properties of Matrices and Operations on Matrices

Properties of Matrices and Operations on Matrices Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations,

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less

More information

Functional Analysis Review

Functional Analysis Review Functional Analysis Review Lorenzo Rosasco slides courtesy of Andre Wibisono 9.520: Statistical Learning Theory and Applications September 9, 2013 1 2 3 4 Vector Space A vector space is a set V with binary

More information

Definitions for Quizzes

Definitions for Quizzes Definitions for Quizzes Italicized text (or something close to it) will be given to you. Plain text is (an example of) what you should write as a definition. [Bracketed text will not be given, nor does

More information

Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD) School of Computing National University of Singapore CS CS524 Theoretical Foundations of Multimedia More Linear Algebra Singular Value Decomposition (SVD) The highpoint of linear algebra Gilbert Strang

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition (Com S 477/577 Notes Yan-Bin Jia Sep, 7 Introduction Now comes a highlight of linear algebra. Any real m n matrix can be factored as A = UΣV T where U is an m m orthogonal

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Linear Algebra March 16, 2019

Linear Algebra March 16, 2019 Linear Algebra March 16, 2019 2 Contents 0.1 Notation................................ 4 1 Systems of linear equations, and matrices 5 1.1 Systems of linear equations..................... 5 1.2 Augmented

More information

Lecture 9: SVD, Low Rank Approximation

Lecture 9: SVD, Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected

More information

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem. Contents The dot product 3. Length of a vector........................

More information