randomized block krylov methods for stronger and faster approximate svd

Size: px
Start display at page:

Download "randomized block krylov methods for stronger and faster approximate svd"

Transcription

1 randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS

2 singular value decomposition n d left singular vectors singular values right singular vectors σ σ 2 A = U Σ V T σ d- σ d

3 singular value decomposition n d left singular vectors singular values right singular vectors σ σ 2 A = U Σ V T σ d- σ d Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc.

4 singular value decomposition n d left singular vectors singular values right singular vectors σ σ 2 A = U Σ V T σ d- σ d Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i

5 singular value decomposition n d A k = left singular vectors singular values U k σ σ k Σ k σ d- σ d right singular vectors V kt Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i A k = arg min A B F B:rank(B)=k

6 singular value decomposition n d A k = left singular vectors singular values U k σ σ k Σ k σ d- σ d right singular vectors V kt Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i A k = arg min A B 2 B:rank(B)=k

7 singular value decomposition n d A k = left singular vectors singular values U k σ σ k Σ k σ d- σ d right singular vectors V kt Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i U k U T ka = arg min A B 2 B:rank(B)=k

8 standard svd approximation metrics Frobenius Norm Low-Rank Approximation: A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F 2

9 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F A 2 F = d i= σ2 i and A U k U T k A 2 F = A A k 2 F = d i=k+ σ2 i. 2 SNAP/ Enron singular value σ i index i 2

10 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F A 2 F = d i= σ2 i and A U k U T k A 2 F = A A k 2 F = d i=k+ σ2 i. 2 SNAP/ Enron singular value σ i index i For many datasets literally any Ũ k would work! 2

11 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F Spectral Norm Low-Rank Approximation (stronger): A Ũ k Ũ T k A 2 ( + ϵ) A U k U T k A 2 2

12 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F Spectral Norm Low-Rank Approximation (stronger): A Ũ k Ũ T k A 2 ( + ϵ)σ k+ 2

13 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F Spectral Norm Low-Rank Approximation (stronger): A Ũ k Ũ T k A 2 ( + ϵ)σ k+ Per Vector Principal Component Error (strongest): ũ T i AAT ũ i ( ϵ)u T i AAT u i for all i k. 2

14 main research question Classic Full SVD Algorithms (e.g. QR Algorithm): All of these goals in roughly O(nd 2 ) time (error dependence is log log /ϵ on lower order terms). Unfortunately, this is much too slow for many data sets. 3

15 main research question Classic Full SVD Algorithms (e.g. QR Algorithm): All of these goals in roughly O(nd 2 ) time (error dependence is log log /ϵ on lower order terms). Unfortunately, this is much too slow for many data sets. How fast can we approximately compute just u,..., u k? 3

16 frobenius norm error Weak Approximation Algorithms: Strong Rank Revealing QR (Gu, Eisenstat 996): A Ũ k Ũ T k A F poly(n, k) A A k F in time O(ndk) 4

17 frobenius norm error Weak Approximation Algorithms: Strong Rank Revealing QR (Gu, Eisenstat 996): A Ũ k Ũ T k A F poly(n, k) A A k F in time O(ndk) Sparse Subspace Embeddings (Clarkson, Woodruff 23): ( ) nk A Ũ k Ũ T 2 k A F ( + ϵ) A A k F in time O(nnz(A)) + Õ ϵ 4 4

18 iterative svd algorithms Iterative methods are the only game in town for stronger guarantees. Runtime is approximately: O(nnz(A)k #iterations) Power method (Müntz 93, von Mises 929) Krylov/Lanczos methods (Lanczos 95) 5

19 iterative svd algorithms Iterative methods are the only game in town for stronger guarantees. Runtime is approximately: O(nnz(A)k #iterations) O(ndk #iterations) << O(nd 2 ) Power method (Müntz 93, von Mises 929) Krylov/Lanczos methods (Lanczos 95) 5

20 iterative svd algorithms Iterative methods are the only game in town for stronger guarantees. Runtime is approximately: O(nnz(A)k #iterations) O(ndk #iterations) << O(nd 2 ) Power method (Müntz 93, von Mises 929) Krylov/Lanczos methods (Lanczos 95) Stochastic Methods? 5

21 power method review Traditional Power Method: x R d, x i+ Ax i Ax i 6

22 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X X U 2 6

23 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X X 2 U 2 6

24 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 2 X 3 U 2 6

25 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 3 X 4 U 2 6

26 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 4 X 5 U 2 6

27 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 5 X 6 U 2 6

28 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 6 X 7 U 2 6

29 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 7 X 8 U 2 6

30 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) 3 2. A X 8 X 9 U 2 6

31 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k 7

32 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k 7

33 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k 7

34 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k Linear dependence on the singular value gap: gap = σ k σ k+ σ k 7

35 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k Linear dependence on the singular value gap: gap = σ k σ k+ σ k While this gap is traditionally assumed to be constant, it is the dominant factor in the iteration count for many datasets. 7

36 typical gap values Stanford Network Analysis Project Slashdot Social Network singular value, σ i index, i 8

37 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i 8

38 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i Median value of gap k = σ k σ k+ σ k for k 2:.9 8

39 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i Minimum value of gap k = σ k σ k+ σ k for k 2:.4 8

40 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i Minimum value of gap k = σ k σ k+ σ k for k 2: Runtime = O(25, nnz(a)k log(d/ϵ)) 8

41 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: 9

42 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: ( A Ũ k Ũ T k A 2 ( + ϵ) A A k 2 in time O nnz(a)k log d ) ϵ 9

43 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: ( A Ũ k Ũ T k A 2 ( + ϵ) A A k 2 in time O nnz(a)k log d ) ϵ Improves on classical bounds when ϵ > (σ k σ k+ )/σ k. 9

44 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: ( A Ũ k Ũ T k A 2 ( + ϵ) A A k 2 in time O nnz(a)k log d ) ϵ Improves on classical bounds when ϵ > (σ k σ k+ )/σ k. Long series of refinements and improvements: Rokhlin, Szlam, Tygert 29 Halko, Martinsson, Tropp 2 Boutsidis, Drineas, Magdon-Ismail 2 Witten, Candès 24 Woodruff 24 9

45 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds.

46 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds. redsvd ScaleNLP (Breeze) libskylark

47 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds. redsvd ScaleNLP (Breeze) libskylark But in the numerical linear algebra community, Krylov/Lanczos methods have long been prefered over power iteration.

48 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds. GNU Octave MLlib But in the numerical linear algebra community, Krylov/Lanczos methods have long been prefered over power iteration.

49 lanczos/krylov acceleration Power Method Krylov Methods ( O nnz(a)k ) ( log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k ) log(d/ϵ) (σk σ k+ )/σ k

50 lanczos/krylov acceleration Power Method Krylov Methods ( O nnz(a)k ) ( log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k ) log(d/ϵ) (σk σ k+ )/σ k ( O nnz(a)k log d )? ϵ No gap independent analysis of Krylov methods!

51 lanczos/krylov acceleration Power Method Krylov Methods ( O nnz(a)k ) ( log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k ) log(d/ϵ) (σk σ k+ )/σ k ( O nnz(a)k log d ) ϵ ( O nnz(a)k log d ) ϵ }{{} Our Contribution

52 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ 2

53 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. 2

54 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. Beats runtime of Block Power Method:... 2

55 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. Beats runtime of Block Power Method:,. 2

56 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. Beats runtime of Block Power Method:,. Improves classic Lanczos bounds when (σ k σ k+ )/σ k < ϵ. 2

57 understanding gap dependence First Step: Where does gap dependence actually comes from? 3

58 understanding gap dependence To prove guarantees like: ũ T i AAT ũ i ( ϵ)σi 2, classical analysis argues about convergence to A s true singular vectors. u i ũ i error Traditional objective function: u i ũ i 2. 4

59 understanding gap dependence Traditional objective function: u i ũ i 2 Simple potential function, easy to work with. 5

60 understanding gap dependence Traditional objective function: u i ũ i 2 Simple potential function, easy to work with. Can be used to prove strong per-vector error or spectral norm guarantees for A Ũ k Ũ T k A 2. 5

61 understanding gap dependence Traditional objective function: u i ũ i 2 Simple potential function, easy to work with. Can be used to prove strong per-vector error or spectral norm guarantees for A Ũ k Ũ T k A 2. Inherently requires an iteration count that depends on singular value gaps. 5

62 understanding gap dependence Traditional objective function: u i ũ i A X X U 2 6

63 understanding gap dependence Traditional objective function: u i ũ i A X X 2 U 2 6

64 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X X 3 U 2 6

65 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X 3 X 4.72 U 2 6

66 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X 4.72 X 5.76 U 2 6

67 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X 5.76 X 6.78 U 2 6

68 understanding gap dependence Traditional objective function: u i ũ i A X 6 X 7 U 2 6

69 understanding gap dependence Traditional objective function: u i ũ i A X 7 X 8 U 2 6

70 understanding gap dependence Traditional objective function: u i ũ i 2 3. A.54.5 X 8.84 X 9.86 U 2 6

71 key insight Convergence becomes less necessary precisely when it is difficult to achieve! 7

72 key insight Convergence becomes less necessary precisely when it is difficult to achieve! 3. A U 2 U 2 Minimizing u i ũ i 2 is sufficient, but far from necessary. 7

73 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. 8

74 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) 8

75 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (AG) = A Ũ k Ũ T k A F = A A F = 8

76 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = 8

77 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = If A is not rank k then we have error due to A A k : Ũ k = span(ag) = A Ũ k Ũ T k A F poly(d) A A k F 8

78 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = If A is not rank k then we have error due to A A k : Ũ k = span(ag) = A Ũ k Ũ T k A F poly(d) A A k F Gives an error bound for a single power method iteration. 8

79 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = If A is not rank k then we have error due to A A k : Ũ k = span(ag) = A Ũ k Ũ T k A F poly(d) A A k F Gives an error bound for a single power method iteration. Meaningless unless A A k F (the tail noise ) is very small. 8

80 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. 9

81 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. This is exactly what Block Power Method does: G AG A 2 G... A q G, Ũ k = span(a q G) 9

82 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. d d σ σ 2 A = U Σ U T σ d- σ d 9

83 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. d d σ q σ 2q A q = U Σ q U T σ d-q σ dq 9

84 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. 5 Spectrum of A Spectrum of A q Singular Value σ i Index i 9

85 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. 5 Spectrum of A Spectrum of A q Singular Value σ i Index i A q A q k 2 F = d i=k+ σ2q is extremely small. i 9

86 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. 2

87 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. ( ϵ) O(/ϵ) << 2

88 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. ( ϵ) O(/ϵ) << Ũ k = span(a q G) must align well with large (but not the largest!) singular vectors of A q to achieve even coarse Frobenius norm error: A q Ũ k Ũ T k Aq F poly(d) A q A q k F 2

89 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. ( ϵ) O(/ϵ) << Ũ k = span(a q G) must align well with large (but not the largest!) singular vectors of A q to achieve even coarse Frobenius norm error: A q Ũ k Ũ T k Aq F poly(d) A q A q k F A and A q have the same singular vectors so Ũ k is good for A. 2

90 modern approach to analysis We use new tools for converting very small Frobenius norm low-rank approximation error to spectral norm and per vector error, without arguing about convergence of ũ i and u i. 2

91 krylov acceleration There are better polynomials than A q for denoising A. 22

92 krylov acceleration There are better polynomials than A q for denoising A. 3 x O(/ε) T O(/ ε) (x) x 22

93 krylov acceleration There are better polynomials than A q for denoising A. 3 x O(/ε) T O(/ ε) (x) x With Chebyshev polynomials only need degree q = Õ(/ ϵ). 22

94 krylov acceleration There are better polynomials than A q for denoising A. 3 x O(/ε) T O(/ ε) (x) x With Chebyshev polynomials only need degree q = Õ(/ ϵ). 22

95 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. 23

96 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. Furthermore, block power iteration computes (at intermediate steps) all of the components needed for: T q (A)G = c G + c AG c q A q G 23

97 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. Furthermore, block power iteration computes (at intermediate steps) all of the components needed for: T q (A)G = c G + c AG c q A q G G AG A 2 G... A q G 23

98 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. Furthermore, block power iteration computes (at intermediate steps) all of the components needed for: Block Krylov Iteration: T q (A)G = c G + c AG c q A q G G AG A 2 G... A q G K = [ G, AG, A 2 G,..., A q G ] } {{ } Krylov subspace 23

99 implicit use of acceleration polynomial But... 24

100 implicit use of acceleration polynomial But... we can t explicitly compute T q (A), since its parameters depend on A s (unknown) singular values. 24

101 implicit use of acceleration polynomial But... we can t explicitly compute T q (A), since its parameters depend on A s (unknown) singular values. Solution: Returning the best Ũ k in the span of K is only better then returning span(t q (A)G). 24

102 implicit use of acceleration polynomial What is the best Ũ k? 25

103 implicit use of acceleration polynomial What is the best Ũ k? Surprisingly difficult question. 25

104 implicit use of acceleration polynomial What is the best Ũ k? Surprisingly difficult question. For Block Power Method, did not need to consider this Ũ k = span(a q G) was the only option. 25

105 implicit use of acceleration polynomial What is the best Ũ k? Surprisingly difficult question. For Block Power Method, did not need to consider this Ũ k = span(a q G) was the only option. In classical Lanczos/Krylov analysis, convergence to the true singular vectors also lets us avoid this issue. Use Rayleigh Ritz procedure. 25

106 rayleigh-ritz post-processing Project A to K and take the top k singular vectors (using an accurate classical method): Ũ k = span ((P K A) k ) 26

107 rayleigh-ritz post-processing Project A to K and take the top k singular vectors (using an accurate classical method): Block Krylov Iteration: K = Ũ k = span ((P K A) k ) [ G, AG, A 2 G,..., A q G ] } {{ } Krylov subspace, Ũ k = span ((P K A) k ) }{{} best solution in Krylov subspace 26

108 rayleigh-ritz post-processing Project A to K and take the top k singular vectors (using an accurate classical method): Block Krylov Iteration: K = Ũ k = span ((P K A) k ) [ G, AG, A 2 G,..., A q G ] } {{ } Krylov subspace, Ũ k = span ((P K A) k ) }{{} best solution in Krylov subspace Equivalent to the classic Block Lanczos algorithm in exact arithmetic. 26

109 rayleigh-ritz post-processing This post-processing step provably gives an optimal Ũ k for Frobenius norm low-rank approximation error. 27

110 rayleigh-ritz post-processing This post-processing step provably gives an optimal Ũ k for Frobenius norm low-rank approximation error. Our entire analysis relied on converting very small Frobenius norm error to strong spectral norm and per vector error! 27

111 rayleigh-ritz post-processing This post-processing step provably gives an optimal Ũ k for Frobenius norm low-rank approximation error. Our entire analysis relied on converting very small Frobenius norm error to strong spectral norm and per vector error! Take away: Modern denoising analysis gives new insight into the practical effectiveness of Rayleigh-Ritz projection. 27

112 final implementation Similar to randomized Block Power Method extremely simple (pseudocode in paper). Block Power Method Block Krylov Iteration 28

113 performance Block Krylov beats Block Power Method definitively for small ϵ. Error ε Block Krylov Frobenius Error Block Krylov Spectral Error Block Krylov Per Vector Error Power Method Frobenius Error Power Method Spectral Error Power Method Per Vector Error Iterations q 2 Newsgroups, k = 2 29

114 performance Block Krylov beats Block Power Method definitively for small ϵ. Error ε Block Krylov Frobenius Error Block Krylov Spectral Error Block Krylov Per Vector Error Power Method Frobenius Error Power Method Spectral Error Power Method Per Vector Error Runtime (seconds) 2 Newsgroups, k = 2 29

115 final comments Main Takeaway: First gap independent bound for Krylov methods. ( ) ( log d O nnz(a)k O nnz(a)k log d ) (σk σ k+ /σ k ϵ Open Questions Full stability analysis. Master error metric for gap independent results. Gap independent bounds for other methods (e.g. online and stochastic PCA). Analysis for small space/restarted block Krylov methods? 3

116 Thank you! 3

117 stability Stability Lanczos algorithms are often considered to be unstable. Largely due to the fact that a recurrence is used to efficiently compute a basis for the Krylov subspace on the fly. Since our subspace is small, we do not use the recurrence. Computing the basis explicitly avoids serious stability issues. There is some loss of orthogonality between blocks. However it only occurs once the algorithm has converged and we can show that it is not an issue in practice. 32

118 stability On poorly conditioned matrices Randomized Block Krylov Iteration still significantly outperforms Block Power Method. Block Krylov Block Power Per vector error ε iterations q Per Vector Error for k =, κ =, 33

dimensionality reduction for k-means and low rank approximation

dimensionality reduction for k-means and low rank approximation dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques

More information

A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades

A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades Qiaochu Yuan Department of Mathematics UC Berkeley Joint work with Prof. Ming Gu, Bo Li April 17, 2018 Qiaochu Yuan Tour

More information

Randomized algorithms for the approximation of matrices

Randomized algorithms for the approximation of matrices Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

APPLIED NUMERICAL LINEAR ALGEBRA

APPLIED NUMERICAL LINEAR ALGEBRA APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation

More information

Normalized power iterations for the computation of SVD

Normalized power iterations for the computation of SVD Normalized power iterations for the computation of SVD Per-Gunnar Martinsson Department of Applied Mathematics University of Colorado Boulder, Co. Per-gunnar.Martinsson@Colorado.edu Arthur Szlam Courant

More information

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale

Subset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: More on Arnoldi Iteration; Lanczos Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 17 Outline 1

More information

A fast randomized algorithm for approximating an SVD of a matrix

A fast randomized algorithm for approximating an SVD of a matrix A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,

More information

Lecture 9: Matrix approximation continued

Lecture 9: Matrix approximation continued 0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during

More information

Lecture 18 Nov 3rd, 2015

Lecture 18 Nov 3rd, 2015 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Randomized Algorithms for Matrix Computations

Randomized Algorithms for Matrix Computations Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching

More information

to be more efficient on enormous scale, in a stream, or in distributed settings.

to be more efficient on enormous scale, in a stream, or in distributed settings. 16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to

More information

A randomized algorithm for approximating the SVD of a matrix

A randomized algorithm for approximating the SVD of a matrix A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated. Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces

More information

SUBSPACE ITERATION RANDOMIZATION AND SINGULAR VALUE PROBLEMS

SUBSPACE ITERATION RANDOMIZATION AND SINGULAR VALUE PROBLEMS SUBSPACE ITERATION RANDOMIZATION AND SINGULAR VALUE PROBLEMS M. GU Abstract. A classical problem in matrix computations is the efficient and reliable approximation of a given matrix by a matrix of lower

More information

Randomized Numerical Linear Algebra: Review and Progresses

Randomized Numerical Linear Algebra: Review and Progresses ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November

More information

Efficiently Implementing Sparsity in Learning

Efficiently Implementing Sparsity in Learning Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have

More information

Subset Selection. Ilse Ipsen. North Carolina State University, USA

Subset Selection. Ilse Ipsen. North Carolina State University, USA Subset Selection Ilse Ipsen North Carolina State University, USA Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

Lecture 5: Randomized methods for low-rank approximation

Lecture 5: Randomized methods for low-rank approximation CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 5: Randomized methods for low-rank approximation Gunnar Martinsson The University of Colorado at Boulder Research

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES 48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue

More information

Sketching Structured Matrices for Faster Nonlinear Regression

Sketching Structured Matrices for Faster Nonlinear Regression Sketching Structured Matrices for Faster Nonlinear Regression Haim Avron Vikas Sindhwani IBM T.J. Watson Research Center Yorktown Heights, NY 10598 {haimav,vsindhw}@us.ibm.com David P. Woodruff IBM Almaden

More information

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010 Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined

More information

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland Matrix Algorithms Volume II: Eigensystems G. W. Stewart University of Maryland College Park, Maryland H1HJ1L Society for Industrial and Applied Mathematics Philadelphia CONTENTS Algorithms Preface xv xvii

More information

Large-scale eigenvalue problems

Large-scale eigenvalue problems ELE 538B: Mathematics of High-Dimensional Data Large-scale eigenvalue problems Yuxin Chen Princeton University, Fall 208 Outline Power method Lanczos algorithm Eigenvalue problems 4-2 Eigendecomposition

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

Randomization: Making Very Large-Scale Linear Algebraic Computations Possible

Randomization: Making Very Large-Scale Linear Algebraic Computations Possible Randomization: Making Very Large-Scale Linear Algebraic Computations Possible Gunnar Martinsson The University of Colorado at Boulder The material presented draws on work by Nathan Halko, Edo Liberty,

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

Column Selection via Adaptive Sampling

Column Selection via Adaptive Sampling Column Selection via Adaptive Sampling Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas

More information

arxiv: v2 [cs.ds] 17 Feb 2016

arxiv: v2 [cs.ds] 17 Feb 2016 Efficient Algorithm for Sparse Matrices Mina Ghashami University of Utah ghashami@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com Jeff M. Phillips University of Utah jeffp@cs.utah.edu arxiv:1602.00412v2

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical

More information

Sketched Ridge Regression:

Sketched Ridge Regression: Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:

More information

Low-Rank Approximations, Random Sampling and Subspace Iteration

Low-Rank Approximations, Random Sampling and Subspace Iteration Modified from Talk in 2012 For RNLA study group, March 2015 Ming Gu, UC Berkeley mgu@math.berkeley.edu Low-Rank Approximations, Random Sampling and Subspace Iteration Content! Approaches for low-rank matrix

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

A fast randomized algorithm for overdetermined linear least-squares regression

A fast randomized algorithm for overdetermined linear least-squares regression A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Parameter Selection Techniques and Surrogate Models

Parameter Selection Techniques and Surrogate Models Parameter Selection Techniques and Surrogate Models Model Reduction: Will discuss two forms Parameter space reduction Surrogate models to reduce model complexity Input Representation Local Sensitivity

More information

Iterative methods for symmetric eigenvalue problems

Iterative methods for symmetric eigenvalue problems s Iterative s for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 11, 2008 s 1 The power and its variants Inverse power Rayleigh quotient

More information

Simple and Deterministic Matrix Sketches

Simple and Deterministic Matrix Sketches Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41 Data Matrices Often

More information

Matrix Approximations

Matrix Approximations Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents

More information

Index. for generalized eigenvalue problem, butterfly form, 211

Index. for generalized eigenvalue problem, butterfly form, 211 Index ad hoc shifts, 165 aggressive early deflation, 205 207 algebraic multiplicity, 35 algebraic Riccati equation, 100 Arnoldi process, 372 block, 418 Hamiltonian skew symmetric, 420 implicitly restarted,

More information

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.

QALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra. QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview

More information

Krylov Subspace Methods to Calculate PageRank

Krylov Subspace Methods to Calculate PageRank Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector

More information

Tighter Low-rank Approximation via Sampling the Leveraged Element

Tighter Low-rank Approximation via Sampling the Leveraged Element Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com

More information

arxiv: v1 [math.na] 29 Dec 2014

arxiv: v1 [math.na] 29 Dec 2014 A CUR Factorization Algorithm based on the Interpolative Decomposition Sergey Voronin and Per-Gunnar Martinsson arxiv:1412.8447v1 [math.na] 29 Dec 214 December 3, 214 Abstract An algorithm for the efficient

More information

LARGE SPARSE EIGENVALUE PROBLEMS

LARGE SPARSE EIGENVALUE PROBLEMS LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems

More information

Fast algorithms for dimensionality reduction and data visualization

Fast algorithms for dimensionality reduction and data visualization Fast algorithms for dimensionality reduction and data visualization Manas Rachh Yale University 1/33 Acknowledgements George Linderman (Yale) Jeremy Hoskins (Yale) Stefan Steinerberger (Yale) Yuval Kluger

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

1 Conjugate gradients

1 Conjugate gradients Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration

More information

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts

More information

1 An implementation of a randomized algorithm for principal component analysis

1 An implementation of a randomized algorithm for principal component analysis 1 An implementation of a randomized algorithm for principal component analysis Arthur Szlam, Facebook Artificial Intelligence Research Yuval Kluger, Yale University Department of Pathology Mark Tygert,

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

Randomized algorithms for matrix computations and analysis of high dimensional data

Randomized algorithms for matrix computations and analysis of high dimensional data PCMI Summer Session, The Mathematics of Data Midway, Utah July, 2016 Randomized algorithms for matrix computations and analysis of high dimensional data Gunnar Martinsson The University of Colorado at

More information

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction

More information

RandNLA: Randomization in Numerical Linear Algebra

RandNLA: Randomization in Numerical Linear Algebra RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling

More information

Numerical Methods for Solving Large Scale Eigenvalue Problems

Numerical Methods for Solving Large Scale Eigenvalue Problems Peter Arbenz Computer Science Department, ETH Zürich E-mail: arbenz@inf.ethz.ch arge scale eigenvalue problems, Lecture 2, February 28, 2018 1/46 Numerical Methods for Solving Large Scale Eigenvalue Problems

More information

Parallel Singular Value Decomposition. Jiaxing Tan

Parallel Singular Value Decomposition. Jiaxing Tan Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

Review of Some Concepts from Linear Algebra: Part 2

Review of Some Concepts from Linear Algebra: Part 2 Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set

More information

3 Best-Fit Subspaces and Singular Value Decomposition

3 Best-Fit Subspaces and Singular Value Decomposition 3 Best-Fit Subspaces and Singular Value Decomposition (SVD) Think of the rows of an n d matrix A as n data points in a d-dimensional space and consider the problem of finding the best k-dimensional subspace

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Krylov subspace projection methods

Krylov subspace projection methods I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the

More information

Math 504 (Fall 2011) 1. (*) Consider the matrices

Math 504 (Fall 2011) 1. (*) Consider the matrices Math 504 (Fall 2011) Instructor: Emre Mengi Study Guide for Weeks 11-14 This homework concerns the following topics. Basic definitions and facts about eigenvalues and eigenvectors (Trefethen&Bau, Lecture

More information

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

Rank revealing factorizations, and low rank approximations

Rank revealing factorizations, and low rank approximations Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization

More information

Rank revealing factorizations, and low rank approximations

Rank revealing factorizations, and low rank approximations Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC February 2017 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization

More information

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

More information

arxiv: v2 [stat.ml] 24 Apr 2017

arxiv: v2 [stat.ml] 24 Apr 2017 Faster Principal Component Regression and Stable Matrix Chebyshev Approximation arxiv:68.4773v [stat.ml] 4 Apr 7 Zeyuan Allen-Zhu zeyuan@csail.mit.edu Princeton University / IAS August 6, 6 Abstract Yuanzhi

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

arxiv: v2 [stat.ml] 29 Nov 2018

arxiv: v2 [stat.ml] 29 Nov 2018 Randomized Iterative Algorithms for Fisher Discriminant Analysis Agniva Chowdhury Jiasen Yang Petros Drineas arxiv:1809.03045v2 [stat.ml] 29 Nov 2018 Abstract Fisher discriminant analysis FDA is a widely

More information

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5. 2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

Improved Distributed Principal Component Analysis

Improved Distributed Principal Component Analysis Improved Distributed Principal Component Analysis Maria-Florina Balcan School of Computer Science Carnegie Mellon University ninamf@cscmuedu Yingyu Liang Department of Computer Science Princeton University

More information

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28

CS 229r: Algorithms for Big Data Fall Lecture 17 10/28 CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices

Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Sijia Hao

More information

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS

A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS SILVIA NOSCHESE AND LOTHAR REICHEL Abstract. Truncated singular value decomposition (TSVD) is a popular method for solving linear discrete ill-posed

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline

More information