randomized block krylov methods for stronger and faster approximate svd
|
|
- Jennifer Cox
- 6 years ago
- Views:
Transcription
1 randomized block krylov methods for stronger and faster approximate svd Cameron Musco and Christopher Musco December 2, 25 Massachusetts Institute of Technology, EECS
2 singular value decomposition n d left singular vectors singular values right singular vectors σ σ 2 A = U Σ V T σ d- σ d
3 singular value decomposition n d left singular vectors singular values right singular vectors σ σ 2 A = U Σ V T σ d- σ d Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc.
4 singular value decomposition n d left singular vectors singular values right singular vectors σ σ 2 A = U Σ V T σ d- σ d Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i
5 singular value decomposition n d A k = left singular vectors singular values U k σ σ k Σ k σ d- σ d right singular vectors V kt Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i A k = arg min A B F B:rank(B)=k
6 singular value decomposition n d A k = left singular vectors singular values U k σ σ k Σ k σ d- σ d right singular vectors V kt Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i A k = arg min A B 2 B:rank(B)=k
7 singular value decomposition n d A k = left singular vectors singular values U k σ σ k Σ k σ d- σ d right singular vectors V kt Extremely important primitive for dimensionality reduction, low-rank approximation, PCA, etc. u i = arg max x T AA T x x: x =,x u,...,u i U k U T ka = arg min A B 2 B:rank(B)=k
8 standard svd approximation metrics Frobenius Norm Low-Rank Approximation: A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F 2
9 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F A 2 F = d i= σ2 i and A U k U T k A 2 F = A A k 2 F = d i=k+ σ2 i. 2 SNAP/ Enron singular value σ i index i 2
10 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F A 2 F = d i= σ2 i and A U k U T k A 2 F = A A k 2 F = d i=k+ σ2 i. 2 SNAP/ Enron singular value σ i index i For many datasets literally any Ũ k would work! 2
11 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F Spectral Norm Low-Rank Approximation (stronger): A Ũ k Ũ T k A 2 ( + ϵ) A U k U T k A 2 2
12 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F Spectral Norm Low-Rank Approximation (stronger): A Ũ k Ũ T k A 2 ( + ϵ)σ k+ 2
13 standard svd approximation metrics Frobenius Norm Low-Rank Approximation (weak): A Ũ k Ũ T k A F ( + ϵ) A U k U T k A F Spectral Norm Low-Rank Approximation (stronger): A Ũ k Ũ T k A 2 ( + ϵ)σ k+ Per Vector Principal Component Error (strongest): ũ T i AAT ũ i ( ϵ)u T i AAT u i for all i k. 2
14 main research question Classic Full SVD Algorithms (e.g. QR Algorithm): All of these goals in roughly O(nd 2 ) time (error dependence is log log /ϵ on lower order terms). Unfortunately, this is much too slow for many data sets. 3
15 main research question Classic Full SVD Algorithms (e.g. QR Algorithm): All of these goals in roughly O(nd 2 ) time (error dependence is log log /ϵ on lower order terms). Unfortunately, this is much too slow for many data sets. How fast can we approximately compute just u,..., u k? 3
16 frobenius norm error Weak Approximation Algorithms: Strong Rank Revealing QR (Gu, Eisenstat 996): A Ũ k Ũ T k A F poly(n, k) A A k F in time O(ndk) 4
17 frobenius norm error Weak Approximation Algorithms: Strong Rank Revealing QR (Gu, Eisenstat 996): A Ũ k Ũ T k A F poly(n, k) A A k F in time O(ndk) Sparse Subspace Embeddings (Clarkson, Woodruff 23): ( ) nk A Ũ k Ũ T 2 k A F ( + ϵ) A A k F in time O(nnz(A)) + Õ ϵ 4 4
18 iterative svd algorithms Iterative methods are the only game in town for stronger guarantees. Runtime is approximately: O(nnz(A)k #iterations) Power method (Müntz 93, von Mises 929) Krylov/Lanczos methods (Lanczos 95) 5
19 iterative svd algorithms Iterative methods are the only game in town for stronger guarantees. Runtime is approximately: O(nnz(A)k #iterations) O(ndk #iterations) << O(nd 2 ) Power method (Müntz 93, von Mises 929) Krylov/Lanczos methods (Lanczos 95) 5
20 iterative svd algorithms Iterative methods are the only game in town for stronger guarantees. Runtime is approximately: O(nnz(A)k #iterations) O(ndk #iterations) << O(nd 2 ) Power method (Müntz 93, von Mises 929) Krylov/Lanczos methods (Lanczos 95) Stochastic Methods? 5
21 power method review Traditional Power Method: x R d, x i+ Ax i Ax i 6
22 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X X U 2 6
23 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X X 2 U 2 6
24 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 2 X 3 U 2 6
25 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 3 X 4 U 2 6
26 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 4 X 5 U 2 6
27 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 5 X 6 U 2 6
28 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 6 X 7 U 2 6
29 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) A X 7 X 8 U 2 6
30 power method review Traditional Power Method: x R d, x i+ Ax i Ax i Block Power Method (Simultaneous/Subspace Iteration): X R d k, X i+ orthonormalize(ax i ) 3 2. A X 8 X 9 U 2 6
31 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k 7
32 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k 7
33 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k 7
34 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k Linear dependence on the singular value gap: gap = σ k σ k+ σ k 7
35 power method runtime Runtime for Block Power method is roughly: ( ) log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k Linear dependence on the singular value gap: gap = σ k σ k+ σ k While this gap is traditionally assumed to be constant, it is the dominant factor in the iteration count for many datasets. 7
36 typical gap values Stanford Network Analysis Project Slashdot Social Network singular value, σ i index, i 8
37 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i 8
38 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i Median value of gap k = σ k σ k+ σ k for k 2:.9 8
39 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i Minimum value of gap k = σ k σ k+ σ k for k 2:.4 8
40 typical gap values Stanford Network Analysis Project Slashdot Social Network 4 singular value, σ i index, i Minimum value of gap k = σ k σ k+ σ k for k 2: Runtime = O(25, nnz(a)k log(d/ϵ)) 8
41 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: 9
42 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: ( A Ũ k Ũ T k A 2 ( + ϵ) A A k 2 in time O nnz(a)k log d ) ϵ 9
43 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: ( A Ũ k Ũ T k A 2 ( + ϵ) A A k 2 in time O nnz(a)k log d ) ϵ Improves on classical bounds when ϵ > (σ k σ k+ )/σ k. 9
44 gap independent bounds Recent work shows Block Power method (with randomized start vectors) gives: ( A Ũ k Ũ T k A 2 ( + ϵ) A A k 2 in time O nnz(a)k log d ) ϵ Improves on classical bounds when ϵ > (σ k σ k+ )/σ k. Long series of refinements and improvements: Rokhlin, Szlam, Tygert 29 Halko, Martinsson, Tropp 2 Boutsidis, Drineas, Magdon-Ismail 2 Witten, Candès 24 Woodruff 24 9
45 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds.
46 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds. redsvd ScaleNLP (Breeze) libskylark
47 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds. redsvd ScaleNLP (Breeze) libskylark But in the numerical linear algebra community, Krylov/Lanczos methods have long been prefered over power iteration.
48 randomized block power method Randomized Block Power Method is widely cited and implemented simple algorithm with simple bounds. GNU Octave MLlib But in the numerical linear algebra community, Krylov/Lanczos methods have long been prefered over power iteration.
49 lanczos/krylov acceleration Power Method Krylov Methods ( O nnz(a)k ) ( log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k ) log(d/ϵ) (σk σ k+ )/σ k
50 lanczos/krylov acceleration Power Method Krylov Methods ( O nnz(a)k ) ( log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k ) log(d/ϵ) (σk σ k+ )/σ k ( O nnz(a)k log d )? ϵ No gap independent analysis of Krylov methods!
51 lanczos/krylov acceleration Power Method Krylov Methods ( O nnz(a)k ) ( log(d/ϵ) O nnz(a)k (σ k σ k+ )/σ k ) log(d/ϵ) (σk σ k+ )/σ k ( O nnz(a)k log d ) ϵ ( O nnz(a)k log d ) ϵ }{{} Our Contribution
52 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ 2
53 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. 2
54 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. Beats runtime of Block Power Method:... 2
55 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. Beats runtime of Block Power Method:,. 2
56 our main result A simple randomized Block Krylov Iteration gives all three of our target error bounds in time: ( O nnz(a)k log d ) ϵ A U k U T k A 2 ( + ϵ)σ k+ and ũ T i AAT ũ i σ 2 i ϵσ 2 k+ Gives a runtime bound that is independent of A. Beats runtime of Block Power Method:,. Improves classic Lanczos bounds when (σ k σ k+ )/σ k < ϵ. 2
57 understanding gap dependence First Step: Where does gap dependence actually comes from? 3
58 understanding gap dependence To prove guarantees like: ũ T i AAT ũ i ( ϵ)σi 2, classical analysis argues about convergence to A s true singular vectors. u i ũ i error Traditional objective function: u i ũ i 2. 4
59 understanding gap dependence Traditional objective function: u i ũ i 2 Simple potential function, easy to work with. 5
60 understanding gap dependence Traditional objective function: u i ũ i 2 Simple potential function, easy to work with. Can be used to prove strong per-vector error or spectral norm guarantees for A Ũ k Ũ T k A 2. 5
61 understanding gap dependence Traditional objective function: u i ũ i 2 Simple potential function, easy to work with. Can be used to prove strong per-vector error or spectral norm guarantees for A Ũ k Ũ T k A 2. Inherently requires an iteration count that depends on singular value gaps. 5
62 understanding gap dependence Traditional objective function: u i ũ i A X X U 2 6
63 understanding gap dependence Traditional objective function: u i ũ i A X X 2 U 2 6
64 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X X 3 U 2 6
65 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X 3 X 4.72 U 2 6
66 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X 4.72 X 5.76 U 2 6
67 understanding gap dependence Traditional objective function: u i ũ i 2 3. A X 5.76 X 6.78 U 2 6
68 understanding gap dependence Traditional objective function: u i ũ i A X 6 X 7 U 2 6
69 understanding gap dependence Traditional objective function: u i ũ i A X 7 X 8 U 2 6
70 understanding gap dependence Traditional objective function: u i ũ i 2 3. A.54.5 X 8.84 X 9.86 U 2 6
71 key insight Convergence becomes less necessary precisely when it is difficult to achieve! 7
72 key insight Convergence becomes less necessary precisely when it is difficult to achieve! 3. A U 2 U 2 Minimizing u i ũ i 2 is sufficient, but far from necessary. 7
73 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. 8
74 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) 8
75 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (AG) = A Ũ k Ũ T k A F = A A F = 8
76 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = 8
77 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = If A is not rank k then we have error due to A A k : Ũ k = span(ag) = A Ũ k Ũ T k A F poly(d) A A k F 8
78 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = If A is not rank k then we have error due to A A k : Ũ k = span(ag) = A Ũ k Ũ T k A F poly(d) A A k F Gives an error bound for a single power method iteration. 8
79 modern approach to analysis Iterative methods viewed as denoising procedures for Random Sketching methods. Choose G N (, ) d k. If A is rank k then: span (AG) = span(a) Ũ k = span (A) = A Ũ k Ũ T k A F = A A F = If A is not rank k then we have error due to A A k : Ũ k = span(ag) = A Ũ k Ũ T k A F poly(d) A A k F Gives an error bound for a single power method iteration. Meaningless unless A A k F (the tail noise ) is very small. 8
80 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. 9
81 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. This is exactly what Block Power Method does: G AG A 2 G... A q G, Ũ k = span(a q G) 9
82 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. d d σ σ 2 A = U Σ U T σ d- σ d 9
83 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. d d σ q σ 2q A q = U Σ q U T σ d-q σ dq 9
84 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. 5 Spectrum of A Spectrum of A q Singular Value σ i Index i 9
85 modern approach to analysis How to avoid tail noise? Apply sketching method to A q instead. Assuming A is symmetric, if A = UΣU T then A q = UΣ q U T. 5 Spectrum of A Spectrum of A q Singular Value σ i Index i A q A q k 2 F = d i=k+ σ2q is extremely small. i 9
86 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. 2
87 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. ( ϵ) O(/ϵ) << 2
88 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. ( ϵ) O(/ϵ) << Ũ k = span(a q G) must align well with large (but not the largest!) singular vectors of A q to achieve even coarse Frobenius norm error: A q Ũ k Ũ T k Aq F poly(d) A q A q k F 2
89 modern approach to analysis q = Õ(/ϵ) ensures that any singular value below σ k+ becomes extremely small in comparison to any singular value above ( + ϵ)σ k+. ( ϵ) O(/ϵ) << Ũ k = span(a q G) must align well with large (but not the largest!) singular vectors of A q to achieve even coarse Frobenius norm error: A q Ũ k Ũ T k Aq F poly(d) A q A q k F A and A q have the same singular vectors so Ũ k is good for A. 2
90 modern approach to analysis We use new tools for converting very small Frobenius norm low-rank approximation error to spectral norm and per vector error, without arguing about convergence of ũ i and u i. 2
91 krylov acceleration There are better polynomials than A q for denoising A. 22
92 krylov acceleration There are better polynomials than A q for denoising A. 3 x O(/ε) T O(/ ε) (x) x 22
93 krylov acceleration There are better polynomials than A q for denoising A. 3 x O(/ε) T O(/ ε) (x) x With Chebyshev polynomials only need degree q = Õ(/ ϵ). 22
94 krylov acceleration There are better polynomials than A q for denoising A. 3 x O(/ε) T O(/ ε) (x) x With Chebyshev polynomials only need degree q = Õ(/ ϵ). 22
95 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. 23
96 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. Furthermore, block power iteration computes (at intermediate steps) all of the components needed for: T q (A)G = c G + c AG c q A q G 23
97 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. Furthermore, block power iteration computes (at intermediate steps) all of the components needed for: T q (A)G = c G + c AG c q A q G G AG A 2 G... A q G 23
98 krylov acceleration Chebyshev polynomial T q (A) has a very small tail. So returning Ũ k = span(t q (A)G) would suffice. Furthermore, block power iteration computes (at intermediate steps) all of the components needed for: Block Krylov Iteration: T q (A)G = c G + c AG c q A q G G AG A 2 G... A q G K = [ G, AG, A 2 G,..., A q G ] } {{ } Krylov subspace 23
99 implicit use of acceleration polynomial But... 24
100 implicit use of acceleration polynomial But... we can t explicitly compute T q (A), since its parameters depend on A s (unknown) singular values. 24
101 implicit use of acceleration polynomial But... we can t explicitly compute T q (A), since its parameters depend on A s (unknown) singular values. Solution: Returning the best Ũ k in the span of K is only better then returning span(t q (A)G). 24
102 implicit use of acceleration polynomial What is the best Ũ k? 25
103 implicit use of acceleration polynomial What is the best Ũ k? Surprisingly difficult question. 25
104 implicit use of acceleration polynomial What is the best Ũ k? Surprisingly difficult question. For Block Power Method, did not need to consider this Ũ k = span(a q G) was the only option. 25
105 implicit use of acceleration polynomial What is the best Ũ k? Surprisingly difficult question. For Block Power Method, did not need to consider this Ũ k = span(a q G) was the only option. In classical Lanczos/Krylov analysis, convergence to the true singular vectors also lets us avoid this issue. Use Rayleigh Ritz procedure. 25
106 rayleigh-ritz post-processing Project A to K and take the top k singular vectors (using an accurate classical method): Ũ k = span ((P K A) k ) 26
107 rayleigh-ritz post-processing Project A to K and take the top k singular vectors (using an accurate classical method): Block Krylov Iteration: K = Ũ k = span ((P K A) k ) [ G, AG, A 2 G,..., A q G ] } {{ } Krylov subspace, Ũ k = span ((P K A) k ) }{{} best solution in Krylov subspace 26
108 rayleigh-ritz post-processing Project A to K and take the top k singular vectors (using an accurate classical method): Block Krylov Iteration: K = Ũ k = span ((P K A) k ) [ G, AG, A 2 G,..., A q G ] } {{ } Krylov subspace, Ũ k = span ((P K A) k ) }{{} best solution in Krylov subspace Equivalent to the classic Block Lanczos algorithm in exact arithmetic. 26
109 rayleigh-ritz post-processing This post-processing step provably gives an optimal Ũ k for Frobenius norm low-rank approximation error. 27
110 rayleigh-ritz post-processing This post-processing step provably gives an optimal Ũ k for Frobenius norm low-rank approximation error. Our entire analysis relied on converting very small Frobenius norm error to strong spectral norm and per vector error! 27
111 rayleigh-ritz post-processing This post-processing step provably gives an optimal Ũ k for Frobenius norm low-rank approximation error. Our entire analysis relied on converting very small Frobenius norm error to strong spectral norm and per vector error! Take away: Modern denoising analysis gives new insight into the practical effectiveness of Rayleigh-Ritz projection. 27
112 final implementation Similar to randomized Block Power Method extremely simple (pseudocode in paper). Block Power Method Block Krylov Iteration 28
113 performance Block Krylov beats Block Power Method definitively for small ϵ. Error ε Block Krylov Frobenius Error Block Krylov Spectral Error Block Krylov Per Vector Error Power Method Frobenius Error Power Method Spectral Error Power Method Per Vector Error Iterations q 2 Newsgroups, k = 2 29
114 performance Block Krylov beats Block Power Method definitively for small ϵ. Error ε Block Krylov Frobenius Error Block Krylov Spectral Error Block Krylov Per Vector Error Power Method Frobenius Error Power Method Spectral Error Power Method Per Vector Error Runtime (seconds) 2 Newsgroups, k = 2 29
115 final comments Main Takeaway: First gap independent bound for Krylov methods. ( ) ( log d O nnz(a)k O nnz(a)k log d ) (σk σ k+ /σ k ϵ Open Questions Full stability analysis. Master error metric for gap independent results. Gap independent bounds for other methods (e.g. online and stochastic PCA). Analysis for small space/restarted block Krylov methods? 3
116 Thank you! 3
117 stability Stability Lanczos algorithms are often considered to be unstable. Largely due to the fact that a recurrence is used to efficiently compute a basis for the Krylov subspace on the fly. Since our subspace is small, we do not use the recurrence. Computing the basis explicitly avoids serious stability issues. There is some loss of orthogonality between blocks. However it only occurs once the algorithm has converged and we can show that it is not an issue in practice. 32
118 stability On poorly conditioned matrices Randomized Block Krylov Iteration still significantly outperforms Block Power Method. Block Krylov Block Power Per vector error ε iterations q Per Vector Error for k =, κ =, 33
dimensionality reduction for k-means and low rank approximation
dimensionality reduction for k-means and low rank approximation Michael Cohen, Sam Elder, Cameron Musco, Christopher Musco, Mădălina Persu Massachusetts Institute of Technology 0 overview Simple techniques
More informationA Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades
A Tour of the Lanczos Algorithm and its Convergence Guarantees through the Decades Qiaochu Yuan Department of Mathematics UC Berkeley Joint work with Prof. Ming Gu, Bo Li April 17, 2018 Qiaochu Yuan Tour
More informationRandomized algorithms for the approximation of matrices
Randomized algorithms for the approximation of matrices Luis Rademacher The Ohio State University Computer Science and Engineering (joint work with Amit Deshpande, Santosh Vempala, Grant Wang) Two topics
More informationsublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)
sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank
More informationAPPLIED NUMERICAL LINEAR ALGEBRA
APPLIED NUMERICAL LINEAR ALGEBRA James W. Demmel University of California Berkeley, California Society for Industrial and Applied Mathematics Philadelphia Contents Preface 1 Introduction 1 1.1 Basic Notation
More informationNormalized power iterations for the computation of SVD
Normalized power iterations for the computation of SVD Per-Gunnar Martinsson Department of Applied Mathematics University of Colorado Boulder, Co. Per-gunnar.Martinsson@Colorado.edu Arthur Szlam Courant
More informationSubset Selection. Deterministic vs. Randomized. Ilse Ipsen. North Carolina State University. Joint work with: Stan Eisenstat, Yale
Subset Selection Deterministic vs. Randomized Ilse Ipsen North Carolina State University Joint work with: Stan Eisenstat, Yale Mary Beth Broadbent, Martin Brown, Kevin Penner Subset Selection Given: real
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: More on Arnoldi Iteration; Lanczos Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 17 Outline 1
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationLecture 9: Matrix approximation continued
0368-348-01-Algorithms in Data Mining Fall 013 Lecturer: Edo Liberty Lecture 9: Matrix approximation continued Warning: This note may contain typos and other inaccuracies which are usually discussed during
More informationLecture 18 Nov 3rd, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 18 Nov 3rd, 2015 Scribe: Jefferson Lee 1 Overview Low-rank approximation, Compression Sensing 2 Last Time We looked at three different
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationComputational Methods. Eigenvalues and Singular Values
Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationRandomized Algorithms for Matrix Computations
Randomized Algorithms for Matrix Computations Ilse Ipsen Students: John Holodnak, Kevin Penner, Thomas Wentworth Research supported in part by NSF CISE CCF, DARPA XData Randomized Algorithms Solve a deterministic
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra (Part 2) David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching
More informationto be more efficient on enormous scale, in a stream, or in distributed settings.
16 Matrix Sketching The singular value decomposition (SVD) can be interpreted as finding the most dominant directions in an (n d) matrix A (or n points in R d ). Typically n > d. It is typically easy to
More informationA randomized algorithm for approximating the SVD of a matrix
A randomized algorithm for approximating the SVD of a matrix Joint work with Per-Gunnar Martinsson (U. of Colorado) and Vladimir Rokhlin (Yale) Mark Tygert Program in Applied Mathematics Yale University
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Edgar
More informationNumerical Methods in Matrix Computations
Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices
More informationComputation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.
Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces
More informationSUBSPACE ITERATION RANDOMIZATION AND SINGULAR VALUE PROBLEMS
SUBSPACE ITERATION RANDOMIZATION AND SINGULAR VALUE PROBLEMS M. GU Abstract. A classical problem in matrix computations is the efficient and reliable approximation of a given matrix by a matrix of lower
More informationRandomized Numerical Linear Algebra: Review and Progresses
ized ized SVD ized : Review and Progresses Zhihua Department of Computer Science and Engineering Shanghai Jiao Tong University The 12th China Workshop on Machine Learning and Applications Xi an, November
More informationEfficiently Implementing Sparsity in Learning
Efficiently Implementing Sparsity in Learning M. Magdon-Ismail Rensselaer Polytechnic Institute (Joint Work) December 9, 2013. Out-of-Sample is What Counts NO YES A pattern exists We don t know it We have
More informationSubset Selection. Ilse Ipsen. North Carolina State University, USA
Subset Selection Ilse Ipsen North Carolina State University, USA Subset Selection Given: real or complex matrix A integer k Determine permutation matrix P so that AP = ( A 1 }{{} k A 2 ) Important columns
More informationApproximate Spectral Clustering via Randomized Sketching
Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed
More informationLecture 5: Randomized methods for low-rank approximation
CBMS Conference on Fast Direct Solvers Dartmouth College June 23 June 27, 2014 Lecture 5: Randomized methods for low-rank approximation Gunnar Martinsson The University of Colorado at Boulder Research
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More information4.8 Arnoldi Iteration, Krylov Subspaces and GMRES
48 Arnoldi Iteration, Krylov Subspaces and GMRES We start with the problem of using a similarity transformation to convert an n n matrix A to upper Hessenberg form H, ie, A = QHQ, (30) with an appropriate
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationLast Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection
Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue
More informationSketching Structured Matrices for Faster Nonlinear Regression
Sketching Structured Matrices for Faster Nonlinear Regression Haim Avron Vikas Sindhwani IBM T.J. Watson Research Center Yorktown Heights, NY 10598 {haimav,vsindhw}@us.ibm.com David P. Woodruff IBM Almaden
More informationLeast Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo
Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 26, 2010 Linear system Linear system Ax = b, A C m,n, b C m, x C n. under-determined
More informationMatrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland
Matrix Algorithms Volume II: Eigensystems G. W. Stewart University of Maryland College Park, Maryland H1HJ1L Society for Industrial and Applied Mathematics Philadelphia CONTENTS Algorithms Preface xv xvii
More informationLarge-scale eigenvalue problems
ELE 538B: Mathematics of High-Dimensional Data Large-scale eigenvalue problems Yuxin Chen Princeton University, Fall 208 Outline Power method Lanczos algorithm Eigenvalue problems 4-2 Eigendecomposition
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationRandomization: Making Very Large-Scale Linear Algebraic Computations Possible
Randomization: Making Very Large-Scale Linear Algebraic Computations Possible Gunnar Martinsson The University of Colorado at Boulder The material presented draws on work by Nathan Halko, Edo Liberty,
More informationRecovering any low-rank matrix, provably
Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix
More informationColumn Selection via Adaptive Sampling
Column Selection via Adaptive Sampling Saurabh Paul Global Risk Sciences, Paypal Inc. saupaul@paypal.com Malik Magdon-Ismail CS Dept., Rensselaer Polytechnic Institute magdon@cs.rpi.edu Petros Drineas
More informationarxiv: v2 [cs.ds] 17 Feb 2016
Efficient Algorithm for Sparse Matrices Mina Ghashami University of Utah ghashami@cs.utah.edu Edo Liberty Yahoo Labs edo.liberty@yahoo.com Jeff M. Phillips University of Utah jeffp@cs.utah.edu arxiv:1602.00412v2
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More informationSketching as a Tool for Numerical Linear Algebra
Sketching as a Tool for Numerical Linear Algebra David P. Woodruff presented by Sepehr Assadi o(n) Big Data Reading Group University of Pennsylvania February, 2015 Sepehr Assadi (Penn) Sketching for Numerical
More informationSketched Ridge Regression:
Sketched Ridge Regression: Optimization and Statistical Perspectives Shusen Wang UC Berkeley Alex Gittens RPI Michael Mahoney UC Berkeley Overview Ridge Regression min w f w = 1 n Xw y + γ w Over-determined:
More informationLow-Rank Approximations, Random Sampling and Subspace Iteration
Modified from Talk in 2012 For RNLA study group, March 2015 Ming Gu, UC Berkeley mgu@math.berkeley.edu Low-Rank Approximations, Random Sampling and Subspace Iteration Content! Approaches for low-rank matrix
More informationFirst Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate
58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationA fast randomized algorithm for overdetermined linear least-squares regression
A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationParameter Selection Techniques and Surrogate Models
Parameter Selection Techniques and Surrogate Models Model Reduction: Will discuss two forms Parameter space reduction Surrogate models to reduce model complexity Input Representation Local Sensitivity
More informationIterative methods for symmetric eigenvalue problems
s Iterative s for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 11, 2008 s 1 The power and its variants Inverse power Rayleigh quotient
More informationSimple and Deterministic Matrix Sketches
Simple and Deterministic Matrix Sketches Edo Liberty + ongoing work with: Mina Ghashami, Jeff Philips and David Woodruff. Edo Liberty: Simple and Deterministic Matrix Sketches 1 / 41 Data Matrices Often
More informationMatrix Approximations
Matrix pproximations Sanjiv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 00 Sanjiv Kumar 9/4/00 EECS6898 Large Scale Machine Learning Latent Semantic Indexing (LSI) Given n documents
More informationIndex. for generalized eigenvalue problem, butterfly form, 211
Index ad hoc shifts, 165 aggressive early deflation, 205 207 algebraic multiplicity, 35 algebraic Riccati equation, 100 Arnoldi process, 372 block, 418 Hamiltonian skew symmetric, 420 implicitly restarted,
More informationLARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems
LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationQALGO workshop, Riga. 1 / 26. Quantum algorithms for linear algebra.
QALGO workshop, Riga. 1 / 26 Quantum algorithms for linear algebra., Center for Quantum Technologies and Nanyang Technological University, Singapore. September 22, 2015 QALGO workshop, Riga. 2 / 26 Overview
More informationKrylov Subspace Methods to Calculate PageRank
Krylov Subspace Methods to Calculate PageRank B. Vadala-Roth REU Final Presentation August 1st, 2013 How does Google Rank Web Pages? The Web The Graph (A) Ranks of Web pages v = v 1... Dominant Eigenvector
More informationTighter Low-rank Approximation via Sampling the Leveraged Element
Tighter Low-rank Approximation via Sampling the Leveraged Element Srinadh Bhojanapalli The University of Texas at Austin bsrinadh@utexas.edu Prateek Jain Microsoft Research, India prajain@microsoft.com
More informationarxiv: v1 [math.na] 29 Dec 2014
A CUR Factorization Algorithm based on the Interpolative Decomposition Sergey Voronin and Per-Gunnar Martinsson arxiv:1412.8447v1 [math.na] 29 Dec 214 December 3, 214 Abstract An algorithm for the efficient
More informationLARGE SPARSE EIGENVALUE PROBLEMS
LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems
More informationFast algorithms for dimensionality reduction and data visualization
Fast algorithms for dimensionality reduction and data visualization Manas Rachh Yale University 1/33 Acknowledgements George Linderman (Yale) Jeremy Hoskins (Yale) Stefan Steinerberger (Yale) Yuval Kluger
More informationSingular Value Decomposition
Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More information1 Conjugate gradients
Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration
More informationSolution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method
Solution of eigenvalue problems Introduction motivation Projection methods for eigenvalue problems Subspace iteration, The symmetric Lanczos algorithm Nonsymmetric Lanczos procedure; Implicit restarts
More information1 An implementation of a randomized algorithm for principal component analysis
1 An implementation of a randomized algorithm for principal component analysis Arthur Szlam, Facebook Artificial Intelligence Research Yuval Kluger, Yale University Department of Pathology Mark Tygert,
More informationHomework 1. Yuan Yao. September 18, 2011
Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to
More informationRandomized algorithms for matrix computations and analysis of high dimensional data
PCMI Summer Session, The Mathematics of Data Midway, Utah July, 2016 Randomized algorithms for matrix computations and analysis of high dimensional data Gunnar Martinsson The University of Colorado at
More informationLecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm
CS 622 Data-Sparse Matrix Computations September 19, 217 Lecture 9: Krylov Subspace Methods Lecturer: Anil Damle Scribes: David Eriksson, Marc Aurele Gilles, Ariah Klages-Mundt, Sophia Novitzky 1 Introduction
More informationRandNLA: Randomization in Numerical Linear Algebra
RandNLA: Randomization in Numerical Linear Algebra Petros Drineas Department of Computer Science Rensselaer Polytechnic Institute To access my web page: drineas Why RandNLA? Randomization and sampling
More informationNumerical Methods for Solving Large Scale Eigenvalue Problems
Peter Arbenz Computer Science Department, ETH Zürich E-mail: arbenz@inf.ethz.ch arge scale eigenvalue problems, Lecture 2, February 28, 2018 1/46 Numerical Methods for Solving Large Scale Eigenvalue Problems
More informationParallel Singular Value Decomposition. Jiaxing Tan
Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate SVD? How to parallelize SVD? Future Work What is SVD? Matrix Decomposition Eigen Decomposition A (non-zero) vector
More informationThe Lanczos and conjugate gradient algorithms
The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization
More informationReview of Some Concepts from Linear Algebra: Part 2
Review of Some Concepts from Linear Algebra: Part 2 Department of Mathematics Boise State University January 16, 2019 Math 566 Linear Algebra Review: Part 2 January 16, 2019 1 / 22 Vector spaces A set
More information3 Best-Fit Subspaces and Singular Value Decomposition
3 Best-Fit Subspaces and Singular Value Decomposition (SVD) Think of the rows of an n d matrix A as n data points in a d-dimensional space and consider the problem of finding the best k-dimensional subspace
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationKrylov subspace projection methods
I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks
More informationSingular Value Decomposition
Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the
More informationMath 504 (Fall 2011) 1. (*) Consider the matrices
Math 504 (Fall 2011) Instructor: Emre Mengi Study Guide for Weeks 11-14 This homework concerns the following topics. Basic definitions and facts about eigenvalues and eigenvectors (Trefethen&Bau, Lecture
More informationSketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden
Sketching as a Tool for Numerical Linear Algebra All Lectures David Woodruff IBM Almaden Massive data sets Examples Internet traffic logs Financial data etc. Algorithms Want nearly linear time or less
More informationLinear Algebra. Session 12
Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)
More informationRank revealing factorizations, and low rank approximations
Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC January 2018 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization
More informationRank revealing factorizations, and low rank approximations
Rank revealing factorizations, and low rank approximations L. Grigori Inria Paris, UPMC February 2017 Plan Low rank matrix approximation Rank revealing QR factorization LU CRTP: Truncated LU factorization
More informationBeyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian
Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics
More informationarxiv: v2 [stat.ml] 24 Apr 2017
Faster Principal Component Regression and Stable Matrix Chebyshev Approximation arxiv:68.4773v [stat.ml] 4 Apr 7 Zeyuan Allen-Zhu zeyuan@csail.mit.edu Princeton University / IAS August 6, 6 Abstract Yuanzhi
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationarxiv: v2 [stat.ml] 29 Nov 2018
Randomized Iterative Algorithms for Fisher Discriminant Analysis Agniva Chowdhury Jiasen Yang Petros Drineas arxiv:1809.03045v2 [stat.ml] 29 Nov 2018 Abstract Fisher discriminant analysis FDA is a widely
More information2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.
2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector
More informationNotes on Eigenvalues, Singular Values and QR
Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square
More informationImproved Distributed Principal Component Analysis
Improved Distributed Principal Component Analysis Maria-Florina Balcan School of Computer Science Carnegie Mellon University ninamf@cscmuedu Yingyu Liang Department of Computer Science Princeton University
More informationCS 229r: Algorithms for Big Data Fall Lecture 17 10/28
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 17 10/28 Scribe: Morris Yau 1 Overview In the last lecture we defined subspace embeddings a subspace embedding is a linear transformation
More informationArnoldi Methods in SLEPc
Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationAM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition
AM 205: lecture 8 Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition QR Factorization A matrix A R m n, m n, can be factorized
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationRandomized methods for computing the Singular Value Decomposition (SVD) of very large matrices
Randomized methods for computing the Singular Value Decomposition (SVD) of very large matrices Gunnar Martinsson The University of Colorado at Boulder Students: Adrianna Gillman Nathan Halko Sijia Hao
More informationA MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS
A MODIFIED TSVD METHOD FOR DISCRETE ILL-POSED PROBLEMS SILVIA NOSCHESE AND LOTHAR REICHEL Abstract. Truncated singular value decomposition (TSVD) is a popular method for solving linear discrete ill-posed
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 7: More on Householder Reflectors; Least Squares Problems Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 15 Outline
More information