Acceleration of Randomized Kaczmarz Method
|
|
- Alice Mathews
- 5 years ago
- Views:
Transcription
1 Acceleration of Randomized Kaczmarz Method Deanna Needell [Joint work with Y. Eldar] Stanford University BIRS Banff, March 2011
2 Problem Background Setup Setup Let Ax = b be an overdetermined consistent system of equations
3 Problem Background Setup Setup Let Ax = b be an overdetermined consistent system of equations
4 Problem Background Setup Setup Let Ax = b be an overdetermined consistent system of equations
5 Problem Background Setup Setup Let Ax = b be an overdetermined consistent system of equations Goal From A and b we wish to recover unknown x. Assume m n.
6 Kaczmarz Method Method Kaczmarz The Kaczmarz method is an iterative method used to solve Ax = b. Due to its speed and simplicity, it s used in a variety of applications.
7 Kaczmarz Method Method Kaczmarz The Kaczmarz method is an iterative method used to solve Ax = b. Due to its speed and simplicity, it s used in a variety of applications.
8 Kaczmarz Method Method Kaczmarz The Kaczmarz method is an iterative method used to solve Ax = b. Due to its speed and simplicity, it s used in a variety of applications.
9 Kaczmarz Method Method Kaczmarz 1 Start with initial guess x 0 2 x k+1 = x k + b[i] a i,x k a a i 2 i where i = (k mod m) Repeat (2)
10 Kaczmarz Method Method Kaczmarz 1 Start with initial guess x 0 2 x k+1 = x k + b[i] a i,x k a a i 2 i where i = (k mod m) Repeat (2)
11 Kaczmarz Method Method Kaczmarz 1 Start with initial guess x 0 2 x k+1 = x k + b[i] a i,x k a a i 2 i where i = (k mod m) Repeat (2)
12 Kaczmarz Method Method Kaczmarz 1 Start with initial guess x 0 2 x k+1 = x k + b[i] a i,x k a a i 2 i where i = (k mod m) Repeat (2)
13 Kaczmarz Method Geometrically Denote H i = {w : a i,w = b[i]}.
14 Kaczmarz Method Geometrically Denote H i = {w : a i,w = b[i]}.
15 Kaczmarz Method Geometrically Denote H i = {w : a i,w = b[i]}.
16 Kaczmarz Method Geometrically Denote H i = {w : a i,w = b[i]}.
17 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
18 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
19 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
20 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
21 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
22 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
23 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
24 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
25 Kaczmarz Method But what if... Denote H i = {w : a i,w = b[i]}.
26 Randomized Version Randomized Kaczmarz Kaczmarz 1 Start with initial guess x 0 2 x k+1 = x k + b[i] a i,x k a a i 2 i where i is chosen randomly 2 3 Repeat (2)
27 Randomized Version Randomized Kaczmarz Kaczmarz 1 Start with initial guess x 0 2 x k+1 = x k + b[i] a i,x k a a i 2 i where i is chosen randomly 2 3 Repeat (2)
28 Randomized Version Randomized Kaczmarz Strohmer-Vershynin 1 Start with initial guess x 0 2 x k+1 = x k + bp ap,x k a a p 2 p where P(p = i) = a i A 2 F 3 Repeat (2)
29 Randomized Version Randomized Kaczmarz Strohmer-Vershynin 1 Start with initial guess x 0 2 x k+1 = x k + bp ap,x k a a p 2 p where P(p = i) = a i A 2 F 3 Repeat (2)
30 Randomized Version Randomized Kaczmarz (RK) Strohmer-Vershynin Let R = A 1 2 A 2 F ( A 1 def = inf{m : M Ax 2 x 2 for all x}) Then E x k x 2 2 (1 1 R) k x0 x 2 2 Well conditioned A Convergence in O(n) iterations O(n 2 ) total runtime. Better than O(mn 2 ) runtime for Gaussian elimination and empirically often faster than Conjugate Gradient.
31 Randomized Version Randomized Kaczmarz (RK) Strohmer-Vershynin Let R = A 1 2 A 2 F ( A 1 def = inf{m : M Ax 2 x 2 for all x}) Then E x k x 2 2 (1 1 R) k x0 x 2 2 Well conditioned A Convergence in O(n) iterations O(n 2 ) total runtime. Better than O(mn 2 ) runtime for Gaussian elimination and empirically often faster than Conjugate Gradient.
32 Randomized Version Randomized Kaczmarz (RK) Strohmer-Vershynin Let R = A 1 2 A 2 F ( A 1 def = inf{m : M Ax 2 x 2 for all x}) Then E x k x 2 2 (1 1 R) k x0 x 2 2 Well conditioned A Convergence in O(n) iterations O(n 2 ) total runtime. Better than O(mn 2 ) runtime for Gaussian elimination and empirically often faster than Conjugate Gradient.
33 Randomized Version Randomized Kaczmarz (RK) Strohmer-Vershynin Let R = A 1 2 A 2 F ( A 1 def = inf{m : M Ax 2 x 2 for all x}) Then E x k x 2 2 (1 1 R) k x0 x 2 2 Well conditioned A Convergence in O(n) iterations O(n 2 ) total runtime. Better than O(mn 2 ) runtime for Gaussian elimination and empirically often faster than Conjugate Gradient.
34 Randomized Version Randomized Kaczmarz (RK) with noise System with noise We now consider the consistent system Ax = b corrupted by noise to form the possibly inconsistent system Ax b +z.
35 Randomized Version Randomized Kaczmarz (RK) with noise Theorem [N] Let Ax = b be corrupted with noise: Ax b +z. Then E x k x 2 ( 1 1 R) k/2 x0 x 2 + Rγ, where γ = max i z[i] a i 2. This bound is sharp and attained in simple examples.
36 Randomized Version Randomized Kaczmarz (RK) with noise Theorem [N] Let Ax = b be corrupted with noise: Ax b +z. Then E x k x 2 ( 1 1 R) k/2 x0 x 2 + Rγ, where γ = max i z[i] a i 2. This bound is sharp and attained in simple examples.
37 Randomized Version Randomized Kaczmarz (RK) with noise Error Error in estimation: Gaussian 2000 by 100 after 800 iterations Error 0.05 Threshold Error Error in estimation: Gaussian 2000 by Trials Iterations Error in estimation: Partial Fourier 700 by 101 after 1000 iterations. Error Threshold Error in estimation: Bernoulli 2000 by 100 after 750 iterations. Error Threshold Error 0.08 Error Trials Trials Figure: Comparison between actual error (blue) and predicted threshold (pink). Scatter plot shows exponential convergence over several trials.
38 Modified RK Even better convergence? : Noiseless case revisited Recall x k+1 = x k + b[i] a i,x k a a i 2 i 2 Since these projections are orthogonal, the optimal projection is one that maximizes x k+1 x k 2. Therefore we choose i maximizing b[i] a i,x k a i 2. Too costly Project onto low dimensional subspace. Use the low dimensional representations to predict the optimal projection.
39 Modified RK Even better convergence? : Noiseless case revisited Recall x k+1 = x k + b[i] a i,x k a a i 2 i 2 Since these projections are orthogonal, the optimal projection is one that maximizes x k+1 x k 2. Therefore we choose i maximizing b[i] a i,x k a i 2. Too costly Project onto low dimensional subspace. Use the low dimensional representations to predict the optimal projection.
40 Modified RK Even better convergence? : Noiseless case revisited Recall x k+1 = x k + b[i] a i,x k a a i 2 i 2 Since these projections are orthogonal, the optimal projection is one that maximizes x k+1 x k 2. Therefore we choose i maximizing b[i] a i,x k a i 2. Too costly Project onto low dimensional subspace. Use the low dimensional representations to predict the optimal projection.
41 Modified RK Even better convergence? : Noiseless case revisited Recall x k+1 = x k + b[i] a i,x k a a i 2 i 2 Since these projections are orthogonal, the optimal projection is one that maximizes x k+1 x k 2. Therefore we choose i maximizing b[i] a i,x k a i 2. Too costly Project onto low dimensional subspace. Use the low dimensional representations to predict the optimal projection.
42 Modified RK Even better convergence? : Noiseless case revisited Recall x k+1 = x k + b[i] a i,x k a a i 2 i 2 Since these projections are orthogonal, the optimal projection is one that maximizes x k+1 x k 2. Therefore we choose i maximizing b[i] a i,x k a i 2. Too costly Project onto low dimensional subspace. Use the low dimensional representations to predict the optimal projection.
43 Modified RK Even better convergence? : Noiseless case revisited Recall x k+1 = x k + b[i] a i,x k a a i 2 i 2 Since these projections are orthogonal, the optimal projection is one that maximizes x k+1 x k 2. Therefore we choose i maximizing b[i] a i,x k a i 2. Too costly Project onto low dimensional subspace. Use the low dimensional representations to predict the optimal projection.
44 Modified RK JL Dimension Reduction Johnson-Lindenstrauss Lemma Let δ > 0 and let S be a finite set of points in R n. Then for any d satisfying d C log S δ 2, (1) there exists a Lipschitz mapping Φ : R n R d such that (1 δ) s i s j 2 2 Φ(s i ) Φ(s j ) 2 2 (1+δ) s i s j 2 2, (2) for all s i,s j S.
45 Modified RK JL Dimension Reduction Moreover In the proof of the JL Lemma the map Φ is chosen as the projection onto a random d-dimensional subspace of R n. Now many known distributions will yield such a projection. Recently, transforms with fast multiplies have also been shown to satisfy the JL Lemma [Ailon-Chazelle, Hinrichs-Vybiral, Ailon-Liberty, Krahmer-Ward,...] Perform Reduction Choose such a d n projector Φ and during preprocessing set α i = Φa i.
46 Modified RK JL Dimension Reduction Moreover In the proof of the JL Lemma the map Φ is chosen as the projection onto a random d-dimensional subspace of R n. Now many known distributions will yield such a projection. Recently, transforms with fast multiplies have also been shown to satisfy the JL Lemma [Ailon-Chazelle, Hinrichs-Vybiral, Ailon-Liberty, Krahmer-Ward,...] Perform Reduction Choose such a d n projector Φ and during preprocessing set α i = Φa i.
47 Modified RK JL Dimension Reduction Moreover In the proof of the JL Lemma the map Φ is chosen as the projection onto a random d-dimensional subspace of R n. Now many known distributions will yield such a projection. Recently, transforms with fast multiplies have also been shown to satisfy the JL Lemma [Ailon-Chazelle, Hinrichs-Vybiral, Ailon-Liberty, Krahmer-Ward,...] Perform Reduction Choose such a d n projector Φ and during preprocessing set α i = Φa i.
48 Modified RK RK via Johnson-Lindenstrauss (RKJL) [N-Eldar] Select: Select n rows so that each row a i is chosen with probability a i 2 2 / A 2 F. For each set and set j = argmax i γ i. Test: For a j and the first row a l selected set γ i = b[i] α i,φx k α i 2, γ j = b[j] a j,x k a j 2 and γ l = b[l] a l,x k a l 2. If γ l > γ j, set j = l. Project: Set x k+1 = x k + b[j] a j,x k a j 2 a j. 2 Update: Set k = k + 1 and repeat.
49 Modified RK RK via Johnson-Lindenstrauss (RKJL) [N-Eldar] Select: Select n rows so that each row a i is chosen with probability a i 2 2 / A 2 F. For each set γ i = b[i] α i,φx k α i 2, and set j = argmax i γ i. Test: For a j and the first row a l selected set Project: Set γ j = b[j] a j,x k a j 2 and γ l = b[l] a l,x k a l 2. If γl > γj, set j = l. Update: Set k = k + 1 and repeat. x k+1 = x k + b[j] a j,x k a j 2 a j.
50 Modified RK RK via Johnson-Lindenstrauss (RKJL) [N-Eldar] Select: Select n rows so that each row a i is chosen with probability a i 2 2 / A 2 F. For each set γ i = b[i] α i,φx k α i 2, and set j = argmax i γ i. Test: For a j and the first row a l selected set γ j = b[j] a j,x k a j 2 and γ l = b[l] a l,x k a l 2. If γ l Project: Set > γ j, set j = l. x k+1 = x k + b[j] a j,x k a j 2 a j. 2 Update: Set k = k + 1 and repeat.
51 Modified RK RK via Johnson-Lindenstrauss (RKJL) [N-Eldar] Select: Select n rows so that each row a i is chosen with probability a i 2 2 / A 2 F. For each set γ i = b[i] α i,φx k α i 2, and set j = argmax i γ i. Test: For a j and the first row a l selected set γ j = b[j] a j,x k a j 2 and γ l = b[l] a l,x k a l 2. If γ l > γ j, set j = l. Project: Set x k+1 = x k + b[j] a j,x k a j 2 a j. Update: Set k = k +1 and repeat.
52 Modified RK Runtime Select: Calculate Φx k : In general O(nd) Calculate γ i for each i (of n): O(nd) Test: Calculate γ j and γ l : O(n) Project: Calculate x k+1 : O(n) Overall Runtime Since each iteration takes O(nd), we have convergence in O(n 2 d).
53 Modified RK Choosing parameter d Lemma: Choice of d Let Φ be the n d (Gaussian) matrix with d = Cδ 2 log(n) as in the RKJL method. Set γ i = Φa i,φx k also as in the method. Then γ i a i,x k 2δ for all i and k in the first O(n) iterations of RKJL. Low Risk This shows worst case expected convergence in at most O(n 2 logn) time, and of course in most cases one expects far faster convergence.
54 Modified RK Choosing parameter d Lemma: Choice of d Let Φ be the n d (Gaussian) matrix with d = Cδ 2 log(n) as in the RKJL method. Set γ i = Φa i,φx k also as in the method. Then γ i a i,x k 2δ for all i and k in the first O(n) iterations of RKJL. Low Risk This shows worst case expected convergence in at most O(n 2 logn) time, and of course in most cases one expects far faster convergence.
55 Modified RK Choosing parameter d Lemma: Choice of d Let Φ be the n d (Gaussian) matrix with d = Cδ 2 log(n) as in the RKJL method. Set γ i = Φa i,φx k also as in the method. Then γ i a i,x k 2δ for all i and k in the first O(n) iterations of RKJL. Low Risk This shows worst case expected convergence in at most O(n 2 logn) time, and of course in most cases one expects far faster convergence.
56 Modified RK Choosing parameter d Lemma: Choice of d Let Φ be the n d (Gaussian) matrix with d = Cδ 2 log(n) as in the RKJL method. Set γ i = Φa i,φx k also as in the method. Then γ i a i,x k 2δ for all i and k in the first O(n) iterations of RKJL. Low Risk This shows worst case expected convergence in at most O(n 2 logn) time, and of course in most cases one expects far faster convergence.
57 Justification Analytical Justification Theorem [Assuming row normalization] Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard RK method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when d = Cδ 2 logn, m E x k+1 x 2 2 min E x k+1 x 2 2 ( p j 1 ) γj +2δ, E xk+1 m x 2 2 where j=1 { ( m j n 1) p j = ( m n), j m n+1 0, j > m n+1 are non-negative values satisfying m j=1 p j = 1 and p 1 p 2... p m = 0.
58 Justification Analytical Justification Theorem [Assuming row normalization] Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard RK method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when d = Cδ 2 logn, m E x k+1 x 2 2 min E x k+1 x 2 2 ( p j 1 ) γj +2δ, E xk+1 m x 2 2 where j=1 { ( m j n 1) p j = ( m n), j m n+1 0, j > m n+1 are non-negative values satisfying m j=1 p j = 1 and p 1 p 2... p m = 0.
59 Justification Analytical Justification Theorem [Assuming row normalization] Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard RK method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when d = Cδ 2 logn, m E x k+1 x 2 2 min E x k+1 x 2 2 ( p j 1 ) γj +2δ, E xk+1 m x 2 2 where j=1 { ( m j n 1) p j = ( m n), j m n+1 0, j > m n+1 are non-negative values satisfying m j=1 p j = 1 and p 1 p 2... p m = 0.
60 Justification Analytical Justification Theorem [Assuming row normalization] Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard RK method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when d = Cδ 2 logn, m E x k+1 x 2 2 min E x k+1 x 2 2 ( p j 1 ) γj +2δ, E xk+1 m x 2 2 where j=1 { ( m j n 1) p j = ( m n), j m n+1 0, j > m n+1 are non-negative values satisfying m j=1 p j = 1 and p 1 p 2... p m = 0.
61 Justification Analytical Justification Corollary Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when exact geometry is preserved (δ 0), E x k+1 x 2 2 E x k+1 x 2 2 m j=1 ( p j 1 ) γj m.
62 Justification Analytical Justification Corollary Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when exact geometry is preserved (δ 0), E x k+1 x 2 2 E x k+1 x 2 2 m j=1 ( p j 1 ) γj m.
63 Justification Analytical Justification Corollary Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when exact geometry is preserved (δ 0), E x k+1 x 2 2 E x k+1 x 2 2 m j=1 ( p j 1 ) γj m.
64 Justification Analytical Justification Corollary Fix an estimation x k and denote by x k+1 and xk+1 the next estimations using the RKJL and the standard method, respectively. Set γj = a j,x k 2 and reorder these so that γ1 γ 2... γ m. Then when exact geometry is preserved (δ 0), E x k+1 x 2 2 E x k+1 x 2 2 m j=1 ( p j 1 ) γj m.
65 Justification Empirical Evidence RK RKJL Figure: l 2 -Error (y-axis) as a function of the iterations (x-axis). The dashed line is standard Randomized Kaczmarz, and the solid line is the modified one, without a Johnson-Lindenstrauss projection. Instead, the best move out of the randomly chosen n rows is used. Note that we cannot afford to do this computationally.
66 Justification Empirical Evidence RK RKJL, d=1000 RKJL, d=500 RKJL, d=100 RKJL, d= Figure: l 2 -Error (y-axis) as a function of the iterations (x-axis) for various values of d with m = and n = 1000.
67 Thank you For more information Web: References: Eldar, Needell, Acceleration of Randomized Kaczmarz Method via the Johnson-Lindenstrauss Lemma, Num. Algorithms, to appear. Needell, Randomized Kaczmarz solver for noisy linear systems, BIT Num. Math., 50(2) Strohmer, Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Four. Ana. and App
SGD and Randomized projection algorithms for overdetermined linear systems
SGD and Randomized projection algorithms for overdetermined linear systems Deanna Needell Claremont McKenna College IPAM, Feb. 25, 2014 Includes joint work with Eldar, Ward, Tropp, Srebro-Ward Setup Setup
More informationRandomized projection algorithms for overdetermined linear systems
Randomized projection algorithms for overdetermined linear systems Deanna Needell Claremont McKenna College ISMP, Berlin 2012 Setup Setup Let Ax = b be an overdetermined, standardized, full rank system
More informationIterative Projection Methods
Iterative Projection Methods for noisy and corrupted systems of linear equations Deanna Needell February 1, 2018 Mathematics UCLA joint with Jamie Haddock and Jesús De Loera https://arxiv.org/abs/1605.01418
More informationConvergence Rates for Greedy Kaczmarz Algorithms
onvergence Rates for Greedy Kaczmarz Algorithms Julie Nutini 1, Behrooz Sepehry 1, Alim Virani 1, Issam Laradji 1, Mark Schmidt 1, Hoyt Koepke 2 1 niversity of British olumbia, 2 Dato Abstract We discuss
More informationNoisy Signal Recovery via Iterative Reweighted L1-Minimization
Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.
More informationGreedy Signal Recovery and Uniform Uncertainty Principles
Greedy Signal Recovery and Uniform Uncertainty Principles SPIE - IE 2008 Deanna Needell Joint work with Roman Vershynin UC Davis, January 2008 Greedy Signal Recovery and Uniform Uncertainty Principles
More informationTwo-subspace Projection Method for Coherent Overdetermined Systems
Claremont Colleges Scholarship @ Claremont CMC Faculty Publications and Research CMC Faculty Scholarship --0 Two-subspace Projection Method for Coherent Overdetermined Systems Deanna Needell Claremont
More informationOn the exponential convergence of. the Kaczmarz algorithm
On the exponential convergence of the Kaczmarz algorithm Liang Dai and Thomas B. Schön Department of Information Technology, Uppsala University, arxiv:4.407v [cs.sy] 0 Mar 05 75 05 Uppsala, Sweden. E-mail:
More informationA Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility
A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on
More informationRandomized Kaczmarz Nick Freris EPFL
Randomized Kaczmarz Nick Freris EPFL (Joint work with A. Zouzias) Outline Randomized Kaczmarz algorithm for linear systems Consistent (noiseless) Inconsistent (noisy) Optimal de-noising Convergence analysis
More informationCoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp
CoSaMP Iterative signal recovery from incomplete and inaccurate samples Joel A. Tropp Applied & Computational Mathematics California Institute of Technology jtropp@acm.caltech.edu Joint with D. Needell
More informationExponential decay of reconstruction error from binary measurements of sparse signals
Exponential decay of reconstruction error from binary measurements of sparse signals Deanna Needell Joint work with R. Baraniuk, S. Foucart, Y. Plan, and M. Wootters Outline Introduction Mathematical Formulation
More informationNew ways of dimension reduction? Cutting data sets into small pieces
New ways of dimension reduction? Cutting data sets into small pieces Roman Vershynin University of Michigan, Department of Mathematics Statistical Machine Learning Ann Arbor, June 5, 2012 Joint work with
More informationRecovering overcomplete sparse representations from structured sensing
Recovering overcomplete sparse representations from structured sensing Deanna Needell Claremont McKenna College Feb. 2015 Support: Alfred P. Sloan Foundation and NSF CAREER #1348721. Joint work with Felix
More informationStochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm
Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm Deanna Needell Department of Mathematical Sciences Claremont McKenna College Claremont CA 97 dneedell@cmc.edu Nathan
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationFast Random Projections
Fast Random Projections Edo Liberty 1 September 18, 2007 1 Yale University, New Haven CT, supported by AFOSR and NGA (www.edoliberty.com) Advised by Steven Zucker. About This talk will survey a few random
More informationA fast randomized algorithm for overdetermined linear least-squares regression
A fast randomized algorithm for overdetermined linear least-squares regression Vladimir Rokhlin and Mark Tygert Technical Report YALEU/DCS/TR-1403 April 28, 2008 Abstract We introduce a randomized algorithm
More informationGREEDY SIGNAL RECOVERY REVIEW
GREEDY SIGNAL RECOVERY REVIEW DEANNA NEEDELL, JOEL A. TROPP, ROMAN VERSHYNIN Abstract. The two major approaches to sparse recovery are L 1-minimization and greedy methods. Recently, Needell and Vershynin
More informationFaster Johnson-Lindenstrauss style reductions
Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Outline 1 Introduction Dimensionality reduction The Johnson-Lindenstrauss Lemma Speeding up computation 2 The Fast Johnson-Lindenstrauss
More informationDimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices
Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices Jan Vybíral Austrian Academy of Sciences RICAM, Linz, Austria January 2011 MPI Leipzig, Germany joint work with Aicke
More informationFast Dimension Reduction
Fast Dimension Reduction Nir Ailon 1 Edo Liberty 2 1 Google Research 2 Yale University Introduction Lemma (Johnson, Lindenstrauss (1984)) A random projection Ψ preserves all ( n 2) distances up to distortion
More informationGossip algorithms for solving Laplacian systems
Gossip algorithms for solving Laplacian systems Anastasios Zouzias University of Toronto joint work with Nikolaos Freris (EPFL) Based on : 1.Fast Distributed Smoothing for Clock Synchronization (CDC 1).Randomized
More informationarxiv: v2 [math.na] 11 May 2017
Rows vs. Columns: Randomized Kaczmarz or Gauss-Seidel for Ridge Regression arxiv:1507.05844v2 [math.na] 11 May 2017 Ahmed Hefny Machine Learning Department Carnegie Mellon University Deanna Needell Department
More informationFast Angular Synchronization for Phase Retrieval via Incomplete Information
Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationSignal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit
Signal Recovery From Incomplete and Inaccurate Measurements via Regularized Orthogonal Matching Pursuit Deanna Needell and Roman Vershynin Abstract We demonstrate a simple greedy algorithm that can reliably
More informationLearning Theory of Randomized Kaczmarz Algorithm
Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,
More information17 Random Projections and Orthogonal Matching Pursuit
17 Random Projections and Orthogonal Matching Pursuit Again we will consider high-dimensional data P. Now we will consider the uses and effects of randomness. We will use it to simplify P (put it in a
More informationA fast randomized algorithm for approximating an SVD of a matrix
A fast randomized algorithm for approximating an SVD of a matrix Joint work with Franco Woolfe, Edo Liberty, and Vladimir Rokhlin Mark Tygert Program in Applied Mathematics Yale University Place July 17,
More informationRandom hyperplane tessellations and dimension reduction
Random hyperplane tessellations and dimension reduction Roman Vershynin University of Michigan, Department of Mathematics Phenomena in high dimensions in geometric analysis, random matrices and computational
More informationFast Dimension Reduction
Fast Dimension Reduction MMDS 2008 Nir Ailon Google Research NY Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes (with Edo Liberty) The Fast Johnson Lindenstrauss Transform (with Bernard
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationSparse and TV Kaczmarz solvers and the linearized Bregman method
Sparse and TV Kaczmarz solvers and the linearized Bregman method Dirk Lorenz, Frank Schöpfer, Stephan Wenger, Marcus Magnor, March, 2014 Sparse Tomo Days, DTU Motivation Split feasibility problems Sparse
More informationTopics in Compressed Sensing
Topics in Compressed Sensing By Deanna Needell B.S. (University of Nevada, Reno) 2003 M.A. (University of California, Davis) 2005 DISSERTATION Submitted in partial satisfaction of the requirements for
More information11 The Max-Product Algorithm
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 11 The Max-Product Algorithm In the previous lecture, we introduced
More informationCS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT
CS on CS: Computer Science insights into Compresive Sensing (and vice versa) Piotr Indyk MIT Sparse Approximations Goal: approximate a highdimensional vector x by x that is sparse, i.e., has few nonzero
More informationThe Analysis Cosparse Model for Signals and Images
The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationLinear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra
Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info,
More informationJohnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels
Johnson-Lindenstrauss, Concentration and applications to Support Vector Machines and Kernels Devdatt Dubhashi Department of Computer Science and Engineering, Chalmers University, dubhashi@chalmers.se Functional
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationIntroduction to Compressed Sensing
Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral
More informationConvergence Properties of the Randomized Extended Gauss-Seidel and Kaczmarz Methods
Claremont Colleges Scholarship @ Claremont CMC Faculty Publications and Research CMC Faculty Scholarship 11-1-015 Convergence Properties of the Randomized Extended Gauss-Seidel and Kaczmarz Methods Anna
More informationConjugate-Gradient. Learn about the Conjugate-Gradient Algorithm and its Uses. Descent Algorithms and the Conjugate-Gradient Method. Qx = b.
Lab 1 Conjugate-Gradient Lab Objective: Learn about the Conjugate-Gradient Algorithm and its Uses Descent Algorithms and the Conjugate-Gradient Method There are many possibilities for solving a linear
More informationMAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing
MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing Afonso S. Bandeira April 9, 2015 1 The Johnson-Lindenstrauss Lemma Suppose one has n points, X = {x 1,..., x n }, in R d with d very
More informationCoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles
CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles SIAM Student Research Conference Deanna Needell Joint work with Roman Vershynin and Joel Tropp UC Davis, May 2008 CoSaMP: Greedy Signal
More informationPhase Transition Phenomenon in Sparse Approximation
Phase Transition Phenomenon in Sparse Approximation University of Utah/Edinburgh L1 Approximation: May 17 st 2008 Convex polytopes Counting faces Sparse Representations via l 1 Regularization Underdetermined
More informationCSC 576: Linear System
CSC 576: Linear System Ji Liu Department of Computer Science, University of Rochester September 3, 206 Linear Equations Consider solving linear equations where A R m n and b R n m and n could be extremely
More informationarxiv: v2 [math.na] 28 Jan 2016
Stochastic Dual Ascent for Solving Linear Systems Robert M. Gower and Peter Richtárik arxiv:1512.06890v2 [math.na 28 Jan 2016 School of Mathematics University of Edinburgh United Kingdom December 21, 2015
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More informationStructured signal recovery from non-linear and heavy-tailed measurements
Structured signal recovery from non-linear and heavy-tailed measurements Larry Goldstein* Stanislav Minsker* Xiaohan Wei # *Department of Mathematics # Department of Electrical Engineering UniversityofSouthern
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationReshaped Wirtinger Flow for Solving Quadratic System of Equations
Reshaped Wirtinger Flow for Solving Quadratic System of Equations Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 344 hzhan3@syr.edu Yingbin Liang Department of EECS Syracuse University
More informationDS-GA 1002 Lecture notes 10 November 23, Linear models
DS-GA 2 Lecture notes November 23, 2 Linear functions Linear models A linear model encodes the assumption that two quantities are linearly related. Mathematically, this is characterized using linear functions.
More informationThe Pros and Cons of Compressive Sensing
The Pros and Cons of Compressive Sensing Mark A. Davenport Stanford University Department of Statistics Compressive Sensing Replace samples with general linear measurements measurements sampled signal
More informationNon-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007
Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007 Lecturer: Roman Vershynin Scribe: Matthew Herman 1 Introduction Consider the set X = {n points in R N } where
More informationSparse recovery for spherical harmonic expansions
Rachel Ward 1 1 Courant Institute, New York University Workshop Sparsity and Cosmology, Nice May 31, 2011 Cosmic Microwave Background Radiation (CMB) map Temperature is measured as T (θ, ϕ) = k k=0 l=
More informationWeaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms
DOI: 10.1515/auom-2017-0004 An. Şt. Univ. Ovidius Constanţa Vol. 25(1),2017, 49 60 Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms Doina Carp, Ioana Pomparău,
More informationDetecting Sparse Structures in Data in Sub-Linear Time: A group testing approach
Detecting Sparse Structures in Data in Sub-Linear Time: A group testing approach Boaz Nadler The Weizmann Institute of Science Israel Joint works with Inbal Horev, Ronen Basri, Meirav Galun and Ery Arias-Castro
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationAn algebraic perspective on integer sparse recovery
An algebraic perspective on integer sparse recovery Lenny Fukshansky Claremont McKenna College (joint work with Deanna Needell and Benny Sudakov) Combinatorics Seminar USC October 31, 2018 From Wikipedia:
More informationWeaker hypotheses for the general projection algorithm with corrections
DOI: 10.1515/auom-2015-0043 An. Şt. Univ. Ovidius Constanţa Vol. 23(3),2015, 9 16 Weaker hypotheses for the general projection algorithm with corrections Alexru Bobe, Aurelian Nicola, Constantin Popa Abstract
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationGradient Descent Methods
Lab 18 Gradient Descent Methods Lab Objective: Many optimization methods fall under the umbrella of descent algorithms. The idea is to choose an initial guess, identify a direction from this point along
More informationSome Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform
Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform Nir Ailon May 22, 2007 This writeup includes very basic background material for the talk on the Fast Johnson Lindenstrauss Transform
More information7.3 The Jacobi and Gauss-Siedel Iterative Techniques. Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP.
7.3 The Jacobi and Gauss-Siedel Iterative Techniques Problem: To solve Ax = b for A R n n. Methodology: Iteratively approximate solution x. No GEPP. 7.3 The Jacobi and Gauss-Siedel Iterative Techniques
More informationUniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit
Claremont Colleges Scholarship @ Claremont CMC Faculty Publications and Research CMC Faculty Scholarship 6-5-2008 Uniform Uncertainty Principle and Signal Recovery via Regularized Orthogonal Matching Pursuit
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationUsing the Johnson-Lindenstrauss lemma in linear and integer programming
Using the Johnson-Lindenstrauss lemma in linear and integer programming Vu Khac Ky 1, Pierre-Louis Poirion, Leo Liberti LIX, École Polytechnique, F-91128 Palaiseau, France Email:{vu,poirion,liberti}@lix.polytechnique.fr
More information9.1 Linear Programs in canonical form
9.1 Linear Programs in canonical form LP in standard form: max (LP) s.t. where b i R, i = 1,..., m z = j c jx j j a ijx j b i i = 1,..., m x j 0 j = 1,..., n But the Simplex method works only on systems
More informationCombining geometry and combinatorics
Combining geometry and combinatorics A unified approach to sparse signal recovery Anna C. Gilbert University of Michigan joint work with R. Berinde (MIT), P. Indyk (MIT), H. Karloff (AT&T), M. Strauss
More informationCOMPRESSED Sensing (CS) is a method to recover a
1 Sample Complexity of Total Variation Minimization Sajad Daei, Farzan Haddadi, Arash Amini Abstract This work considers the use of Total Variation (TV) minimization in the recovery of a given gradient
More informationThe Johnson-Lindenstrauss Lemma in Linear Programming
The Johnson-Lindenstrauss Lemma in Linear Programming Leo Liberti, Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Aussois COW 2016 The gist Goal: solving very large LPs min{c x Ax
More informationLecture 18: March 15
CS71 Randomness & Computation Spring 018 Instructor: Alistair Sinclair Lecture 18: March 15 Disclaimer: These notes have not been subjected to the usual scrutiny accorded to formal publications. They may
More informationOXPORD UNIVERSITY PRESS
Concentration Inequalities A Nonasymptotic Theory of Independence STEPHANE BOUCHERON GABOR LUGOSI PASCAL MASS ART OXPORD UNIVERSITY PRESS CONTENTS 1 Introduction 1 1.1 Sums of Independent Random Variables
More informationIterative solvers for linear equations
Spectral Graph Theory Lecture 17 Iterative solvers for linear equations Daniel A. Spielman October 31, 2012 17.1 About these notes These notes are not necessarily an accurate representation of what happened
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More informationThe Dual Lattice, Integer Linear Systems and Hermite Normal Form
New York University, Fall 2013 Lattices, Convexity & Algorithms Lecture 2 The Dual Lattice, Integer Linear Systems and Hermite Normal Form Lecturers: D. Dadush, O. Regev Scribe: D. Dadush 1 Dual Lattice
More informationLecture 11. Fast Linear Solvers: Iterative Methods. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico
Lecture 11 Fast Linear Solvers: Iterative Methods J. Chaudhry Department of Mathematics and Statistics University of New Mexico J. Chaudhry (UNM) Math/CS 375 1 / 23 Summary: Complexity of Linear Solves
More informationarxiv: v4 [math.sp] 19 Jun 2015
arxiv:4.0333v4 [math.sp] 9 Jun 205 An arithmetic-geometric mean inequality for products of three matrices Arie Israel, Felix Krahmer, and Rachel Ward June 22, 205 Abstract Consider the following noncommutative
More informationSPARSE signal representations have gained popularity in recent
6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying
More informationCSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming
CSCI 1951-G Optimization Methods in Finance Part 01: Linear Programming January 26, 2018 1 / 38 Liability/asset cash-flow matching problem Recall the formulation of the problem: max w c 1 + p 1 e 1 = 150
More informationTHE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS
THE NEWTON BRACKETING METHOD FOR THE MINIMIZATION OF CONVEX FUNCTIONS SUBJECT TO AFFINE CONSTRAINTS ADI BEN-ISRAEL AND YURI LEVIN Abstract. The Newton Bracketing method [9] for the minimization of convex
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationApproximate Message Passing Algorithms
November 4, 2017 Outline AMP (Donoho et al., 2009, 2010a) Motivations Derivations from a message-passing perspective Limitations Extensions Generalized Approximate Message Passing (GAMP) (Rangan, 2011)
More informationAccelerated Dense Random Projections
1 Advisor: Steven Zucker 1 Yale University, Department of Computer Science. Dimensionality reduction (1 ε) xi x j 2 Ψ(xi ) Ψ(x j ) 2 (1 + ε) xi x j 2 ( n 2) distances are ε preserved Target dimension k
More informationPAVED WITH GOOD INTENTIONS: ANALYSIS OF A RANDOMIZED BLOCK KACZMARZ METHOD 1. INTRODUCTION
PAVED WITH GOOD INTENTIONS: ANALYSIS OF A RANDOMIZED BLOCK KACZMARZ METHOD DEANNA NEEDELL AND JOEL A. TROPP ABSTRACT. The block Kaczmarz method is an iterative scheme for solving overdetermined least-squares
More informationSparse Johnson-Lindenstrauss Transforms
Sparse Johnson-Lindenstrauss Transforms Jelani Nelson MIT May 24, 211 joint work with Daniel Kane (Harvard) Metric Johnson-Lindenstrauss lemma Metric JL (MJL) Lemma, 1984 Every set of n points in Euclidean
More informationLecture Notes 5: Multiresolution Analysis
Optimization-based data analysis Fall 2017 Lecture Notes 5: Multiresolution Analysis 1 Frames A frame is a generalization of an orthonormal basis. The inner products between the vectors in a frame and
More informationCompressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles
Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional
More informationarxiv: v1 [cs.it] 21 Feb 2013
q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto
More informationRandomized Block Kaczmarz Method with Projection for Solving Least Squares
Claremont Colleges Scholarship @ Claremont CMC Faculty Publications and Research CMC Faculty Scholarship 3-17-014 Randomized Block Kaczmarz Method with Projection for Solving Least Squares Deanna Needell
More informationNumerical Methods - Numerical Linear Algebra
Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear
More informationHigh Dimensional Geometry, Curse of Dimensionality, Dimension Reduction
Chapter 11 High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction High-dimensional vectors are ubiquitous in applications (gene expression data, set of movies watched by Netflix customer,
More informationTutorial: Sparse Signal Recovery
Tutorial: Sparse Signal Recovery Anna C. Gilbert Department of Mathematics University of Michigan (Sparse) Signal recovery problem signal or population length N k important Φ x = y measurements or tests:
More informationRobust Sparse Recovery via Non-Convex Optimization
Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn
More informationElementary maths for GMT
Elementary maths for GMT Linear Algebra Part 2: Matrices, Elimination and Determinant m n matrices The system of m linear equations in n variables x 1, x 2,, x n a 11 x 1 + a 12 x 2 + + a 1n x n = b 1
More information