On the exponential convergence of. the Kaczmarz algorithm

Similar documents
Convergence Rates for Greedy Kaczmarz Algorithms

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility

Two-subspace Projection Method for Coherent Overdetermined Systems

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

Acceleration of Randomized Kaczmarz Method

Convergence Properties of the Randomized Extended Gauss-Seidel and Kaczmarz Methods

Learning Theory of Randomized Kaczmarz Algorithm

Randomized projection algorithms for overdetermined linear systems

arxiv: v2 [math.na] 11 May 2017

SGD and Randomized projection algorithms for overdetermined linear systems

Randomized Block Kaczmarz Method with Projection for Solving Least Squares

Randomized Kaczmarz Nick Freris EPFL

arxiv: v2 [math.na] 2 Sep 2014

RANDOMIZED KACZMARZ ALGORITHMS: EXACT MSE ANALYSIS AND OPTIMAL SAMPLING PROBABILITIES. Harvard University, Cambridge, MA 02138, USA 2

Convergence Rates for Greedy Kaczmarz Algorithms, and Faster Randomized Kaczmarz Rules Using the Orthogonality Graph

An Approximation Algorithm for Approximation Rank

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms

Stochas(c Dual Ascent Linear Systems, Quasi-Newton Updates and Matrix Inversion

The Block Kaczmarz Algorithm Based On Solving Linear Systems With Arrowhead Matrices

Lecture 7: Positive Semidefinite Matrices

Block Kaczmarz Method with Inequalities

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

Sparse and TV Kaczmarz solvers and the linearized Bregman method

Bounds on the Largest Singular Value of a Matrix and the Convergence of Simultaneous and Block-Iterative Algorithms for Sparse Linear Systems

While using the input and output data fu(t)g and fy(t)g, by the methods in system identification, we can get a black-box model like (In the case where

arxiv: v1 [math.na] 1 Sep 2018

CSC 576: Linear System

arxiv: v1 [math.oc] 23 May 2017

Linear Algebra Massoud Malek

Principal Component Analysis

arxiv: v1 [math.st] 1 Dec 2014

An algebraic perspective on integer sparse recovery

Iterative Projection Methods

arxiv: v2 [math.na] 28 Jan 2016

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

SPARSE signal representations have gained popularity in recent

Robust l 1 and l Solutions of Linear Inequalities

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Strengthened Sobolev inequalities for a random subspace of functions

RITZ VALUE BOUNDS THAT EXPLOIT QUASI-SPARSITY

Stein s Method for Matrix Concentration

The Dual Lattice, Integer Linear Systems and Hermite Normal Form

Lecture Notes 9: Constrained Optimization

A Note on Inverse Iteration

Random Reordering in SOR-Type Methods

Iterative Methods for Solving A x = b

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Gossip algorithms for solving Laplacian systems

arxiv: v4 [math.sp] 19 Jun 2015

Blind Deconvolution Using Convex Programming. Jiaming Cao

DS-GA 1002 Lecture notes 10 November 23, Linear models

SLIM. University of British Columbia

Approximation norms and duality for communication complexity lower bounds

Fast and Robust Phase Retrieval

Introduction to Compressed Sensing

Invertibility of random matrices

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

A fast randomized algorithm for overdetermined linear least-squares regression

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

Tighter Low-rank Approximation via Sampling the Leveraged Element

CLASSICAL ITERATIVE METHODS

Low-Rank Matrix Recovery

Solving Corrupted Quadratic Equations, Provably

On Lagrange multipliers of trust-region subproblems

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

arxiv: v1 [math.pr] 11 Feb 2019

Review of Some Concepts from Linear Algebra: Part 2

On a general extending and constraining procedure for linear iterative methods

A SUFFICIENTLY EXACT INEXACT NEWTON STEP BASED ON REUSING MATRIX INFORMATION

A Unified Approach to Proximal Algorithms using Bregman Distance

Linear-Quadratic Optimal Control: Full-State Feedback

On the von Neumann and Frank-Wolfe Algorithms with Away Steps

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

The model reduction algorithm proposed is based on an iterative two-step LMI scheme. The convergence of the algorithm is not analyzed but examples sho

Bounded Matrix Rigidity and John s Theorem

Methods for sparse analysis of high-dimensional data, II

Mini-project in scientific computing

Rank minimization via the γ 2 norm

Random Matrices: Invertibility, Structure, and Applications

Using the Johnson-Lindenstrauss lemma in linear and integer programming

Relaxations and Randomized Methods for Nonconvex QCQPs

Edinburgh Research Explorer

Solving Constrained Rayleigh Quotient Optimization Problem by Projected QEP Method

Information-Theoretic Limits of Matrix Completion

Linear and Sublinear Linear Algebra Algorithms: Preconditioning Stochastic Gradient Algorithms with Randomized Linear Algebra

Parallel Singular Value Decomposition. Jiaxing Tan

COMP 558 lecture 18 Nov. 15, 2010

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Signal Recovery from Permuted Observations

Rank-one LMIs and Lyapunov's Inequality. Gjerrit Meinsma 4. Abstract. We describe a new proof of the well-known Lyapunov's matrix inequality about

Randomized Iterative Methods for Linear Systems

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Hot-Starting NLP Solvers

Fast Approximate Matrix Multiplication by Solving Linear Systems

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A new method on deterministic construction of the measurement matrix in compressed sensing

Sparse Solutions of an Undetermined Linear System

Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Transcription:

On the exponential convergence of the Kaczmarz algorithm Liang Dai and Thomas B. Schön Department of Information Technology, Uppsala University, arxiv:4.407v [cs.sy] 0 Mar 05 75 05 Uppsala, Sweden. E-mail: {liang.dai, thomas.schon}@it.uu.se Abstract The Kaczmarz algorithm (KA) is a popular method for solving a system of linear equations. In this note we derive a new exponential convergence result for the KA. The key allowing us to establish the new result is to rewrite the KA in such a way that its solution path can be interpreted as the output from a particular dynamical system. The asymptotic stability results of the corresponding dynamical system can then be leveraged to prove exponential convergence of the KA. The new bound is also compared to existing bounds. Index Terms Kaczmarz algorithm, Stability analysis, Cyclic algorithm. I. PROBLEM STATEMENT In this note, we discuss the exponential convergence property of the Kaczmarz algorithm (KA) []. Since its introduction, the KA has been applied in many different fields and many new developments are reported[]-[3]. The KA is used to find the solution to the following system of consistent linear equations Ax = b, () This work was supported by the project Probabilistic modelling of dynamical systems (Contract number: 6-03-554) funded by the Swedish Research Council. March, 05

where x R n denotes the unknown vector, A R m n, m n, rank(a) = n and b R m. Define the hyperplane H i as where the i-th row of A is denoted by a T i H i = {x a T i x = b i }, and the i-th element of b is denoted by b i. Geometrically, the KA finds the solution by projecting (or approximately projecting) onto the hyperplanes cyclically from an initial approximation x 0, which reads as x k+ = x k + λ b i(k) a T i(k) x k a a i(k) i(k), () where i(k) = mod(k, m) +. In the update equation (), λ is the relaxation parameter, which satisfies 0 < λ <. We use the Matlab convention mod(, ) to denote the modulus after division operation and to denote the spectral norm of a matrix. It is well-known that the KA is sometimes rather slow to converge. This is especially true when several consecutive row vectors of the matrix A are in some sense "close" to each other. In order to overcome this drawback, the Randomized Karczmarz Algorithm (RKA) algorithm was introduced in [4] for λ =. The key of the RKA is that, instead of performing the hyperplane projections cyclically in a deterministic order, the projections are performed in a random order. More specifically, at time k, select a hyperplane H p to project with probability ap, for p = A F,, m. Note that F is used to denote the Frobenius norm of a matrix. Intuitively speaking, the involved randomization is performing a kind of "preconditioning" to the original matrix equations [6], resulting in a faster exponential convergence rate, as established in [4]. The specific and predefined ordering of the projections in the KA makes it challenging to obtain a tight bound of the convergence rate of the method. In [6], the authors build up the convergence rate of the KA by exploiting the Meany inequality [7], which works for the case λ = in (). [8], [0] also established convergence rates for the KA, for λ (0, ). In Section III, we will compare these results in more detail. In this note, we present a different way to characterize the convergence property of the KA described in (). The key underlying our approach is that we interpret the solution path of the KA as the output of a particular dynamical system. By studying the stability property of this related dynamical system, we then obtain new exponential convergence results for the KA. Related to this, it is interesting to note that the so-called Integral Quadratic Constraints (IQCs) has recently March, 05

3 been used in studying the convergence rate of first-order algorithms applied to solve general convex optimization problems [9]. The note will be organized as follows. In the subsequent section we make use of the subsequence {x jm x} j=0 to enable the derivation of the new exponential convergence result. In Section III we discuss its connections and differences to existing results. Conclusions and ideas for future work are provided in Section IV. II. THE NEW CONVERGENCE RESULT First, let us introduce the matrix B R m n, for which the i-th row b T i a i a i, i =,, m. Furthermore, let P i b i b T i k 0. Using this new notation allows us to rewrite () according to is defined as b i for i =,,, m and let θ k x k x for θ k+ = (I λp i(k) )θ k, (3) which can be interpreted as a discrete time-varying linear dynamical system. Hence, this relation inspires us to study the KA by employing the techniques for analyzing the stability properties of time-varying linear systems, see e.g. [4], [5]. In what follows we will focus on analyzing the convergence rate of the sub-sequence { θ jm } j=0. Given the fact that i(k) = mod(k, m) +, we have ( m ) θ (j+)m = (I λp i ) θ jm M m θ jm. i= The following theorem provides an upper bound on the spectral norm of M m. Theorem : Let ρ M m and 0 < λ, then it holds that where B denotes the pseudo-inverse of the matrix B. ρ λ( λ) ρ, (4) ( + λ m ) B Proof: Let v 0 R n be a vector satisfying M m v 0 = ρv 0, v 0 = and let v i = (I λp i )v i for i =,, m. It follows that v m = M m v 0 and v m = ρ. Notice that P i = P i, so we have (I λp i ) = I (λ λ )P i, March, 05

4 for i =,, m. Hence it holds that which in turn implies that λ( λ) v i = v T i (I λp i ) v i = v T i (I λ( λ)p i )v i = v i λ( λ) P i v i, P i v i = v 0 v m = ρ. (5) i= Also, for any i {,, m}, we have that Together with (5), we get Meanwhile, we have that v i v 0 i = (v k v k ) = λ i P k v k i λ P k v k λ i i P k v k m λi λ P k v k v i v 0 λi λ ( ρ ). (6) λv0 T B T Bv 0 = λ v0 T P k v 0 = λ P k v 0 = λ λ λ P k [v k + (v 0 v k )] P k v k + λ P k v k + λ P k (v k v 0 ) v k v 0. (7) March, 05

5 Together with (5) and (6), we have that or equivalently λv T 0 B T Bv 0 ( ρ ) λ + λ m λ(k ) λ ( ρ ) hence it follows that λv T 0 B T Bv 0 ρ λ ( + λ m(m ) ), (8) Since v0 T B T Bv 0, we conclude that B ρ λ( λ)vt 0 B T Bv 0 + λ m(m ). ρ λ( λ). (9) ( + λ m(m )) B Finally, notice that m(m ) m holds for any natural number m, which concludes the proof. Remark : Notice that in the proof of Theorem, (7) is the main approximation step, and a better approximation here will lead to an improvement of the bound. The following corollary characterizes the convergence of the KA under λ =, which will be used in the subsequent section to enable comparison to the results given in [6], [7]. We omit the proof since it is a direct implication of Theorem. Corollary : For the KA with λ = in (), if m n, we have that ρ. (0) m B Next, we will derive an improvement over the bound (4), enabled by partitioning the matrix A into non-overlapping sub-matrices. Let q = m +, where x denotes the smallest number n which is greater or equal to x. Define the following sets as T i = {(i )n +,, in}, for i =,, q and T q = {(q )n +,, m}. Further, for i =,, q, define B i as the sub-matrix of B with the rows indexed by the set T i, and N i = j T i (I λp j ). Corollary : Based on the previous definitions, and further assume that all the sub-matrices B i for for i =,, q are of rank n, then we have that ( ) q ρ λ( λ) ρ ( + λ n(n )) B i i= () March, 05

6 Proof: Notice that since M m = N q N q N N, we have that q ρ = M m N i. () For each N i, the spectral norm can be bounded analogously to what was done in Theorem, resulting in N i λ( λ). ( + λ n(n )) B i Finally, inserting this inequality into () concludes the proof. i= III. DISCUSSION AND NUMERICAL ILLUSTRATION In Section III-A and III-B we compare our new bound with the bounds provided by the Meany inequality [6], [7] and the RKA, respectively. In Section III-B we also provide a numerical illustration. Section III-C is devoted to a comparison with the bound provided in [8], and finally Section III-D compares with the result given by [0]. A. Comparison with the bound given by Meany inequality In the following, we assume that m = n and λ =. Denote the singular values of B as σ σ σ n, then the bound in [6], [7] given by the Meany inequality can be written as ρ n i= σ i, and the bound given in (0) can be written as ρ σ n n. This implies that when σn n n σ i i= i.e. n σi n, (3) holds, the bound in (0) is tighter. In the following lemma, we derive a sufficient condition, under which the inequality (3) holds. Lemma : If σ n (n )n n n Proof: Notice that n σi = i= i= holds, the inequality in (3) is satisfied. ( n σi i= ) σ n ( n i= σ i n ) n σ n ( ) n n σ n n. (4) March, 05

7 The inequality (4) holds since Hence, if holds, or equivalently if n σi i= n σi = B F = n. i= ( ) n n σn n n σ n holds, then (3) holds, which concludes the proof. (n )n n n Remark : Notice that the right hand side of the inequality in Lemma is in the order of n for large n. Another difference is that the bound provided by Theorem depends explicitly on the size of matrix, while the bound provided by the Meany inequality does not. B. Comparison with the bound given by the RKA Let us now compare our new results to the results available for the RKA. Note that in this case, we set λ =. If {θ jm } j=0 denotes the sequence generated by the RKA, then it holds that [4] E θ jm ( A F A ) jm θ 0, (5) for j, where E denotes the expectation operator with respect to the random operations up to index jm. To compare (5) and (0), we make the assumption that A is a matrix with each row normalized, i.e. A = B, for simplicity. It follows that B F = m m, and. (6) m B B F B Furthermore, since B F B, we have that ( ) m, (7) B F B B F B and combining (6) and (7), results in m B ( B F B ) m. March, 05

8 The above inequality implies that the bound given by (0) is more conservative than the one given by the RKA. Next, a numerical illustration is implemented to compare the bounds given by (0), () and (5). The setup is as follows. Let m = 30 and n = 3, generate A = randn(30, 3) and normalize each row to obtain B, generate x = randn(3, ) and compute y = Bx. In the implementation of the RKA, we run 000 realizations with the same initial value x 0 to obtain an average performance result, which is reported in Fig.. From the left panel in Fig., we can see that the bound (5) for characterizing the convergence of the RKA is closer to the real performance of the RKA, while the bounds given by (0) and () for bounding the convergence of the KA are further away from the real performance of the KA. The right panel in Fig. shows a zoomed illustration of the bound given by (0) and (). We can observe that the bound given by () improves upon (0), which is enabled by the partitioning of the rows of the matrix. 0 5 0 0 0 0.8 0 5 0 0 θi 0 5 θi 0 0.7 0 0 0 5 RKA KA KA BD 0 0.6 0 30 RKA BD KA BD KA BD 0 35 0 0 40 60 80 00 i KA BD 0 0 40 60 80 00 i Fig.. In the left panel, the curves with tags KA and RKA illustrate the real performance of the KA and the RKA. The curves with tags KA BD, KA BD and RKA BD illustrate the bounds given by (0), () and (5), respectively. In the right panel, a zoomed illustration for the curves KA BD and KA BD in the left panel is given. March, 05

9 C. Comparison with the bound given in [8] To compare the result given in [8], we assume that A = B, and that they are square and invertible matrices. Under these assumptions, the involved quantity µ in Corollary 4. of [8] can be approximated by m B given the results in Theorem. of [8]. Hence, the convergence rate of the KA given by Theorem 3. in [8] can be written as where λ (0, ). The result of the current work reads as ρ λ( λ), (8) m [ + (m )λ ] B ρ λ( λ), (9) ( + λ m ) B where λ (0, ). A closer look at the two bounds (8) and (9) reveals the following: ) The optimal choice for the right hand side (RHS) of (8) is λ = 4m 3, resulting in (m ) ρ m(. When m is large, ρ decreases with the speed 4m 3+) B ) When λ = m (a suboptimal choice for simplicity), (9) gives that ρ m.5 B ( m ) 4m B When m is large, ρ decreases in the speed of, faster than the one in [8]. A m B comparison of both bounds when the optimal λ are chosen is given in Fig. (). 3) When λ is chosen to be, both bounds decrease in the order of m... D. Comparison with the bound given in [0] We will once again assume that each row of A is normalized, i.e. B = A. In [0] the authors makes use of a subspace correction method in studying the convergence speed of the KA. They show that (see eq. (3) in [0]), when the best relaxation parameter λ is chosen, ρ can be bounded from above according to ρ. (0) log (m) B B As we discussed in the previous section, when a near-optimal λ (i.e. λ is chosen as ) is used, m the upper bound implied by our analysis gives that ( ρ 4m B By assumption we have B F = m, which implies that B = B T B tr(bt B) n m ). () = B F n = m n. () March, 05

0 0.95 eq. (8) eq. (9) 0.9 0.85 0.8 0.998 0.996 0.994 0.75 0.99 700 800 900 000 m 0.7 0 00 400 600 800 000 m Fig.. The bounds (8) and (9) (using the optimal parameters) are plotted for the given B = 0.5 and m ranging from 0 to 000. The result shows that the bound proposed in this work is always lower than the one given in [8] under the experimental settings. Hence, the bound obtained by [0] will decrease with a speed of m log (m) B while the present work gives the decreasing speed of. m B as m increases, IV. SUMMARY By studying the stability property of a time-varying dynamical system that is related to the KA we have been able to establish some new results concerning the convergence speed of the algorithm. The new results are also compared to several related, previously available results. Let us end the discussion by noting that the following two ideas can possibly lead to further improvements of the results. One potential idea is trying to improve the inequality in (7), since this part introduces much of the approximations in establishing the main result of the note; another idea is to try to find an optimal partitioning of the rows of the matrix A, such that the right hand side of () is minimized. March, 05

V. ACKNOWLEDGEMENT The authors would like to thank the reviewers for useful comments and pointing out the references [8], [0], and Marcus Björk for helpful suggestions. REFERENCES [] S. Kaczmarz, Angenaherte Auflosung von Systemen linearer Gleichungen, Bulletin International de l Académie Polonaise des Sciences et des Lettres, 35, 355-357, 937. [] P. P. B. Eggermont, G. T. Herman, A. Lent, Iterative algorithms for large partitioned linear systems, with applications to image reconstruction, Linear algebra and its applications, 40, 37-67, 98. [3] Y. Censor, Row-action methods for huge and sparse systems and their applications, SIAM review, 3(4), 444-466, 98. [4] T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, Journal of Fourier Analysis and Applications, 5(), 6-78, 009. [5] F. Natterer, The Mathematics of Computerized Tomography, Wiley, New York, 986. [6] T. Strohmer, R. Vershynin, Comments on the randomized Kaczmarz method, Journal of Fourier Analysis and Applications, 5(4), 437-440, 009. [7] D. Needell, Randomized Kaczmarz solver for noisy linear systems, BIT Numerical Mathematics, 50(), 395-403, 00. [8] Y. Eldar, D. Needell, Acceleration of randomized Kaczmarz method via the Johnson-Lindenstrauss lemma, Numerical Algorithms, 58(), 63-77, 0. [9] X. Chen, A. Powell, Almost sure convergence for the Kaczmarz algorithm with random measurements, Journal of Fourier Analysis and Applications, 8(6), 95-4, 0. [0] A. Zouzias, N. M. Freris, Randomized extended Kaczmarz for solving least-squares, SIAM Journal on Matrix Analysis and Applications, 0. [] G. Thoppe, V. Borkar, D. Manjunath, A stochastic Kaczmarz algorithm for network tomography, Automatica, 03. [] D. Needell, J. A. Tropp, Paved with Good Intentions: Analysis of a Randomized Block Kaczmarz Method, Linear Algebra and its Applications, 44, 99-, 04. [3] L. Dai, M. Soltanalian, and K. Pelckmans, On the Randomized Kaczmarz Algorithm, IEEE Signal Processing Letters, (3), 330-333, 04. [4] L. Guo, Time-varying stochastic systems: stability, estimation and control, Jilin Science and Technology Press, Changchun, 993. [5] L. Guo, L. Ljung, Exponential stability of general tracking algorithms, IEEE Transactions on Automatic Control, 40(8), 376-387, 995. [6] A. Galántai, On the rate of convergence of the alternating projection method in finite dimensional spaces, Journal of mathematical analysis and applications, 30(), 30-44, 005. [7] R. K. Meany, A matrix inequality, SIAM journal of Numerical Analysis, 6(), 04-07, 969. [8] J. Mandel, Convergence of the cyclical relaxation method for linear inequalities, Mathematical programming, 30(), 8-8, 984. [9] L. Lessard, B. Recht, A. Packard, Analysis and design of optimization algorithms via integral quadratic constraints, arxiv preprint, arxiv:408.3595, 04. [0] P. Oswald, W. Zhou, Convergence Estimates for Kaczmarz-Type Methods, preprint, 05. March, 05