Optimal rates of linear convergence of relaxed alternating projections and generalized Douglas-Rachford methods for two subspaces

Size: px

Start display at page:

Download "Optimal rates of linear convergence of relaxed alternating projections and generalized Douglas-Rachford methods for two subspaces"

Bathsheba Barrett
5 years ago
Views:

1 Optimal rates of linear convergence of relaxed alternating projections and generalized Douglas-Rachford methods for two subspaces Heinz H. Bauschke, J.Y. Bello Cruz, Tran T.A. Nghia, Hung M. Phan, and Xianfu Wang November 8, 015 Abstract We systematically study the optimal linear convergence rates for several relaxed alternating projection methods and the generalized Douglas-Rachford splitting methods for finding the projection on the intersection of two subspaces. Our analysis is based on a study on the linear convergence rates of the powers of matrices. We show that the optimal linear convergence rate of powers of matrices is attained if and only if all subdominant eigenvalues of the matrix are semisimple. For the convenience of computation, a nonlinear approach to the partially relaxed alternating projection method with at least the same optimal convergence rate is also provided. Numerical experiments validate our convergence analysis. 010 Mathematics Subject Classification: Primary 65F10, 65K05; Secondary 65F15, 65B05, 15A18, 90C5, 41A5 Keywords: Convergent and semi-convergent matrix, Friedrichs angle, generalized Douglas- Rachford method, linear convergence, principal angle, relaxed alternating projection method. 1 Introduction Methods of alternative projections and Douglas-Rachford play important roles in convex optimization; see, e.g., [1,, 3, 6, 11, 14, 15, 16, 18, 0]. They are also widely used in differential equation and signal processing [3, 9]. In order to study convergence rates, error bounds are given for the method of alternating projections [, 17, 7], [18, Theorem 9.8] and [, Section 3.4]; for the Douglas-Rachford method [5, 16], [30] and [3, Proposition 4]. The purpose of this paper is to give a systematic convergence rate analysis of relaxed alternating projections, partial relaxed alternating projection, and the generalized Douglas-Rachford method for two subspaces in finite dimensional spaces. The optimal convergence rates are explicitly given in terms of the relaxation Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. heinz.bauschke@ubc.ca. IME, Federal University of Goias, Goiania, G.O , Brazil. yunier@impa.br and yunier@ufg.br. Mathematics & Statistics, Oakland University, Rochester, MI 48309, USA. nttran@oakland.edu. Department of Mathematical Sciences, University of Massachusetts Lowell, 65 Riverside St., Olney Hall 48, Lowell, MA 01854, USA. Hung Phan@uml.edu. Mathematics, University of British Columbia, Kelowna, B.C. V1V 1V7, Canada. shawn.wang@ubc.ca. 1

2 parameters and sine or cosine of principal angles. Our results extend the work by Demanet and Zhang [16], and the work by Bauschke, Deutsch, Hundal and Park [6]. Our quantification of the optimal convergence rate in terms of relaxation parameters and principal angles will shed light on how to choose parameters in practical applications. To this end, we need a study on the optimal linear convergence rate of the powers of a real or complex matrix A. Necessary and sufficient conditions for such convergence were first established by Hensel [6] and later by Oldenburger [41]. The convergence and its asymptotic rate play a central role in many well-known algorithms for solving linear systems such as Jacobi, Gauss-Seidel, successive over-relaxation methods; see, e.g., [10, 35, 38, 4]. Furthermore, the convergence of the power A k in operator norm is linear and the rate, which is fundamentally different from the asymptotic one mentioned above, is dominated by the second-largest modulus of eigenvalues of A, γ(a), i.e., modulus of the subdominant or controlling eigenvalues [8, 40]. Natural questions thus arising are What is the optimal (smallest) linear convergence rate? and When is γ(a) exactly the optimal linear convergence rate?. In general, the optimal linear convergence rate does not exist (see Example.11 below). However, many iterative linear methods such as the method of alternating projections (also known as von Neumann s method) [, 17] and the Douglas-Rachford splitting algorithm [19, 0, 9, 3] do obtain the optimal linear rates of convergence in operator norm; see also [5, 16, 7]. The rest of the paper is organized as follows. In Section we study optimal linear convergence rates of matrices. We give a necessary and sufficient condition for the powers A k to converge linearly with the optimal rate γ(a) via the semi-simpleness of subdominant eigenvalues. The sufficient part is similar to and can be obtained from [43, Theorem.9], in which the norm of power matrix is exactly computed by the power of spectral radius when all the dominant eigenvalues are nondefective, i.e., semisimple; see also our Remark.19 for further details. However, to obtain the necessary part, we develop a systematic analysis on matrix powers, derive in Lemma.14 a new bound for their norms, which is sharper than the one in [43, Theorem.9], and apply the results to optimal convergence rates of convergent matrices. The main contribution of the paper is developed in Sections 3, 4 and 5. In Section 3, using results of Section, we analyze optimal linear convergence rates of the relaxed alternating methods [1, 33] and also the generalized Douglas-Rachford splitting methods [16] with parameters for two linear subspaces. To the best of our knowledge, our optimal rates of the relaxed alternating methods established here via principal angles [9] are new in the literature and they significantly improve the one of classical alternating methods. Furthermore, our optimal rate of the generalized Douglas-Rachford splitting methods extends the similar result obtained implicitly in Demanet-Zhang [16, Section.6] without assuming the trivial intersection of two subspace as in [16]. In Section 4 we introduce and study a nonlinear map that helps to accelerate significantly the convergence of alternating projection methods. This map also allows us to overcome the difficulty of computing the principal angles used to determine the parameter in the relaxed/partial alternating methods aforementioned. In particular, this generalizes one of the result by Bauschke-Deutsch-Hundal-Park [6, Theorem 3.8]. In Section 5, we provide some numerical results to illustrate our convergence theory developed in earlier sections. The numerical experiments indicate that relaxed alternating projection and partial relaxed alternating projection methods perform better than the method of alternating projection in general. Finally, we present our conclusions in Section 6. Notation. Throughout, we denote by C n n and R n n the sets of n n complex matrices and real matrices, respectively. Let A be a matrix in C n n (or R n n ). The notation A stands for the adjoint (complex transposed) matrix of A. The matrix norm used in this paper is the operator norm, i.e., A = max{ Ax x C n, x 1}, the induced matrix norm. We write ker A, ran A, and

3 rank A as the kernel, range, rank of A, respectively. Moreover, Fix A := ker(a Id) is known as the set of fixed points of A, where Id is the identity mapping. We say A is nonexpansive if Ax x for all x C n ; furthermore, A is firmly nonexpansive if Ax + x Ax x for all x C n. For any subspace U of R n, we use P U for the orthogonal projection operator to U, dim U for the dimension of U, and U for the orthogonal complement of U. We denote I n, 0 n, 0 m n by the n n identity matrix, the n n zero matrix, and the m n zero matrix, respectively. N is the set of nonnegative integers {0, 1,...}. Preliminary results: the optimal convergence rate of matrices In this section we establish conditions under which convergent matrices attain the optimal convergent rate. Our analysis in later sections hinges on these results on matrices. Let us begin with:.1 Some definitions and well-known facts about matrices Definition.1 (convergent matrices) Let A C n n. We say A is convergent 1 to A C n n if and only if (1) A k A 0 as k. We say A is linearly convergent to A with rate µ [0, 1) if there are some M, N > 0 such that () A k A Mµ k for all k > N, k N. Then µ is called a linear convergence rate of A. When the infimum of all the convergence rates is also a convergence rate, we say this minimum is the optimal linear convergence rate. For any A C n n we denote by σ(a) the spectrum of A, the set of all eigenvalues. The spectral radius [37, Example 7.1.4] of A is defined by (3) ρ(a) := max{ λ λ σ(a)}. The next fact is the classical formula of spectral radius. Fact. (spectral radius formula) ([37, Example ]) Let A C n n. Then we have (4) ρ(a) = lim k A k 1 k. With λ σ(a), recall from [37, page 587] that index (λ) is the smallest positive integer k satisfying rank (A λ Id) k = rank (A λ Id) k+1. Furthermore, we say λ σ(a) is semisimple if index (λ) = 1; see, e.g., [37, Exercise 7.8.4]. Fact.3 For A C n n, λ σ(a) is semisimple if and only if ker(a λ Id) = ker(a λ Id). 1 In the literature, A is called convergent if the power A k converges to 0; moreover, A is semi-convergent whenever the latter limit A k exists. To avoid the confusion of these two terminologies, we just say A is convergent in both cases. This is significantly different from the asymptotic convergence rate [10, p. 199]. 3

4 Proof. Note that λ σ(a) is semisimple if and only if dim[ker(a λ Id)] = n rank (A λ Id) = n rank (A λ Id) = dim[ker(a λ Id) ]. Since ker(a λ Id) ker(a λ Id), the equality dim[ker(a λ Id)] = dim[ker(a λ Id) ] holds if and only if ker(a λ Id) = ker(a λ Id). This verifies the proof of the fact. The following result taken from [37] gives us a complete characterization of a convergent matrix. Fact.4 (limits of powers) ([37, page and page 630]) For A C n n, lim k A k exists if and only if (5) (6) ρ(a) < 1, or else ρ(a) = 1 and λ = 1 is semisimple and it is the only eigenvalue on the unit circle. When this happens, we have (7) lim k A k = A = the projector onto ker(a Id) along ran(a Id). In particular, when ρ(a) < 1, we have A = 0. The proof of the above fact is indeed based on the spectral resolution of A k stated below. Fact.5 (spectral resolution of A k ) ([37, page 603 and page 69]) For k N and A C n n with σ(a) = {λ 1, λ,..., λ s } and k i = index (λ i ), we have (8) A k = s i=1 λ k i G i + s i=1 k i 1 j=1 where the spectral projector G i s have the following properties: ( ) k λ k j j i (A λ i Id) j G i, (i) G i is the projector onto ker((a λ i Id) k i ) along ran((a λi Id) k i ). (ii) G 1 + G + + G s = Id. (iii) G i G j = 0 when i = j. (iv) N i = (A λ i Id)G i = G i (A λ i Id) is nilpotent of index k i, i.e., N k i i = 0 and N k i 1 i = 0. Furthermore, the second sum in (8) disappears when index (λ i ) = 1 for all i = 1,..., s. Remark.6 Note from Fact.5(i) and (iv) that 0 = N k i 1 i = (A λ i Id) k i 1 G k i 1 i = (A λ i Id) k i 1 G i if k i > 1. The limit A in Fact.4 is an oblique projector, not necessarily an orthogonal projector. However, we have: Corollary.7 Suppose that A C n n is convergent to A C n n. Then the following hold: (i) A = P Fix A if and only if Fix A = Fix A. (ii) If A is nonexpansive or normal, then A = P Fix A. 4

5 Proof. It follows from (7) that A is equal to the projector onto ker(a Id) along ran(a Id). Thanks to the equality [37, (5.9.11)], we have ran(a Id) = ran(a Id). If A = P Fix A, we obtain It follows that ran(a Id) = ran(a Id) = ran(p Fix A Id) = ran(p (Fix A) ) = (Fix A). Fix A = [ (Fix A) ] = ran(a Id) = ker(a Id) = Fix A. Conversely, if Fix A = Fix A, we have ker(a Id) = Fix A = Fix A = ker(a Id) = ran(a Id), which implies in turn that the projector onto ker(a Id) along ran(a Id) is exactly the orthogonal projection P Fix A. The first part (i) of the corollary is complete. To justify the second part (ii), suppose in addition that A is nonexpansive. Then Fix A = Fix A by [6, Lemma.1] and thus A is convergent to P Fix A. Moreover, if A is normal, then A Id is also normal. Hence, for all x C n we have (A Id)x = (A Id) (A Id)x, x = (A Id)(A Id) x, x = (A Id) x. The latter clearly shows that Fix A = Fix A and thus A = P Fix A. The proof is complete. Remark.8 (convergence, firmly nonexpansiveness and nonexpansiveness) Let A R n n. When A is firmly nonexpansive, A is convergent; see, e.g., [3, Example 5.17]. However, the converse implication fails. Indeed, consider, for n, ( ) 0 n A =. n 0 Then A is not (firmly) nonexpansive because Ae 1 = ne where e 1 = (1, 0) and e = (0, 1). On the other hand, the characteristic polynomial is λ λ n 1, which has roots ±n 1/. Thus A is convergent due to Fact.4. Moreover, convergence and nonexpansiveness are independent, e.g., A = Id is nonexpansive but not convergent.. Asymptotic convergence rates of convergent matrices We will prove later in this section that whenever A is convergent to A, it is linearly convergent with the rate not smaller than ρ(a A ). To develop this idea, let us now consider the case of diagonalizable matrices. Example.9 (diagonalizable case) Suppose that A C n n is diagonalizable and that σ(a) = {λ 1,..., λ s } with 1 = λ 1 > λ λ 3 λ s. Note that all eigenvalues {λ 1,..., λ s } are semisimple when A is diagonalizable. By Fact.5 and Fact.4, we have A is convergent to A and that A k = A + λ k G + + λ k sg s, 5

6 which yields A k A = λ k G + + λ k sg s. It follows that [ ( A k A λ k λ ) k G + + λ λ k( G + + G s ). ( λs ) k Gs ] λ Hence A k A with the linear rate λ. In general an eigenvalue having second-largest modulus after 1 is called a subdominant eigenvalue. Definition.10 (subdominant eigenvalues) ([8, 40]) For A C n n, we define (10) γ(a) := max { λ λ {0} σ(a) \ {1} }. An eigenvalue λ σ(a) satisfying λ = γ(a) is referred as a subdominant eigenvalue. When A is not diagonalizable, γ(a) need not be the convergence rate. Example.11 Let us consider the following matrix A = 1 0 1, which gives us that γ(a) = 1. Note also that A is not diagonalizable. Moreover, by induction it is easy to check that A k = 1 k 0 for all k N. k k k Hence we have A k A := as k. However, observe that A k A γ(a) k = k k 0 k k = k as k k Hence γ(a) is not a convergence rate. However, observe further that any µ ( 1, 1) is a convergence rate of A. Thus A does not obtain the optimal linear convergence rate. The following result below shows that whenever a matrix A is convergent, it must be linearly convergent with any rate in (γ(a), 1). One may also use the spectral decomposition [34, Proposition 1] to prove. Here, our proof is a bit different, but it is necessary for our further study in the paper. 6

7 Theorem.1 (rate of convergence I) Suppose that A C n n is convergent to A C n n. Then we have γ(a) = ρ(a A ) < 1 and that (11) (A A ) k = A k A for all k N. Moreover, the following two assertions are satisfied: (i) A is linearly convergent with any rate µ (γ(a), 1). (ii) If A is linearly convergent with rate µ [0, 1), then µ [γ(a), 1). Proof. First let us justify that γ(a) = ρ(a A ) < 1 and (11) by considering the two following cases taken from Fact.4: Case 1: ρ(a) < 1. In this case we have A = 0 by (4). It follows that γ(a) = ρ(a) = ρ(a A ) < 1. Note also that (11) is trivial, since A = 0. Case : ρ(a) = 1, and λ = 1 is semisimple and the only eigenvalue on the unit circle. Suppose that σ(a) \ {1} = {λ,..., λ s } with 1 > λ... λ s. The Jordan decomposition [37, page 590] of A allows us to find an invertible matrix P C n n and r > 0 such that (1) A = PJP 1 with J being the Jordan form of A, I r 0 0 J 1 (λ j ) J(λ ) 0 J =......, J(λ. j) = 0 J (λ j ) , 0 0 J(λ s ) 0 0 J tj (λ j ) and λ j J k (λ j ) =... 1, index (λ j) = max { } d jk k = 1,..., t j, λj where d jk is the dimension of the matrix J k (λ j ), t j = dim(ker(a λ j Id)). Moreover, it follows from [37, p. 69] that ( ) (13) A Ir 0 = P P This together with the Jordan decomposition above gives us that 0 r 0 0 (14) A A 0 J(λ ) 0 = P P 1, 0 0 J(λ s ) which readily yields ρ(a A ) = max{0, λ } = γ(a) < 1. Observe further from (1) and (13) that AA = A A = (A ) = A. For any k N the latter gives us that (A k A )(A A ) = A k+1 A k A A A + (A ) = A k+1 A A + A = A k+1 A. 7

8 By using this expression, we may prove by induction (11) and this completes the first part of the theorem. Now to verify (i), pick any µ (γ(a), 1) = (ρ(a A ), 1). Employing (4) for operator A A allows us to find some N N such that A k A = (A A ) k µ k for all k N, which verifies the linear convergence of A with rate µ. It remains to prove (ii). Suppose that A is convergent to A with rate µ [0, 1). Hence there are some M, N > 0 such that A k A Mµ k for all k > N, k N. Combining this with the spectral radius formula (4) and (11) gives us that γ(a) = ρ(a A ) = lim k (A A ) k 1 k = lim k A k A 1 k lim k M 1 k µ = µ, which ensures γ(a) µ and thus completes the proof of the theorem. Remark.13 We observe from (14) that if λ σ(a) \ {0, 1}, λ σ(a A ) and its index does not change..3 The optimal convergence rate of convergent matrices A natural question arising from the above theorem is that in which case γ(a) is the optimal linear convergence rate of A; see our Definition.1. By Theorem.1, the actual problem is to describe when γ(a) is a convergence rate of A; see also our Example.11. Theorem.15 below gives us a complete answer for this question. To prepare, we need the following lemma, which enhances the spectral radius formula in Fact. and is of its own interest. It is worth noting that one may prove it by using the Jordan decomposition [37, page 590 and page 618] with similar complexity. Lemma.14 Let A C n n with spectral radius ρ(a) > 0. σ(a), λ = ρ(a) }. We have Define α := max { index (λ) λ (15) 0 < lim sup k A k ( ) <. k ρ(a) α 1 k Proof. For the matrix A, denote the set of distinct eigenvalues in σ(a) by {λ 1,..., λ s } with ρ(a) = λ 1 λ... λ s and k i = index (λ i ), i = 1,..., s. We get from (8) that (16) Denote by A k = s i=1 λ k i G i + s i=1 k i 1 j=1 ( ) k λ k j j i (A λ i Id) j G i. (17) E := {1,..., s}, F := { l N λ l = ρ(a), 1 l s }, and S := { i F index (λ i ) = α }. 8

9 It is clear that F S =. By (16) we have (18) A k = Note that ( ) k j } {{ } :=H k i 1 i S j=0 λ k j i (A λ i Id) j G i k i 1 ( ) k + λ k j j i (A λ i Id) j G i. i E\S j=0 }{{} :=K (19) K ( ) k α 1 λ 1 k = i E\S k i 1 j=0 = i (E\S) F + i E\F k i 1 j=0 ( ) k j k λ i j k i 1 j=0 1 λ 1 k ( ) (A λ i Id) j G i k α 1 ) k j λ i 1 λ 1 k ( ) (A λ i Id) j G i k α 1 ( k j ( ) k j k λ i j 1 λ 1 k ( ) (A λ i Id) j G i. k α 1 For i (E \ S) F and 0 j k i 1, observe from the definition of α in (17) that j α. It follows that ( ) k (0) ( j k ) λk j i = λ 1 k α 1 (α 1)!(k α + 1)! j!(k j)! 1 λ 1 j (α 1)! j!(k α + ) 1 λ 1 j := ε 1 (k) 0 as k. For i E \ F and 0 j k i 1, we have ( ) k j k λ i (1) j λ 1 k kj( λ i ) k j λ1 j := ε (k) 0, λ 1 since k j is polynomial in k and ( λ i / λ 1 ) k j is exponential with λ i / λ 1 < 1 for i / F. It follows from (19), (0), and (1) that () K ) ( k α 1 λ 1 k i (E\S) F k i 1 ε 1 (k) (A λ i ) j G i + j=0 i E\F k i 1 ε (k) (A λ i ) j G i 0. j=0 9

10 Now note from (18) that ( ) k H ( ) k α 1 (3) λ 1 k = i S α 1 j=0 j ( k α 1 λ k (α 1) i = i S = i S ) λk j i λ 1 k (A λ i Id) j G i λ 1 k (A λ i Id) α 1 G i + i S λ k i λ 1 k λ (α 1) i α j=0 (A λ i Id) α 1 α G i + i S j=0 Furthermore, for i S and j α similarly to (0) we may prove that ( ) k which implies in turn that ( j k ) λk j i α 1 λ 1 k := ε3 (k) 0 when k, ( ) k ( j k ) λk j i λ 1 k (A λ i Id) j G i α 1 ( ) k ( j k ) λk j i λ 1 k (A λ i Id) j G i. } α 1 {{ } :=H 1 (4) α H 1 ε 3 (k) (A λ i Id) j G i 0 as k. i S j=0 ( ) k By dividing (18) by λ 1 k and taking k, we get from (18), (), (3), and (4) that α 1 (5) lim sup k ( k α 1 A k ) ρ(a) k = lim sup k i S λ k i λ 1 k λ (α 1) i (A λ i Id) α 1 G i (6) λ i (α 1) (A λ i Id) α 1 G i <, i S which verify the right-hand inequality in (15). Furthermore, since λ i λ 1 = 1 for all i S, by passing to subsequences we may assume without loss of generality that for each i S the sequence [ ] k λi λ i xi with x i = 1 as k. Hence, it follows from (5) that (7) lim sup k A k ( ) k ρ(a) α 1 k i S x i λ (α 1) i (A λ i Id) α 1 G i. The left-hand inequality in (15) is justified when i S x i λ (α 1) i (A λ i Id) α 1 G i = 0. By contraction, suppose i S x i λ (α 1) i (A λ i Id) α 1 G i = 0. By multiplying this equality by G l with l S, we get from Fact.5 (i) and (iii) that x l λ (α 1) l (A λ l Id) α 1 G l = 0, 10

11 which is impossible since x l = 0, λ l = λ 1 = 0, and (A λ l Id) α 1 G l = 0 by Remark.6. The proof is complete. Theorem.15 (rate of convergence II) Let A C n n be convergent to A C n n. Then γ(a) is the optimal linear convergence rate of A if and only if all the subdominant eigenvalues are semisimple. Proof. By Theorem.1, we only need to prove that γ(a) is a linear convergence rate of A if and only if λ is semisimple for every eigenvalue λ σ(a) satisfying λ = γ(a). Define α := max { index (λ) λ σ(a), λ = γ(a) }. If γ(a) = 0, then by (8) we have A k = A for all k 1. This means that A = A and A = A, which ensures that γ(a) is semisimple and γ(a) = 0 is a convergence rate. Thus the statement of the theorem is trivial in this case. It remains to prove the theorem when γ(a) > 0. If A is linearly convergent to A with the rate γ(a), we find M, N such that A k A Mγ(A) k for all k > N, k N Note from Theorem.1 that γ(a) = ρ(a A ). Thanks to Remark.13, all subdominant eigenvalues of A are eigenvalues of A A and their indices do not change. It follows from (11) and Lemma.14 for A A that 0 < lim sup k (A A ) k ( ) = lim sup k ρ(a A α 1 ) k k A k A ( k α 1 ) γ(a) k lim sup k which yields α = 1 and thus all subdominant eigenvalues are semisimple. ( M k ), α 1 Conversely, if all subdominant eigenvalues are semisimple, we have α = 1. Applying Lemma.14 again to A A gives us that lim sup k A k A γ(a) k = lim sup k (A A ) k ( ) <, k ρ(a A α 1 ) k which verifies that γ(a) is the convergence rate of A. The proof is complete. Remark.16 It is worth mentioning that Example.9 is also a direct consequence of Theorem.15, since all the eigenvalues of A are semisimple when A is diagonalizable. Moreover, γ(a) is not the convergence rate in Example.11, since 1 = γ(a) is not semisimple in this case. Next let us summarize Fact.4, Theorem.1, and Theorem.15 in the following result, which provides a complete characterization for obtaining the optimal convergence rate. Theorem.17 (optimal convergence rate) Let A C n n. Then A is convergent with the optimal linear convergence rate γ(a) if and only if one of the following holds: (i) ρ(a) < 1 and all λ σ(a) satisfying λ = γ(a) are semisimple. (ii) ρ(a) = 1, λ = 1 is the only eigenvalue on the unit circle, λ = 1 is semisimple, and all λ σ(a) satisfying λ = γ(a) are semisimple. 11

12 Proof. If A is convergent with the optimal linear convergence rate, Theorem.1 tells us that γ(a) is the optimal convergence rate. Moreover, (i) and (ii) follow from Fact.4 and Theorem.15. Conversely, if (i) and (ii) hold, we also get from Fact.4 and Theorem.15 that A is convergent with the optimal rate γ(a). Theorem.18 Let A C n n be convergent to A. Then we have (8) A k A A A k and thus γ(a) A A. Furthermore, if A is normal then we have (9) A k A = A A k and γ(a) = A A is the optimal convergence rate of A. Proof. First, observe from (11) in Theorem.1 that A k A = (A A ) k A A k, which clearly ensures (8) and thus γ(a) = ρ(a A ) A A. To justify the second part, suppose that A is convergent and normal. We claim that A A is also normal. This is trivial when A = 0. It remains to take into account the case A = 0. Since A is normal, we can find a diagonal matrix J = diag (λ 1, λ,..., λ n ) with λ 1... λ n and a unitary matrix P such that A = PJP. Fact.4 tells us that 1 σ(a) and 1 = λ 1 =... = λ r > λ r+1... λ n for some r N. It follows that (30) A = lim k A k = P Hence we obtain ( ) Ir 0 P r r 0 0 A A 0 λ r+1 0 = P P, 0 0 λ n which is a normal matrix. The latter formula together with (11) also gives us that A k A = (A A ) k = A A k = λ r+1 k = [ρ(a A )] k = γ(a) k, which ensures (9) and completes the proof of the theorem. Remark.19 (i). Theorems.1 and.15 can also be deduced from the spectral radius formula and Jordan factorizations; however, we were not be able to find these results explicitly in literature. The asymptotic convergence bound of matrix powers can be found [43, Theorem.9, p. 33], i.e., Let A be of order n and let ε > 0 be given. Then for any norm there exists σ (depending on the norm) and τ > 0 (depending on A, ε and the norm) such that for all k 1: (31) σρ(a) k A k τ A,ε (ρ(a) + ε) k. If the dominant eigenvalues of A is nondefective, we may take ε = 0. Lemma.14 gives a better upper bound for A k when ρ(a) > 0. The distinguished feature of our results given here is the 1

13 complete characterizations under which the convergence rate of the matrix power A k is exactly γ(a), rather than γ(a) + ε for some ε > 0; or just sufficient conditions. (ii). Assume that lim k A k exists and ρ(a) = 1. According to Fact.4, the spectral resolution, A = P + Z where P = G 1, P = P, PZ = ZP = 0 and ρ(z) = γ(a) < 1, see also [36, Theorem.1]. We have A k = P + Z k so that A k P = Z k. Therefore, the rate of convergence of A k to P is exactly the rate of convergence of Z k to 0. This observation is exactly Theorem.1 with a different proof, where A there is P in this decomposition. It is well-known that ρ(z) is the asymptotic convergence rate meaning that for every ε > 0, there exists N N, M 1, M > 0 such that ( k N) Mρ(Z) k Z k M (ρ(z) + ε) k. Our result says that there exists M 3 > 0 such that ( k N) Z k M 3 ρ(z) k if and only if all eigenvalues of Z with magnitude ρ(z) are semisimple. In other words, when one of the eigenvalues of Z with magnitude ρ(z) is not semisimple, the convergence rate of Z k to 0 can be as close as ρ(z) as one wishes, but not exactly ρ(z). (iii). When all subdominant eigenvalues of A are semisimple, all eigenvalues of Z in (ii) with magnitude ρ(z) are also semisimple and thus nondefective in the sense of Stewart [43]. It follows from [43, Theorem.9, p. 33] or (31) above that there exists τ > 0 such that Z k τρ(z) k for all k > 1. This together with the spectral resolution aforementioned in (ii) tells us that A k linearly converges to P with the exact rate ρ(z) = ρ(a). The latter conclusion is indeed the sufficient part in (.15) obtained by using a different method. 3 Convergence rate analysis of relaxed alternating projection and generalized Douglas-Rachford methods In this section, using results in Section and principal angles between two subspaces, we will analyze convergence rates of relaxed alternating projections, partial relaxed alternating projection and generalized Douglas-Rachford methods for two subspaces comprehensively. We show that how to choose the relaxation parameter to find the optimal rates of convergence. It turns out that matrices associated with these iteration procedures do have subdominant eigenvalues being semi-simple. Throughout the section we suppose that U and V are two subspaces of R n with 1 p := dim U dim V := q n 1. Note that the whole section will be trivial if dim U = 0 or dim V = n. Let us recall the principal angles and the Friedrichs angles between U and V as follows, which are crucial for our quantitative analysis of convergence rates. Definition 3.1 (principal angles) ([9], [37, page 456]) The principal angles θ k [0, π ], k = 1,..., p between U and V are defined by (3) cos θ k := u k, v k { = max u, v u U, v V, u = v = 1, u, u j = v, v j = 0, j = 1,..., k 1 } with u 0 = v 0 := 0. It is worth mentioning that the vectors u k, v k are not uniquely defined, but the principal angles θ k are unique with 0 θ 1 θ θ p π ; see [37, page 456]. 13

14 Definition 3. (Friedrichs angle) The cosine of the Friedrichs angle θ F (0, π ] between U and V is { (33) c F (U, V) := max u, v } u U (U V), v V (U V), u = v = 1. In the following proposition we show that the Friedrichs angle is exactly the (s + 1)-th principal angle θ s+1 where s := dim(u V). Proposition 3.3 (principal angles and Friedrichs angle) Let s := dim(u V). Then we have θ k = 0 for k = 1,..., s and θ s+1 = θ F > 0. Proof. Let x 1,..., x s be an orthonormal basis of the subspace U V. We may choose u k = v k = x k, k = 1,..., s from (3). It follows that cos θ k = x k, x k = 1 and thus θ k = 0 for all k = 1,..., s. Moreover, since span {u 1,..., u s } = span {v 1,..., v s } = U V, we obtain from (3) that (34) cos θ s+1 = max { u, v u U, v V, u = v = 1, u, v (U V) }. This together with (33) tells us that θ s+1 = θ F. The proof is complete. The following result follows the idea of [9, 16] to construct the orthogonal projections P U and P V with the appearance of the principal angles. Proposition 3.4 (principal angles and orthogonal projections) Suppose that p + q < n. Then we may find an orthogonal matrix D R n n such that I p (35) P U = D 0 0 p q p n p q D and P V = D C CS 0 0 CS S I q p n p q where C and S are two p p diagonal matrices defined by C := diag ( ) cos θ 1,..., cos θ p and S := diag ( ) (36) sin θ 1,..., sin θ p D, with the principal angles θ 1,..., θ p between U and V found in Definition 3.1. Consequently, we have (37) C CS 0 0 P U P V = D 0 0 p q p 0 D and P U P V = D n p q 0 p CS C q p I n p q Furthermore, the orthogonal projection P U V is computed by ( ) Is 0 (38) P U V = D D with s := dim(u V). 0 0 n s D. Proof. Let Q U R n p, Q U R n (n p) and Q V R n q be three matrices such that their columns form three orthonormal bases for U, U and V, respectively. It follows from [37, page 430] that P U = Q U QU, I P U = P U = Q U Q and P U V = Q V QV. Furthermore, by [9, Theorem 1] we have that the Singular Value Decomposition (SVD) of the p q matrix QU Q V is (39) Q U Q V = ACB with C = diag (cos θ 1,..., cos θ p ) R p p, 14

15 where A R p p and B R q p satisfy AA = A A = B B = I p. Since all p columns of B are orthonormal and p q, we may find a q (q p) matrix B such that B := (B, B ) R q q is orthogonal. Define further that D 1 := Q U A R n p, we have D 1 D 1 = A Q U Q U A = A A = I p. Note further from (39) that (40) Moreover, we get from (39) that P U Q V = Q U Q U Q V = Q U ACB = D 1 CB. [Q U Q V ] [Q U Q V ] = QV Q U Q U Q V = QV(Id P U )Q V = Id QV P UQ V = Id QV Q UQU Q V = Id (ACB ) (ACB ) = Id BCA ACB ( ) ( = Id BC B = B B C 0 B B Ip C = B ) 0 B 0 0 q p 0 I q p ( ) S 0 = B B. 0 I q p Hence the columns of B are eigenvectors of [Q Q U V ] [Q Q U V ]. It follows that the SVD of Q Q U V has the form ( ) (4) Q S 0 U Q V = A 1 B 0 I q p for some A 1 R (n p) q with A1 A 1 = I q. Define D := Q U A 1 R n q, we have D D = A1 Q Q U U A 1 = A1 A 1 = I q. Moreover, it follows from (4) that ( ) (43) (I P U )Q V = Q U Q S 0 U Q V = D B. 0 I q p Note further that D1 D = A QU Q U A 1 = 0, since the columns of Q U, Q U are two basis of U and U, respectively. Thus there is an n (n p q) matrix D 3 such that D := (D 1, D, D 3 ) R n n is orthogonal. Combining (40) and (43) gives us that Hence we have P V = Q V Q V = Q V = D 1 CB + D ( S 0 0 I q p ) B = D 1 ( C 0p (q p) ) B + D ( S 0 0 I q p [ ( ) ] [ ( ( ) D 1 C 0p (q p) B S 0 + D B C B 0 I q p = D 1 C D 1 + D 1 ( CS 0p (q p) ) D + D ( SC C CS 0 0 = D CS S I q p 0 D, n p q 0 (q p) p 0 (q p) p ) D 1 + D ( S 0 0 I q p ) B. ) ( D1 S 0 + B 0 I q p ) D ) ] D which ensures the second part of (35). Note further that D 1 D1 = Q U A(Q U A) = Q U AA QU = Q U QU = P U. It follows that I p P U = D 0 0 p q p 0 D, n p q 15

16 which verifies (35). The formulas of P U P V and P U P V = (Id P U )(Id P V ) in (37) can be derived easily from (35). It remains to establish (38). Observe from (37) and Proposition 3.3 that C k C (k 1) CS 0 ( ) (P U P V ) k = D 0 0 p 0 D Is 0 D D as k. 0 0 n s n p Note further that Fix(P U P V ) = U V = Fix(P V P U ); see, e.g., ( [6, Lemma ).4]. Combining this with Is 0 (38) and Corollary.7 tells us that P U V = P Fix(PU P V ) = D D 0 0. n s Remark 3.5 When p + q < n, observe from (37), (33), and Proposition 3.3 that γ(p U P V ) = γ(p U P V ) = c F (U, V). These equalities is also true when p + q n by applying the trick used in Case in the proof of Theorem 3.6. It follows that c F (U, V) = c F (U, V ) by replacing U, V by U, V, respectively. This equality is known as Solmon s formula; see [17, Theorem 16] and also [39, Theorem 3] for different proofs. 3.1 Convergence rate of relaxed alternating projection methods Throughout this subsection let us denote the classical alternating projection mapping by T := P U P V, which is well-known to be convergent to P U V with the linear rate c F (U, V) = cos θ s+1 with s = dim(u V); see [17, 7]. We will study some relaxations of this operator and show that a better optimal rate can be obtained. The first kind relaxed alternating projection mapping we will study is defined by (46) T µ := (1 µ) Id +µp U P V with µ R. It is worth noting that the case µ = 0 is not interesting, since T 0 = Id is the identity map. Let us analyze the convergence of T µ in the following result mainly for the case µ = 0. When µ = 1, it recovers the classical result aforementioned. Theorem 3.6 (relaxed alternating projection) Let θ s+1 = θ F be defined in Proposition 3.3 with s = dim(u V). Then the mapping T µ = (1 µ) Id +µp U P V, µ R is convergent if and only if µ [0, ). Moreover, the following assertions hold: (i) If µ (0, 1+sin θ s+1 ], then T µ is convergent to P U V with the optimal linear rate γ(t µ ) = 1 µ sin θ s+1. (ii) If µ ( 1+sin θ s+1, ), then T µ is convergent to P U V with the optimal linear rate γ(t µ ) = µ 1. Consequently, when µ = 0, T µ is convergent to P U V with linear rate smaller than cos θ s+1 if and only if µ (1, sin θ s+1 ). Furthermore, T µ attains the smallest convergence rate 1 sin θ s+1 1+sin θ s+1 at µ = 1+sin θ s+1. Proof. Let us justify the theorem by considering two main cases as follows. 16

17 Case 1: p + q < n, where 1 p = dim U q = dim V n 1. By Proposition 3.4, (35) and (37), we may find some orthogonal matrix D such that (1 µ)i p + µc µcs 0 T µ = (1 µ) Id +µp U P V = D 0 (1 µ)i p 0 D (47) I p µs 0 0 µcs 0 (1 µ)i n p = D 0 (1 µ)i p 0 D. 0 0 (1 µ)i n p It follows that (48) σ(t µ ) = {1 µ sin θ k k = 1,..., p} {1 µ}. Suppose first that T µ is convergent, we get from Fact.4 that ρ(t µ ) 1 and 1 σ(t µ ). Thus we have 1 µ 1 and 1 = 1 µ, which yield 0 µ <. Conversely, suppose that 0 µ < and observe from Proposition 3.3 that 1 = 1 µ sin θ 1 =... = 1 µ sin θ s > 1 µ sin θ s µ sin θ p 1 µ > 1. If µ = 0 then T µ = Id is always convergent. If µ > 0 and s = 0, it is clear that 1 / σ(t µ ) by (48). Thus T µ is convergent by Fact.4. If µ > 0 and s > 0, we claim that 1 σ(t µ ) is semisimple. Indeed, observe from (47) that ( ker( µs (49) ker(t µ Id) = ) ( ) ) R s D = D. 0 (n p) 1 0 (n s) 1 Similarly we also have ( ker( µ (50) ker(t µ Id) = S 4 ) ) D 0 (n p) 1 ( ) R s = D. 0 (n s) 1 It follows from (49) and (50) that 1 is semisimple to T µ due to Fact.3. This tells that T µ is convergent by Fact.4. Thus T µ = (1 µ) Id +µp U P V, µ R is convergent if and only if µ [0, ). Next let us justify (i) and (ii) under the assumption that µ (0, ). We claim first that T µ is convergent to P U V. Indeed, note that Furthermore, we have Fix T µ = ker[µ(p U P V Id)] = ker(p U P V Id) = Fix(P U P V ) = U V. Fix T µ = ker[µ(p V P U Id)] = ker(p V P U Id) = Fix(P V P U ) = V U, which yields in turn the equality Fix T µ = Fix T µ. By Corollary.7, the mapping T µ is convergent to P U V. Now we justify the quantitative characterizations in (i) and (ii). Observe from (48) that the subdominant eigenvalue of T µ is (51) γ(t µ ) = max{ 1 µ sin θ s+1, 1 µ }. Note also that (5) (1 µ sin θ s+1 ) (1 µ) = µ cos θ s+1 [ µ(1 + sin θ s+1 ) ]. 17

18 Subcase a: cos θ s+1 = 0. Then we have θ s+1 =... = θ p = π and γ(t µ) = 1 µ. In this case it is easy to see that CS = 0 and thus T µ is diagonalizable by (47). Thanks to Example.9 we have T µ is convergent with optimal rate 1 µ. Both (i) and (ii) are valid in this case. Subcase b: cos θ s+1 > 0. Let us consider the following three subsubcases: Subsubcase b1: µ (0, sin θ s+1 +1 ). Then we have 1 µ sin θ s+1 > 1 µ by (5) and thus γ(t µ ) = 1 µ sin θ s+1. Observe that (53) 1 > a µ := 1 µ sin θ s+1 > 1 sin θ s sin θ s+1 = 1 sin θ s sin θ s+1 0. Hence we have γ(t µ ) = 1 µ sin θ s+1. Suppose further that θ s+1 =... = θ k and θ s+1 = θ k+1 with some k {s + 1,..., p}, we easily check from (47) that ker(t µ a µ Id) = ker(t µ a µ Id) = D ( 0 1 s (R k s ) 0 1 (n k) ), which shows that a µ is semisimple by Fact.3. Thanks to Theorem.15, T µ is convergent with the optimal rate a µ. Subsubcase b: µ = 1+sin θ s+1 > 1. Then we obtain from (51) that (54) γ(t µ ) = 1 µ sin θ s+1 = 1 µ = 1 µ sin θ s+1 = µ 1 = 1 sin θ s sin θ s+1. It is similar to the above subsubcase that a µ σ(t µ ) is semisimple. Furthermore, 1 µ σ(t µ ) is also semisimple. Indeed, observe that µc µcs 0 T µ (1 µ) Id = D 0 0 p n p By using these two expressions, we may check that µ C 4 µ C 3 S 0 D, (T µ (1 µ) Id) = D 0 0 p n p ker(t µ (1 µ) Id) = ker(t µ (1 µ) Id), D. which yields that (1 µ) is also semisimple by Fact.3. By Theorem.15 again, we obtain that 1 sin θ s+1 1+sin θ s+1 is the optimal linear convergent rate of T µ. Subsubcase b3: µ > 1+sin θ s+1 we get from (51) that (55) γ(t µ ) = 1 µ = µ 1 > > 1. It follows from (5) that 1 µ sin θ s+1 < 1 µ. And thus 1 + sin θ s+1 1 = 1 sin θ s sin θ s+1. Similarly to the above case, 1 µ σ(t µ ) is semisimple. Thus Theorem.15 tells us that µ 1 is the convergent rate of T µ in this subcase. Combining Subsubcase b1 and Subsubcase b ensures (i), and (ii) is exactly the Subsubcase b3. Thus (i) and (ii) are verified. Let us complete the proof by verifying the last part of the theorem. When µ (0, 1+sin θ s+1 ], we have 1 µ sin θ s+1 < cos θ s+1 if and only if µ > 1, since sin θ s+1 > 0 by Proposition

19 Furthermore, when µ ( 1+sin θ s+1, ), we have µ 1 < cos θ s+1 if and only if µ < 1 + cos θ s+1 = sin θ s+1. Combining these two observations with (i) and (ii) in the theorem tells us that T µ is convergent to P U V with a rate smaller than cos θ s+1 if and only if µ (1, sin θ s+1 ). Moreover, the optimal rate 1 sin θ s+1 1+sin θ s+1 of T µ is obtained at µ = 1+sin θ s+1 due to (53), (54), and (55). Case : p + q n. We may find some k N such that n := n + k > p + q. Define U := U {0 k } R n, V := V {0 k } R n, and T µ = (1 µ) Id +µp U P V. It is clear that 1 p = dim U dim V = q and p + q < n. Observe from Definition 3.1 that the principal angles between U and V are the same with the ones between U and V. Moreover, we have P U = ( ) ( ) PU 0 PV 0, P 0 0 V =, and thus k 0 0 k ( (56) T µ Tµ 0 = 0 (1 µ)i k Since q n 1, there is some x R n \ {0} such that P V x = 0. It follows that Tx = 0, and thus we have 0 σ(t) and then 1 µ σ(t µ ). If T µ is convergent, Fact.4 tells us that 1 < 1 µ 1, i.e., µ [0, ). Conversely, if µ [0, ) we have T µ is convergent due to Case 1. This together with (56) ensures that T µ is also convergent. Hence T µ is convergent if and only if µ [0, ). To verify the convergence rate of T µ, suppose further that µ (0, ). We note that σ(t µ ) = σ(t µ), which implies in turn that γ(t µ ) = γ(t µ). It follows from Case 1 that T µ in (56) is convergent to P U V = PU V 0 ( ) with the convergence rate γ(t 0 0 µ). This together with (56) yields k ). T k µ P U V (T µ) k P U V. Thus γ(t µ) = γ(t µ ) is the convergence rate of T µ by also Theorem.1. The analysis of γ(t µ) in (i) and (ii) in Case 1 also guides us to verify (i) and (ii) for γ(t µ ) in Case. Hence the proof is complete. Next we study another kind of relaxation of the the map T = P U P V, that is (57) S µ := P U ((1 µ) Id +µp V ) = (1 µ)p U + µp U P V ; see also [33] for a similar form, which will give us a better optimal rate. Since the proof is similar to the one of Theorem 3.6 above, we only sketch the main steps. Theorem 3.7 (partial relaxed alternating projection) The map S µ := P U ((1 µ) Id +µp V ) = (1 µ)p U + µp U P V is convergent if and only if µ [0, ) with the convention 1 sin θ p 0 =. Moreover, the following assertions hold: (i) If µ (0, sin θ s+1 +sin θ p ], then S µ is convergent to P U V with the optimal linear convergence rate γ(s µ ) = 1 µ sin θ s+1. (ii) If µ ( sin θ s+1 +sin θ p, sin θ p ), then S µ is convergent to P U V with the optimal linear convergence rate γ(s µ ) = µ sin θ p 1. Consequently, when µ = 0, S µ is convergent to P U V with the optimal linear convergence rate smaller than cos θ s+1 = c F (U, V) if and only if µ (1, sin θ s+1 ). Furthermore, S sin θ µ attains the smallest linear p convergence rate sin θ p sin θ s+1 sin θ s+1 +sin θ p at µ = sin θ s+1 +sin θ p. 19

20 Proof. We separate the proof into two main cases as below: Case 1: p + q < n with 1 p = dim U q = dim V n 1. It follows from (35) and (37) that there is some orthogonal matrix D R n n such that (1 µ)i p + µc µcs 0 I p µs µcs 0 (58) S µ = D 0 0 p 0 D = D 0 0 p 0 D n p n p Hence we have (59) σ(s µ ) = {1 µ sin θ k k = 1,..., p} {0}. Suppose that S µ is convergent, we get from Fact.4 that (60) 1 < 1 µ sin θ p and 1 µ sin θ s+1 1. Since θ s+1 = θ F = 0 by Proposition 3.3, the latter gives us that µ [0, sin θ p ). Conversely, suppose that µ [0, sin θ p ), we have (61) 1 = 1 µ sin θ 1 = = 1 µ sin θ s 1 µ sin θ s+1 1 µ sin θ p > 1. If µ = 0 then S µ = P U is always convergent. If µ > 0 and s = 0, it is clear that 1 / σ(s µ ) by (59). Thanks to Fact.4, we have S µ is convergent. If µ > 0 and s > 0, it is similar to the corresponding part of Theorem 3.6 that 1 σ(s µ ) is semisimple. Combining (61) with Fact.4 gives us that S µ is convergent. Thus S µ is convergent if and only if µ [0, sin θ p ). To verify (i) and (ii), assume further that µ (0, sin θ p ). Let us claim that S µ is convergent to P U V. Via the explicit form of S µ in (58), we can easily check that Note also from (38) that Fix S µ = ker(s µ Id) = D((R s ) 0 1 (n s) ) = ker(s µ Id) = Fix S µ. U V = Fix P U V = D((R s ) 0 1 (n s) ). It follows that Fix S µ = Fix S µ = U V. Thanks to Corollary.7, we have S µ is convergent to P U V. Next we justify the qualitative characterizations in (i) and (ii). Observe from (59) and (61) that (6) γ(s µ ) = max{ 1 µ sin θ s+1, 1 µ sin θ p }. Note also that (63) (1 µ sin θ s+1 ) (1 µ sin θ p ) = µ(sin θ p sin θ s+1 )[ µ(sin θ s+1 + sin θ p )]. Subcase a: sin θ p = sin θ s+1, i.e., θ s+1 = θ s+ = = θ p. Hence we have σ(s µ ) = {1 µ sin θ s, 1 µ sin θ s+1, 0} and γ(s µ ) = 1 µ sin θ s+1. Moreover, it is easy to check that c µ := 1 µ sin θ s+1 is semisimple by showing that ker(s µ c µ Id) = ker(s µ c µ Id). Subcase b: sin θ p = sin θ s+1, i.e., sin θ p > sin θ s+1. We continue the proof by taking into account three different cases as follows. 0

21 Subsubcase b1: µ (0, sin θ s+1 +sin θ p ). Then we have from (63) that 1 µ sin θ s+1 > 1 µ sin θ p, which gives us that γ(s µ ) = 1 µ sin θ s+1 by (6). Moreover, note that (64) c µ = 1 µ sin θ s+1 > 1 sin θ s+1 + sin θ p sin θ s+1 = sin θ p sin θ s+1 sin θ s+1 + sin θ p > 0. Thanks to the structure of S µ in (58), we may check that c µ is semisimple. Thus c µ = γ(s µ ) is the optimal linear convergence rate of S µ by Theorem.15. Subsubcase b: µ = sin θ s+1 +sin θ p. Thus (65) γ(s µ ) = 1 µ sin θ s+1 = 1 µ sin θ p = sin θ p sin θ s+1 sin θ s+1 + sin θ p > 0. We can check that c µ = 1 µ sin θ s+1 and d µ := 1 µ sin θ p are semisimple in this case via Fact.3. This together with Theorem.15 tells us that γ(s µ ) = 1 µ sin θ s+1 = sin θ p sin θ s+1 sin θ s+1 +sin θ p the optimal linear rate of S µ. Subsubcase b3: µ (, ). It follows from (63) that 1 µ sin θ sin θ s+1 +sin θ p sin θ s+1 < 1 p µ sin θ p, which yields γ(s µ ) = 1 µ sin θ p by (6). Moreover, observe that (66) µ sin θ p 1 > sin θ s+1 + sin θ p sin θ p 1 = sin θ p sin θ s+1 sin θ s+1 + sin θ p > 0. We also have d µ = 1 µ sin θ p is semisimple via Fact.3. Thanks to Theorem.15, γ(s µ ) = µ sin θ p 1 is the optimal linear convergence rate of S µ. Combining Subsubcase b1 and Subsubcase b gives us (i). Furthermore, Subsubcase b3 exactly verifies (ii). The last part of the theorem is indeed a direct consequence of (i) and (ii). The proof of the theorem for Case 1 is complete. Case : p + q n. Then we find some k N such that n := n + k > p + q and define U := U {0 k } R n, V := V {0 k } R n, and S µ = (1 µ)p U + µp U P V. It is clear that 1 p = dim U dim V = q and p + q < n. Moreover, we also have ( ) S µ Sµ 0 =, 0 0 k which shows that S µ is convergent if and only if S µ is convergent. The rest of the proof is quite similar to the corresponding one in Theorem 3.6. Remark 3.8 It is clear that the optimal linear rate sin θ p sin θ s+1 sin θ s+1 +sin θ p is of S µ is smaller than the one 1 sin θ s+1 of T 1+sin θ µ in Theorem 3.6. Note further from the above theorem that S = P U R V with s+1 R V := P V Id, which is known as the reflection-projection method [7, 11] is convergent to P U V if and only if <, i.e., θ sin θ p < π p. When this case is fulfill, the optimal linear rate of the reflectionprojection method is max{ 1 sin θ s+1, 1 sin θ p } by (6). Besides the definition of θ s+1, θ p in Definition 3.1 and Definition 3., we may also obtain θ s+1, θ p in following formulas (67) cos θ s+1 = P U P V P U V and sin θ p = P U P U P V = P U P U P V P U from (35), (37), and (38). 1

22 Remark 3.9 (finite termination) From Theorem 3.6, observe that the map T µ has the linear convergence rate 0, i.e., it will always terminate after finite powers if and only if θ s+1 = π and µ = 1. Similarly, we get from Theorem 3.7 that S µ has the linear convergence rate 0 if and only if µ = and θ sin θ s+1 +sin θ s+1 = θ p. The latter condition is clearly satisfied when dim(u V) = p 1 p and µ = 1 ; e.g., U and V are two different lines passing the origin in R, or U is a line in R 3 sin θ s+1 and V is a hyperplane in R 3 with U V, or U and V are two different hyperplanes in R 3, etc. 3. Convergence rate of the generalized Douglas-Rachford method Convergence rates of many specific matrices relating to the Douglas-Rachford operator (68) R := P U P V + P U P V = R UR V + Id = R U R V + Id have been discussed in [16]. One of the particular cases there is the so-called generalized Douglas- Rachford operator R µ defined by R µ := (1 µ) Id +µr. The convergence rate of this mapping has been obtained in Demanet-Zhang [16] under an additional condition U V = {0}. In the following result we give a complete characterization of the convergence of this map and also show that the condition U V = {0} can be relaxed. Theorem 3.10 (generalized Douglas-Rachford method) The map R µ is convergent if and only if µ [0, ). Moreover, the following assertions hold: (i) R µ is normal. (ii) If µ (0, ) then R µ is convergent to P Fix R = P (U V) (U V ) with the optimal linear convergence rate γ(r µ ) = µ( µ) cos θ s+1 + (1 µ), where s := dim(u V). Proof. As proceeded in the proof of Theorem 3.6 and Theorem 3.7, we consider two major cases as below. (69) Case 1: p + q < n. By using the expressions of (37), we easily establish that R µ C + (1 µ)s µcs 0 0 = D µcs C + (1 µ)s (1 µ)i q p 0 D I n p q I p µs µcs 0 0 = D µcs I p µs (1 µ)i q p 0 D ; I n p q see also a similar form on [16, page 14]. It is easy to check that R µr µ = R µ R µ, i.e., R µ is normal. Thus (i) is satisfied. We may get from the above format and the block determinant formula, c.f., [37, page 475] that σ(r µ ) = { {cos θ k + (1 µ) sin θ k ± iµ cos θ k sin θ k k = 1,..., p} {1} if q = p, {cos θ k + (1 µ) sin θ k ± iµ cos θ k sin θ k k = 1,..., p} {1} {1 µ} if q > p,

23 where i := 1. For any k = 1,..., p, we have 1 µ sin θ k ± iµ cos θ k sin θ k = (1 µ sin θ k ) + [ ] µ cos θ k sin θ k = [µ cos θ k + (1 µ)] + µ cos θ k (1 cos θ k ) = µ( µ) cos θ k + (1 µ). Suppose further that R µ is convergent. Then we get from Fact.4 that µ( µ) cos θ s+1 + (1 µ) 1, which yields µ( µ)(1 cos θ s+1 ) 0 and thus µ [0, ], since cos θ s+1 < 1. Next let us consider three particular subcases of µ. Subcase a. µ =. Then all eigenvalues of R µ have magnitude 1. By Fact.4, we have (70) 1 µ sin θ k ± iµ cos θ k sin θ k = 1 for all k = 1,..., p, which implies in turn that sin θ s+1 cos θ s+1 = 0 and thus θ s+1 = π, since sin θ s+1 > 0 by Proposition 3.3. It follows that 1 µ sin θ s+1 ± iµ cos θ s+1 sin θ s+1 = 1, which contradicts (70). Hence when µ =, R µ is not convergent. (71) Subcase b: µ = 0. It is obvious that R µ = Id is convergent to Id with rate 0. Subcase c: 0 < µ <. By Propodition 3.3 we have 1 = µ( µ) cos θ 1 + (1 µ) = = > µ( µ) cos θ s+1 + (1 µ) µ( µ) cos θ p + (1 µ) 1 µ. µ( µ) cos θ s + (1 µ) µ( µ) cos θ s+ + (1 λ) Since R µ is normal, it follows from Fact.4 and Corollary.7 that R µ is convergent. Hence we have R µ is convergent if and only if µ [0, ). It remains to verify (ii) in this case. Suppose that µ (0, ), we get from the normality of R µ and Theorem.18 that γ(r µ ) = µ( µ) cos θ s+1 + (1 µ) (by (71)) is the optimal linear convergence rate of R µ and that R µ is convergent to P Fix Rµ = P Fix R. Moreover, we have Fix R = (U V) (U V ) by [5, Proposition 3.6]. This ensures (ii) and thus completes the proof of the theorem for Case 1. Case : p + q n. Similarly to the proof of Theorem 3.6 and Theorem 3.7, we find k > 0 such that n + k := n > p + q. Define further that U := U {0 k } R n, V := V {0 k } R n, and R µ = (1 µ) Id +µ[p U P V + P (U ) P (V ) ]. It is easy to verify that ( ) (7) R Rµ 0 µ =. 0 I k Note from Case 1 that R µ is normal, and so is R µ. Morever, we get from (7) that R µ is convergent if and only if R µ is convergent with the same rate. The analysis of the convergence of R µ in Case 1 justifies all the statement of the theorem in this case. The proof is complete. 3

24 Remark 3.11 (1). Unlike the relaxed alternating projection methods studied in Theorem 3.6 and 3.7, convergence rate of the (over and under) relaxation of the Douglas-Rachford algorithm is always bigger than the original one due to γ(r 1 ) = cos θ s+1 µ( µ) cos θ s+1 + (1 µ) = γ(r µ ) for all µ [0, ). Moreover, it is worth mentioning here that Theorem 3.10 also tells us that R = R U R V, which is known as reflection-reflection method will never be convergent in the case of two nontrivial subspaces with 1 dim U, dim V n 1. (). For the linear convergence rate of the Douglas-Rachford method on a general Hilbert space, see [5]. 4 A nonlinear approach to the alternating projection method Throughout this section, we also suppose that U and V are two subspaces of R n with 1 p = dim U dim V = q n 1. From Theorem 3.7, we know that the map S µ (57) obtains its smallest rate sin θ p sin θ s+1 at µ =. This rate is smaller than the optimal rate of T sin θ s+1 +sin θ p sin θ s+1 +sin θ µ p and T. However, it is not trivial to determine θ s+1 and θ p to construct µ = for sin θ s+1 +sin θ p S µ especially with big dimensions of U and V; see Definition 3.1, Definition 3., and (67). In this section we introduce a simple nonlinear mapping, by using the idea of a line search [6, 3, 5] for the map S µ, so that the iterative sequence given by this nonlinear mapping is linearly convergent to the projection on U V with the same optimal rate mentioned above, so at least as fast convergence rate as the one using the optimal relaxation parameter. One may think of this mapping as the partial relaxed alternating projection with an adaptive parameter µ(x) depending on each iteration period. This is a technique employed for other iterative methods; see, e.g., [3, 4, 11, 1, 13, 1]. Definition 4.1 Define the map B T with T = P U P V by (73) B T (x) := P U ((1 µ x )x + µ x P V x) = (1 µ x )P U x + µ x P U P V x, where (74) P U x P U P V x, x µ x := P U x P U P V x if P U x P U P V x = 0 1 if P U x P U P V x = 0. Remark 4. In [4, 6, 3], an accelerated mapping of T is introduced by using the line-search [5] as (75) A T (x) := (1 λ x )x + λ x P U P V x, where (76) x P U P V x, x λ x = x P U P V x if x P U P V x = 0 1 if x P U P V x = 0. It is worth noting that µ x = λ x and B T x = A T x when x U. 4

Elementary linear algebra

Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The