A CS decomposition for orthogonal matrices with application to eigenvalue computation

Size: px

Start display at page:

Download "A CS decomposition for orthogonal matrices with application to eigenvalue computation"

Jemimah Dorsey
5 years ago
Views:

1 A CS decomposition for orthogonal matrices with application to eigenvalue computation D Calvetti L Reichel H Xu Abstract We show that a Schur form of a real orthogonal matrix can be obtained from a full CS decomposition Based on this fact a CS decomposition-based orthogonal eigenvalue method is developed We also describe an algorithm for orthogonal similarity transformation of an orthogonal matrix to a condensed product form and an algorithm for full CS decomposition The latter uses mixed shifted and zero-shift iterations for high accuracy Numerical examples are presented Keywords Orthogonal matrix Eigenvalue problem Full CS decomposition High accuracy AMS subject classification 65F15 15A3 15A18 15B10 65G50 65F35 1 Introduction The eigenvalue problem for unitary and orthogonal matrices has many applications including time series analysis signal processing and numerical quadrature; see eg for discussions These applications as well as the elegant theory have spurred the development of numerical methods for these eigenvalue problems Both QR algorithms and divide-and-conquer methods have been developed; see Both types of methods require initial unitary similarity transformation of a given unitary matrix Q C n n to an upper Hessenberg matrix H represented in product form H = H 1 H H n 1 H n where γk σ H k = diag I k 1 k I σ k γ n k γ k + σk = 1 γ k C σ k > 0 (1) k is a complex Householder matrix for 1 k n 1 and H n = diag I n 1 γ n with γ n C unimodular is a truncated complex Householder matrix The Hessenberg matrix H can be transformed further by unitary similarity transformation into the product form H o H e where H o = H 1 H 3 H n 1 H e = H H H n () The latter transformation was first described by Ammar et al in 1 for orthogonal matrices of even order Q R n n and they applied it to determine the eigenvalues and if so desired eigenvectors of Q Orthogonal matrices of odd order can be reduced to orthogonal matrices of even order by deflation; see below Subsequently Bunse Gerstner and Elsner developed a Department of Mathematics Case Western Reserve University Cleveland OH 106 USA dxc57@caseedu Department of Mathematical Sciences Kent State University Kent OH USA reichel@mathkentedu Research supported in part by NSF grant DMS Department of Mathematics University of Kansas Lawrence KS 6605 USA feng@kuedu Research supported in part by Fudan University Key Laboratory Senior Visiting Scholar Project 1

2 QZ-type algorithm for the matrix pencil {Ho H e } by exploiting the quasi-diagonal structure of the matrices H o and H e Here and below the superscript denotes transposition and complex conjugation A superscript T will denote transposition only This paper describes how an orthogonal matrix Q can be brought into the form H o H e directly by an orthogonal similarity transformation This transformation is related to the transformation used by Bunse Gerstner and Elsner but with different reduction targets and reduction order In this way we obtain a new approach for computing the spectral factorization of an orthogonal matrix based on the product form H o H e Real eigenvalues ie eigenvalues ±1 of an orthogonal matrix Q can be removed by deflations; see 1 8 for discussions Therefore throughout this paper we assume that the orthogonal matrix Q is of even order n n and does not have real eigenvalues Then we can apply an orthogonal similarity transformation to the expression H o H e to obtain a matrix of the form ΣZΣZ T Z11 Z Σ = diagi n I n Z = 1 Z 1 Z where the blocks Z 11 Z 1 Z1 T Z T are n n and upper bidiagonal Suppose that T U1 0 Φ Ψ V1 0 Z = 0 U Ψ Φ 0 V is a full CS decomposition where U 1 U V 1 V are orthogonal Φ = diagφ 1 φ n Ψ = diagψ 1 ψ n and the diagonal entries satisfy φ i + ψ i = 1 and φ i ψ i > 0 for i = 1 n Then ΣZΣZ T = U1 0 0 U Φ Ψ ΦΨ ΦΨ Φ Ψ U1 0 0 U T from which we obtain a real Schur form of Q after a simple permutation We develop an algorithm to compute a full CS decomposition of the matrix Z and determine a real Schur form of Q from this decomposition Sutton 16 describes an algorithm for computing a full CS decomposition of a unitary matrix in a block form His algorithm is more reliable than other CS decomposition methods Our algorithm follows the approach described in 17 for SVD iterations An orthogonal matrix is perfectly well-conditioned and all its eigenvalues are unimodular Therefore any backward stable method will compute all eigenvalues with high accuracy However this does not mean that small real and imaginary parts of the eigenvalues always are computed with high relative accuracy For this reason we apply the Demmel Kahan zero-shift SVD iteration technique 5 to accurately compute small singular values of blocks of Z When an orthogonal similarity transformation needs to be generated for the orthogonal eigenproblem a CS decomposition-based eigenvalue method only requires a pair of orthogonal matrices {U 1 U } or {V 1 V } to be determined This results in an efficient algorithm Moreover differently from QR-type algorithms which normally use unimodular shifts the CS decomposition-based method uses standard shifts We remark that the initial reduction method can be applied to complex unitary matrices as well The factorization ΣZΣZ T provides a new condensed form for real orthogonal matrices and therefore is of independent interest The full CS decomposition algorithm can be applied to solve real orthogonal eigenvalue problems as well as to compute a CS decomposition of a unitary matrix This paper is organized as follows Section describes the initial orthogonal transformation that determines the representation H o H e of the given matrix Q We explain in Section 3 how a real orthogonal matrix is orthogonally similar to ΣZΣZ T and why a full CS decomposition of Z gives a Schur form of ΣZΣZ T Section contains a full CS decomposition algorithm and some implementation details and Section 5 presents a new algorithm for the orthogonal eigenvalue problem based on the full CS decomposition This section also contains a few computed examples Section 6 contains concluding remarks

3 In the remainder of this paper I denotes an identity matrix e j is the jth column of I and stands for the Euclidean vector norm or associated induced matrix norm We use MATLAB inspired notation to define a submatrix For instance for a matrix A = a ij we let A(i : j k) := a ik a jk T denote the submatrix that consists of the vector made up of the entries of column k of A in rows i through j Similarly A(k i : j) := a ki a kj stands for the vector consisting of the entries of row k and columns i through j Initial reduction to condensed form This section describes a process for reducing a n n unitary matrix Q by unitary similarity transformation to the form H o H e where both H o and H e are in product form; see () Details of the process are shown in the algorithm below which uses the following elementary matrices: (a) H k (γ σ) or simply H k (1 k p 1) denotes a p p Householder matrix analogous to (1) Note that for a vector x = x 1 x T 0 with x 0 we have x 1 x γ = σ = 0 x1 + x x1 + x Then γ σ σ γ x = x 0 For k = p we let H p (γ) = diagi p 1 γ When σ is real H k (γ σ) = H k (γ σ) (b) H i (x) is a modified p p Householder matrix associated with a vector x C p i+1 It is defined by H i (x) = diagi i 1 H diagi i 1 β I p i where H is a Householder matrix such that Hx = β x e 1 and β = 1 Therefore H i (x) 0 = x x e i When i = p x is a scalar and H p (x) = diagi p 1 β where β = 1 if x = 0; otherwise β = x/ x For a given vector x = y T x i z T T with y C i 1 z C p i and x i z T T 0 we can determine a Householder matrix H i (γ σ) with σ 0 and a matrix H i+1 (z) such that H i (γ σ) H i+1 (z) x = y α 0 T α = x i + z > 0 Algorithm 1 Given a n n unitary matrix Q the algorithm computes a unitary matrix Θ 0 such that Θ 0QΘ 0 = H o H e 0 Set Θ 0 = I n 1 For k = 1 n 1 (a) Determine H k := H k (Q(k : n k 1)) (b) Update Q H k QH k Θ 0 Θ 0 H k 3

4 (c) Determine H k 1 := H k 1 (γ k 1 σ k 1 ) with γ k 1 and σ k 1 0 satisfying γk 1 σ k 1 qk 1k 1 = e σ k 1 γ k 1 q 1 kk 1 (d) Update Q H k 1 Q (e) Determine H k+1 := H k+1 (Q(k k + 1 : n) ) (f) Update Q H k+1 QH k+1 Θ 0 Θ 0 H k+1 (g) Determine H k := H k (γ k σ k ) with γ k and σ k 0 satisfying γk σ k qkk = e σ k γ k q 1 kk+1 (h) Update Q QH k End For (a) Determine H n := H n (Q(n n 1)) (b) Update Q H n QH n Θ 0 Θ 0 H n (c) Determine H n 1 := H n 1 (γ n 1 σ n 1 ) with γ n 1 and σ n 1 0 satisfying γn 1 σ n 1 qn 1n 1 = e σ n 1 γ n 1 q 1 nn 1 (d) Update Q H n 1Q (e) Determine H n := H n (γ n ) with γ n = q nn We illustrate the process with a 6 6 unitary matrix For k = 1 we determine H H 1 H 3 H such that x x x x x x x x x x x x x x x x x x Q = x x x x x x H x x x x x x QH + x x x x x 0 x x x x x 0 x x x x x x x x x x x 0 x x x x x x x x x x x 0 x x x x x H 1 H QH H 3 H 1 H QH H 3 H 1 0 x x x x x 0 x x x x x 0 x x x x x 0 x x x x x 0 x x x x x x x x x 0 x x x x 0 x x x x 0 x x x x H 3 H 1 H QH H x x x x x x 0 x x x x x 0 x x x x x 0 x x x x x where 0 denotes a zero introduced by transformation + stands for a nonnegative entry the entries 1 are due the fact that the matrices H i and Q are unitary and is a zero entry that arises because we work with unitary matrices For k = we repeat the procedure with the trailing submatrix to obtain H 5H 3 H H 3H 1 H QH H 3 H H H 5 H = x x 0 0 x x

5 Finally application of H 6 H 5 and H 6 yields H5 H 6H 5H 3 H H 3H 1 H QH H 3 H H H 5 H H 6 H6 = I 6 In general we have Hn 1H nh n 1 H 3H 1 H QH H 3 H H n 1 Hn H n Hn = I Using the fact that H k commutes with H j and H j for any j > k + 1 and letting the above equation can be written as which gives Θ 0 = H H 3 H n 1 H n (H n 1H n 3 H 3 H 1 )Θ 0QΘ 0 (H H H n H n) = I Θ 0QΘ 0 = H o H e H o = H 1 H 3 H n 1 H e = H n H H = H H H n (3) Several comments are in order: 1 Algorithm 1 is similar to the algorithm described in for unitary matrices The main difference is that the latter takes H o as the reduction target Our algorithm computes the matrices H 1 H n 1 explicitly in the elimination process With H k 1 being involved explicitly Algorithm 1 avoids the decision-making step for determining the matrices H k+1 and is more straightforward to implement and potentially more reliable Algorithm 1 follows the same approach used in 17 for the initial reduction of a full CS decomposition The procedure in 17 differs in that it applies two-sided non-similar block diagonal unitary transformations 3 If H k is diagonal for some k during the computations the unitary matrix H o H e decouples and yields two or more submatrices If Q is real orthogonal because det H k = 1 for all k = 1 n 1 it follows that det Q = det H n (γ) = γ In this case if γ = 1 then 1 and 1 must be eigenvalues of Q These eigenvalues can be deflated; see Algorithm 1 requires about 6n 3 /3 flops for computing H o H e If the unitary matrix Θ 0 has to be stored additional 3n 3 /3 flops are needed Here one flop stands for one of the arithmetic floating-point operations + We briefly turn to the stability of Algorithm 1 Let ˆΘ 0 Ĥ k (k = 1 n) be the computed matrices determined by the reduction process It follows from standard error analysis 6 Chapter 5 that there are unitary matrices Θ 0 and H k (k = 1 n) such that with I + = ( H 1 H3 H n 1 ) Θ 0 (Q + E 1 ) Θ 0 ( H H H n Hn ) E 1 Θ 0 ˆΘ 0 H 1 Ĥ1 H n Ĥn = O(µ) where I + is the computed matrix obtained without enforcing the diagonal entries to be 1 and the entries with in the illustration to be zero; µ denotes machine precision From (I + ) (I + ) I n = (Q + E 1 ) (Q + E 1 ) I n = O(µ) and the zero pattern of as shown in the illustration one can show similarly as in 17 that = O(µ) 5

6 It is easily seen that Ĥ 1 Ĥ 3 Ĥn 1 = H 1 H3 H n 1 (I + F o ) Ĥ Ĥ Ĥn = H H H n (I + F e ) with F o F e = O(µ) Then for the computed matrices Ĥ o := Ĥ1Ĥ3 Ĥn 1 Ĥ e := ĤĤ Ĥn one has Ĥ o Ĥ e = Θ 0(Q + E) Θ 0 This shows backward stability of Algorithm 1 E = O(µ) 3 Transformation of H o H e to ΣZΣZ T From now on we assume that Q R n n is orthogonal and that there is a real orthogonal matrix Θ 0 such that Θ T 0 QΘ 0 = H o H e where H o = γ1 σ H 1 H 3 H n 1 = diag 1 σ 1 γ 1 H e = H H H n = diag 1 γ σ σ γ γ3 σ 3 σ 3 γ 3 γn 1 σ n 1 σ n 1 γ n 1 γn σ n 1 σ n γ n and σ k > 0 for k = 1 n 1 Note that all γ 1 γ n 1 are real Define for any γ σ R with σ + γ = 1 and σ > 0 the numbers a b R with a b > 0 by ( ) σ (1+γ) (1+γ) γ 0 (a b) = ( ) (1 γ) σ γ < 0 (1 γ) Then one obtains the spectral decompositions a b T γ σ a b b a σ γ b a = () and b a a b T γ σ σ γ b a a b = (5) In fact if σ = sin θ and γ = cos θ for some θ (0 π) then a = cos(θ/) and b = sin(θ/) Application of () to each block of H o yields an orthogonal matrix a1 b Θ o = diag 1 a b an b n b 1 a 1 b a b n a n with a k b k > 0 and a k + b k = 1 for k = 1 n such that Θ T o H o Θ o = diag Similarly by applying (5) to each block in H e we can determine an orthogonal matrix f1 g Θ e = diag 1 1 fn 1 g n 1 1 g 1 f 1 g n 1 f n 1 6

7 with f k g k > 0 and fk + g k = 1 for k = 1 n 1 such that Θ T e H e Θ e = diag Introduce Then Now define where P = e 1 e 3 e n 1 e e e n Σ = (Θ o P ) T H o (Θ o P ) = (Θ e P ) T H e (Θ e P ) = Σ In 0 0 I n Z := (Θ o P ) T (Θ e P ) = (P T Θ o P ) T (P T Θ e P ) =: Z o Z e Z o = a 1 b 1 a b a n b 1 a 1 b a b n Z e = b n 1 0 f 1 g 1 0 a n f n 1 g n g 1 f 1 0 g n 1 f n (6) Then it follows from that Θ T 0 QΘ 0 = H o H e = Θ o P Σ(Θ o P ) T (Θ e P )Σ(Θ e P ) T (Θ 0 Θ o P ) T Q(Θ 0 Θ o P ) = ΣZΣZ T (7) Because Z is an orthogonal matrix it has a CS decomposition see eg 6 T U1 0 V1 0 Φ Ψ Z = 0 U 0 V Ψ Φ where U 1 U V 1 V R n n are orthogonal and Φ = diagφ 1 φ φ n Ψ = diagψ 1 ψ ψ n (8) with φ k + ψ k = 1 and φ k ψ k > 0 for k = 1 n Finally let U1 0 Θ := Θ 0 Θ o P P (9) 0 U 7

8 Then we have the real Schur decomposition Θ T φ QΘ = diag 1 ψ1 φ 1 ψ 1 φ 1 ψ 1 φ 1 ψ1 φ n ψn φ n ψ n φ n ψ n φ n ψn (10) Note that for we have another real Schur decomposition V1 0 Θ = Θ 0 Θ e P 0 V Θ T Q Θ = Θ T Q T Θ = diag φ 1 ψ 1 φ 1 ψ 1 φ 1 ψ 1 φ 1 ψ 1 P φ n ψn φ n ψ n φ n ψ n φ n ψn In order to generate the orthogonal matrices Θ or Θ one has to compute either the matrix pairs {U 1 U } or {V 1 V } but not necessarily both of them We remark that in 1 an SVD-based method was proposed for solving the eigenvalue problem for Q However the approach there is different from ours We finally note that the factorizations of this section do not apply to complex unitary matrices because their eigenvalues might not appear in complex conjugate pairs Computation of a full CS decomposition of Z We turn to the problem of computing a full CS decomposition of Z The product form Z = Z o Z e with Z o Z e given by (6) yields Z = a 1 b 1 g 1 b 1 f 1 a f 1 a g 1 bn 1 g n 1 bn 1 f n 1 a n f n 1 a n g n 1 b n b 1 a 1 g 1 a 1 f 1 b f 1 b g 1 an 1 g n 1 an 1 f n 1 = α 11 β 11 α 13 α 1 β13 βn 11 αn 13 b n f n 1 b n g n 1 a n α n1 β n 13 α n3 α 1 β 1 α 1 α β1 βn 1 αn 1 Z11 Z =: 1 Z 1 Z (11) α n β n 1 α n The further discussion requires the following elementary orthogonal matrices: (a) For x = x 1 x T 0 we define Householder matrices H ij (x) R n n and Givens matrices G ij (x) R n n by H ij (x) = I n + (γ 1)e i e T i (γ + 1)e j e T j + σ(e i e T j + e j e T i ) 1 i < j n G ij (x) = I n + (γ 1)(e i e T i + e j e T j ) + σ(e j e T i e i e T j ) 1 i < j n 8

9 where γ = x 1 x σ = x 1 + x x 1 + x satisfy γ σ σ γ γ σ x = σ γ T x = x 0 Note that γ σ 0 when x 1 x 0 For clarity we sometimes will write H ij (x) and G ij (x) as H ij (γ σ) and G ij (γ σ) respectively (b) For x = x 1 x T 0 and 1 i n < j n we define n n Givens matrices G ij (x) = I n + (γ 1)(e i e T i + e j e T j ) + σ(e j e T i e i e T j ) where γ = x x σ = 1 x 1 + x x 1 + x are such that γ σ σ γ Again γ σ 0 when x 1 x 0 x = In 16 Sutton proposed a full SVD iteration process that applies the bidiagonal iteration to all four bidiagonal blocks of Z simultaneously For one simultaneous bidiagonal iteration it determines real orthogonal matrices U 1 U V 1 V R n n such that T Z = Z U1 U V1 V with Z of the same form as Z The iteration is based on the fact that for any pair of real numbers µ 1 µ satisfying µ 1 + µ = 1 we have 0 x Z T 11Z 11 µ 1I = µ I Z T 1Z 1 Z 11 Z T 11 µ 1I = µ I Z 1 Z T 1 Z T Z µ 1I = µ I Z T 1Z 1 Z Z T µ 1I = µ I Z 1 Z T 1 In exact arithmetic the simultaneous bidiagonal iteration is equivalent to one bidiagonal iteration on Z 11 (or Z ) with a shift µ 1 or on Z 1 (or Z 1 ) with a shift µ Note that one bidiagonal iteration on a block say Z 11 with a shift µ 1 is equivalent to one QR iteration on Z11Z T 11 with the same shift µ 1 6 Sec 86: if Z 11 = U1 T Z 11 V 1 is upper bidiagonal with U 1 V 1 real orthogonal and V 1 e 1 parallel to (Z11Z T 11 µ 1I)e 1 then Z 11 T Z 11 = V1 T (Z11Z T 11 )V 1 is symmetric tridiagonal generated from Z11Z T 11 using a similarity transformation with V 1 Here we describe a related simultaneous iteration procedure that is based on the alternative initial reduction procedure considered in 17 The main difference is that in Algorithm below one does not need to decide which row or column will be used for generating a Givens matrix in each step of the bulge-chasing process Another difference is that Algorithm computes Z o Z e of the same forms as Z o Z e defined in (6) by reducing Z to I and then determines Z = Z o Ze after the reduction process Algorithm Full CS bidiagonal iteration Given a n n real orthogonal matrix Z of the form (11) the algorithm computes orthogonal matrices U 1 U V 1 V R n n such that T U1 V1 Z = Z U by applying one QR iteration step to Z T 11Z 11 with shift µ 1 The orthogonal matrices U 1 U V 1 V are determined by updating available matrices V 9

10 0 Compute x = (Z T 11Z 11 µ 1I)(1 : 1) and F 1 := G 1 (x) Update Z 11 Z 11 F 1 Z 1 Z 1 F 1 Update V 1 V 1 F 1 1 For k = 1 n 1 (a) Determine G kk+1 := G kk+1 (Z 11 (k : k + 1 k)) and G kk+1 := G kk+1 (Z 1 (k : k + 1 k)) (b) Update Z 11 G T kk+1 Z 11 Z 1 G T kk+1 Z 1 U 1 U 1 G kk+1 Z 1 G T kk+1 Z 1 Z G T kk+1 Z U U Gkk+1 (c) Determine H kn+k := H kn+k (Z 11 (k k) Z 1 (k k) T ) (d) Update Z H kn+k Z (e) Determine F k+1k+ := G k+1k+ (Z 1 (k k + 1 : k + ) T ) (and F n := F nn+1 := diagi n 1 sign(z 1 (n 1 n))) and F kk+1 := G kk+1 (Z (k k : k + 1) T ) (f) Update Z 11 Z 11 F k+1k+ Z 1 Z 1 F k+1k+ V 1 V 1 F k+1k+ Z 1 Z 1 Fkk+1 Z Z Fkk+1 V V Fkk+1 (g) Determine G k+1n+k := G k+1n+k (Z 1 (k k + 1) Z (k k) T ) (h) Update Z ZG T k+1n+k End For (a) Determine G n := diagi n 1 sign(z 11 (n n)) and G n := diagi n 1 sign(z 1 (n n)) (b) Update Z 11 G T n Z 11 Z 1 G T n Z 1 U 1 U 1 G n Z 1 G T n Z 1 Z G T n Z U U Gn (c) Determine H nn := H nn (Z 11 (n n) Z 1 (n n) T ) (d) Update Z H nn Z (e) Determine F n := diagi n 1 sign(z (n n)) (f) Update Z 1 Z 1 Fn Z Z Fn V V Fn 3 Form Z = H 1n+1 H n+ H nn G n+1 G 3n+ G nn 1 The sign function used in the algorithm is defined for real x by { x/ x if x 0 sign(x) = 1 otherwise We illustrate the computations of Algorithm with a 6 6 matrix Z: x x 0 x 0 0 x x 0 x x x x x 0 f x x x x 0 Z = 0 0 x 0 x x F x 0 x x x x 0 x 0 0 x x 0 x x x x x 0 f x x x x x 0 x x 0 0 x 0 x x G 1 G 1 + x f x f 0 0 x x x x x 0 x x + x f x f 0 0 x x x x x 0 x x H x x x x x 0 x x 0 x x x x 0 0 x x x x x 0 x x 10

11 F 3 F 1 G 3 G 3 F 3 F 3 G 3 G x x x x 0 0 f x f x x x x x x 0 0 f x f x x x x f 0 0 x x x x x f 0 0 x x x x x x x x x x x F 3 G T H 5 G T 35 H x x x 0 0 x x x x x x x 0 0 x x x x x x x x x x 0 0 x x x x x x x x = I 6 where 0 denotes zero + stands for a nonnegative entry 1 is one is a zero obtained because Z is orthogonal and elements marked by f are obtained by fill-in They are generically nonvanishing Because H kn+k commutes with the block diagonal matrix diagg jj+1 G jj+1 and G T k+1n+k commutes with diagf j+1j+ F jj+1 for any j > k we obtain In general one has T U1 0 V1 0 H 36 H 5 H 1 Z G T 0 U 0 V G T 35 = I 6 H nn H n+ H 1n+1 U1 0 0 U T V1 0 Z 0 V Therefore T U1 0 V1 0 Z = 0 U 0 V Z o Ze = Z where G T n+1 G T nn 1 = I n Z o = H 1n+1 H n+ H nn Ze = G n+1 G 3n+ G nn 1 11

12 Algorithm requires about 130n flops If the matrices U 1 U V 1 V have to be updated then additional n flops are needed during the whole iteration process One has to determine a shift µ 1 for the rotation F 1 For instance one may choose the Wilkinson shift ie an eigenvalue of the trailing submatrix of Z 11 Z T 11 that is closest to the last diagonal entry of Z 11 Z T 11; see 16 for details on the choice of shift One can show that Algorithm is backward stable similarly as we did for Algorithm 1 Alternatively one can proceed in the same manner as in 17 For the eigenvalue problem of an orthogonal matrix any backward stable algorithm is also forward stable Hence the eigenvalues can be computed accurately with a backward stable method This however does not guarantee that small real and imaginary parts of the eigenvalues will be computed accurately In our case this is equivalent to the fact that small singular values of the blocks of Z might not be computed accurately We therefore describe another full CS decomposition iteration based on the Demmel Kahan zero-shift SVD iteration 5 to achieve high accuracy of small singular values The iteration procedure from Z to Z is the standard bulgechasing process Since µ 1 = 0 the iteration formulas can be explicitly derived Theorem 1 Let Z be the matrix generated from Z in (11) with the zero-shift full CS iteration Then α k1 = r k α k = α k3s k α k3 = α kq k α k = q k p k r k where for k = 1 n β k1 = s k1 p k+1 βk = β k3p k+1 s k βk3 = β k r k+1 q k βk = z k s k+1 (1) p k = (t k 11α k1 ) + βk1 rk = (c k 11p k ) + (α k+11 z k1 ) s k = (c k 1α k ) + βk qk = (t k 1s k ) + (s k α k+1 ) t k1 = t k 11α k1 z k1 = β k1 p k p k c k1 = c k 11p k s k1 = α k+11z k1 r k r k c k = c k 1α k s k = β k s k s k t k = t k 1s k z k = s kα k+1 q k q k (13) with t 01 = c 01 = c 0 = t 0 = 1 and β nj = 0 for j = 1 3 Proof A proof is provided in Appendix A We note that the block Z 1 is not involved in the iteration The four sets of parameters in (13) stem from four Givens matrices each of which can be computed with the following algorithm 6 Algorithm 3 Givens matrix Given a real vector x = x 1 x T 0 the algorithm computes γ σ 1 γ σ and r = x such that x = r σ γ 0 If x 1 > x else end Compute t = x /x 1 s = 1 + t Compute γ = 1/s σ = t/s r = x 1 s Compute t = x 1 /x s = 1 + t Compute γ = t/s σ = 1/s r = x s Theorem 1 suggests the following iterative method 1

13 Algorithm Zero-shift CS bidiagonal iteration Given a n n real orthogonal matrix Z of the form (11) the algorithm computes orthogonal matrices U 1 U V 1 V R n n such that U1 U T V1 Z V = Z based on one QR iteration step applied to Z T 11Z 11 with zero shift The orthogonal matrices U 1 U V 1 V are determined by updating the available matrices 0 Set t 01 = t 0 = c 01 = c 0 = 1 and β nj = 0 for j = For k = 1 n 1 (a) Compute p k and G kk+1 (t k1 z k1 ) with x = t k 11 α k1 β k1 T (b) Compute r k and G kk+1 (c k1 s k1 ) with x = c k 11 p k α k+11 z k1 T (c) Compute s k and G kk+1 (c k s k ) with x = c k 1 α k β k T (d) Compute q k and G kk+1 (t k z k ) with x = t k 1 s k α k+1 s k T (e) If k > 1 End if Compute β k 11 = s k 11 p k β k 1 = β k 13 p k /s k 1 β k 13 = β k 1 r k /q k 1 β k 1 = z k 1 s k (f) Compute α k1 = r k α k = α k3 s k /p k α k3 = α k q k /r k α k = q k (g) Update U 1 U 1 G kk+1 (c k1 s k1 ) U U G kk+1 (c k s k ) and V 1 V 1 G kk+1 (t k1 z k1 ) V V G kk+1 (t k z k ) (a) Compute p n = α n1 t n 11 r n = c n 11 p n s n = α n c n 1 q n = t n 1 s n (b) Compute β n 11 = s n 11 p n β n 1 = β n 13 p n /s n 1 β n 13 = β n 1 r n /q n 1 β n 1 = z n 1 s n (c) Compute α n1 = r n α n = α n3 s n /p n α n3 = α n q n /r n α n = q n This algorithm requires (n 1) + flops Additional n(n 1) flops are needed to update U 1 U V 1 V The factored form Z o Ze is easily derived from Z if it is required One may simply use the entries of Z 11 and Z 1 and apply the following code: 0 ã 1 = α 11 b1 = α 1 1 For k = 1 n 1 (a) Determine f k ã k+1 b k+1 by applying Algorithm 3 to x := α k+11 α k+1 T (b) Determine g k by applying Algorithm 3 to x := β k1 β k T End for This code demands 1n 1 flops We remark that it follows from (11) that one may also use the formulas g k = β k1 / b k or g k = β k /ã k to compute g k The reason for using the formula in the code is to enforce the relation f k + g k = 1 explicitly The following first order error analysis which is based on the results in 5 shows that highly accurate singular values can be computed with Algorithm The floating-point arithmetic is assumed to satisfy fl(α β) = (α β)(1 + δ 1 ) = (α β)/(1 + δ ) δ 1 δ µ where {+ } and µ is the machine precision 13

14 Theorem Suppose that Algorithm is applied to the data {α kj } and {β kj } to compute { α kj } and { β kj } on a computer with machine precision µ Let {ˆα kj } and { ˆβ kj } be the computed data Then ˆα kj = α kj (1 + ɛ αkj ) ˆβkj = β kj (1 + ɛ βkj ) and 69k 8 ɛ αk1 50k + 13 ɛ βk1 µ ɛ αk 5k 8 µ ɛ αk3 µ ɛ βk 50k + 9 µ ɛ βk3 163k 75 69k 8 µ ɛ αk µ 36k 75 µ ɛ βk Furthermore if during the iterations s k1 s k z k1 z k τ < 1 for all k then 88 38τ ɛ αk1 (1 τ) µ ɛ 19τ β k1 (1 τ) µ 17τ ɛ αk (1 τ) µ ɛ 17τ β k (1 τ) µ τ + 1τ τ + 1τ ɛ αk3 (1 τ) µ ɛ βk3 (1 τ) µ 88 38τ ɛ αk (1 τ) µ ɛ 65 36τ τ β k (1 τ) µ 188k 83 µ (1) Proof A proof is given in Appendix B It is pointed out in 5 that typically the β kj s decease during the iterations Therefore the bounds in (15) will be more realistic in practice We may use Algorithms 3 and together to compute a full CS decomposition of Z where we apply the latter algorithm to compute small singular values and use the former algorithm to compute the other ones Slight modifications of the deflation and stopping criteria proposed in 5 can be used For an n n bidiagonal matrix B with diagonal entries {α j } n j=1 and super(sub)- diagonal entries {β j } n 1 j=1 we apply the following algorithms cf 5: Algorithm A λ n = α n For j = n 1 to 1 End Algorithm B ν 1 = α 1 Compute λ j = α j /(1 + β j /λ j+1 ) For j = 1 to n 1 End Then B 1 1 Compute ν j+1 = α j+1 /(1 + β j /ν j ) = min j λ j and B = min j ν j For each of the blocks of Z we can apply the above algorithms to compute a lower bound for the smallest singular value: σ 11 = min{ Z Z } σ 1 = min{ Z Z } σ 1 = min{ Z Z } σ = min{ Z Z 1 1 } Let tol be a selected tolerance We use the following criterion to switch from zero-shift iteration to shifted iteration: (15) 1

15 If n min{σ 11 σ }tol µ run zero-shift iteration with Z elseif n min{σ 1 σ 1 }tol µ run zero-shift iteration with a permuted Z else run shifted iteration end The permuted matrix Z is given by ˆP 0 0 ˆP T 0 ˆP Z ˆP 0 = α n3 β n 13 α n1 α n 13 βn 11 β13 α1 α 13 β 11 α 11 α n β n 1 α n α n 1 βn 1 β1 α α 1 β 1 α 1 where ˆP = e n e n 1 ( 1) n 1 e 1 This matrix has the same pattern as Z In the situation that n min{σ 11 σ }tol > µ and n min{σ 1 σ 1 }tol µ the blocks Z 11 Z have no small singular values while Z 1 Z 1 do In this case we apply the zero-shift iteration to the permuted Z to compute these small singular values The orthogonal matrices have to be changed accordingly: U 1 U 1 ˆP U U ˆP V1 V ˆP V V 1 ˆP For pre-iteration deflation we use the following deflation criterion when applying Algorithms A and B to blocks of Z: if for some j β j /ν j or β j /λ j+1 tol for all four blocks set β jk = 0 for k = 1 3 For the shifted iteration we use the deflation criterion: if max β n 1k/α nk tol set β n 1k = 0 for k = 1 3 (16) 1 k Algorithm 5 Overall CS decomposition Given a n n matrix Z defined in (11) the algorithm computes a full CS decomposition (8) Initial Step Choose tol and M where M is the maximum number of full CS iterations Set U 1 = U = V 1 = V = I n or the already existing ones Iteration Step (a) Apply Algorithms A and B to each of the four blocks of Z If for some j β j /ν j < tol or β j /λ j+1 < tol for all four blocks Set β j1 = β j = β j3 = β j = 0 Decouple Z to form submatrices of the same form as Z end if (b) For each submatrix (still denoted by Z) of size p p goto (c) (c) Apply Algorithms A and B to compute σ 11 σ 1 σ 1 σ If p max{σ 11 σ }tol < µ 15

16 (d) (e) While max { βp 11 σ 11 βp 1 σ 1 } βp 13 σ βp 1 1 σ tol and # of iterations M Apply Algorithm to Z End while If # of iterations is larger than M then report divergence otherwise set β p 1k = 0 for k = 1 3 and repeat (c) with the reduced matrix Z Elseif p max{σ 1 σ 1 }tol < µ Repeat (d) with the permuted Z Else While max 1 k β p 1k /α pk tol and # of iterations M Apply Algorithm 3 to Z End while If # of iterations is larger than M then report divergence otherwise set β p 1k = 0 for k = 1 3 and repeat (e) with the reduced matrix Z End if We let tol = nµ and M = 3n / in the algorithm Underflow may occur during the iteration process Vanishing elements σ 11 σ 1 σ 1 or σ is a good indication Since Z is orthogonal we may scale up the matrix if underflow takes place The graded structure of Z may cause loss of accuracy This difficulty can be reduced by chasing the bulge in suitable direction (up or down) as described in 5 5 The Schur form and numerical examples The computations for determining the Schur form of a real orthogonal matrix Q are summarized by the following algorithm: Algorithm 6 Orthogonal Schur form Given a n n real orthogonal matrix Q without real eigenvalues the algorithm computes the real Schur form (10) Step 1 Apply Algorithm 1 to Q to compute the factorization (3) Step Compute the factorization (7) Step 3 Apply Algorithm 5 to Z to compute the full CS decomposition (8) (without computing V 1 V ) Step Form the matrix Θ using (9) and determine (10) With proper choices of the tolerances the algorithm is backward stable The main cost is the initial reduction step when fewer than O(n ) zero-shift CS decomposition iterations are carried out This is the typical situation We tested the eigenvalue algorithm with several examples The main purpose of the numerical experiments is to illustrate the accuracy achieved with the algorithm We remark that when computing the eigenvalues of a general orthogonal matrix zeroshift full CS iteration typically will not improve the accuracy This is because the matrix Z is generated after the initial reduction process and rounding errors from this process pollute Z All the experiments were carried out with MATLAB version 7100 on an imac 81 computer with an Intel Core Duo GHz processor Example 1 We generated 0 real orthogonal matrices Q of size using the MATLAB command QR=qr(randn(30)) For each Q we computed the eigenvalues by using Algorithm 6 in two ways Method I is simply Algorithm 6 Method II is essentially the same as Algorithm 16

17 6 but in Step 3 we use the shifted CS iterations only Also for Method II the deflation criterion (16) is changed to max 1 k β n 1k/(α n 1k + α nk ) tol For each method we report the maximum and minimum eigenvalue errors for each matrix The results are shown in Figure 1 where the matrices (labeled in horizontal direction) are sorted according to the magnitude of the maximum errors from Method I The exact eigenvalues are computed by using eig from the MATLAB Symbolic Toolbox For comparison we also display the extreme errors of the eigenvalues computed by the standard MATLAB function eig The broken lines in Figure 1 indicate that the minimum errors are numerically zero Figure 1: Maximum and minimum eigenvalue errors for 0 random orthogonal matrices of Example Max err Method I Max err Method II Max err eig Min err Method I Min err Method II Min err eig Figure 1 shows that Method II computes the eigenvalues at least as accurately as eig while Method I may give less accurate eigenvalues A closer look reveals that the less accurately computed eigenvalues are the ones with small imaginary part and the large errors are caused by the zero-shift CS iterations This is because slow convergence demands many more iterations and this increases the error in the real part while accuracy of the imaginary part cannot be improved Example In this example we tested thirty 0 0 orthogonal matrices of the form Q = UDU T where D is quasi-diagonal containing 10 complex conjugate eigenvalue pairs among which are two pairs ±i10 and ±i10 7 The other pairs are of the form ± 1 d i ±id i where d i is a positive random number generated with the MATLAB function rand and the sign for the real part also is randomly generated The 30 orthogonal matrices are determined by the (fixed) matrix D and 30 randomly generated orthogonal matrices U For each matrix Q the eigenvalues 17

18 are computed by Methods I and II as well as by eig similarly as in Example 1 For each Q the zero-shift CS iterations for both the regular and permuted versions are used with Method I The eigenvalue errors are reported in Figure We observe that similarly as in Example 1 the zero-shift CS iteration yields the largest errors Figure : Maximum and minimum eigenvalue errors for orthogonal matrices of Example Max err Method I Max err Method II Max err eig Min err Method I Min err Method II Min err eig Example 3 In this example we tested the zero-shift CS decomposition iterations to see if the small imaginary parts of orthogonal eigenvalues can be computed accurately To this end we constructed orthogonal matrices of the form H o H e and for each H k defined in (1) we let γ k = cos θ k and σ k = sin θ k with a random θ k (0 π) We tested ten such randomly generated orthogonal matrices of size and observed that the relative errors of the small imaginary parts of the eigenvalues computed with Method I are smaller than for eigenvalues computed with the other methods although the corresponding eigenvalue errors are slightly larger The results of a typical orthogonal matrix are shown in Figure 3 where the errors are arranged according to the magnitude of the imaginary parts (in horizontal direction) 6 Conclusions and open problems We have shown that the eigenvalue problem for a real orthogonal matrix can be formulated as a full CS decomposition problem with a simple transformation and based on this fact we have developed a backward stable eigenvalue method It is interesting that for real orthogonal matrices the eigenvalue problem and the CS decomposition merge in such a way There are still many open problems For instance H o H e is orthogonally similar to the matrix ΣZΣZ T which is actually a block matrix with all four tridiagonal blocks Is it possible to develop an algorithm that uses this structure directly? Also there are several implementation options that can be explored For instance in the CS decomposition iterations one could keep Z 18

19 Figure 3: Eigenvalue errors for a typical orthogonal matrix in Example Absolute errors of eigenvalues Abs err Method I Abs err Method II Abs err eig Relative errors of the imaginary part Rel err imag Method I Rel err imag Method II Rel err imag eig in product form Z o Z e which might improve speed and accuracy somewhat because the product form is determined by only n parameters while Z requires 8n parameters Appendix A Proof of Theorem 1 We would like to use the form of Z in (11) with {a k } {b k } {f k } {g k } Note that (13) is equivalent to ( ) p a1 k f 1 k 1 k = + (b k g k ) c k1 = a 1 kf 1 k 1 b k g k p 1 k 1 p 1 k p ( ) k ( ) rk p1 k ak+1 b k f k g k = + c k1 = p 1 k s k1 = a k+1b k f k g k r 1 k 1 p k r 1 k p k r ( ) k s a1 k f 1 k k = + (b k+1g k ) c k = a 1 kf 1 k s k = b k+1g k s 1 k 1 s 1 k s ( ) k ( ) qk s1 k ak+1 b k+1 f k+1 g k = + t k = s 1 k s k = a k+1b k+1 f k+1 g k q 1 k 1 s k q 1 k s k q k (17) with f 1 0 = p 1 0 = r 1 0 = s 1 0 = q 1 0 = 1 and f n = 1 g n = 0 Here for a number set {x 1 x n } and 1 k n x 1 k := x 1 x x k We first need the following result 19

20 Lemma 3 for k = 1 n s 1 k = (a 1 k+1 f 1 k ) + (b k+1 p 1 k ) (18) p 1 k = (a 1 k f 1 k ) + (g k s 1 k 1 ) = s 1 k + (a k+1 g k s 1 k 1 ) (19) Proof We show (18) by induction When k = 1 using f 1 + g 1 = 1 and p 1 = a 1 + (b 1 g 1 ) = (a 1 f 1 ) + g 1 gives (a 1 a f 1 ) + (b p 1 ) = (a 1 a f 1 ) + b ((a 1 f 1 ) + g 1) = (a 1 f 1 ) + (b g 1 ) = s 1 Assume that (18) holds for k 1 Since p 1 k = (a 1 k f 1 k 1 ) + (b k g k p 1 k 1 ) and from the assumption s 1 k 1 = (a 1 kf 1 k 1 ) + (b k p 1 k 1 ) one has s 1 k = (a 1 k f 1 k ) + (b k+1 g k s 1 k 1 ) = (a 1 k f 1 k ) + (b k+1 g k ) ((a 1 k f 1 k 1 ) + (b k p 1 k 1 ) ) = (a 1 k f 1 k ) (a k+1 + b k+1) + (a 1 k b k+1 f 1 k 1 g k ) + (b k b k+1 g k p 1 k 1 ) = (a 1 k+1 f 1 k ) + (a 1 k b k+1 f 1 k 1 ) + (b k b k+1 g k p 1 k 1 ) = (a 1 k+1 f 1 k ) + b k+1((a 1 k f 1 k 1 ) + (b k g k p 1 k 1 ) ) = (a 1 k+1 f 1 k ) + (b k+1 p 1 k ) The relation (18) now follows by induction From (18) for any k one has (b k+1 p 1 k ) = s 1 k (a 1 k+1 f 1 k ) = (a 1 k f 1 k ) + (b k+1 g k s 1 k 1 ) (a 1 k+1 f 1 k ) = b k+1((a 1 k f 1 k ) + (g k s 1 k 1 ) ) Hence by dividing b k+1 on both sides and using s 1 k = (a 1 k f 1 k ) + (b k+1 g k s 1 k 1 ) we obtain (19) Proof of Theorem 1 We show (1) and (13) by induction following the bulge-chasing process The properties a k + b k = 1 and f k + g k = 1 are used throughout the bulge-chasing process Since µ 1 = 0 from (11) the first column of Z11Z T 11 is parallel to x = a 1 b 1 g 1 T From this x we determine a Givens rotation G 1 (t 11 z 11 ) with p 1 t 11 z 11 defined in (17) By post-multiplying this rotation to Z 11 Z 1 we create a bulge in the ( 1) entry for each of these two blocks Since the first round of bulge-chasing only involves the first three rows and columns of four blocks of Z we focus on the window p 1 b 1 f 1 a b 1 f 1 g 1 p 1 1 a 1 a f 1 p 1 1 b g a g 1 b f Z 1 := a 3 f a 3 g b 3 f 3 a 1 b 1 f1 p 1 1 g 1 p 1 1 a 1 f 1 b 1 b f 1 g 1 p 1 1 a 1 b f 1 p 1 1 a g b g 1 a f b 3 f b 3 g a 3 f 3 Let G 1 (c 11 s 11 ) and G 1 (c 1 s 1 ) be the Givens rotations with r 1 c 11 c 11 and s 1 c 1 s 1 defined in (17) By pre-multiplying G T 1(c 11 s 11 ) to Z 11 and Z 1 and G T 1(c 1 s 1 ) to Z 1 and 0

21 Z simple calculations yield Z 1 r 1 b 1f 1s 1 p 1 a 1 a b 1f 1 g1 r 1p 1 a 1 f 1 r 1 b g p 1 r 1 a1 af1g1 s 1p 1 b p 1 s 1 a b 1 f 1g 1 b 1f 1s 1 a b 1 f 1 g 1 r 1p 1 r 1p 1 r 1p 1 ag1 r 1p 1 b f p 1 r 1 a 3 f a 3 g b 3 f 3 abg1 a s 1 s b f g 1 1 s 1 a1 f1g s 1 a1 f1 s 1 b 3 f b 3 g a 3 f 3 Let G 1 (t 1 z 1 ) and G 3 (t 1 z 1 ) be the Givens rotations with q 1 t 1 z 1 and p t 1 z 1 defined in (17) By post-multiplying G 1 (t 1 z 1 ) to Z 1 Z and G 3 (t 1 z 1 ) to Z 11 Z 1 one obtains Z 1 r 1 a b 1f 1g 1p r 1p 1 b 1f 1q 1s 1 r 1p 1 p 1 r 1 a 3b f g p a 1 3f 1 p 1 aa3bfg1 q 1s 1 b 1f 1s 1 p 1 ag1p s 1 q 1 a 1 b f 1 f s 1p gp1(p +(bf) ) s 1p b b 3f g a 1 b 3f 1 p p 1 ag1p1 p r 1q 1s 1 b f r 1p 1 q 1s 1 a1 abf1 fg1 q 1s 1 abb3fg1 q 1s 1 a3gs1 q 1 b 3 f 3 a1 f1 q 1 b3gs1 q 1 a 3 f 3 Now the initial bulge is chased from the ( 1) to the (3 ) position in Z 11 and Z 1 and we are ready for the next bulge-chasing step Assume that k steps of bulge-chasing have been carried out Now the zoom-in window is formed by the columns and rows k + 1 k + k + 3 of Z 11 Z 1 denoted by Z (1) k+1 and the columns of k k + 1 k + and rows k + 1 k + k + 3 of Z 1 and Z denoted by Z () Z (1) k+1 = Z () k+1 = p 1 k+1 r 1 k a k+ b k+1 f k+1 g k+1 p k+1 k+1 : a 1 k+ f 1 k+1 p 1 k+1 b k+ g k+ a k+3 f k+ a 1 k+1 b k+1 f 1 k+1 f k+1 s 1 k p k+1 g k+1p 1 k (p k+1 +(b k+1f k+1 ) ) s 1 k p k+1 b k+1 b k+ f k+1 g k+1 p k+1 a k+1g k p 1 k+1 p k+1 q k s k r 1 k a k+1a k+ b k+1 f k+1 g k g k+1 q k s k a 1 k+1a k+1 b k+1 f 1 k+1 f k+1 g k q k s k s 1 k a k+1b k+1 b k+ f k+1 g k g k+1 q k s k a 1 k+1 b k+ f 1 k+1 p 1 k+1 a k+ g k+ b k+3 f k+ b k+1 f k+1 r 1 k p 1 k q 1 k s 1 k a k+g k+1 s 1 k q 1 k b k+ f k+ a k+3 g k+ a 1 k+1f 1 k+1 q 1 k b k+g k+1 s 1 k q 1 k a k+ f k+ b k+3 g k+ At step k+1 in order to annihilate the (k+ k+1) entry of Z 11 and Z 1 as well as the (k+ k) entry of Z 1 and Z we use the Givens rotations G k+1k+ (c k+11 s k+11 ) and G k+1k+ (c k+1 s k+1 ) with r k+1 c k+11 s k+11 and s k+1 c k+1 s k+1 defined in (17) After the transformations it is 1

22 straightforward to show that the zoom-in windows are given by Z (1) k+1 Z () k+1 a r 1 k+ a k+ b k+1 f 1 k+1 f k+1 g k+1 a k+ b k+1 b k+ f k+1 g k+1 g k+ k+1 r k+1 p k+1 p 1 k+1 r k+1 p k+1 a 1 k+ f 1 k+1 r 1 k+1 b k+ g k+ p 1 k+1 r 1 k+1 a k+3 f k+ b k+1 f k+1 s k+1 p k+1 a 1 k+a k+ f 1 k+1 g k+1 s k+1 p 1 k+1 a k+b k+ g k+1 g k+ s k+1 b k+ p 1 k+1 s 1 k+1 a 1 k+f 1 k+1 g k+ s 1 k+1 b k+3 f k+ a k+1g k r k+1 p k+1 b k+1 f k+1 s 1 k+1 s k+1 a k+ b k+1 b k+ f k+1 f k+ g k+1 q k s k r k+1 p k+1 q 1 k r k+1 p k+1 a k+1b k+1 f k+1 g k s k+1 q k s k a k+g k+1 p 1 k ((b k+1 f k+1 r 1 k ) +(s 1 k p k+1 ) ) r 1 k+1 p k+1 q 1 k s 1 k b k+ f k+ p 1 k+1 r 1 k+1 a k+3 g k+ s 1 k+1 q 1 k a k+b k+ f k+ g k+1 s k+1 a 1 k+f 1 k+ s 1 k+1 b k+3 g k+ where we used the identity (18) to simplify the (k + 1 k + ) entry of Z 1 Next we annihilate simultaneously the (k + 1 k + 3) entry in both Z 11 and Z 1 and the (k + 1 k + ) entry in both Z 1 and Z with the Givens rotations G k+k+3 (t k+1 z k+1 ) and G k+1k+ (t k+1 z k+1 ) respectively with p k+ t k+1 z k+1 and q k+1 t k+1 z k+1 defined in (17) After these transformations one has Z (1) k+1 a r k+ b k+1 f k+1 g k+1 p k+ k+1 r k+1 p k+1 p 1 k+ r 1 k+1 a k+3 b k+ f k+ g k+ p k+ b k+1 f k+1 s k+1 p k+1 a k+g k+1 p k+ s k+1 a 1 k+ b k+ f 1 k+ f k+ a 1 k+3 f 1 k+ p 1 k+ s 1 k+1 p k+ g k+p 1 k+1 (p k+ +(b k+f k+ ) ) s 1 k+1 p k+ b k+ b k+3 f k+ g k+ a 1 k+ b k+3 f 1 k+ p k+ p 1 k+ Z () k+1 a k+1g k r k+1 p k+1 q k s k b k+1 f k+1 q k+1 s k+1 r k+1 p k+1 a k+g k+1 p 1 k+1 p k+ q k+1 s k+1 r 1 k+1 a k+a k+3 b k+ f k+ g k+1 g k+ q k+1 s k+1 a k+1b k+1 f k+1 g k s k+1 q k s k q k+1 a 1 k+a k+ b k+ f 1 k+ f k+ g k+1 s 1 k+1 q k+1 s k+1 a k+b k+ b k+3 f k+ g k+1 g k+ q k+1 s k+1 b k+ f k+ r 1 k+1 p 1 k+1 q 1 k+1 s 1 k+1 a k+3g k+ s 1 k+1 q 1 k+1 a 1 k+f 1 k+ q 1 k+1 b k+3g k+ s 1 k+1 q 1 k+1 where in order to get the expressions of the (k + 1 k + 1) (k + 1 k + ) and (k + k + ) entries of Z 1 we have used (19) After n steps the zoom-in window of Z is the submatrix formed by the last two rows and columns of Z 11 and Z 1 denoted by Z (1) n 1 and the submatrix formed by the last two rows and

23 three columns of Z 1 and Z denoted by Z () n 1 : Z (1) n 1 = Z () n 1 = p 1 n 1 r 1 n a nb n 1f n 1g n 1 p n 1 a 1 n 1b n 1f 1 n 1f n 1 a 1 nf 1 n 1 p 1 n 1 s 1 n p n 1 gn 1p1 n (p n 1 +(bn 1fn 1) ) s 1 n p n 1 b n 1b nf n 1g n 1 a 1 n 1b nf 1 n 1 p n 1 p 1 n 1 an 1gn p1 n p n 1 q n s n r 1 n an 1anbn 1fn 1gn gn 1 q n s n a1 n a n 1 bn 1f1 n f n 1 gn s 1 n q n s n an 1bn 1bnfn 1gn gn 1 q n s n b n 1f n 1r 1 n p 1 n q 1 n s n angn 1s1 n q 1 n a1 n 1f1 n 1 q 1 n bngn 1s1 n q 1 n b n a n Similarly using the Givens rotations G n 1n (c n 11 s n 11 ) and G n 1n (c n 1 s n 1 ) with r n 1 c n 11 s n 11 and with s n 1 c n 1 s n 1 defined in (17) we annihilate the (n n 1) entry of Z 11 Z 1 as well as the (n n ) entry of Z 1 Z These transformations yield Z (1) n 1 r n 1 b n 1f n 1s n 1 p n 1 a nb n 1f n 1g n 1p n r n 1p n 1 r n angn 1pn s n 1 b np 1 n p ns 1 n 1 where p n := a 1 nf 1 n 1 p 1 n 1 r n := p 1 n r 1 n 1 and Z () n 1 an 1gn rn 1pn 1 q n s n an 1bn 1fn 1gn sn 1 q n s n b n 1f n 1s 1 n s n 1 r n 1p n 1q 1 n angn 1p1 n ((bn 1fn 1r1 n ) +(s 1 n p n 1) ) r 1 n 1p n 1q 1 n s 1 n s1 n 1 q 1 n a nb n 1b nf n 1g n 1 r n 1p n 1 b np 1 n 1 r 1 n 1 anbngn 1 s n 1 a1 nf1 n 1 s 1 n 1 Finally in order to annihilate the (n 1 n) entry in both Z 1 Z we need the Givens rotation G n 1n (t n 1 z n 1 ) with q n 1 t n 1 z n 1 defined in (17) After the transformation Z () n 1 becomes Z () n 1 With g n = 0 one has an 1gn rn 1pn 1 q n s n an 1bn 1fn 1gn sn 1 q n s n b n 1f n 1q n 1s n 1 r n 1p n 1 angn 1p1 n 1p n q n 1s n 1r 1 n 1 q n 1 anbngn 1p1 n s 1 n 1q n 1s n 1 p 1 n = r 1 n = s 1 n = q 1 n = a 1 n f 1 n 1 b nr 1 n 1p 1 n 1 q 1 n 1s 1 n 1 p1 n q 1 n 1 3

24 Eventually Z Z with Z11 Z 1 r 1 a b 1f 1g 1p r 1p 1 r b 1f 1s 1 p 1 ag1p s 1 b f s p a n 1b n f n g n p n 1 r n p n r n 1 a n 1g n p n 1 s n b n 1f n 1s n 1 p n 1 a nb n 1f n 1g n 1p n r n 1p n 1 r n angn 1pn s n 1 b nf ns n p n Z1 Z b 1f 1q 1s 1 r 1p 1 ag1rp q 1s 1 b f q s r p an 1gn rn 1pn 1 q n s n q 1 abfg1s q 1s 1 q an 1bn 1fn 1gn sn 1 q n s n b n 1f n 1q n 1s n 1 r n 1p n 1 angn 1rnpn q n 1s n 1 q n 1 anbngn 1sn q n 1s n 1 b nq ns n r np n q n It is easily shown that the matrix Z can be expressed as Z = r 1 α 1β 11p r 1p 1 α 13q 1s 1 r 1p 1 α 13s 1 p 1 r β 13r p q 1s 1 α n1β n 11p n r n 1p n 1 α n 13q n 1s n 1 r n 1p n 1 r n β13p s 1 q 1 βn 13rnpn q n 1s n 1 α 3s p α β 1s q 1s 1 β n 13p n s n 1 qn 1 α n3s n p n αnβn 1sn q n 1s n 1 The formulas in (1) can be derived using this matrix expression and (13) The sets {p k } {r k } {s k } {q k } have the following properties Lemma (a) For all 1 k n 0 < p k r k s k q k < 1 α n3q ns n r np n q n (b) For k = 1 n 1 > r 1 k > p 1 k > s 1 k a 1 n f 1 n 1 1 > r 1 k > q 1 k > s 1 k a 1 n f 1 n 1

25 Proof (a) Using the fact that r 1 r n are the diagonal entries of Z 11 that q 1 q n are the diagonal entries of Z and that Z is orthogonal one has 0 < r k q k < 1 From the formulas in (13) 0 < p k α k1 + β k1 < 1 0 < s k (b) Using the product form Z = Z o Ze one obtains f k = q 1 k r 1 k α k + β k < 1 Because f k < 1 we have q 1 k < r 1 k The inequalities r 1 k > p 1 k and q 1 k > s 1 k follow from the formulas in (17) The inequality p 1 k > s 1 k is from (19) Finally because 0 < s k < 1 for any k one has s 1 k s 1 n = a 1 n f 1 n 1 Appendix B Error analysis for Algorithm In the following δ with a subscript is a tiny number of size O(µ) Let ˆα be the computed value of α (or α) We use the notation ˆα = α(1 + ɛ α ) (or ˆα = α(1 + ɛ α ) ) and consider first order errors The following error analysis for the computations of a Givens rotation is from 5 Lemma 5 Lemma 5 Suppose that γ σ and ρ = x are the values computed by Algorithm 3 with x = x 1 x T in exact arithmetic and let ˆγ ˆσ ˆρ be the computed values from a slight perturbed x with x 1 (1 + ɛ x1 ) x (1 + ɛ x ) T Then where ˆγ = γ(1 + ɛ γ ) ˆσ = σ(1 + ɛ σ ) ˆρ = ρ(1 + ɛ ρ ) ɛ ρ = ɛ x1 γ + ɛ x σ + δ ρ δ ρ 13 µ ɛ γ = (ɛ x1 ɛ x )σ + δ γ δ γ 1 µ ɛ σ = (ɛ x ɛ x1 )γ + δ σ δ σ 1 µ Proof of Theorem Consider the computed values at the kth iteration of Algorithm We have ˆβ kj = β kj (1 + ɛ βkj ) ˆα kj = α kj (1 + ɛ αkj ) j = 1 3 where and ɛ αk1 = ɛ rk ɛ βk1 = ɛ sk1 + ɛ pk+1 + δ βk1 ɛ αk = ɛ sk ɛ pk + δ αk ɛ βk = ɛ pk+1 ɛ sk + δ βk ɛ αk3 = ɛ αk + ɛ qk ɛ rk + δ αk3 ɛ βk3 = ɛ βk + ɛ rk+1 ɛ qk + δ βk3 ɛ αk = ɛ qk ɛ βk = ɛ zk + ɛ sk+1 + δ βk Define x k 11 := α k1 t k 11 Then δ βk1 δ βk µ δ βk δ βk3 δ αk δ αk3 µ (0) ˆx k 11 = x k 11 (1 + ɛ xk 11 ) By Lemma 5 we have ˆp k = p k (1 + ɛ pk ) ˆt k1 = t k1 (1 + ɛ tk1 ) ẑ k1 = z k1 (1 + ɛ zk1 ) 5

26 where ɛ pk = ɛ xk 11 t k1 + δ pk δ pk 13 µ (1) ɛ tk1 = ɛ xk 11 zk1 + δ tk1 δ tk1 1 µ () ɛ zk1 = ɛ xk 11 t k1 + δ zk1 δ zk1 1 µ (3) Since by () ˆx k1 = x k1 (1 + ɛ xk1 ) ɛ xk1 = ɛ tk1 + δ 1 = ɛ xk 11 zk1 + δ δ 5 µ () Similarly ˆr k = r k (1 + ɛ rk ) ĉ k1 = c k1 (1 + ɛ ck1 ) ŝ k1 = s k1 (1 + ɛ sk1 ) where by (1) (3) and using c k1 + s k1 = 1 ɛ rk = ɛ xk 11 t k1(c k1 s k1) + ɛ ck 11 c k1 + δ rk δ rk 1 µ ɛ ck1 = (ɛ xk 11 t k1 + ɛ ck 11 + δ )s k1 + δ ck1 δ 1 µ δ c k1 1 µ (5) ɛ sk1 = (ɛ xk 11 t k1 + ɛ ck 11 + δ )c k1 + δ sk1 δ sk1 1 µ Combining () with (5) one has ɛxk1 ɛ ck1 zk1 0 t k1 s k1 s k1 ɛxk 11 ɛ ck s k1 µ (6) Using the same trick as in 5 Lemma 7 we get ɛxk1 ɛ ck1 ( k k j=i ( z j1 0 s i1 1 ) k k j=i z j1 i=1 j=i s j1 ) 5 63 µ Hence and therefore ɛ xk1 5k µ ɛ c k1 113k µ ɛ pk 5k 1 µ ɛ tk1 ɛ zk1 5k µ ɛ rk 138k 96 µ ɛ sk1 163k 100 µ In the same way for the errors in s k c k s k q k t k z k with x k 1 = c k 1 α k we obtain ɛxk s k 0 ɛxk 1 ɛ tk c k z k zk + (7) ɛ tk 1 Similarly one has ɛ xk 5k µ and for 5 1+z k ŝ k = s k (1 + ɛ sk ) ĉ k = c k (1 + ɛ ck ) ŝ k = s k (1 + ɛ sk ); ˆq k = q k (1 + ɛ qk ) ˆt k = t k (1 + ɛ tk ) ẑ k = z k (1 + ɛ zk ) 6

27 we have 5k 1 ɛ sk µ ɛ ck ɛ sk 5k µ 138k 96 ɛ qk µ ɛ tk 113k µ ɛ 163k 100 z k µ Substituting these bounds into (0) yields (1) If zk1 z k s k1 s k τ < 1 for all k then from (6) and (7) we obtain that ɛxk1 τ 0 ɛxk 11 Following 5 Lemma 8 ɛ xk1 ɛ xk ɛ ck1 τ τ ɛxk τ 0 ɛ tk τ τ Using the same derivations now yields (15) References ɛ ck 11 ɛxk 1 ɛ tk τ 5 1+τ µ µ ( ) 5 50τ (1 τ) µ ɛ 1(τ + 1) c k1 ɛ tk + µ (1 τ) (1 τ) 1 G S Ammar W B Gragg and L Reichel On the eigenproblem for orthogonal matrices in Proceedings of the 5th Conference on Decision and Control IEEE Piscataway pp G S Ammar W B Gragg and L Reichel Determination of Pisarenko frequency estimates as eigenvalues of an orthogonal matrix in Advanced Algorithms and Architectures for Signal Processing II ed F T Luk Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE) vol 86 The International Society for Optical Engineering Bellingham WA pp G S Ammar L Reichel and D C Sorensen Algorithm 730: An implementation of a divide and conquer algorithm for the unitary eigenproblem ACM Trans Math Software 18: and 0: A Bunse Gerstner and L Elsner Schur parameter pencils for the solution of the unitary eigenproblem Linear Algebra Appl 15/156: J Demmel and W Kahan Accurate singular values of bidiagonal matrices SIAM J Sci Stat Comput 11: G H Golub and C F Van Loan Matrix Computations 3rd ed Johns Hopkins University Press Baltimore W B Gragg Positive definite Toeplitz matrices the Arnoldi process for isometric operators and Gaussian quadrature on the unit circle J Comput Appl Math 6: This is a slightly edited version of a paper published in Russian in Numerical Methods in Linear Algebra Moscow University Press ed E S Nikolaev Moscow University Press Moscow pp W B Gragg The QR algorithm for unitary Hessenberg matrices J Comput Appl Math 16:

The restarted QR-algorithm for eigenvalue computation of structured matrices

Journal of Computational and Applied Mathematics 149 (2002) 415 422 www.elsevier.com/locate/cam The restarted QR-algorithm for eigenvalue computation of structured matrices Daniela Calvetti a; 1, Sun-Mi