arxiv: v1 [math.na] 5 Oct 2018

Size: px
Start display at page:

Download "arxiv: v1 [math.na] 5 Oct 2018"

Transcription

1 SHARP ERROR BOUNDS FOR RITZ VECTORS AND APPROXIMATE SINGULAR VECTORS arxiv: v1 [mathna] 5 Oct 2018 YUJI NAKATSUKASA Abstract We derive sharp bounds for the accuracy of approximate eigenvectors (Ritz vectors) obtained by the Rayleigh-Ritz process for symmetric eigenvalue problems Using information that is available or easy to estimate, our bounds improve the classical Davis-Kahan sin θ theorem by a factor that can be arbitrarily large, and can give nontrivial information even when the sin θ theorem suggests that a Ritz vector might have no accuracy at all We also present extensions in three directions, deriving error bounds for invariant subspaces, singular vectors and subspaces computed by a (Petrov-Galerkin) projection SVD method, and eigenvectors of self-adjoint operators on a Hilbert space 1 Introduction It is well known that the eigenvector corresponding to a near-multiple eigenvalue is ill-conditioned Specifically, the classical Davis-Kahan theory [2] implies that the condition number of eigenvectors of symmetric or Hermitian matrices is 1/, where is the smallest distance between the particular eigenvalue and the other eigenvalues For example, if ( λ, x) with x = 1 is an approximation to an exact eigenpair (λ, x) of a symmetric matrix A with residual r = A x λ x, then the Davis-Kahan sin θ theorem gives the error bound for x [2],[17, Ch 11]: (11) sin (x, x) r, where (x, x) = acos xt x x x and is the distance between λ and the eigenvalues of A other than λ Here and throughout, for vectors denotes the standard Eudlidean norm In view of the bound (11), it is commonly believed that if is smaller than the residual r, then we cannot guarantee any accuracy in the computed eigenvector x In this work, we partly challenge this belief Namely, we examine the accuracy of eigenvectors obtained by the Rayleigh-Ritz process (R-R), the most widely-used process for computing partial (usually extremal) eigenpairs of large-scale symmetric/hermitian matrices, and show that (11) can be improved often significantly, and by a factor that can be arbitrarily large using quantities that are readily available (or can be estimated cheaply) after the computation 2010 Mathematics Subject Classification Primary 15A18, 15A42, 65F15 Key words and phrases Rayleigh-Ritz, eigenvector, Davis-Kahan, error bounds, singular vector, self-adjoint operator This work was supported by JSPS grants No 17H01699 and 18H

2 2 YUJI NAKATSUKASA Of course, the classical Davis-Kahan bound is tight in general: In the absence of additional information other than r and, we cannot improve (11), in that there exist examples for which the bound (11) is essentially tight However, when ( λ, x) is a computed approximate eigenpair (Ritz pair) obtained by R-R, there is usually abundant additional information available that (11) does not use: most importantly, the residual r is orthogonal to the trial subspace, which is rich in the eigenspace corresponding to not only λ but also eigenvalues close to λ Moreover, since the trial subspace in R-R usually contains approximation to nearby eigenpairs (eg when looking for the smallest eigenvalues), a bound can be computed for (which we call the big ), which is roughly the distance between the Ritz value λ and eigenvalues not approximated by the Ritz values; see (23) for the precise definition These are the crucial properties that allow us to improve the Davis- Kahan bound (11) in other words, we take into account the matrix structure generated automatically by R-R to derive sharp bounds for the Ritz vector error Our results essentially show that up to a modest constant, the in (11) can be replaced by, which is usually much wider, thus improving classical results Another way to understand our results is via (structured) perturbation theory: while an eigenvector has condition number 1/ if a general perturbation is allowed, R-R imposes a structure in the perturbation that reduces the structured condition number to 1/ Qualitatively speaking, the fact that the accuracy of Ritz vectors depends on rather than was pointed out by Ovtchinnikov [16, Thm 4] However, the bounds there involve quantities that are unavailable and diffucult to estimate, such as the projector onto an exact eigenspace Our bounds are easy to compute or estimate, using information that is available after a typical computation of an approximate eigenpairs via the R-R process Our bounds are also tight, in that they cannot be improved without additional information In addition, we extend the results in three ways First, we obtain error bounds for invariant subspaces (spanned by more than one eigenvector) computed by R- R This gives an answer to one of the open problems suggested in Davis-Kahan s classical paper Second, we derive their SVD variants, establishing tight bounds for the quality of approximate singular vectors and singular subspaces associated with the largest singular values, obtained by a (Petrov-Galerkin) projection method Finally, we generalize the error bounds to eigenvectors of self-adjoint operators on a Hilbert space Notation λ(a) denotes the spectrum (set of eigenvalues) of a symmetric matrix A σ(a) = {σ i (A)} min(m,n) i=1 is the set of singular values of A C m n, where σ 1 (A) σ 2 (A) σ min (A) = σ min(m,n) (A) 0 I n denotes the n n identity matrix Q C n (n k) is the orthogonal complement of Q C n k Quantities involved in the R-R process wear a hat (eg λ, x, X), and those with tildes are auxiliary objects for the analysis Norms for vectors (lower-case letters) always denote the Euclidean norm 2 for matrices (upper case) represent the spectral norm We use to refer to a general matrix norm, and for inequalities that hold for any fixed unitarily invariant norm Inequalities involving 2,F hold for the spectral and Frobenius norms, but not necessarily for any unitarily invariant norm We denote by A a self-adjoint operator on a Hilbert space, λ(a) its spectrum, and A its spectral (operator) norm We drop the subscript i in λ i, λ i when this can

3 RITZ VECTORS AND SINGULAR VECTORS 3 be done without causing confusion We always normalize eigenvectors and Ritz vectors to have unit norm x = x = 1 Unless otherwise stated, for definiteness we assume that the Ritz values λ 1, λ k approximate the smallest eigenvalues of A (and accordingly the Ritz values are arranged in increasing order λ 1 λ 2 λ k ) This is a typical situation in applications, and clearly the discussion covers the case where the largest eigenvalues are sought (if necessary by working with A) A less common but still important case is when interior eigenvalues are desired, for example those lying in an interval (eg [13]) Our results are applicable to this case also; one subtlety here is that some care is needed in estimating i, since the Ritz values tend to contain outliers in this case 2 Setup 21 Big (good), small (bad) Let A C n n be the (large) Hermitian matrix whose partial eigenvalues are sought, and let Q C n k (n k, usually n k) be a trial subspace with orthonormal columns Q Q = I k (obtained eg via Lanczos, LOBPCG, Jacobi-Davidson or the generalized Davidson method [1]) Following standard practice, for a matrix with orthonormal columns Q, we identify the matrix Q with its column space Span(Q) R-R obtains approximate eigenvalues (Ritz values) and eigenvectors (Ritz vectors) as follows (1) Compute the k k matrix Q AQ (2) Compute the eigendecomposition Q AQ = Ω ΛΩ ( λ 1,, λ k ) = diag( Λ) are the Ritz values, and [ x 1,, x k ] = QΩ are the Ritz vectors The Ritz pairs ( λ i, x i ) thus obtained satisfy x i span(q) for all i, and since Q (AQΩ QΩ Λ) = Q AQΩ Ω Λ = Ω ΛΩ Ω Ω Λ = 0 by construction, we have crucially for this work the orthogonality between Q and the residuals A x i λ i x i Q, for every 1 i k Throughout we assume k 2; indeed when k = 1 there is no room for improvement upon Davis-Kahan Underlying R-R is a matrix of particular structure: Let Q be the orthogonal complement of Q, such that [Q Q ] is a square unitary matrix (and hence so is [QΩ, Q ]), and consider the unitary transformation applied to A λ 1 r1 T (21) à := [QΩ, Q ] λ A[QΩ, Q ] = k rk T r 1 r k A 3 Here R = (Q ) AQΩ = [r 1, r 2,, r k ]; we use the subscript 3 in A 3 because later we partition the (1, 1) block further into two pieces Suppose (λ i, x i ) is an exact eigenpair of à such that à x i = λ i x i, and partition x i = [ wi ] y i z i with w i C, y i C k 1, z i C n k Then since x i = [Q X, Q ] x i is the corresponding eigenvector of A, and ( λ i, x i ) is a Ritz pair with x i = [Q X, Q ]e i where e i = [1, 0,, 0] T, it follows that cos (x i, x i ) = e T i x i = w i, and hence (22) sin (x i, x i ) = y i 2 + z i 2

4 4 YUJI NAKATSUKASA This is a key fact in the forthcoming analysis Fundamental in this work is the distinction between the big i and the small, defined by (23) i := min λ i λ(a 3 ), i := min j {1,,k}\i λ i λ j Intuitively, i measures the distance between the target λ i and the undesired eigenvalues, whereas i is that between λ i and all the other eigenvalues, including the desired ones (eg, λ 2 ) For example when R 0, we have i min λ i λ k+1 ; by contrast i min( λ i λ i+1, λ i λ i 1 ) Observe that i i, and we typically have i i We illustrate this in Figure 21 for i = 1 Throughout the paper, it is helpful to consider the case i = 1, where the target eigenpair is the smallest one λ 1 λ 2 λ k eig(a 3 ) = eig(q A 3Q ) 1 1 k Figure 21 Illustration of typical situation when the smallest eigenvalues are sought, and R is small enough so that λ i λ i While the small is i = min j i λ i λ j min j i λ i λ j, the big is much bigger i = min λ i λ(a 3 ) In addition to i and i, some of the bounds we derive involve λ i λ j for a fixed j {1,, k}\i These lie between i and i Recalling (21), the information clearly available after R-R are the Ritz pairs ( λ i, x i ) for i = 1,, k and the norms of the individual r i, because they are equal to the residuals r i = A x i λ i x i In addition, one can reasonably expect that an estimate (or better yet, a lower bound) is available for i for each i, or at least for small i: when the smallest k eigenpairs are sought, the trial subspace Q assuming it has been chosen appropriately by the algorithm used is expected to be rich in the eigenspace corresponding to those eigenvalues It then follows from standard eigenvalue perturbation theory that A 3 contains only eigenvalues that are roughly at least as large as λ k+1 (A) (up to R, or indeed R 2 [12]) Therefore, although the exact value of λ k+1 (A) is unknown, we can use the knowledge of the Ritz values λ i to estimate i, for example i min λ i λ k+1 or i min λ i λ k ; we use the latter, approximate lower bound in our experiments Similarly, one can estimate i for example as i min j i λ i λ j In practice, an important feature of the residuals is that they are typically graded: r 1 r 2 r k This is because the extremal eigenvalues converge much faster than interior ones; a fact deeply connected with polynomial (and rational) approximation theory [20, 33] We derive bounds (eg Theorem 41) that respect this property, and hence give sharp bounds in practical situations We note that previous bounds exist that involve the big i rather than i ; most notably (aside from Ovtchinnikov s result [16] mentioned in the introduction) Davis-Kahan s generalized sin θ theorem where the angles between subspaces of different dimensions are bounded [2, Thm 61] In this case, however, (in addition to comparing eg a vector and a subspace rather than two vectors) the numerator

5 RITZ VECTORS AND SINGULAR VECTORS 5 is replaced by the entire R rather than the ith column r i The bounds we derive essentially show that, up to a small constant, (i) the small i in (11) can be replaced by the big i, and (ii) the numerator is the ith column r i These combined give a massively improved error bound for x i, especially for small values of i The next section illustrates the first aspect, and the second will be covered in Section partitioning We will derive three error bounds for Ritz vectors; the first, obtained in this section, is simple and vividly illustrates the roles of and, but not sharp in a practical setting In Section 4 we derive two more bounds that give better bounds in practice Here we consider a simplified 2 2 block partitioning of (21) where ] [ Λ1 (31) Ã(= [QΩ, Q ] R A[QΩ, Q ]) =, R A 3 where Λ 1 = diag( λ 1, Λ 2 ) R k k and R = A X X Λ 1 are the computed quantities In other words, we do not distinguish the columns r i of R but treat R as a single residual term Below, we derive bounds for sin (x i, x i ) applicable to i = 1,, k In our analysis, we assume that λ i is the (1, 1) element of à This simplifies the discussion and loses no generality as we can permute the leading k k block of à Moreover, we drop the subscript i in the remainder of this section for simplicity Theorem 31 Let A be a Hermitian matrix as in (21), for which ( λ, x) is a Ritz pair with λ = λ 1 Let (λ, x) be an eigenvector of A, and let = min λ λ(a 3 ) and = min λ λ( Λ 2 ) Then writing R 2 = [r 2, r 3,, r k ], we have (32) sin(x, x) R R ( R 2 (1 + R ) 2 2 ) Note that clearly R 2 2 R 2, so the result implies sin(x, x) R 2 R 2 R 2 (1 + ) 1 + R Proof Let x = w y be an eigenvector of à as in (31) such that à w y = λ w y, z [ ] z z with w C, y C k 1 Then since sin (x, x) = y from (22), the goal is to z bound y and z The bottom part of à x = λ x gives [ ] w (λi n k A 3 )z = R, y from which we obtain (33) z (λi n k A 3 ) 1 2 R [ ] w (λi y n k A 3 ) 1 2 R 2 = R 2 Note that the denominator is, not We also note that the final bound is Davis-Kahan s generalized sin θ theorem where subspaces of different sizes ( x and

6 6 YUJI NAKATSUKASA the [x 1,, x k ]) are compared (and when the perturbation is off-diagonal); in fact, we can also obtain z R 2 [ w ] y, which is the generalized tan θ theorem From the second block of à x = λ x we have (λi k 1 Λ 2 )y = R 2z, and since (λi k 1 Λ 2 )y y, we obtain the important bound (34) y R 2 2 z Combining with (33) we obtain (35) y R 2 R 2 2 Therefore, we conclude that ] sin (x, x) 2 = y 2 [ z giving (32) ( R R 2 2 ) 2 2, We make several remarks regarding the theorem Remark 31 (Qualitative behavior of bounds) Theorem 31 shows that if R 2, then (x, x) R 2 if R 2, then (x, x) R 2 2 Note how Davis-Kahan s bound is insufficient to explain these: When R 2, we improve the bound (11) by a factor /, which is typically 1 Moreover, when R 2, classical results suggest x may have no accuracy at all Nonetheless, Theorem 31 shows that there is still a nontrivial bound for sin (x, x) as long as R 2 ( ) These results are particularly relevant when only low-accuracy solutions are available, so that R 2 is much larger than working precision Remark 32 (Effect of finite precision arithmetic) Crucial in the above argument is that Λ 1 has zero off-diagonal elements In practice in finite-precision arithmetic, the Rayleigh-Ritz process inevitably results in Λ 1 in (31) with off-diagonal elements that are O(u) instead of 0, due to roundoff errors (assuming for simplicity A = O(1)) It is therefore important to address how they affect the bounds As mentioned in the introduction, classical perturbation theory shows that these O(u) terms will perturb x by up to O(u/) Since the off-diagonal O(u) elements in Λ 1 indeed lie in the directions that perturb the eigenvector the most (we return to this in Section 42), to account for roundoff errors we will need to add the term O(u/) to the bound (32) This remark becomes important especially when R is small, so that O(u/) is not negligible relative to R 2 In other words, the folklore that eigenvectors cannot be computed with precision higher than u/ is true; what we refute is the belief that the bound R / (or r /) is sharp our result shows that when R > but R <, Rayleigh-Ritz computes eigenvectors of much higher accuracy than R /

7 RITZ VECTORS AND SINGULAR VECTORS 7 Remark 33 (Different partitionings) We can obtain different bounds depending on where to partition, that is, we can invoke the bound (32) by taking a k k for some k k Each choice of k gives a different bound, since each gives different values of R and (along with, though its dependence on k is usually much less significant) If the computational cost is not a concern, one can compute all possible partitionings and take the smallest bound obtained However, the bounds in Section 4 are often still better in practice Remark 34 (Proof via generalized Davis-Kahan and Saad) The result (32) can also be derived by combining (i) Saad s bound [18, Thm 46], which bounds ( x i, x i ) relative to (Q, x i ), the angle between the desired eigenvector and the trial subspace, and (ii) the generalized Davis-Kahan sin θ theorem [2], in which two subspaces of different dimensions are compared Here we presented a first-principles derivation, as we use the same line of arguments to derive improved and generalized bounds in the forthcoming sections Also noteworthy is Knyazev s paper [10], which generalizes Saad s bound to subspaces He also shows that Ritz vectors contain quadratically small components in eigenvectors approximated by the other Ritz vectors This is essentially captured in (34), which indicates y 2 = O( R 2 2) (absorbing the s in the constant) We revisit this phenomenon for subspaces in Section 5 While we will not repeat them, Remarks 32 and 33 are relevant throughout the paper 31 Experiments To illustrate Theorem 31, we conduct the following experiment; throughout, all experiments were carried out in MATLAB version R2017a using IEEE double precision arithmetic with unit roundoff Let ] [ Λ1 R A = R n n R A 3 where n = 10 (the precise size of n is insignificant), Λ 1 = [ ] 1+ 1 and A3 0, so that 1 We take R R k 2 to be randomly generated matrices using MATLAB s randn function, scaled so that R 2 is fixed to a value 10 i, for i = 0,, 15 For each i, we generate 100 such matrices R, and find the largest value of sin (x, x) from the 100 runs These are shown as observed in Figure 31, along with (i) the classical bound R 2 /, (ii) the new bound (32), (iii) the bound u/, in view of Remark 32 In view of Remark 32, sin (x, x) is bounded by the maximum of (32) and (a small multiple of) u/ Of course, we always have the trivial bound sin (x, x) 1, so putting these together, we have the following bound in finite-precision arithmetic: (36) sin (x, x) min 1, max ( O( u ), R R 2 ) 2 2 We observe in Figure 31 that this is indeed the case, and the new bound (32) gives remarkably sharp bounds for the observed values of sin (x, x) (when it is not dominated by u/, and gives a nontrivial bound 1) This is despite the fact that we are plotting the looser bound R R 2 2 with R 2 2 in (32) replaced by R, as using R 2 makes the bound depend on the particular random instance of R

8 8 YUJI NAKATSUKASA As discussed above, the new bound (32) has two asymptotic behaviors: R 2 / when R 2, and R 2 2/( ) when R 2 This can be seen in the plots, as the change of slope in the new bound around R 2 From the plots with = 10 3 and 10 5, we see that this transition also reflects the observed values of sin (x, x) quite accurately In all cases, the classical Davis-Kahan bound R 2 / tends to be severe overestimates (and r 2 / as in (11) is not much different), and the new bound can provide nontrivial information (bound smaller than 1) even when the Davis-Kahan bound is useless with R 2 / > 1, and the difference between Davis-Kahan and the new bound widens when is small Davis-Kahan (32) observed Davis-Kahan (32) observed u/ u/ R R 10 0 observed 10-5 Davis-Kahan 10 0 observed 10-5 Davis-Kahan u/ (32) u/ (32) R R Figure 31 Illustration of our bound (32) (dashed red; with R 2 replaced by R), varying (upper-left: = 10 1, upper-right: = 10 3, lower-left: = 10 5, lower-right: = ) Observe how sharp (32) is, relative to the classical Davis-Kahan bound R 2 / When R 2 u/, the bound in finiteprecision arithmetic would be the maximum between the new bound and u/ (dashed black, constant line); see (36) 4 Improved error bounds for Ritz vectors The above experiments illustrate the sharpness of the bound (32) given the information R = [r 1,, r k ] and λ i, along with min(eig(a 3 )) When applied in practice, however, we find that the bound (32) is usually a severe overestimate, as we illustrate in Section 41 The reason is that it does not distinguish r 1 from r k

9 RITZ VECTORS AND SINGULAR VECTORS 9 (say), while typically we have r 1 r k, reflecting the difference in speed with which each Ritz pair converges, typically the extremal ones converging first As noted in Section 21, after R-R one also has information on the individual norms r i = A x i λ i x i Here we derive bounds that are essentially sharp using all the information We shall show that if r i are sufficiently small, then sin (x, x) r1 This is usually a massive improvement over (32), and essentially sharp: we cannot improve the bound below r1 1 The argument is similar 1 to Theorem 31 but with more elaborate manipulations The strategy is the same: bound y in terms of z, and use this to bound [ ] y z Theorem 41 In the setting of Theorem 31, If > R2 2 2, then (41) sin (x, x) If > k r i 2, then λ λ i (42) sin (x, x) r 1 k r 1 R2 2 2 r i 2 λ λ i 1 + R ( k ) 2 r i 1 + λ λ i Proof We first prove (41) The main idea is to improve the bound (33) on z As before we have (λi k 1 Λ 2 )y = R2z, so y R2 2 z We also have [ ] w (λi n k A 3 )z = [r 1 R 2 ] y This gives (λi n k A 3 )z R 2 y = r 1 w, hence (λi n k A 3 )z R 2 y r 1 w Using y R2 2 z we obtain (λi n k A 3 )z R z r 1 w Noting that σ min (λi n k A 3 ) =, we have (λi n k A 3 )z z, hence Using the assumption > R2 2 2 ( R ) z r 1w z This together with y R2 2 z sin (x, x) = y [ z] giving (41) and the trivial bound w 1 we obtain r 1 R2 2 2 yields r 1 R R ,

10 10 YUJI NAKATSUKASA The remaining task is to prove (42) The idea to improve the bound (34) on y, or rather its individual entries, using (λi k 1 Λ 2 )y = R 2z Writing y = [y 2,, y k ] T, the ith (i = 2,, k) element gives (λ λ i )y i = ri z, hence (43) y i = r i z λ λ i r i z λ λ, i = 2,, k i We also have w y 2 (λi n k A 3 )z = [r 1, r 2,, r k ] y k This gives (λi n k A 3 )z R 2 y = r 1 w, and (44) (λi n k A 3 )z so using (43) we obtain (λi n k A 3 )z k k r i y i r 1 w, r i 2 z λ λ i+1 r 1w Again using (λi n k A 3 )z z, we therefore obtain ( k r i 2 λ λ i ) z r 1w r i 2 Hence, using the assumption > k and the trivial bound w 1 we λ λ i obtain r 1 z k r i 2 λ λ i [ ] The fact sin (x, x) = y together with (43) completes the proof of (42) z Note that since the bounds y R2 2 z and (43) are both valid, in both bounds (41) and (42), the term with the square root can be replaced with the minimum, that is, ( 1 + min R 2 2 2, ( k 2 r i λ λ i ) 2 ) This applies also to the bounds to follow, but for brevity we do not repeat this remark We also note that the bounds (41) and (42) are not comparable The bound (41) involves the small, which (42) avoids to some extent by using the individual residuals r i ; however, the heavy use of triangular inequalities in the bound (44) suggests (41) can still be a significant overestimate The sharpest bound one can obtain would be via directly bounding the norm y = (λi k 1 Λ 2 ) 1 R 2z Nonetheless, experiments suggest (42) is often a good bound, as we illustrate now

11 RITZ VECTORS AND SINGULAR VECTORS Experiments We illustrate Theorem 41 with experiments more practical than Section 31 We let A R be the classical tridiagonal matrix with 2 on the diagonal and -1 on the super- and sub-diagonals This is a 1D Laplacian matrix, obtained by finite difference discretization We then run the LOBPCG algorithm [11] to compute the smallest eigenpair with a random initial guess, working with a k = 50-dimensional subspace Figure 41 (left) shows the convergence of sin ( x 1, x 1 ) along with four bounds: Davis-Kahan s sin ( x 1, x 1 ) r1, (41), (42) and (32) from the previous section Some data are missing for (41) and (42) in the early steps as they violated the assumption > R2 2 2 or > k r i 2 λ λ i ; note that these assumptions can be checked inexpensively To estimate and we used the available quantities min λ 1 λ k, min λ 1 λ 2 ; the plots look nearly identical if the exact values are used (42) (32) (41) Davis-Kahan exact LOBPCG steps Figure 41 Left: convergence of sin ( x 1, x 1 ) (shown as exact), and its bounds (41), (42) and the Davis-Kahan bound (11) Right: scatterplot of λ vs residuals r i = A x i λ i x i for i = 1, 2,, k = 50, after 20 LOBPCG iterations Note how r i are graded r i r j for i j We make several observations First, (42) gave sharp bounds for sin ( x 1, x 1 ) when applicable For example after eight LOBPCG iterations, Davis-Kahan s sin θ theorem gives bounds > 1, suggesting x 1 may have no accuracy at all Nonetheless, (42) correctly shows that it has at least accuracy 10 3 Second, the bound (32) is poor throughout, because it takes the entire residual matrix norm R 2 in the numerator, without respecting the fact that the residuals r i = A x i λ i x i are typically graded and hence r 1 R 2, as illustrated in Figure 41 (right) Finally, the asymptotic behavior of the bounds as R 0 (many LOBPCG steps) are also in stark contrast This is because up to first order in R, (41) and (42) are r1, whereas Davis-Kahan involves the smaller r1 42 Structured condition number Here we interpret Theorem 41 from the standpoint of perturbation theory Namely, we regard R in (21) as a perturbation to the block diagonal matrix Ã0 := diag( λ 1,, λ k, A 3 ) having eigenvectors e 1,, e k (besides others), the first k canonical vectors Examining (x i, x i ) is equivalent to examining how much the eigenvector e i of Ã0 gets perturbed by R

12 12 YUJI NAKATSUKASA In the opening we mentioned that the condition number of an eigenvector is 1/ i That is, there exists a perturbation E such that Ã0 + E has an eigenvector ê i with (45) (e i, ê i ) = E i + O( E 2 ) Yet, the two bounds in Theorem 41 show that writing R := Ã Ã0 (slightly and harmlessly abusing notation), we have (46) (e i, ê i ) = r i i + O( R 2 ) Note the two changes, both potentially significant: first, is replaced by Second, the norm of the entire perturbation E is replaced by the individual r i, the perturbation only in the ith column of Ã0 An explanation of this effect can be made via structured perturbation analysis In R-R, the perturbation R in (21) is highly structured in two ways: the nonzero pattern, and the grading of r i For example, the perturbation E that would perturb the eigenvector e 1 the most is the (1, 2) and (2, 1) elements in Ã0, as they connect the eigenvalues λ 1 and λ 2, resulting in the (unstructured) condition number 1/ λ 1 λ 2 = 1/ 1 However, these are forced to be zero by the R-R construction Within the structured perturbation allowed in R-R, e 1 is perturbed most by the (k + 1, 1) and (1, k + 1) elements, assuming for the moment A 3 is diagonalized These elements connect the eigenvalues λ 1 and min(λ(a 3 )) λ k+1, resulting in the structured condition number 1/ 1 Regarding the grading of r i, the r j (j i) terms have no effect on ê i up to O( r j 2 ), making r i the only term that affects the leading term in (46) 5 Bounds for invariant subspaces We now turn to bounding errors for invariant subspaces spanned by more than one eigenvector Besides being the natural object in many applications, it is sometimes necessary to resort to subspaces instead of individual eigenvectors, when multiple or near-multiple eigenvalues are present For example, if = O(u), none of the above bounds would be useful, as the O( u ) term in (36) due to roundoff errors is always present Below we derive bounds that give useful information in such cases We briefly recall the definition of angles between subspaces The angles {θ i } k1 i=1 between two subspaces spanned by X C n k1, Y C n k1 with orthonormal columns are defined by θ i = acos(σ i (X Y )); they are known as the canonical angles or principal angles [5, Thm 643] Equivalently, we have sin θ i = σ i (X Y ) (as can be verified eg via the CS decomposition [5, Thm 252]), which is what we use below (and used above for k 1 = 1 to obtain (22)) To clarify the situation, rewrite (21) as Λ 1 (51) Ã := [ X 1 X2 X3 ] A[ X 0 R1 1 X2 X3 ] = 0 Λ2 R2, R 1 R 2 A 3 where [ X 1 X2 X3 ] is an orthogonal matrix, with [ X 1 X2 ] = QΩ, X1 C n k1, X 2 C n (k k1), and X 3 C n (n k) Our goal is to bound sin ( X 1, X 1 ) from above,

13 RITZ VECTORS AND SINGULAR VECTORS 13 where X 1 C n k1 is a matrix of k 1 exact eigenvectors of A, ie, AX 1 = X 1 Λ 1 Defining X 1 = [ X 1 X2 X3 ] X 1, we have à X 1 = X 1 Λ 1, so the columns of X1 are eigenvectors of à With the partitioning X W 1 = Y with W C k1 k1, Y Z C (k k1) k1, Z C (n k) k1, it therefore follows that sin ( X 1, X 1 ) = [ X 2 X3 ] X 1 = Y [ This extends (22), and is a key identity in the forthcoming analysis Z] Sometimes we deal with the angles between subspaces of different dimensions, say [ X 1 X2 ] C n k and X 1 C n k1 with k 1 k In this case the angles are defined via sin θ i = σ i (X1 ([ X 1 X2 ] )) for i = 1,, k 1 Here is the extension of the previous bounds to invariant subspaces Note that and are redefined; we use the same notation as they reduce to the same values when k 1 = 1 Theorem 51 Let A, à be as in (51), with ( Λ 1, X 1 ) being k 1 Ritz pairs Let (Λ 1, X 1 ) be a set of k 1 exact eigenpairs AX 1 = X 1 Λ 1 Let = min λ(λ 1 ) λ(a 3 ) and = min λ(λ 1 ) λ( Λ 2 ) Then writing R = [R 1 R 2 ] := [R 1 r k1+1,, r k ] C n k where R 1 C n k1, we have (52) sin (X, X) R (1+ R 2 2 ), sin (X, X) 2,F R 2,F 1 + R Moreover, if > R2 2 2 (53) (54) and if > k i=k 1+1 (55) sin (X, X) (56) sin (X, X) 2,F then sin (X, X) sin (X, X) r i 2 2 min λ(λ 1) λ i R 1 R2 2 2 R 1 2,F 2,F R2 2 2 R 1 k i=k 1+1 then r i 2 2 min λ(λ 1) λ i R 1 2,F k i=k 1+1 r i 2 2 min λ(λ 1) λ i 1 + R , R , ( k i=k 1+1 ( k 1 + i=k 1+1 r i 2 min λ(λ 1 ) λ i r i 2 min λ(λ 1 ) λ i Proof The proof mimics that of Theorem 41, extending the discussion from vectors to subspaces Let X 1 = [ X 1, X 2, X W 3 ]X 1 = Y C n k1 be an invariant subspace Z ) 2, ) 2

14 14 YUJI NAKATSUKASA of à such that W W (57) à Y = Y Λ 1 Z Z Then the bottom part of the equation gives [ ] [ ] W W (58) ZΛ 1 A 3 Z = [R 1 R 2 ] = R Y Y Using a well-known bound for Sylvester s equations (eg [14, Lem 2], [19, Ch V]), along with the fact min(λ(λ 1 ) λ(a 3 )) =, we obtain [ ] R W Y 2 (59) Z R, where for the last inequality we used the fact XY X Y 2 [7, Cor 3510] As in (33), this is the generalized Davis-Kahan sin θ theorem From the second block of (57) we have (510) Y Λ 1 Λ 2 Y = R 2Z, hence again from the Sylvester equation bound (511) Y R 2 2 Z Together with (59) we obtain (512) Y R R 2 2 Therefore, we conclude that sin (X, X) ] = Y [ R Z (1 + R 2 2 ), the second inequality in (53) For the spectral and Frobenius norms, using the stronger inequality [ (513) A A B] 2 2,F + B 2 2,F, 2,F we obtain the second result in (53) We next prove (53) From (58) we obtain ZΛ 1 A 3 Z R 2 Y R 1 W, hence using (511) we have ZΛ 1 A 3 Z R Z R 1 W Again using the Sylvester equation bound ZΛ 1 A 3 Z Z, we therefore obtain ( R ) Z R 1W

15 RITZ VECTORS AND SINGULAR VECTORS 15 Hence using the assumption > R2 2 2 with R 1 W R 1 W 2, we obtain Finally, Z sin (X, X) [ ] = Y Z and the trivial bound W 2 1 along R 1 R2 2 2 R 1 R R , giving (53) The 2,F version (54) is obtained as above using (513) It remains to establish (55) and (56) Taking the ith row of (510) gives y i Λ 1 λ k1+iy i = r k 1+i Z for i = 1,, k k 1, where y i is the ith row of Y Hence (514) y i r k1+i Z min λ(λ 1 ) λ k1+i, i = 1,, k k 1 We also have ZΛ 1 A 3 Z = [R 1, r k1+1,, r k ] [r k1+1,, r k ] y 1 y k k1 (515) ZΛ 1 A 3 Z W y 1 y k k1 = R 1 W, so using (514) we obtain k i=k 1+1 This gives (ZΛ 1 A 3 Z) r i 2 2 Z min λ(λ 1 ) λ i R 1W Since ZΛ 1 A 3 Z Z / as before, this gives ( k i=k 1+1 Hence using the assumption > k i=k 1+1 W 2 1 we obtain r i 2 2 min λ(λ 1 ) λ i ) Z R 1W R 1 W 2 Z R 1 k i=k 1+1 r i 2 2 min λ(λ 1) λ i r i 2 2 min λ(λ 1) λ i and the trivial bound We use the fact sin (X, X) = Y [ together with (514) to complete the Z] proof of (55) Again, (56) follows immediately from (513) Four remarks are in order Remark 51 (Vector vs subspace bounds) The 2,F bounds in Theorem 51 reduce to the vector bounds in the previous sections by taking k 1 = 1 Thus they can be regarded as proper generalizations

16 16 YUJI NAKATSUKASA Remark 52 (Question 103 by Davis-Kahan) At the end of their landmark paper, Davis and Kahan [2] suggest four open problems Among them, Question 102 asks for an extension of their theorems to the case where C n is split into a pair of three (instead of two) subspaces in two ways, X 1, X 2, X 3 (exact eigenspaces) and X 1, X 2, X 3 (approximate ones) Namely, using information such as the Ritz values and residuals, can one bound the subspace angles? We argue that the above results give an answer the setting in (51) is precisely in this form, and Theorem 52 gives sharp bounds for sin (X 1, X 1 ) Remark 53 (On the definition of ) An astute reader might have noticed an inconsistency in the definition of in (11) and in our theorems For example, (52) reduces to r1 when k 1 = k = 1 (hence R 2 is empty), but in this case is the difference between λ (an exact desired eigenvalue) and λ(a 3 ) (approximations to undesired eigenvalues) By contrast, in (11) is the distance between the approximate desired eigenvalue λ and the exact undesired eigenvalues This clash of notation was left intentionally indeed, we can obtain (11) from Theorem 51: note that sin (x, x) = sin (X, X ) 2 (where [x, X ] and [ x, X ] are orthogonal), and invoke (52) taking X X, X X, and Λ 2 empty Then in (52) becomes in (11), and we precisely recover (11) We have deferred this discussion until now because it requires the subspace bound (52), and the inconsistency is after all harmless, as just described Remark 54 (Proof techniques) The reader might have also noticed that the proofs above are basically repeated applications of well-known norm inequalities in matrix analysis One might then wonder, why do they appear to give stronger results than previous ones? The answer appears to lie in (21) the simple but crucial unitary transformation from A to à that simplifies the task to bounding Y, Z, as in (22) By contrast, most classical results work with A and start from the residual equation A X 1 X 1 Λ1 = R and derive bounds on (X 1, X 1 ): for example the Davis- Kahan sin θ theorem can be obtained essentially by left-multiplying X3 T, taking X 2 empty Knyazev [10, Sec 4] employed ingenious techniques to obtain (among others) essentially (512) Obtaining sharper bounds like (515) in a similar manner appears to be highly challenging Once we reformulate the problem as in (21) (22), the derivation becomes significantly simpler (in the author s opinion) 6 SVD We now present an SVD analogue of Theorem 51, deriving bounds for the accuracy of singular vectors and singular subspaces obtained by a Petrov-Galerkin projection method Such methods proceed as follows: project A onto lower-dimensional trial subspaces spanned by Û Cm km, V C n kn having orthonormal columns (for how to choose Û, V see eg [1, 6, 21]), compute the SVD of the small k m k n matrix Û A V = Ũ ΣṼ and obtain an approximate economical SVD as A (ÛŨ) Σ( V Ṽ ), which is of rank min(k m, k n ) Some of the columns of ÛŨ and V Ṽ then approximate the exact left and right singular vectors of A Our goal is to quantify their accuracy We focus on the most frequently encountered case where an approximate SVD is sought, that is, the leading singular vectors are being approximated

17 RITZ VECTORS AND SINGULAR VECTORS 17 Theorem 61 Let A C m n with m n, of the form Σ 1 (61) [Û1 Û2 Û3] A[ V 0 R 1 1 V2 V3 ] = 0 Σ2 R 2 =: Ã, S 1 S 2 A 3 where [Û1 Û2 Û3] and V 1 V2 V3 ] are square unitary, and Σ 1 R k1 k1, Σ2 R (km k1) (kn k1) with [ Σ 1 ] [ ] equal to diag( σ 1, σ 2,, σ k ) Σ2 0 if km k n = k, and [ ] diag( σ 1, σ 2,, σ k ) 0 k (kn k) if k = km < k n Let (Σ 1, U 1, V 1 ) be the set of k 1 leading singular triplets of A Define = min(σ( Σ 1 ) σ(a 3 )) and = σ min (Σ 1 ) Σ 2 2, and suppose that, > 0 Write [S 1 S 2 ] = S, [ R 1 ] R 2 = R and for brevity define Θ := max( sin (U 1, Û1), sin (V 1, V 1 ) ) Then we have ( max( R, S ) (62) Θ 1 + max( R ) 2, S 2 ) Moreover, provided that > max( S2 2, R2 2)2 (63) Θ max( S 1, R 1 ) max( S2 2, R2 2)2, we have ( 1 + max( R 2 2, S 2 2 ) Finally, define k := max(k m k 1, k n k 1 ), and denote by r2i T the ith row of R 21 and by s 2i the ith column of S 21 (setting r 2i = 0 for i > k m k 1 and s 2i = 0 for i > k n k 1, and σ k1+i = 0 for i > min(k m, k n ) k 1 ) If > k then (64) Θ max( S 1, R 1 ) k max( r 2i, s 2i ) 2 i=1 σ min(σ 1) σ k1 +i i=1 ) max( r 2i, s 2i ) 2, σ min(σ 1) σ k1 +i k max( r 2i, s 2i ) 1 + σ min (Σ 1 ) σ k1+i Though not displayed for brevity, slightly improved bounds for 2,F analogous to those in Theorem 51 are available for each bound above The derivation is again the same, using the inequality [ ] A 2,F B A 2 2,F + B 2 2,F Proof Let ( ) Σ 1, Ũ1, Ṽ1 and Ũ 1 Ã = Σ 1Ṽ 1 Write Ṽ1 = (65) i=1 be a set of exact singular triplets of Ã, ie, ÃṼ1 = Ũ1Σ 1 Ṽ 11 Ṽ 21 Ṽ 31, Ũ1 = Σ 1 0 R 1 0 Σ2 R 2 S 1 S 2 A 3 Ṽ 11 Ṽ 21 Ṽ 31 Ũ 11 Ũ 21 Ũ 31 =, so that Ũ 11 Ũ 21 Ũ 31 Σ 1, and (66) [Ũ 11 Ũ 21 Ũ 31 ] 1 0 R 1 Σ 0 Σ2 R 2 S 1 S 2 A 3 = Σ 1 [Ṽ 11 Ṽ 21 Ṽ 31 ]

18 18 YUJI NAKATSUKASA As in the previous sections, we have the crucial identities (67) sin (U 1, Û1) = [Ũ21 Ũ 31 ], sin (V1, V 1 ) = [Ṽ21 Ṽ 31 ] To prove the theorem we first bound Ũ21 with respect to Ũ31, and similarly bound Ṽ21 with respect to Ṽ31 From the second block of (65) we obtain (68) Σ2 Ṽ 21 + R 2 Ṽ 31 = Ũ21Σ 1, and the second block of (66) gives (69) Ũ 21 Σ 2 + Ũ 31S 2 = Σ 1 Ṽ 21 Taking norms and using the triangular inequality and the fact σ min (X) Y XY X 2 Y (the lower bound holds if X C m n, m n) in (68) and (69), we obtain Ũ21 σ min (Σ 1 ) Ṽ21 Σ 2 2 R 2 Ṽ 31, Ṽ21 σ min (Σ 1 ) Ũ21 Σ 2 2 Ũ 31S 2 (610) By adding the first inequality times σ min (Σ 1 ) and the second inequality times Σ 2, we eliminate the Ṽ21 term, and recalling the assumption σ min (Σ 1 ) > Σ 2 2 we obtain Ũ21 σ min(σ 1 ) R 2 Ṽ 31 + Σ 2 2 Ũ 31S 2 (σ min (Σ 1 )) 2 Σ Eliminating Ũ21 from (610) similarly yields Ṽ21 σ min(σ 1 ) Ũ 31S 2 + Σ 2 2 R 2 Ṽ 31 (σ min (Σ 1 )) 2 Σ Combining these two inequalities we obtain max( Ũ21, Ṽ21 ) max( Ũ 31S 2, R 2 Ṽ 31 ) σ min (Σ 1 ) Σ 2 2 max( Ũ31, Ṽ31 ) max( R 2 2, S 2 2 ) σ min (Σ 1 ) Σ 2 2 = max( R 2 2, S 2 2 ) max( Ũ31, Ṽ31 ) (611) Together with (67) it follows that max( sin (U 1, Û1), sin (V 1, V 1 ) ) = max ( [Ũ21 Ũ 31 ], [Ṽ21 Ṽ 31 ] ) max( Ũ21 + Ũ31, Ṽ21 + Ṽ31 ) (1 + max( R 2 2, S 2 2 ) ) max( Ũ31, Ṽ31 ) (612)

19 RITZ VECTORS AND SINGULAR VECTORS 19 The remaining task is to bound max( Ũ31, Ṽ31 ) The bottom block of (65) gives (613) S 1 Ṽ 11 + S 2 Ṽ 21 + A 3 Ṽ 31 = Ũ31Σ 1 Hence recalling that [S 1 S 2 ] = S we have (614) σ min (Σ 1 ) Ũ31 S + A3 2 Ṽ31 Similarly, from the last block of (66) (615) Ũ 11R 1 + Ũ 21R 2 + Ũ 31A 3 = Σ 1 Ṽ 31, we obtain (616) σ min (Σ 1 ) Ṽ31 R + A3 2 Ũ31 We multiply (614) by σ min (Σ 1 ) and (616) by A 3 2, and add them to eliminate the Ṽ31 terms, to obtain (σ min (Σ 1 ) 2 A 3 2 2) Ũ31 A3 2 R + σmin (Σ 1 ) S (σ min (Σ 1 ) + A 3 2 ) max( R, S ) Hence by the assumption = σ min (Σ 1 ) A 3 2 > 0, we have Ũ31 max( R, S ) Eliminating the Ũ31 terms from (614) and (616) yields the same bound for Ṽ31, hence max( Ũ 31, Ṽ 31 ) max( R, S ) Combine this with (611) and (67) to obtain (62) We next prove (63) From (613) we also obtain (617) σ min (Σ 1 ) Ũ31 S1 + S2 2 Ṽ21 + A3 2 Ṽ31, and from (615), (618) σ min (Σ 1 ) Ṽ31 R1 + Ũ21 R2 2 + Ũ31 A3 2 Again eliminate the Ṽ31 terms by multiplying (617) by σ min (Σ 1 ) and (618) by A 3 2, and adding them: (σ min (Σ 1 ) 2 A 3 2 2) Ũ31 σ min (Σ 1 )( S 1 + S 2 2 Ṽ21 ) + A 3 2 ( R 1 + R 2 2 Ũ21 ) (σ min (Σ 1 ) + A 3 2 )(max( S1, R1 ) + max( S2 2, R 2 2 ) max( Ũ21, Ṽ21 )) Therefore, using (611) we obtain max( S 1, R 1 ) + max( S 2, R 2 ) max( Ũ21, Ũ31 σ min (Σ 1 ) A 3 1 (max( S 1, R 1 ) + max( S 2, R 2 ) 2 max( Ṽ21 Ũ31 ), Ṽ31 ) )

20 20 YUJI NAKATSUKASA As before, eliminating Ũ31 from (617) and (618) yields the same bound for Ṽ31, hence max( Ũ 31, Ṽ 31 ) 1 (max( S 1, R 1 ) + max( S 2, R 2 ) 2 max( Ũ 31, ) Ṽ 31 ) Therefore using the assumption > max( S2, R2 )2 we obtain max( Ũ 31, Ṽ 31 ) max( S1, R1 ) max( S2, R2 )2 Together with (612) we obtain (63) The remaining task is to establish (64) For this, we revisit (68), (69), and now bound the individual Ũ21i, the ith row of Ũ21 We obtain Ũ 21i σ min (Σ 1 ) Ṽ 21i σ k1+i r2iṽ31 T, Ṽ 21i σ min (Σ 1 ) Ũ 21i σ k1+i Ũ 31s 2i, Eliminating Ṽ21i and Ũ21i as before gives Ũ21i σ min (Σ 1 ) r T 2i Ṽ 31 + σk1+i Ũ 31s 2i (σ min (Σ 1 )) 2 σ k 2, 1+i Ṽ21i σ min (Σ 1 ) Ũ 31s 2i + σ k1+i r2iṽ31 T (σ min (Σ 1 )) 2 σ k 2 1+i Note that when k m k n, (Ũ21i, r 2i ) or (Ṽ21i, s 2i ) is empty for large i; by taking σ k1+i = 0 for such i the argument carries over We therefore have for every i (619) max( Ũ 21i, Ṽ 21i ) max( s 2i, r 2i ) max( Ũ31, Ṽ31 ) σ min (Σ 1 ) σ k1+i From (613) we also obtain (620) σ min (Σ 1 ) k n k 1 Ũ31 A3 2 Ṽ31 + S1 + and from (615), (621) σ min (Σ 1 ) Ṽ 31 Ũ 31 A R 1 + i=1 k m k 1 Hence eliminating Ũ31 and then Ṽ31, and using (619) gives max( Ũ31, Ṽ31 ) 1 k max( S 1, R 1 ) + i=1 max( r 2i, s 2i ) 2 σ min (Σ 1 ) σ k1+i i=1 s 2i Ṽ21i, Ũ 21i r 2i max( Ũ31, Ṽ31 ) Thus by the assumption > k i=1 max( r2i, s2i )2 σ min(σ 1) σ k1 +i we have max( Ũ 31, Ṽ 31 max( S 1, R 1 ) ) k i=1 max( r2i, s2i )2 σ min(σ 1) σ k1 +i Finally, the bound (64) follows from combining this with (619) and (67)

21 RITZ VECTORS AND SINGULAR VECTORS 21 We note that R or S in the above theorem is allowed to be empty, as in the case where a one-sided projection is employed This includes the popular randomized SVD algorithm [6] We make two more remarks Remark 61 (Other approaches for the SVD) A standard approach to extending results in symmetric eigenvalue problems to the SVD is to use the Jordan-Wielandt matrix, for example as in [12, Sec 3] As pointed out in [15], this has the slight downside of introducing spurious eigenvalues at 0 Moreover, the results via Jordan- Wielandt we obtained were less clean and looser than Theorem 61 Another approach is to work with the Gram matrix A A, but this unnecessarily squares the singular values and modifies and For these reasons, we have chosen to work directly with the SVD equations Remark 62 (Proof of (62) via Wedin and [15]) As in Remark 34, a proof for (62) can be given by combining Wedin s result (the SVD analogue of Davis- Kahan) and [15] (SVD analogue of Saad s result) The sharper bounds (62) and (63) cannot be obtained this way 7 Eigenvectors of a self-adjoint operator So far we have specialized to finite-dimensional matrices as the analysis is elementary and the situation is more transparent In this final section, as in [10, 16], we extend the discussion to the infinite-dimensional case, where the matrix is generalized to a self-adjoint operator A : H H on a Hilbert space H with inner product, Unlike the previous studies, which assumed the operators are bounded, our discussion allows A to be unbounded, thus is applicable for example to differential operators Au = u ; in this case, we assume that A is densely defined, as is customary Let Q be a subspace of H, which is of finite dimension k with orthonormal basis q 1,, q k In the Rayleigh-Ritz process for A, we compute the k k matrix A 1 with (i, j) element q i, Aq j and its eigenvalue decomposition A 1 = Ωdiag( λ 1,, λ k )Ω to obtain the Ritz values λ 1,, λ k and Ritz vectors [q 1,, q k ]Ω Denote by Q 1, Q 2 H the resulting Ritz subspaces corresponding to disjoint sets of eigenvalues of A 1 (we have Q = Q 1 Q 2 ), and let Q 3 be the (infinite-dimensional) orthogonal complement of Q such that H = Q 1 Q 2 Q 3 is an orthogonal direct sum For simplicity, we treat the case where Q 1 is one-dimensional (subspace versions can be obtained, generalizing Section 5) That is, let ( λ, û) be a Ritz pair with û = q 1, and suppose that Au = λu; note that this is an assumption, as a self-adjoint operator may not have any eigenvalue (eg [8, Ch 9]), although the spectrum is always nonempty The goal is to bound sin (û, u) Denote by P i be the orthogonal projectors onto each subspace Q i We define A ij := P i AP j Then the R-R process forces A 12 = 0, A 21 = 0 Note that A 13 = A 31, A 23 = A 32 (where denotes the adjoint of the operators), and these terms represent the residuals, hence we write R 1 = R 31 and R = A 31 + A 32 Also define R 2 = A 32 (= A 23 ), and r i = A 32 P 2,i (= P 2,i A 23 ) for i = 2,, k, where P 2,i is the 1-dimensional projection onto the ith Ritz vector The quantities and are defined by = min λ λ(a 22 ), = min λ λ(a 33 ), in which λ(a ii ) denotes the spectrum of the restriction of A ii to Q i

22 22 YUJI NAKATSUKASA Theorem 71 Under the above assumptions and notation, (71) sin (u, û) R 1 + R ( R (1 + R ) 2 ) Moreover, if > R2 2, then (72) sin (u, û) and if > k r i 2, then λ λ i (73) sin (u, û) R 1 k R 1 R2 2 r i 2 λ λ i 1 + R 2 2 2, ( k ) 2 r i 1 + λ λ i Proof Writing u = 3 i=1 u i with u i Q i, the Q i -component of Au = λu each implies (74a) (74b) (74c) λu 1 = A 11 u 1 + A 13 u 3, λu 2 = A 22 u 2 + A 23 u 3, λu 3 = A 31 u 1 + A 32 u 2 + A 33 u 3 Our goal is to bound sin (u, û) = u u 3 2 We first derive (71), an analogue of Theorem 31 By (74c), we have (A 33 λ)u 3 = A 31 u 1 + A 32 u 2 = (A 31 + A 32 )u A 31 + A 32 = R Together with the fact (A 33 λ)u 3 u 3 ([9, V35]; to see this, note that v = (A 33 λ)u 3 implies (A 33 λ) 1 v = u 3, hence u 3 (A 33 λ) 1 v v ), we obtain (75) u 3 R Now (74b) gives (A 22 λ)u 2 = A 23 u 3 Since A 23 = A 23 = A 32 = R 2, and (A 22 λ)u 2 u 2, we thus have u 2 R2 u 3 Using this and (75), we obtain (71) We now turn to (73); the proof of (72) is similar and omitted As in Theorem 51, the idea is to improve the estimate of u 2 using (74b) Projecting it onto P 2,i gives P 2,i (A 22 λ)u 2 + P 2,i A 23 u 3 = 0 for i = 2,, k, and by assumption P 2,i A 22 = λ i P 2,i, so ( λ i λ)p 2,i u 2 + P 2,i A 23 u 3 = 0, hence (76) P 2,i u 2 P 2,iA 23 u 3 λ i λ P 2,iA 23 u 3 λ i λ = r i u 3 λ i λ, where we used r i = P 2,i A 23 for the final equality The inequality (76) holds for i = 2,, k

23 RITZ VECTORS AND SINGULAR VECTORS 23 Now since A 32 = A 32 P 2 = k A 32P 2,i, we can rewrite (74c) as A 31 u 1 + k A 32P 2,i u 2 = (A 33 λ)u 3 Therefore (A 33 λ)u 3 A 31 u 1 + R 1 + k A 32 P 2,i u 2 R 1 + k r i 2 u 3, λ i λ k A 32 P 2,i P 2,i u 2 where we used (76) and r i = A 32 P 2,i Together with (A 33 λ)u 3 u 3 we obtain R 1 (77) u 3 ( 1 ) k r i 2 so λ i λ Finally, (76) together with u 2 2 = ( k P k 2,iu 2 2 gives u 2 2 sin (u, ũ) = u u 3 2 R 1 ( 1 k completing the proof of (73) r i 2 λ i λ r i λ i λ ( k ) 2 ) r i 1 +, λ i λ 71 Experiments: Sturm-Liouville eigenvalue problem We illustrate Theorem 71 with a simple Sturm-Liouville eigenvalue problem (eg [4, 35]) (78) Au = u = λu, u (0) = αu(0), u (π) = βu(π), u H = H 2 (0, π) A is an unbounded self-adjoint operator, with a full set of (infinitely many) orthonormal eigenfunctions Here we take α = 1, β = 1 The exact eigenvalues are λ i = νi 2, where ν i are the solutions for tan πν = 2ν/(ν 2 1), with corresponding eigenfunction ν i cos ν i x + α sin ν i x [4, 35] We attempt to compute the eigenpairs with the smoothest eigenfunctions, ie, eigenpairs closest to 0 To do this, a natural idea is to take low-degree polynomials We take the trial subspace to be the k-dimensional subspace of polynomials p of degree up to k + 1 that satisfy the two boundary conditions p (0) = αp(0) and p (π) = βp(π) Figure 71 (left) shows the basis functions obtained in this way, for k = 7 Such computations can be done conveniently using Chebfun [3] Having defined the subspace Q, we can perform R-R to obtain the Ritz vectors (which are functions in H here), along with the Ritz values Figure 71 shows the convergence of (u, û) to the eigenfunction u for the smallest eigenpair and its bounds, analogous to Figure 41 As in that experiment, our bound (72) gives tighter bounds for the actual error, although here Davis-Kahan also performs well, since is not very small Finally, in Figure 72 we illustrate the behavior of the residual function Aû λû as k varies Note that û is determined up to a sign flip ±1; here we chose û(1) > 0 We make two observations First, evidently the norm Aû λû decays rapidly as k increases, essentially like the right plot in Figure 71 The second and more interesting observation is that the residuals appear to become more and more oscillatory (non-smooth) as k grows This is a typical phenomenon, and ) 2 u 3 2,

24 24 YUJI NAKATSUKASA Davis-Kahan exact (73) Figure 71 Left: Basis functions for projection subspace Q, satisfying u (0) = u(0), u (π) = u(π) Right: Convergence of (u, û) and its bounds can be explained as follows As emphasized repeatedly in this paper, R-R forces the residual to be orthogonal to Q, which contains the smoothest functions Consequently, in the Legendre expansion of the residual Aû λû = i=0 c ip i (x), c i are small for i < k; they are bounded roughly by u 2, which is O( Aû λû 2 ) by (76) This also reflects the main result in [10]; recall Remark 34 By growing k, the residual becomes orthogonal to more and more of these smoothest functions, and therefore becomes more oscillatory Figure 72 Residual function Aû λû for k = 3, 6 and 9 Acknowledgements I am grateful to Andrew Knyazev for helpful discussions and bringing [16] to my attention, and Mayuko Yamashita for the help in Section 7 References [1] Z Bai, J Demmel, J Dongarra, A Ruhe, and H van der Vorst Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide SIAM, Philadelphia, 2000 [2] C Davis and W M Kahan The rotation of eigenvectors by a perturbation III SIAM J Numer Anal, 7(1):1 46, 1970 [3] T A Driscoll, N Hale, and L N Trefethen Chebfun Guide Pafnuty Publications, 2014 [4] G B Folland Fourier Analysis and its Applications, volume 4 AMS, 1992

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

Off-diagonal perturbation, first-order approximation and quadratic residual bounds for matrix eigenvalue problems

Off-diagonal perturbation, first-order approximation and quadratic residual bounds for matrix eigenvalue problems Off-diagonal perturbation, first-order approximation and quadratic residual bounds for matrix eigenvalue problems Yuji Nakatsukasa Abstract When a symmetric block diagonal matrix [ A 1 A2 ] undergoes an

More information

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems Part I: Review of basic theory of eigenvalue problems 1. Let A C n n. (a) A scalar λ is an eigenvalue of an n n A

More information

NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES

NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES NEW A PRIORI FEM ERROR ESTIMATES FOR EIGENVALUES ANDREW V. KNYAZEV AND JOHN E. OSBORN Abstract. We analyze the Ritz method for symmetric eigenvalue problems and prove a priori eigenvalue error estimates.

More information

Foundations of Matrix Analysis

Foundations of Matrix Analysis 1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

Numerical Methods I Eigenvalue Problems

Numerical Methods I Eigenvalue Problems Numerical Methods I Eigenvalue Problems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 2nd, 2014 A. Donev (Courant Institute) Lecture

More information

Numerical Methods for Solving Large Scale Eigenvalue Problems

Numerical Methods for Solving Large Scale Eigenvalue Problems Peter Arbenz Computer Science Department, ETH Zürich E-mail: arbenz@inf.ethz.ch arge scale eigenvalue problems, Lecture 2, February 28, 2018 1/46 Numerical Methods for Solving Large Scale Eigenvalue Problems

More information

Iterative methods for symmetric eigenvalue problems

Iterative methods for symmetric eigenvalue problems s Iterative s for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 11, 2008 s 1 The power and its variants Inverse power Rayleigh quotient

More information

The Eigenvalue Problem: Perturbation Theory

The Eigenvalue Problem: Perturbation Theory Jim Lambers MAT 610 Summer Session 2009-10 Lecture 13 Notes These notes correspond to Sections 7.2 and 8.1 in the text. The Eigenvalue Problem: Perturbation Theory The Unsymmetric Eigenvalue Problem Just

More information

NEW ESTIMATES FOR RITZ VECTORS

NEW ESTIMATES FOR RITZ VECTORS MATHEMATICS OF COMPUTATION Volume 66, Number 219, July 1997, Pages 985 995 S 0025-5718(97)00855-7 NEW ESTIMATES FOR RITZ VECTORS ANDREW V. KNYAZEV Abstract. The following estimate for the Rayleigh Ritz

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Conceptual Questions for Review

Conceptual Questions for Review Conceptual Questions for Review Chapter 1 1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths.

More information

Domain decomposition on different levels of the Jacobi-Davidson method

Domain decomposition on different levels of the Jacobi-Davidson method hapter 5 Domain decomposition on different levels of the Jacobi-Davidson method Abstract Most computational work of Jacobi-Davidson [46], an iterative method suitable for computing solutions of large dimensional

More information

Math 408 Advanced Linear Algebra

Math 408 Advanced Linear Algebra Math 408 Advanced Linear Algebra Chi-Kwong Li Chapter 4 Hermitian and symmetric matrices Basic properties Theorem Let A M n. The following are equivalent. Remark (a) A is Hermitian, i.e., A = A. (b) x

More information

MTH 2032 SemesterII

MTH 2032 SemesterII MTH 202 SemesterII 2010-11 Linear Algebra Worked Examples Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education December 28, 2011 ii Contents Table of Contents

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces. Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,

More information

Direct methods for symmetric eigenvalue problems

Direct methods for symmetric eigenvalue problems Direct methods for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 4, 2008 1 Theoretical background Posing the question Perturbation theory

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4

EIGENVALUE PROBLEMS. EIGENVALUE PROBLEMS p. 1/4 EIGENVALUE PROBLEMS EIGENVALUE PROBLEMS p. 1/4 EIGENVALUE PROBLEMS p. 2/4 Eigenvalues and eigenvectors Let A C n n. Suppose Ax = λx, x 0, then x is a (right) eigenvector of A, corresponding to the eigenvalue

More information

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit V: Eigenvalue Problems Lecturer: Dr. David Knezevic Unit V: Eigenvalue Problems Chapter V.4: Krylov Subspace Methods 2 / 51 Krylov Subspace Methods In this chapter we give

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

The following definition is fundamental.

The following definition is fundamental. 1. Some Basics from Linear Algebra With these notes, I will try and clarify certain topics that I only quickly mention in class. First and foremost, I will assume that you are familiar with many basic

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

Math 307 Learning Goals. March 23, 2010

Math 307 Learning Goals. March 23, 2010 Math 307 Learning Goals March 23, 2010 Course Description The course presents core concepts of linear algebra by focusing on applications in Science and Engineering. Examples of applications from recent

More information

Numerical Analysis Lecture Notes

Numerical Analysis Lecture Notes Numerical Analysis Lecture Notes Peter J Olver 8 Numerical Computation of Eigenvalues In this part, we discuss some practical methods for computing eigenvalues and eigenvectors of matrices Needless to

More information

I. Multiple Choice Questions (Answer any eight)

I. Multiple Choice Questions (Answer any eight) Name of the student : Roll No : CS65: Linear Algebra and Random Processes Exam - Course Instructor : Prashanth L.A. Date : Sep-24, 27 Duration : 5 minutes INSTRUCTIONS: The test will be evaluated ONLY

More information

Review of some mathematical tools

Review of some mathematical tools MATHEMATICAL FOUNDATIONS OF SIGNAL PROCESSING Fall 2016 Benjamín Béjar Haro, Mihailo Kolundžija, Reza Parhizkar, Adam Scholefield Teaching assistants: Golnoosh Elhami, Hanjie Pan Review of some mathematical

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

arxiv: v1 [math.na] 5 May 2011

arxiv: v1 [math.na] 5 May 2011 ITERATIVE METHODS FOR COMPUTING EIGENVALUES AND EIGENVECTORS MAYSUM PANJU arxiv:1105.1185v1 [math.na] 5 May 2011 Abstract. We examine some numerical iterative methods for computing the eigenvalues and

More information

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220)

1 Vectors. Notes for Bindel, Spring 2017 Numerical Analysis (CS 4220) Notes for 2017-01-30 Most of mathematics is best learned by doing. Linear algebra is no exception. You have had a previous class in which you learned the basics of linear algebra, and you will have plenty

More information

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v ) Section 3.2 Theorem 3.6. Let A be an m n matrix of rank r. Then r m, r n, and, by means of a finite number of elementary row and column operations, A can be transformed into the matrix ( ) Ir O D = 1 O

More information

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated.

Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated. Math 504, Homework 5 Computation of eigenvalues and singular values Recall that your solutions to these questions will not be collected or evaluated 1 Find the eigenvalues and the associated eigenspaces

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University September 22, 2005 0 Preface This collection of ten

More information

Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II)

Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II) 1 Majorization for Changes in Ritz Values and Canonical Angles Between Subspaces (Part I and Part II) Merico Argentati (speaker), Andrew Knyazev, Ilya Lashuk and Abram Jujunashvili Department of Mathematics

More information

Introduction. Chapter One

Introduction. Chapter One Chapter One Introduction The aim of this book is to describe and explain the beautiful mathematical relationships between matrices, moments, orthogonal polynomials, quadrature rules and the Lanczos and

More information

MATH 581D FINAL EXAM Autumn December 12, 2016

MATH 581D FINAL EXAM Autumn December 12, 2016 MATH 58D FINAL EXAM Autumn 206 December 2, 206 NAME: SIGNATURE: Instructions: there are 6 problems on the final. Aim for solving 4 problems, but do as much as you can. Partial credit will be given on all

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland

Matrix Algorithms. Volume II: Eigensystems. G. W. Stewart H1HJ1L. University of Maryland College Park, Maryland Matrix Algorithms Volume II: Eigensystems G. W. Stewart University of Maryland College Park, Maryland H1HJ1L Society for Industrial and Applied Mathematics Philadelphia CONTENTS Algorithms Preface xv xvii

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Linear Algebra Review

Linear Algebra Review Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Linear Algebra. Min Yan

Linear Algebra. Min Yan Linear Algebra Min Yan January 2, 2018 2 Contents 1 Vector Space 7 1.1 Definition................................. 7 1.1.1 Axioms of Vector Space..................... 7 1.1.2 Consequence of Axiom......................

More information

Numerical Methods - Numerical Linear Algebra

Numerical Methods - Numerical Linear Algebra Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment Two Caramanis/Sanghavi Due: Tuesday, Feb. 19, 2013. Computational

More information

Math 504 (Fall 2011) 1. (*) Consider the matrices

Math 504 (Fall 2011) 1. (*) Consider the matrices Math 504 (Fall 2011) Instructor: Emre Mengi Study Guide for Weeks 11-14 This homework concerns the following topics. Basic definitions and facts about eigenvalues and eigenvectors (Trefethen&Bau, Lecture

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

7. Symmetric Matrices and Quadratic Forms

7. Symmetric Matrices and Quadratic Forms Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Logistics Notes for 2016-08-29 General announcement: we are switching from weekly to bi-weekly homeworks (mostly because the course is much bigger than planned). If you want to do HW but are not formally

More information

Characterization of half-radial matrices

Characterization of half-radial matrices Characterization of half-radial matrices Iveta Hnětynková, Petr Tichý Faculty of Mathematics and Physics, Charles University, Sokolovská 83, Prague 8, Czech Republic Abstract Numerical radius r(a) is the

More information

Numerical Methods in Matrix Computations

Numerical Methods in Matrix Computations Ake Bjorck Numerical Methods in Matrix Computations Springer Contents 1 Direct Methods for Linear Systems 1 1.1 Elements of Matrix Theory 1 1.1.1 Matrix Algebra 2 1.1.2 Vector Spaces 6 1.1.3 Submatrices

More information

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N. Math 410 Homework Problems In the following pages you will find all of the homework problems for the semester. Homework should be written out neatly and stapled and turned in at the beginning of class

More information

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min.

Lecture 1. 1 Conic programming. MA 796S: Convex Optimization and Interior Point Methods October 8, Consider the conic program. min. MA 796S: Convex Optimization and Interior Point Methods October 8, 2007 Lecture 1 Lecturer: Kartik Sivaramakrishnan Scribe: Kartik Sivaramakrishnan 1 Conic programming Consider the conic program min s.t.

More information

Analysis Preliminary Exam Workshop: Hilbert Spaces

Analysis Preliminary Exam Workshop: Hilbert Spaces Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H

More information

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization General Tools for Solving Large Eigen-Problems

More information

Krylov subspace projection methods

Krylov subspace projection methods I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks

More information

Diagonalizing Matrices

Diagonalizing Matrices Diagonalizing Matrices Massoud Malek A A Let A = A k be an n n non-singular matrix and let B = A = [B, B,, B k,, B n ] Then A n A B = A A 0 0 A k [B, B,, B k,, B n ] = 0 0 = I n 0 A n Notice that A i B

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form. k k n-k. m-k k m-k n-k

Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form. k k n-k. m-k k m-k n-k I. REVIEW OF LINEAR ALGEBRA A. Equivalence Definition A1. If A and B are two m x n matrices, then A is equivalent to B if we can obtain B from A by a finite sequence of elementary row or elementary column

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

5.3 The Power Method Approximation of the Eigenvalue of Largest Module

5.3 The Power Method Approximation of the Eigenvalue of Largest Module 192 5 Approximation of Eigenvalues and Eigenvectors 5.3 The Power Method The power method is very good at approximating the extremal eigenvalues of the matrix, that is, the eigenvalues having largest and

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the

More information

Chapter 5 Eigenvalues and Eigenvectors

Chapter 5 Eigenvalues and Eigenvectors Chapter 5 Eigenvalues and Eigenvectors Outline 5.1 Eigenvalues and Eigenvectors 5.2 Diagonalization 5.3 Complex Vector Spaces 2 5.1 Eigenvalues and Eigenvectors Eigenvalue and Eigenvector If A is a n n

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

LARGE SPARSE EIGENVALUE PROBLEMS

LARGE SPARSE EIGENVALUE PROBLEMS LARGE SPARSE EIGENVALUE PROBLEMS Projection methods The subspace iteration Krylov subspace methods: Arnoldi and Lanczos Golub-Kahan-Lanczos bidiagonalization 14-1 General Tools for Solving Large Eigen-Problems

More information

The QR Decomposition

The QR Decomposition The QR Decomposition We have seen one major decomposition of a matrix which is A = LU (and its variants) or more generally PA = LU for a permutation matrix P. This was valid for a square matrix and aided

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

ANGLES BETWEEN SUBSPACES AND THE RAYLEIGH-RITZ METHOD. Peizhen Zhu. M.S., University of Colorado Denver, A thesis submitted to the

ANGLES BETWEEN SUBSPACES AND THE RAYLEIGH-RITZ METHOD. Peizhen Zhu. M.S., University of Colorado Denver, A thesis submitted to the ANGLES BETWEEN SUBSPACES AND THE RAYLEIGH-RITZ METHOD by Peizhen Zhu M.S., University of Colorado Denver, 2009 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in

More information

MAT 610: Numerical Linear Algebra. James V. Lambers

MAT 610: Numerical Linear Algebra. James V. Lambers MAT 610: Numerical Linear Algebra James V Lambers January 16, 2017 2 Contents 1 Matrix Multiplication Problems 7 11 Introduction 7 111 Systems of Linear Equations 7 112 The Eigenvalue Problem 8 12 Basic

More information

Practical Linear Algebra: A Geometry Toolbox

Practical Linear Algebra: A Geometry Toolbox Practical Linear Algebra: A Geometry Toolbox Third edition Chapter 12: Gauss for Linear Systems Gerald Farin & Dianne Hansford CRC Press, Taylor & Francis Group, An A K Peters Book www.farinhansford.com/books/pla

More information

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors Chapter 7 Canonical Forms 7.1 Eigenvalues and Eigenvectors Definition 7.1.1. Let V be a vector space over the field F and let T be a linear operator on V. An eigenvalue of T is a scalar λ F such that there

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x

BASIC ALGORITHMS IN LINEAR ALGEBRA. Matrices and Applications of Gaussian Elimination. A 2 x. A T m x. A 1 x A T 1. A m x BASIC ALGORITHMS IN LINEAR ALGEBRA STEVEN DALE CUTKOSKY Matrices and Applications of Gaussian Elimination Systems of Equations Suppose that A is an n n matrix with coefficents in a field F, and x = (x,,

More information

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

7. Dimension and Structure.

7. Dimension and Structure. 7. Dimension and Structure 7.1. Basis and Dimension Bases for Subspaces Example 2 The standard unit vectors e 1, e 2,, e n are linearly independent, for if we write (2) in component form, then we obtain

More information

Eigenvalues and Eigenvectors

Eigenvalues and Eigenvectors Chapter 1 Eigenvalues and Eigenvectors Among problems in numerical linear algebra, the determination of the eigenvalues and eigenvectors of matrices is second in importance only to the solution of linear

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

1. General Vector Spaces

1. General Vector Spaces 1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

More information

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace

On the Modification of an Eigenvalue Problem that Preserves an Eigenspace Purdue University Purdue e-pubs Department of Computer Science Technical Reports Department of Computer Science 2009 On the Modification of an Eigenvalue Problem that Preserves an Eigenspace Maxim Maumov

More information

Chapter 7: Symmetric Matrices and Quadratic Forms

Chapter 7: Symmetric Matrices and Quadratic Forms Chapter 7: Symmetric Matrices and Quadratic Forms (Last Updated: December, 06) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved

More information

Designing Information Devices and Systems II

Designing Information Devices and Systems II EECS 16B Fall 2016 Designing Information Devices and Systems II Linear Algebra Notes Introduction In this set of notes, we will derive the linear least squares equation, study the properties symmetric

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization

More information

arxiv: v1 [math.na] 1 Sep 2018

arxiv: v1 [math.na] 1 Sep 2018 On the perturbation of an L -orthogonal projection Xuefeng Xu arxiv:18090000v1 [mathna] 1 Sep 018 September 5 018 Abstract The L -orthogonal projection is an important mathematical tool in scientific computing

More information

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

MATH 240 Spring, Chapter 1: Linear Equations and Matrices MATH 240 Spring, 2006 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 8th Ed. Sections 1.1 1.6, 2.1 2.2, 3.2 3.8, 4.3 4.5, 5.1 5.3, 5.5, 6.1 6.5, 7.1 7.2, 7.4 DEFINITIONS Chapter 1: Linear

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Class notes: Approximation

Class notes: Approximation Class notes: Approximation Introduction Vector spaces, linear independence, subspace The goal of Numerical Analysis is to compute approximations We want to approximate eg numbers in R or C vectors in R

More information

Linear Algebra and Dirac Notation, Pt. 2

Linear Algebra and Dirac Notation, Pt. 2 Linear Algebra and Dirac Notation, Pt. 2 PHYS 500 - Southern Illinois University February 1, 2017 PHYS 500 - Southern Illinois University Linear Algebra and Dirac Notation, Pt. 2 February 1, 2017 1 / 14

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis, 5 Orthogonality Goals: We use scalar products to find the length of a vector, the angle between 2 vectors, projections, orthogonal relations between vectors and subspaces Then we study some applications

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 1. qr and complete orthogonal factorization poor man s svd can solve many problems on the svd list using either of these factorizations but they

More information