Backward perturbation analysis for scaled total least-squares problems

Size: px

Start display at page:

Download "Backward perturbation analysis for scaled total least-squares problems"

Marianna Mosley
6 years ago
Views:

1 NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 009; 16: Published online 5 March 009 in Wiley InterScience ( Backward perturbation analysis for scaled total least-squares problems X.-W. Chang, and D. Titley-Peloquin School of Comptuer Science, McGill University, 3480 University Street, McConnell Engineering Building, Room 318, Montreal, Canada H3A A7 SUMMARY The scaled total least-squares (STLS) method unifies the ordinary least-squares (OLS), the total leastsquares (TLS), and the data least-squares (DLS) methods. In this paper we perform a backward perturbation analysis of the STLS problem. This also unifies the backward perturbation analyses of the OLS, TLS and DLS problems. We derive an expression for an extended minimal backward error of the STLS problem. This is an asymptotically tight lower bound on the true minimal backward error. If the given approximate solution is close enough to the true STLS solution (as is the goal in practice), then the extended minimal backward error is in fact the minimal backward error. Since the extended minimal backward error is expensive to compute directly, we present a lower bound on it as well as an asymptotic estimate for it, both of which can be computed or estimated more efficiently. Our numerical examples suggest that the lower bound gives good order of magnitude approximations, while the asymptotic estimate is an excellent estimate. We show how to use our results to easily obtain the corresponding results for the OLS and DLS problems in the literature. Copyright q 009 John Wiley & Sons, Ltd. Received 8 October 008; Revised 3 January 009; Accepted 1 February 009 KEY WORDS: scaled total least squares; backward perturbation analysis 1. INTRODUCTION Given an approximate solution to a certain problem, backward perturbation analysis involves finding a perturbation in the data of minimal size such that the approximate solution is an exact solution of the perturbed problem. The size of the minimal perturbation is referred to as the minimal backward error. In matrix computations, backward perturbation analyses are useful in two respects. Correspondence to: X.-W. Chang, School of Comptuer Science, McGill University, 3480 University Street, McConnell Engineering Building, Room 318, Montreal, Canada H3A A7. chang@cs.mcgill.ca Contract/grant sponsor: NSERC of Canada; contract/grant number: RGPIN Contract/grant sponsor: NSERC of Canada; contract/grant number: PGS-D3 Copyright q 009 John Wiley & Sons, Ltd.

2 68 X.-W. CHANG AND D. TITLEY-PELOQUIN One is to check if a computed solution is a backward stable solution. Sometimes we may not know if an algorithm for solving a problem is numerically stable, but if the relative minimal backward error is of the order of unit round-off, then the computed solution is a backward stable solution and we can be satisfied with it. The other is to use backward perturbation analysis results to design effective stopping criteria for the iterative solution of large sparse problems; see, e.g. [1 4]. There has been a lot of work on the backward perturbation analysis of matrix problems, especially in recent years. Interested readers can find some references on backward perturbation analysis of linear systems (including least-squares problems) in [5]. This paper will give a backward perturbation analysis for scaled total least-squares (STLS) problems. STLS unifies ordinary least squares (OLS), total least squares (TLS) and data least squares (DLS); see Paige and Strakoš [6, 7] and Rao [8]. The backward perturbation analysis of STLS to be given in this paper also unifies the backward perturbation analyses of OLS, TLS and DLS, thus this paper unifies and generalizes the work in [5, 9]. We derive formulas for an extended minimal backward error in Section 3. This extended minimal backward error is at worst a lower bound on the minimal backward error. But we show both in theory (see Section 3) and using numerical tests (see Section 7) that if the given approximate solution is a good enough approximation to the exact solution of the STLS problem, then the extended minimal backward error is the actual minimal backward error. It is time consuming to compute the extended minimal backward error directly, hence in Sections 5 and 6 we derive a lower bound on it and an asymptotic estimate for it, respectively, both of which can be computed or estimated more efficiently. From these results, we show how to easily obtain the groundbreaking minimal backward error results for OLS problems given by Waldén et al. [9], and the extended minimal backward error results for DLS problems given by Chang et al. [5]; see Sections 4 6. We use I =[e 1,...,e n ] to denote the unit matrix. For any matrix B R m n, its -norm and F-norm are denoted by B and B F, respectively, its Moore Penrose generalized inverse is denoted by B, its smallest singular value (the pth largest singular value with p =min{m,n}) by σ min (B), its smallest eigenvalue is denoted by λ min (when B is symmetric) and its condition number in the -norm by κ (B). For any matrix B =[b 1,...,b n ],definevec(b)=[b1 T,...,bT n ]T. For any vector v R n, its -norm is denoted by v and its Moore Penrose generalized inverse is { 0 if v =0 v v T / v if v =0. SCALED TOTAL LEAST-SQUARES PROBLEMS For given A R m n, b R m and γ>0, the scaled total least-squares (STLS) problem with data [A,b] can be formulated as (see [8]) STLS distance σ S min [E, f γ] F subject to (A+ E)x =b+ f (1) E, f,x A theoretically equivalent but different formulation can be found in [6]. The STLS problem (1) may not have a solution, but it does have a unique solution if the following condition holds (see [6], (1.11)): rank(a)=n and b U min ()

3 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 69 where U min is the left singular vector subspace of A corresponding to its minimal singular value σ min (A). From now on, we assume that () holds. It was shown in [10] that σ S in (1) is the global minimum of the function σ b Ax (x) γ + x (3) It can be proven (see [11, Theorem.7]; [6, Section 6]) that when () holds, ˆx solves (1) if and onlyifitsatisfies A T b A ˆx (b A ˆx) = γ ˆx (4) + ˆx b A ˆx γ + ˆx < σ min (A) (5) Note that (4) can be obtained by setting the derivative of σ (x) in (3) equal to zero. All stationary points of σ (x) therefore satisfy (4). The global minimum must also satisfy (5). It is easy to observe from (4) and (5) (see also [6, Section 6]) that when γ=1, ˆx becomes the TLS solution, when γ 0, ˆx converges to the OLS solution (note that in this case the left-hand side of (5) becomes zero and (5) holds automatically), and when γ, ˆx converges to the DLS solution. 3. BACKWARD ERROR ANALYSIS Suppose we have obtained a nonzero approximation y R n to the solution vector ˆx of (1). In order to find the closest STLS problem whose solution is actually y, we see from (3) that we need to solve a backward error problem of the form min ΔA,Δb [ΔA,Δb θ] F subject to y =argmin x b+δb (A+ΔA)x γ + x (6) Here, the chosen scalar θ>0 (different, sometimes, from γ in (1)) allows a different emphasis on each data error. The above is the objective function that we use here to define closest. In order to solve (6), we first need to characterize the set of [ΔA, Δb] satisfying the equality in (6). It follows from (4) and (5) that y is the exact STLS solution for the data set [A+ΔA,b+Δb] if and only if [ΔA,Δb] is in the following set: { C A,b (γ) [ΔA,Δb]:(A+ΔA) T [b+δb (A+ΔA)y] = b+δb (A+ΔA)y γ + y } y, b+δb (A+ΔA)y γ + y <σ min (A+ΔA) A natural idea to solve (6) is to find an explicit expression for the set C A,b (γ) and then minimize [ΔA,Δb θ] F over this set. Unfortunately, the inequality in (7) makes it difficult to derive a general expression for [ΔA,Δb] C A,b (γ), hence we initially ignore it and consider a larger set { } C + A,b (γ) [ΔA,Δb]:(A+ΔA) T [b+δb (A+ΔA)y]= b+δb (A+ΔA)y γ + y y (8) (7)

4 630 X.-W. CHANG AND D. TITLEY-PELOQUIN The following result from Theorem 3. of [1] gives an explicit relation between ΔA and Δb for any [ΔA,Δb] C + A,b (γ). Lemma 3.1 [ΔA,Δb] C + A,b (γ) if and only if A+ΔA = γ ww (b+δb)y T +(I ww )[(b+δb)y + Z(I yy )] (9) for w R m and Z R m n. Based on Lemma 3.1, we now solve the extended minimal backward error problem μ(y,γ,θ) min [ΔA,Δb θ] F (10) [ΔA,Δb] C + A,b (γ) Theorem 3.1 Suppose we are given A R m n, b R m, nonzero approximate STLS solution y R n, γ>0 and θ>0; and suppose that () holds. Let r b Ay and N A(I yy θ r ), (I rr 1+θ θ ), (Ay+γ y θ y b) (11) y +γ 4 y 4 θ M M(y,γ,θ) A(I yy )A T 1+θ y rrt + θ y +γ 4 y 4 (Ay+γ y b) (Ay+γ y b) T = NN T θ r 1+θ y I (1) θ Then M has at most one negative eigenvalue, and θ r μ 1+θ if λ min (M) 0 y (y,γ,θ)= θ r 1+θ y +λ min(m)=σ min (N) if λ min(m)<0 Furthermore, μ(y,γ,θ) is given by the following backward perturbations in A and b: θ 1+θ y ryt if λ min (M) 0 [ θ ΔA = 1+θ y ryt w w T A+ γ (θ γ ) +γ 4 y by T θ ] if λ min (M)<0 + γ4 +γ 4 θ y +θ 4 (1+θ y )(θ +γ 4 y ) ryt (13) (14)

5 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS θ y r if λ min(m) 0 [ 1 Δb = 1+θ y r w w T γ (1+γ y ) θ +γ 4 y b ] (γ +θ )(1+γ y ) (1+θ y )(θ +γ 4 y ) r if λ min (M)<0 where w is the unit eigenvector of M corresponding to λ min (M), or equivalently the unit left singular vector of N corresponding to σ min (N). Proof On the right-hand side of the second equality of (1), both the first and the third terms are nonnegative definite, while the second term is a rank one matrix. By [13, Theorem 4.3.4(b)] with k =1, M has at most one negative eigenvalue. From Lemma 3.1, we see that ΔA and Δb are functions of w and Z. Therefore, to solve (10), we need to find the optimal w and Z. We discuss two cases separately. Case 1: The optimal w =0. The proof is almost the same as the corresponding part in the proof of Theorem. in [5]. But for readers convenience, we give it here. In this case, from (9) we have ΔA =(b+δb)y + Z(I yy ) A (16) Given a nonzero y, we can find Y R n (n 1) such that Y =[y/ y,y ] is orthogonal. Thus ΔAY =[(b+δb)/ y,0]+z[0,y ] [Ay/ y, AY ]=[(r +Δb)/ y,(z A)Y ] Therefore, we have [ΔA,Δb θ] F = r +Δb / y + (Z A)Y F +θ Δb = 1 I r y Δb+ + (Z A)Y θ y I 0 F It follows that the optimal Δb and Ẑ satisfy I r 1 Δb = = θ y I 0 1+θ r, y Ẑ = A (17) Then from (16) the optimal ΔA satisfies ( ) 1 ΔA = b 1+θ ry Ayy θ = y 1+θ y ryt (18) leading to [ ΔA, Δb θ] F = θ r 1+θ y (19) Case : The optimal w =0. Let Y be as in Case 1. Since ww is independent of the length of w, we can assume that w =1 in (9). Construct W =[w,w ] R m m such that it is orthogonal. (15)

6 63 X.-W. CHANG AND D. TITLEY-PELOQUIN Then we have from (9) [ γ W T w T (b+δb) y w T Ay/ y w T ] AY ΔAY = W T (b+δb)/ y W T Ay/ y W T ZY W T AY Thus, it follows that [ΔA,Δb θ] F =[γ y w T (b+δb)+w T Ay/ y ] + w T AY F + W T (r +Δb) / y + W T (Z A)Y F +θ Δb =[γ y w T (b+δb)+w T Ay/ y ] + w T A(I yy ) F + (I ww T )(r +Δb) / y + W T (Z A)Y F +θ Δb = 1 y γ y w T γ y w T b+w T Ay I ww T Δb+ (I ww T )r θ y I 0 + w T A(I yy ) F + W T (Z A)Y F (0) Obviously Ẑ = A is optimal, and if w is fixed, the optimal Δb satisfies γ y w T Δb = I ww T T γ y w T 1 γ y w T T γ y w T b+w T Ay I ww T I ww T (I ww T )r θ y I θ y I θ y I 0 1 = θ +γ 4 y wwt (γ Ay+γ 4 y 1 b) 1+θ y (I wwt )r (1) where we used the Sherman Morrisson Woodbury formula to simplify the inverse. Thus, with the above Ẑ and Δb, usingw T w =1 we have from (0) that [ΔA, Δb θ] F = θ θ y +γ 4 y 4 wt (Ay+γ y b)(ay+γ y b) T w θ + 1+θ y r T (I ww T )r +w T A(I yy )A T w [ = w T A(I yy )A T + θ r 1+θ y (I rr ) ] θ + θ y +γ 4 y 4 (Ay+γ y b)(ay+γ y b) T w = w T NN T w = θ r 1+θ y +wt Mw ()

7 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 633 Then the minimal value is reached when w is equal to w, the unit left singular vector of N corresponding to its smallest singular value σ min (N), or equivalently, the unit eigenvector of M corresponding to its smallest eigenvalue. If λ min (M) 0, from (19) and (), we can see that the minimum value of [ΔA, Δb θ] F is reached in Case 1 and the top equalities in (13), (14) and (15) follow from (19), (18) and (17), respectively. If λ min (M)<0, from (19) and (), we can see that the minimum value of [ΔA, Δb θ] F is reached in Case and the bottom equality in (13) follows from (). In this case, the bottom equality in (15) can be obtained from (1) by some simple algebraic operations and the bottom equality in (14) can easily be proved from (9) with the optimal w, Δb and Z found in Case. Corollary 3.1 With the notation and conditions of Theorem 3.1 and the STLS solution ˆx of (4) and (5), define M M( ˆx,γ,θ) (see (1)) and ˆr b A ˆx. Then lim μ(y,γ,θ)=μ( ˆx,γ,θ)=0 (3) y ˆx Proof First, we consider the case that ˆr =0. Multiplying both sides of (4) by ˆx from the left, we obtain ˆx T A T ˆr = ˆx (b A ˆx) T ˆr γ + ˆx = ˆx γ + ˆx ˆxT A T ˆr ˆx γ + ˆx bt ˆr leading to ˆx T A T ˆr +γ ˆx T ˆxb T ˆr =0. Multiplying both sides of (4) by I ˆx ˆx from the left, we immediately obtain (I ˆx ˆx )A T ˆr =0. Therefore, from (1), M ˆr = A(I ˆx ˆx )A T ˆr θ ˆr 1+θ ˆx ˆr + θ θ ˆx +γ 4 ˆx (A ˆx 4 +γ ˆx b)( ˆx T A T +γ ˆx b T )ˆr = θ ˆr 1+θ ˆr ˆλˆr ˆx But by Theorem 3.1, M has at most one negative eigenvalue. Thus, λ min ( M)= ˆλ<0 and from the bottom of (13) μ( ˆx,γ,θ)=0. When y is close enough to ˆx, λ min (M)<0. Therefore, ( ) θ 1/ ) r lim μ(y,γ,θ)= lim y ˆx y ˆx 1+θ y +λ θ min(m) =( 1/ ˆr 1+θ ˆx +λ min( M) =0 Now we consider the case that ˆr =0. Since μ( ˆx,γ,θ) 0, from (13) we must have λ min ( M) 0. Thus from (13) ( ) lim μ(y,γ,θ) = lim min θ 1/ ) r θ,( 1/ r y ˆx y ˆx 1+θ y 1+θ y +λ min(m) ( ) θ 1/ ) ˆr θ = min,( 1/ ˆr 1+θ ˆx 1+θ ˆx +λ min( M) = μ( ˆx,γ,θ)=0

8 634 X.-W. CHANG AND D. TITLEY-PELOQUIN Since C A,b (γ) C + A,b (γ), μ(y,γ,θ) min [ΔA,Δb θ] F min [ΔA,Δb θ] [ΔA,Δb] C + A,b (γ) F [ΔA,Δb] C A,b (γ) The following theorem, which is analogous to Theorem.8 of [5] states that if y is close enough to the true solution ˆx, as is the goal in practice, then μ(y,γ,θ) is in fact the minimal backward error. Theorem 3. With the notation and conditions of Theorem 3.1 and the STLS solution ˆx of (4) and (5), there exists ε>0 such that for all y satisfying y ˆx ε, μ(y,γ,θ) is the true minimal backward error. Proof The proof is similar to that for Theorem.8 of [5]. For any given y, Theorem 3.1 shows that ΔA satisfying (14) and Δb satisfying (15) are the minimizers of (10). Notice that when y ˆx we have from Corollary 3.1 that μ(y,γ,θ) 0, in other words ΔA 0 and Δb 0. Thus Since lim y ˆx ( b+ Δb (A+ ΔA)y γ + y b A ˆx γ + ˆx σ min (A)<0 see (5), there must exist ε>0 such that when y ˆx <ε, ) σ min (A+ b A ˆx ΔA) = γ + ˆx σ min (A) b+ Δb (A+ ΔA)y γ + y σ min (A+ ΔA)<0 Therefore, when y ˆx <ε, [ ΔA, Δb] C A,b (γ) and thus μ(y,γ,θ)=min [ΔA,Δb] C A,b (γ) [ΔA, Δbθ] F, hence μ(y,γ,θ) is in fact the true minimal backward error. We will give numerical examples in Section 7 showing that when y is the reasonable approximation to the exact solution of the STLS problem (1), ΔA and Δb usually satisfy the inequality in (7). In such cases, μ(y,γ,θ) is the actual minimal backward error. 4. MINIMAL BACKWARD ERRORS FOR OLS AND DLS PROBLEMS The OLS minimal backward error problem and the DLS extended minimal backward error problem (see [5]) are, respectively, μ OLS (y,θ) min [ΔA,Δb θ] F [ΔA,Δb] C + A,b (0) μ DLS (y,θ) min [ΔA,Δb θ] F [ΔA,Δb] C + A,b ( )

9 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 635 wherewehaveusedc + A,b ( ) to denote lim γ C+ A,b (γ) From (13) we see that μ(y,γ,θ) is a continuous function of γ. Then from the definition of μ(y,γ,θ) in (10), we have μ OLS (y,θ)= lim γ 0 μ(y,γ,θ), μ DLS (y,θ)= lim μ(y,γ,θ) (4) γ In this section, we show that using Theorem 3.1 we can easily obtain the backward error results for the OLS problem obtained in [9] and the backward error results for the DLS problem obtained in [5] Minimal backward error for OLS From (1) we have M OLS lim M(y,γ,θ)= AA T γ 0 1+θ y rrt Then by the continuity of eigenvalues, ( ) lim λ min(m(y,γ,θ))=λ min lim M(y,γ,θ) =λ min (M OLS ) γ 0 γ 0 Then from (4) and (13), we obtain θ r μ 1+θ if λ min (M OLS ) 0 y OLS (y,θ)= lim γ 0 μ (y,γ,θ)= θ r 1+θ y +λ min(m OLS ) if λ min (M OLS )<0 This is identical to the formula for the OLS minimal backward error, with perturbations in both A and b, derived in [9]. Furthermore, the optimal perturbations ΔA and Δb in (14) and (15) become θ 1+θ y ryt if λ min (M OLS ) 0 θ ΔA OLS lim ΔA = γ 0 1+θ y ryt [ ] w OLS wols T θ A+ 1+θ y ryt if λ min (M OLS )<0 1 1+θ Δb OLS lim Δb y r if λ min(m OLS ) 0 = γ θ y (I w OLSw T OLS )r if λ min(m OLS )<0 θ

10 636 X.-W. CHANG AND D. TITLEY-PELOQUIN where w OLS is the unit eigenvector of M OLS corresponding to λ min (M OLS ). The formulas for ΔA OLS and Δb OLS were also derived in [9]. 4.. Extended minimal backward error for DLS From (1) we have M DLS lim M(y,γ,θ)= A(I γ yy )A T 1+θ y rrt +θ bb T Once again using the continuity of eigenvalues, ( ) lim λ min(m(y,γ,θ))=λ min lim M(y,γ,θ) =λ min (M DLS ) γ γ Then from (4) and (13), we obtain θ r μ 1+θ y, λ min(m DLS ) 0 DLS (y,θ) lim γ μ (y,γ,θ)= θ r 1+θ y +λ min(m DLS ), λ min (M DLS )<0 This is the formula for the DLS extended minimal backward error, with perturbations in both A and b, derived in [5]. It is also straightforward to verify that the optimal perturbations ΔA and Δb in (14) and (15) become ΔA DLS lim γ θ 1+θ y ryt if λ min (M DLS ) 0 θ ΔA = 1+θ y ryt w DLS w T DLS [ θ ] if λ min (M DLS )<0 A by + 1+θ y 1+θ y ry 1 1+θ Δb DLS lim Δb y r if λ min(m DLS ) 0 = γ 1 1+θ y (I w DLSw T DLS )r w DLSwDLS T b if λ min(m DLS )<0 where w DLS is the unit eigenvector of the matrix M DLS corresponding to its smallest eigenvalue. These formulas were also given in [5].

11 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS A LOWER BOUND ON μ(y,γ,θ) The formula (13) for μ(y,γ,θ) involves the smallest singular value of the m (m +n +1) matrix N, which is expensive to compute directly. In this section we suggest a (hopefully) good lower bound on μ(y, γ, θ), which can easily be estimated. Theorem 5.1 With the notation and conditions of Theorem 3.1, μ(y,γ,θ) min [ΔA,Δb θ] μ lb β (y,γ,θ) 0 (5) [ΔA,Δb] C + A,b β (γ) 1 +4β 0 +β 1 where (γ + y )A T r + r y β 0 (γ + y ) θ + y + y (θ + y ) β 1 (γ + y ) θ + y A +(γ + y ) r + θ + y y r (γ + y ) θ + y + y (θ + y ) (6) (7) Proof Obviously the first inequality in (5) holds. For any [ΔA,Δb] C + A,b (γ), we have from (8) that ( ) T ( ) I y (γ + y y ) A+[ΔA,Δbθ] r +[ΔA,Δbθ] = r +[ΔA,Δbθ] y 0 Denoting F [ΔA,Δb θ] and α γ + y,weget θ 1 [ ( ) y y αa T r + r y = αa T F ] α[i,0]f T r + F θ 1 θ 1 y y r T F y [ y T, θ 1 ]F T F y Taking the -norm of both sides of this equation, we obtain the inequality ( ) y αa T r + r y α A +α r + r y y F θ 1 θ 1 y + α + y y F θ 1 θ 1 θ 1 θ 1 θ 1

12 638 X.-W. CHANG AND D. TITLEY-PELOQUIN that is, with (6) and (7), the quadratic inequality in terms of ξ F : β 0 β 1 ξ+ξ Since ξ and β 0 are nonnegative, ξ ξ +, where ξ + is the positive root of β 0 =β 1 ξ+ξ, hence ξ ξ + = 1 ) 1 ( β 1 +4β 0 β 1 )=β 0 ( β 1 +4β 0 +β 1 giving the second inequality in (5). The lower bound μ lb (y,γ,θ) in (5) can be estimated in O(mn) flops, since A can usually be estimated by a standard norm estimator in O(mn) flops, see, for example, [14, Section 15.]. In fact a good estimate of A might already be available from whatever method is used for obtaining y, and in this case the cost will essentially be the 4mn flops required for computing A T (b Ay). This bound can be compared with the lower bounds for the OLS minimal backward error such as those discussed in [9, 15]. If in (5) we take γ 0 to obtain a bound for the OLS problem and then take θ to restrict the perturbations to A only, we obtain μ OLS (y, ) lim θ μ OLS(y,θ)= lim θ lim γ 0 μ(y,γ,θ) A T r A y + r + ( A y + r ) +4 A T r y This is exactly one of the bounds given in [9]. If in (5) we take γ to obtain a lower bound for the DLS extended minimal backward error and then take θ to restrict the perturbations to A only, we obtain μ DLS (y, ) lim μ DLS(y,θ)= lim lim μ(y,γ,θ) ˆβ 0 θ θ γ ˆβ 1 +4ˆβ 0 + ˆβ 1 where y ˆβ A T r + r y 0 = y 3, ˆβ1 = A y +3 r y This lower bound for the DLS problem was derived in [5]. 6. AN ASYMPTOTIC ESTIMATE FOR μ(y,γ,θ) Computing μ(y,γ,θ) exactly is expensive and the lower bound (5) may not be very tight. In this section, by following [5, 16], we give an asymptotic estimate for μ(y,γ,θ) and show how it is related to other asymptotic estimates in the literature for the OLS and DLS problems. As in the previous sections, denote r b Ay and also α=γ + y. Let h(a,b, y) A T r + r γ + y y = AT r + r α 1 y (8)

13 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 639 Recall from Section 3 that if ˆx is the true STLS solution, h(a,b, ˆx)=0. Then μ(y,γ,θ)= [ ΔA, Δb θ] F, where { ΔA, Δb} is the solution to h(a+δa,b+δb, y)=0 that minimizes [ ΔA, Δb θ] F. By Taylor s expansion, for small enough E R m n and f R m, vec(e) h(a+ E,b+ f, y) h(a,b, y)+ J A vec(e)+ J b f =h(a,b, y)+[j A,θ 1 J b ] f θ where J A R n mn and J b R n m are the Jacobian matrices of h with respect to vec(a) and b, respectively. Therefore, an approximation to [ ΔA, Δb θ] is [E, f θ] such that vec(e) [E, f θ] F =min s.t. h(a,b, y)+[j A,θ 1 J b ] =0 (9) f θ In other words, vec(e) f θ = [J A,θ 1 J b ] h(a,b, y) (30) Thus, we obtain the following approximation to μ(y,γ,θ): vec(e) μ(y,γ,θ) [E, f θ] F = f θ = [J A,θ 1 J b ] h(a,b, y) (31) We say that μ(y,γ,θ) is an asymptotic estimate of μ(y,γ,θ) due to the following result. Theorem 6.1 Using the notation of Theorem 3.1 with μ(y,γ,θ) defined as in (10) and μ(y,γ,θ) as in (31), μ(y,γ,θ) lim y ˆx μ(y,γ,θ) =1 Proof The proof is similar to that for Theorem 4.1 of [5]. A Taylor expansion of h(a+ ΔA,b+ Δb, y) gives vec( ΔA) 0=h(A+ ΔA,b+ Δb, y)=h(a,b, y)+[j A,θ 1 J b ] +O( [ ΔA, Δb] F Δb θ ) (3) Substituting the resulting expression for h(a,b, y) into (30) gives vec(e) vec( ΔA) =[J A,θ 1 J b ] [J A,θ 1 J b ] +O( [ ΔA, Δb] F f θ Δb θ ) Taking the -norm of each side and noticing that [J A,θ 1 J b ] [J A,θ 1 J b ] is an orthogonal projection matrix, we obtain μ(y,γ,θ) μ(y,γ,θ)+o( [ ΔA, Δb] F ) (33)

14 640 X.-W. CHANG AND D. TITLEY-PELOQUIN On the other hand, from (9) and (3) we obtain vec( ΔA) vec(e) [J A,θ 1 J b ] Δb θ =[J A,θ 1 J b ] f θ +O( [ ΔA, Δb] F ) Since [J A,θ 1 J b ] has full row rank (this will be demonstrated below), we can write ([ ) vec( ΔA) vec(e) [J A,θ 1 J b ] =[J A,θ 1 J b ] ]+O( [ ΔA, Δb] F Δb θ f θ Since vec( ΔA) Δb θ is the vector satisfying the above equality with minimum -norm, we must have μ(y,γ,θ) μ(y,γ,θ)+o( [ ΔA, Δb] F ) (34) The result follows from (33) and (34) with Corollary 3.1. In the following, we derive an explicit expression for μ(y,γ,θ). Write A =[a 1,...,a n ] and y T =[η 1,...,η n ]. Given two column vectors f R m and g R n, define the matrix f/ g T ( f i / g j ) R m n. With this notation, using (8) we obtain [ ] J b = h b T = AT +α 1 yr T h, J A =,..., h an T (35) a T 1 Note that r a T k = η k I, r a T k =r T r a T k = η k r T, (ai Tr) =δ ik r T η k ai T a T k where δ ik =1ifi =k and δ ik =0otherwise.Thisgives h a T k =e k r T η k A T α 1 η k yr T (36) From (35) and (36), we obtain J b Jb T = AT A+α 1 (A T ry T + yr T A)+4 r α yy T J A JA T = n (e k r T η k A T α 1 η k yr T )(e k r T η k A T α 1 η k yr T ) T k=1 = y A T A+α 1 ( y α)(a T ry T + yr T A)+ r I +4 r α ( y α)yy T = y A T A+α 1 ( y γ )(A T ry T + yr T A)+ r I 4 r α γ yy T

15 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 641 Denoting J [J A,θ 1 J b ] and introducing the scalars it follows that ξ 0 y +θ, ξ 1 y γ +θ (37) αξ 0 JJ T J A J T A +θ J b J T b = ξ 0 AT A+ξ 0 ξ 1 (A T ry T + yr T A)+ r I +4 r α (θ γ )yy T = (ξ 0 A+ξ 1 ry T ) T (ξ 0 A+ξ 1 ry T )+ r I +[4 r α (θ γ ) ξ 1 r ]yy T = (ξ 0 A+ξ 1 ry T ) T (ξ 0 A+ξ 1 ry T )+ r I r ξ 0 yyt = (ξ 0 A+ξ 1 ry T ) T (ξ 0 A+ξ 1 ry T )+ r I + r y ξ 0 (I yy ) r y ξ 0 I = (ξ 0 A+ξ 1 ry T ) T (ξ 0 A+ξ 1 ry T )+θ ξ 0 r I +ξ 0 r y (I yy ) = ξ 0 BT B where A+ξ 1 0 ξ 1ry T A+ γ θ y θ +γ (1+γ y )(1+θ y ) ryt B ξ 0 r y (I yy ) = θ r y θ 1 ξ 0 r I 1+θ y (I yy ) θ r 1+θ y I (38) Notice from the last block of B that B has full column rank. Therefore, J [J A,θ 1 J b ] has full row rank. Defining θ r 1+θ y c 0 (39) r (θ γ ) y (1+γ y ) 1+θ y it is straightforward (but tedious) to verify that B T c =ξ 1 0 h(a,b, y). Using (31), we obtain μ(y,γ,θ) = J h(a,b, y) = J T (JJ T ) 1 h(a,b, y) = (JJ T ) 1/ h(a,b, y) = (ξ 0 BT B) 1/ h(a,b, y) = B(B T B) 1 B T c = BB c (40)

16 64 X.-W. CHANG AND D. TITLEY-PELOQUIN Notice that BB is an orthogonal projector onto the range of B and μ(y,γ,θ) can be computed by using the QR factorization of B. This computation is more efficient than computing the SVD of N in (11) to obtain μ(y,γ,θ). The asymptotic estimate μ(y,γ,θ) is analogous to an estimate of the minimal backward error for the OLS problem whose various forms have been studied in [16 19], as well as for the DLS problem, derived in [5]. In the following sections, we show exactly how these are related Relationship to an OLS asymptotic estimate Theorem 4.4 in [16] gives an asymptotic estimate for the minimal backward error for the OLS problem when only A is perturbed. This is extended in [17] and [19, Section.7] to the case when both perturbations to A and b are allowed. Using our notation, with ξ 0 defined in (37), the estimate is μ OLS (y,θ) (ξ 0 AT A+ r I ) 1/ A T r (41) It has the property that lim y ˆx μ OLS (y,θ)/μ OLS (y,θ)=1. To compare our estimate μ(y,γ,θ) with that in (41) we need to take the limit as γ 0 to specialize our result to the OLS problem. It can be verified from (38) and (8) that lim γ 0 ξ 0 BT B =ξ 0 AT A yr T A A T ry T + r I, lim h(a,b, y)= A T r γ 0 Hence, from (40) lim γ 0 μ(y,γ,θ)= (ξ 0 AT A yr T A A T ry T + r I ) 1/ A T r (4) Notice that the terms involving A T ry T in (4) are not present in (41). However, in the OLS problem, A T r 0 wheny ˆx. Therefore, in the limit as γ 0 our estimate μ(y,γ,θ) in (40) is asymptotically equivalent to μ OLS (y,θ) in (41). 6.. Relationship to a DLS asymptotic estimate Section 4 in [5] introduced the following asymptotic estimate of the DLS extended minimal backward error with perturbations only in A μ DLS (y, ) B DLS B DLS c DLS (43) where [ A+ry ] r/ y B DLS, c DLS r / y (I yy ) 0 This estimate μ DLS (y, ) has the property that lim y ˆx μ DLS (y, )/μ DLS (y, )=1. To compare our estimate μ(y,γ,θ) with that in (43) we first take the limit as γ to specialize our result to the DLS problem, and then take the limit as θ to restrict the perturbations to A

17 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 643 only. From (38) and (39), when γ and then θ, we obtain A+ry r/ y lim lim B = BDLS θ γ r / y (I yy ) =, lim 0 lim c = 0 = θ γ 0 0 It immediately follows from (40) that lim θ lim γ 0 μ(y,γ,θ)= μ DLS (y, ). [ cdls 0 ] 7. NUMERICAL EXAMPLES To illustrate our results we provide some numerical examples. Recall from Section 3 that the extended minimal backward error μ(y,γ,θ) in (10) is a lower bound on the true minimal backward error. Furthermore, from Theorem 3., if y is close enough to the true solution ˆx, thenμ(y,γ,θ) is in fact the true minimal backward error. We will test how close y must be to ˆx for this to be the case. We can verify whether or not μ(y,γ,θ) is the true minimal backward error as follows. If the inequality in (7) holds with the optimal ΔA and Δb in (14) and (15), then minimizing over C + A,b (γ) in (8) gives the same solution as minimizing over C A,b (γ) in (7), hence μ(y,γ,θ) is indeed the true minimal backward error. We performed our numerical examples in MATLAB version on a 3.0 GHz Intel Pentium 4 processor running Gentoo Linux. The functions randn and rand below are MATLAB built-in functions that generate matrices whose elements are sampled from normal and uniform distributions, respectively. We create our test problems as follows: We create two types of matrices A R : Type 1: A = Ã/ Ã F, Ã =randn(10,80). Usually κ (A) 10. Type : A = Ã/ Ã F, Ã =UΣV T, where U and V are the Q-factors of the QR factorization of random matrices rand(10,10) and rand(80,80), respectively, and Σ=diag(σ i ) with logarithmically equally spaced singular values σ i between 10 0 and Note that here κ (A)=10 4. In both cases we create b as follows: b = Ax, where x =[1,...,1] T. We let A = A+(1/( 10 80))δ A rand(10,80) and b =b+(1/( 10))δ b b rand(10,1) so that the system is likely no longer compatible, with δ A =δ b =10 6,...,10 1. For simplicity we use γ=θ=1. Results with other parameters γ and θ are very similar. We compute the STLS solution ˆx by using the SVD of [A,bγ] (see [6]) and let the approximate solution y be y =ˆx + 1 δ ˆx ˆx rand(80,1) 80 with δ x =0,10 6,...,10 1. For each pair of δ A and δ ˆx and each type of matrix, we generate 1000 sample problems. In our numerical tests, we use single precision to generate the data and to compute the STLS solution ˆx. When verifying if the inequality in (7) holds with the optimal ΔA and Δb in (14) and (15), we must take into account the fact that the computations involved in verifying the inequality are themselves subject to rounding errors. To see what effect this has on our results, we first use

18 644 X.-W. CHANG AND D. TITLEY-PELOQUIN single precision to generate the data A and b, and to compute ˆx and y. We then use both single and then double precision to compute the quantities b+ Δb (A+ ΔA)y γ + y and σ min (A+ ΔA) The number of failures to satisfy the inequality in (7) with optimal ΔA and Δb, when verified using single precision (S) and double precision (D), are presented in Tables I and II. Tables I and II indicate that the inequality almost always holds when it is verified using double precision, and holds less frequently for Type matrices that are more ill-conditioned when it is verified using single precision. This strongly suggests that the times in which the inequality failed to hold in Tables I and II were likely due to rounding errors when actually verifying the inequality. Table I. Number of failures to satisfy the inequality in (7) out of 1000 samples for Type 1 A δ A, δ b 10 6 S D S D S D S D S D S D δ ˆx Table II. Number of failures to satisfy the inequality in (7) out of 1000 samples for Type A δ A, δ b 10 6 S D S D S D S D S D S D δ ˆx

19 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 645 Figure 1. Type 1 A : μ lb (y,γ,θ) (dots) and μ(y,γ,θ) (stars) versus μ(y,γ,θ). Note that ˆx in our tests is not the exact STLS solution but a computed solution (computed by a numerically reliable algorithm). Therefore, each y is not a perturbation of the true STLS solution but rather of a computed solution. However, we note that repeating the above experiment with ˆx computed in double precision gave almost identical results. We conclude that when the approximate solution y is a reasonable approximation to the true STLS solution, it is usually reasonable to expect that μ(y,γ,θ) in (10) is indeed the true minimal backward error. The formula for μ(y,γ,θ) in (13) involves the smallest singular value of an m (m +n +1) matrix, which is expensive to compute directly. In Sections 5 and 6 we gave two estimates for

20 646 X.-W. CHANG AND D. TITLEY-PELOQUIN Figure. Type A : μ lb (y,γ,θ) (dots) and μ(y,γ,θ) (stars) versus μ(y,γ,θ). μ(y,γ,θ): the lower bound μ lb (y,γ,θ) in (5) and the asymptotic estimate μ(y,γ,θ) in (40). We will test how good an approximation of these quantities are to μ(y,γ,θ). In Figures 1 and we plot the lower bound μ lb (y,γ,θ) versus μ(y,γ,θ) in dots and the asymptotic estimate μ(y,γ,θ) versus μ(y,γ,θ) in stars. The diagonal line is plotted for reference. Here, we show four cases for each test problem, with all quantities computed in double precision. Results for all the other test problems were very similar.

21 BACKWARD PERTURBATION ANALYSIS FOR STLS PROBLEMS 647 These (as well as other) examples suggest that the lower bound μ lb (y,γ,θ) gives good order of magnitude estimates of μ(y,γ,θ), while the asymptotic estimate μ(y,γ,θ) gives an excellent estimate of μ(y,γ,θ). 8. CONCLUSIONS Given an approximate STLS solution y, we have found an expression for an extended minimal backward error μ(y,γ,θ) in (10). This is an asymptotically tight lower bound on the true minimal backward error. Our numerical tests suggest that when the approximate STLS solution y is a reasonable approximation to the true STLS solution ˆx, μ(y,γ,θ) is the true minimal backward error. We therefore believe that μ(y,γ,θ) can be used in practice as a measure of backward error. Since μ(y,γ,θ) is very expensive to compute directly, we have given a lower bound on it, μ lb (y,γ,θ) in (5), as well as an asymptotic estimate for it, μ(y,γ,θ) in (40). In all our numerical tests the lower bound μ lb (y,γ,θ) gives good order of magnitude estimates of μ(y,γ,θ), while the asymptotic estimate μ(y, γ, θ) gives an excellent estimate of μ(y, γ, θ). The lower bound can be computed very efficiently. In the future we intend to find efficient and reliable ways to estimate the asymptotic estimate μ(y, γ, θ). The main contribution of this paper is to provide a backward perturbation analysis for the STLS problem, and in so doing unify the backward perturbation analyses for OLS, DLS and TLS problems. In the extreme cases as γ 0 andγ, the STLS problem becomes the OLS and DLS problems, respectively. We have shown how μ(y,γ,θ), its lower bound μ lb (y,γ,θ) and its asymptotic estimate μ(y,γ,θ) specialize to corresponding estimates of backward error in the literature for the OLS and DLS problems. ACKNOWLEDGEMENTS We are grateful to Chris Paige for his many valuable suggestions. We also thank two referees for their helpful comments. REFERENCES 1. Arioli M, Duff I, Ruiz D. Stopping criteria for iterative solvers. SIAM Journal on Matrix Analysis and Applications 199; 13: Chang X-W, Paige CC, Titley-Peloquin D. Stopping criteria for the iterative solution of linear least squares problems. SIAM Journal on Matrix Analysis and Applications, to appear. 3. Paige CC, Saunders MA. LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Transactions on Mathematical Software 198; 8: Rigal JL, Gaches J. On the compatibility of a given solution with the data of a linear system. Journal of the ACM 1967; 14: Chang X-W, Golub GH, Paige CC. Towards a backward perturbation analysis for data least squares problems. SIAM Journal on Matrix Analysis and Applications 008; 30(4): Paige CC, Strakoš Z. Scaled total least squares fundamentals. Numerische Mathematik 00; 91: Paige CC, Strakoš Z. Unifying least squares, total least squares and data least squares. In Total Least Squares and Errors-in-Variables Modeling, Van Huffel S, Lemmerling P (eds). Kluwer Academic Publishers: Dordrecht, 00; Rao BD. Unified treatment of LS, TLS and truncated SVD methods using a weighted TLS framework. In Recent Advances in Total Least Squares Techniques and Errors-in-Variables Modelling, Van Huffel S (ed.). SIAM: Philadelphia, PA, 1997; 11 0.

22 648 X.-W. CHANG AND D. TITLEY-PELOQUIN 9. Waldén B, Karlson R, Sun J-G. Optimal backward perturbation bounds for the linear least squares problem. Numerical Linear Algebra with Applications 1995; : Golub GH, Van Loan CF. An analysis of the total least squares problem. SIAM Journal on Numerical Analysis 1980; 17: van Huffel S, Vandewalle J. The Total Least Squares Problem: Computational Aspects and Analysis. SIAM: Philadelphia, PA, Chang X-W, Paige CC, Titley-Peloquin D. Characterizing matrices that are consistent with given solutions. SIAM Journal on Matrix Analysis and Applications 008; 30(4): Horn RA, Johnson CR. Matrix Analysis. Cambridge University Press: New York, NY, Higham NJ. Accuracy and Stability of Numerical Algorithms (nd edn). SIAM: Philadelphia, PA, Karlson R, Waldén B. Estimation of backward perturbation bounds for the linear least squares problem. BIT 1997; 37: Grcar JF. Optimal sensitivity analysis of linear least squares. Technical Report LBNL-5434, Lawrence Berkeley National Laboratory, Grcar JF, Saunders MA, Su Z. Estimates of optimal backward perturbations for linear least squares problems. Manuscript, Gu M. Backward perturbation bounds for linear least squares problems. SIAM Journal on Matrix Analysis and Applications 1998; 0: Su Z. Computational methods for least squares problems and clinical trials. Ph.D. Thesis, Scientific Computing and Computational Mathematics, Stanford University, 005.

TOWARDS A BACKWARD PERTURBATION ANALYSIS FOR DATA LEAST SQUARES PROBLEMS

TOWARDS A BACKWARD PERTURBATION ANALYSIS FOR DATA LEAST SQUARES PROBLEMS X.-W. CHANG, G. H. GOLUB, AND C.C. PAIGE Xiao-Wen Chang and Chris Paige dedicate this in memory of their warm, generous and inspirational