A PARTIAL CONDITION NUMBER FOR LINEAR LEAST SQUARES PROBLEMS

Size: px

Start display at page:

Download "A PARTIAL CONDITION NUMBER FOR LINEAR LEAST SQUARES PROBLEMS"

George Stokes
5 years ago
Views:

1 A PARTIAL CONDITION NUMBER OR LINEAR LEAST SQUARES PROBLEMS MARIO ARIOLI, MARC BABOULIN, AND SERGE GRATTON CERACS Technical Report TR/PA/04/, 004 Also appeared as Rutherford Appleton Laboratory Technical Report RAL-TR Abstract We consider here the linear least squares problem min y R n Ay b where b R m and A R m n is a matrix of full column rank n and we denote x its solution We assume that both A and b can be perturbed and that these perturbations are measured using the robenius or the spectral norm for A and the Euclidean norm for b In this paper, we are concerned with the condition number of a linear function of x L T x where L R n k ) for which we provide a sharp estimate that lies within a factor of the true condition number Provided the triangular R factor of A from A T A = R T R is available, this estimate can be computed in kn flops We also propose a statistical method that estimates the partial condition number by using the exact condition numbers in random orthogonal directions If R is available, this statistical approach enables to obtain a condition estimate at a lower computational cost In the case of the robenius norm, we derive a closed formula for the partial condition number that is based on the singular values and the right singular vectors of the matrix A Keywords: Linear least squares, normwise condition number, statistical condition estimate, parameter estimation Introduction Perturbation theory has been applied to many problems of linear algebra such as linear systems, linear least squares, or eigenvalue problems [, 4,, 8] In this paper we consider the problem of calculating the quantity L T x, where x is the solution of the linear least squares problem LLSP) min x R n Ax b where b R m and A R m n is a matrix of full column rank n This estimation is a fundamental problem of parameter estimation in the framework of the Gauss-Markov Model [7, p 7] More precisely, we focus here on the evaluation of the sensitivity of L T x to small perturbations of the matrix A and/or the right-hand side b, where L R n k and x is the solution of the LLSP The interest for this question stems for instance from parameter estimation where the parameters of the model can often be divided into two parts : the variables of physical significance and a set of ancillary variables involved in the models or example, this situation occurs in the determination of positions using the GPS system, where the -D coordinates are the quantities of interest but the statistical model involves other parameters such as clock drift and GPS ambiguities [] that are generally estimated during the solution process It is then crucial to ensure that the solution components of interest can be computed with satisfactory accuracy The main goal of this paper is to formalize this problem in terms of a condition number and to describe practical methods to compute or estimate this quantity Note that as far as the sensitivity of a subset of the solution components is concerned, the matrix L is a projection whose columns consist of vectors of the canonical basis of R n The condition number of a map g : R m R n at y 0 measures the sensitivity of gy 0 ) to perturbations of y 0 If we assume that the data space R m and the solution space R n are equipped respectively with the norms D and S, the condition number Ky 0 ) is defined by Ky 0 ) = lim sup δ 0 0< y 0 y D δ gy 0 ) gy) S y 0 y D, ) whereas the relative condition number is defined by K rel) y 0 ) = Ky 0 ) y 0 D / gy 0 ) S This definition shows that Ky 0 ) measures an asymptotic sensitivity and that this quantity depends on the chosen norms for the data and solution spaces If g is a réchet-differentiable -differentiable) Rutherford Appleton Laboratory, Oxfordshire, England marioli@rlacuk CERACS, 4 av Gaspard Coriolis, 057 Toulouse Cedex, rance baboulin@cerfacsfr - gratton@cerfacsfr

2 function at y 0, then Ky 0 ) is the norm of the -derivative g y 0 ) ) see [6]), where is the operator norm induced by the choice of the norms on the data and solution spaces or the full rank LLSP, we have ga, b) = A T A) A T b If we consider the product norm A, b) = A + b for the data space and x for the solution space, then [8] gives an explicit formula for the relative condition number K rel) A, b): K rel) A, b) = A A r + x + ) A, b) x where A denotes the pseudo inverse of A, r = b Ax is the residual vector and and are respectively the robenius and Euclidean norms But does the value of K rel) A, b) give us useful information about the sensitivity of L T x? Can it in some cases overestimate the error in components or on the contrary be too optimistic? Let us consider the following example A = ɛ ɛ 0 ɛ 0 ɛ ɛ ɛ ɛ, x = ɛ ɛ ɛ and b = ɛ ɛ + ɛ ɛ + ɛ ɛ + ɛ x is here the exact solution of the LLSP min x R Ax b If we take ɛ = 0 8 then we have x = 0 8, 0 8, 0 8 ) T and the solution computed in Matlab using a machine precision 0 6 is x = 5 0 8, 5 0 8, 0 8 ) T The LLSP condition number is K rel) A, b) = and the relative errors on the components of x are x x x = x x x = 05 and x x x Then, if L = 0 0, we expect a large value for the condition number of L T x because there is 0 0 a 50% relative error on x and x If now L = 0, 0, ) T, then we expect that the condition number of L T x would be close to because x = x or these two values of L, the LLSP condition number is far from giving a good idea of the sensitivity of L T x Note in this case that the perturbations are due to roundoff errors Let us now consider a simple example in the framework of parameter estimation where in addition to roundoff errors, random errors are involved Let b = {b i },,0 be a series of observed values depending on data s = {s i } where s i = 0 + i, i =,, 0 We determine a -degree polynomial that approximates b in the least squares sense, and we suppose that the following relationship holds = 0 b = x + x s + x s + x 4 s with x = x = x = x 4 = We assume that the perturbation on each b i is 0 8 multiplied by a normally distributed random number and denote by b = { b i },,0 the perturbed quantity This corresponds to the LLSP min x R 4 Ax b where A is the Vandermonde matrix defined by A ij = Let x and ỹ be the computed solutions corresponding to two perturbed right-hand sides Then we obtain the following relative errors on each component: x ỹ x = 0 7, x ỹ x = 6 0 6, x ỹ x, s j i = 6 0 5, and x 4 ỹ 4 x 4 = 0 4 We have K rel) A, b) = 0 5 Regarding the disparity between the sensitivity of each component, we need a quantity that evaluates more precisely the sensitivity of each solution component

3 of the LLSP The idea of analyzing the accuracy of some solution components in linear algebra is by no means new or linear systems Ax = b, A R n and for LLSP, [] defines so called componentwise condition numbers that correspond to amplification factors of the relative errors in solution components due to perturbations of data A or b and explains how to estimate them In our formalism, these quantities are upper bounds of the condition number of L T x where L is a column of the identity matrix We also emphasise that the term componentwise refers here to the solution components and must be distinguished from the metric used for matrices and for which [] provides a condition number for generalized inversion and linear least squares or LLSP, [4] provides a statistical estimate for componentwise condition numbers due to either relative or structured perturbations In the case of linear systems, [] proposes a statistical approach, based on [] that enables to compute the condition number of L T x in On ) Our approach differs from the previous studies in the following aspects: we are interested in the condition of L T x where L is a general matrix and not only a canonical vector of R n, we are looking for a condition number based on the réchet-derivative, and not only for an upper bound of this quantity We present in this paper three ways to obtain information on the condition of L T x The first one uses an explicit formula based on the singular value decomposition of A The second is at the same time an upper bound of this condition number and a sharp estimate of it The third method supplies a statistical estimate The choice between these three methods will depend on the size of the problem computational cost) and on the accuracy desired for this quantity This paper is organized as follows In Section, we define the notion of a partial condition number Then, when perturbations on A are measured using a robenius norm, we give a closed formula for this condition number in the general case where L R n k and in the particular case when L R n In Section, we establish bounds of the partial condition number in robenius as well as in spectral norm, and we show that these bounds can be considered as sharp estimates of it In Section 4 we describe a statistical method that enables to estimate the partial condition number In Section 5 we present numerical results in order to compare the statistical estimate and the exact condition number on sample matrices A and L In Section 6 we give a summary comparing the three ways to compute the condition of L T x as well as a numerical illustration inally some concluding remarks are given in Section 7 Throughout this paper we will use the following notations We use the robenius norm and the spectral norm on matrices and the usual Euclidean on vectors The matrix I is the identity matrix and e i is the i-th canonical vector We also denote by ImA) the space spanned by the columns of A and by KerA) the null space of A The partial condition number of an LLSP Let L be an n k matrix, with k n We consider the function g : R m n R m R k A, b ga, b) = L T xa, b) = L T A T A) A T b ) Since A has full rank n, g is continuously -differentiable in a neighbourhood of A, b) and we denote by g its -derivative Let α and β be two positive real numbers In the present paper we consider the Euclidean norm for the solution space R k or the data space R m n R m, we use the product norms defined by A, b) = α A + β b, α, β > 0 and A, b) = α A + β b, α, β > 0

4 4 These norms are very flexible since they allow to monitor the perturbations on A and b or instance, large values of α resp β ) enable to obtain condition number problems where mainly b resp A) are perturbed A more general weighted robenius norm AT, βb), where T is a positive diagonal matrix is sometimes chosen This is for instance the case in [0] who give an explicit expression for the condition number of rank deficient linear least squares using this norm According to [6], the absolute condition numbers of g at the point A, b) using the two product norms defined above is given by: and κ g, A, b) = g A, b) A, b) max A, b) A, b) κ g, A, b) = g A, b) A, b) max A, b) A, b) The corresponding relative condition numbers of g at A, b) are expressed by and κ rel) g, A, b) = κ g, A, b) A, b) ga, b) κ rel) g, A, b) = κ g,a, b) A, b) ga, b) We call the condition numbers related to L T xa, b) partial condition numbers of the LLSP with respect to the linear operator L The partial condition number defined using the product norm, ) is given by the following theorem Theorem Let A = UΣV T be the thin singular value decomposition of A defined in [7] with Σ = diagσ i ) and σ σ σ n > 0 The absolute condition number of ga, b) = L T xa, b) is given by κ g, A, b) = SV T L where S R n n is the diagonal matrix with diagonal elements S ii = σ σi r + x i α + β Proof The demonstration is divided into three parts In Part, we establish an explicit formula of g A,b) A, b) A, b) A, b) In Part, we derive an upper bound for g A, b) In Part, we show that this bound is reached for a particular A, b) Part : Let A R m n and b R m Using the chain rules of composition of derivatives, we get ie g A, b) A, b) = L T A T A) A T b AA T A) A T b) L T A T A) A T AA T A) A T b + L T A b g A, b) A, b) = L T A T A) A T r L T A Ax + L T A b ) We write A = A + A by defining A = AA A projection of A on ImA)) and A = I AA ) A projection of A on ImA) ) We have A T r = 0 because r ImA) ) and A A = 0 Then we obtain g A, b) A, b) = L T A T A) A T r L T A A x + L T A b )

5 Part : We now prove that κ g, A, b) SV T L Let u i and v i be the i-th column of respectively U and V rom A = V Σ U T, we get AA = UU T = n u iu T i and since n v ivi T = I, we have A = n u iu T i A and A = I AA ) A n v ivi T Moreover, still using the thin SVD of A and A, it follows that 5 A T A) v i = v i σ i, A u i = v i σ i and A b = v i u T i b σ i 4) Thus ) becomes g A, b) A, b) = L T v i [v T i AT I AA ) r σ i n = L T v i y i, u T i A x σ i + u T i b σ i ] where we set y i = vi T AT I AA ) r u T σi i A x σ i + u T i b σ i R Thus if Y = y, y,, y n ) T, we get g A, b) A, b) = L T V Y and then g A, b) A, b) = L T V SS Y SV T L S Y We denote by w i = vt i AT I AA )r S iiσ i ut i Ax S iiσ i + ut i b S iiσ i the i-th component of S Y Then we have w i α v T i AT I AA ) T r αs ii σ i r α Sii σ4 i + x α S ii σ i + β Sii σ i + α u T i A x αs ii σ i + β u T i b βs ii σ i = S ii S ii α I AA ) Av i + α u T i A + β u T i b ) ) α I AA ) Av i + α u T i A + β u T i b ) Hence S Y n α I AA ) Av i + α u T i A + β u T i b = α I AA ) AV + α U T A + β U T b = α I AA ) A + α U T A + β U T b Since U T A = UU T A = AA A and U T b = UU T b b, we get S Y α A + α A + β b rom A = A + A, we get S Y A, b) and thus g A, b) A, b) SV T L A, b) So we have shown that SV T L is an upper bound for κ g, A, b) Part :

6 6 We now prove that this upper bound can be reached ie that SV T L A,b) A, b) = g A, b) for some A, b) R m n R m Let consider the particular choice of A, b) defined by holds A, b) = A + A, b) = α i α r r v T i + β i α u x T i, x γ i β u i) where α i, β i, γ i are real constants to be chosen in order to achieve the upper bound obtained in Part Since A T r = 0 and A A = 0, it follows from ) and 4) that g A, b) A, b) = L T A T A) = L T n = α i ασ i L T v i α i ασ i n α i α r vt i L T A v i r L T Thus by denoting ξ i = [L T r v i, L T x v ασi i X = α, β, γ,, α n, β n, γ n ) T R n we get n β i ασ i v i x + L T r β i ασ i x + γ i βσ i ) ασ i, LT v i βσ i β i α u i x + L T A γ i βσ i v i γ i β u i ] R k and Γ = [ξ,, ξ n ] R k n, and g A, b) A, b) = ΓX 5) ) ) Since i, j trace r r v T i )T r r v T i ) x = trace u T i x ) T x u T i x ) = δ ij ) where δ ij is the Kronecker symbol and trace r r v T i )T x u T i x ) = 0, then { r r v T i } x,,n and {u T i x },,n form an orthonormal set of matrices for the robenius norm and we get A = n α i + β i ) It follows that and Equation 5) yields We know that Γ = max X ΓX X n A, b) = n α i + n βi + γi = X, g A, b) A, b) A, b) = ΓX X is reached for some X = α, β, γ,, α n, β n, γ n ) T Then for the A, b) corresponding to this X, we have g A,b) A, b) A, b) = Γ urthermore we have ΓΓ T = L T v r α σ 4 + x α σ + β σ )v T L + + L T v n r α σn 4 + x α σn + β σn )vn T L = L T v Sv T L + + L T v n Snnv n T L = L T V S)SV T L) Hence Γ = ΓΓ T = SV T L

7 7 A,b) A, b) and α, β, γ,, α n, β n, γ n are such that g A, b) = SV T L Thus SV T L κ g, A, b), which concludes the proof Remark Let l j be the j-th column of L, j =,, k rom SV T L = S v T S nn v n T l,, l k ) = S v T l S v T l k S nn v T n l S nn v T n l k it follows that SV T L is large when there exists at least one large S ii and a l j such that v i T l j 0 In particular, the condition number of L T xa, b) is large when A has small singular values and L has components in the corresponding right singular vectors or when r is large Remark In the general case where L is an n k matrix, the computation of κ g, A, b) via the exact formula given in Theorem requires the computation of the singular values and the right singular vectors of A, which might be expensive in practice since it involves mn operations if we use a R-SVD algorithm and if m n see [7, p 54]) If the LLSP is solved using a direct method, the R factor of the QR decomposition of A or equivalently in exact arithmetic, the Cholesky factor of A T A) might be available Since the right singular vectors of A are also those of R, the condition number can be computed in about n flops using the Golub-Reinsch SVD, [7, p 54]) Using R is even more interesting when L R n, since from L T A + = R T L and L T A T A) = R R T L), 6) it follows that the computation of κ g, A, b) can be done by solving two successive n-by-n triangular systems which involve about n flops Special cases and GSVD In this Section, we analyze some special cases of practical relevance Moreover, we relate the formula given in Theorem for κ g, A, b) to the Generalized Singular Value Decomposition GSVD) [, p 57], [7, p 466], and [5, 9]) Using the GSVD of A and L T, there exist U A R m m, U L R k k orthogonal matrices and Z R n n invertible such that: with U T A A = DA 0 ) Z and U T L LT = D L 0 ) Z D A = diagα,, α n ), D L = diagβ,, β k ), α i + β i = i =,, k, α i =, i = k +,, n The diagonal matrix S can be decompose in the product of two diagonal matrices S = Σ D, with D ii = σ i r + x α + β Then, taking into account the following relations SV T L = L T V S = L T V Σ U T UD = L T A UD, L T A = U L DL 0 ) ZZ D A 0 ) U T A,

8 8 we can represent κ g, A, b) as κ g, A, b) = T HD where T R k k is a diagonal matrix with T ii = β i /α i, i =,, k and H R k n is H = I 0 ) U T A U Note that L T A = T We also point out that the diagonal entries of T are the nonzero generalized eigenvalues of λa T Az = LL T z There are two interesting special cases where the expression of κ g, A, b) is simpler irst, when r = 0, ie the LLSP problem is consistent, we have x D = α + β I and κ g, A, b) = T H x α + β Second, if we allow only perturbations on b and if we use the expression ) of the derivative of ga, b), we get L T A κ g, A, b) = = T β β see Remark 4 in Section ) Other relevant cases where the expression for κ g, A, b) has a special interest are L = I and L is a column vector In the special case where L = I, the formula given by Theorem becomes κ g, A, b) = SV T L σ n r = S = max S ii = σ + x n i α + β Since A = σ n, we obtain that κ g, A, b) = A A r + x α + β This corresponds to the result known from [8] and also to a generalization of the formula of the condition number in robenius norm given in [6, p 9] where only A was perturbed) inally, let us study the particular case where L is a column vector ie when g is a scalar derived function Corollary In the particular case when L is a vector L R n ), the absolute condition number of ga, b) = L T xa, b) is given by κ g, A, b) = L T A T A) r α + ) L T A x α + β )

9 9 Proof By replacing A T A) = V Σ V T and A = V Σ U T in the expression of K = L T A T A) r + L T A x + )) we get K = L T V Σ V T = L T V Σ = Σ V T L r α + L T V Σ U T x α + β ) r α + L T V Σ x α + β ) r α + Σ V T L x α + β ) By writing z,, z n ) T the vector V T L R n we obtain K = = = z i σ 4 i z i σ i S ii z i r α + = SV T L, z i σ i x α + β ) σ i r + x α + β ) and Theorem gives the result Sharp estimate of the partial condition number in robenius and spectral norms In many cases, obtaining a lower and/or an upper bound of κ g, A, b) is satisfactory when these bounds are tight enough and significantly cheaper to compute than the exact formula Moreover, many applications use condition numbers expressed in the spectral norm In the following theorem, we give sharp bounds for the partial condition numbers in the robenius and spectral norms Theorem The absolute condition numbers of ga, b) = L T xa, b) L R n k ) in the robenius and spectral norms can be respectively bounded as follows fa, b) κ g, A, b) fa, b) fa, b) κ g, A, b) fa, b) where fa, b) = L T A T A) r α + L T A x α + β ) ) Proof Part : We start by establishing the lower bounds Let w and w resp a and a ) be right resp the

10 0 left) singular vectors corresponding to the largest singular values of respectively L T A T A) and L T A We use a particular perturbation A, b) expressed as x T r A, b) = w T α r + ɛw, ɛ w α x β ), where ɛ = ± By replacing this value of A, b) in ) we get g A, b) A, b) = r α LT A T A) w + ɛ α x L T A T A) xw T r L T A r wt x α r ɛ x α LT A w ɛ β LT A w Since r ImA) we have A r = 0 Moreover we have w KerLT A ) and thus w ImA+T L) and can be written w = A+T Lδ for some δ R k Then w T r = δt L T A r = 0 It follows that g A, b) A, b) = r α LT A T A) w ɛ x α LT A w ɛ β LT A w rom L T A T A) w = L T A T A) a and L T A w = L T A a, we obtain g A, b) A, b) = L T A T A) r α a ɛ x α + β ) L T A a Since a and a are unit vectors, g A, b) A, b) can be be developed as g A, b) A, b) = L T A T A) r α + L T A x α + β ) ɛ L T A T A) r α x α + β ) L T A cosa, a ) By choosing ɛ = signcosa, a )) the third term of the above expression becomes positive urthermore we have x α + β ) x α + β Then we obtain ie g A, b) A, b) L T A T A) g A, b) A, b) fa, b) r α + L T A x α + β ) ) On the other hand, we have A = r w T α r + w x T α x r + ɛ trace w T ) T w x T ) ) α r α x and w β = β with r w T α r = w x T α x = α and trace r w T α r )T w x T ) ) = 0 α x

11 Then A, b) = and thus we have g A,b) A, b) A, b) fa,b) A,b) A, b) A, b) urthermore, from A, b) A, b) we get g A, b) the same particular value of A, b)) Then we obtain κ g, A, b) fa,b) and κ g, A, b) fa,b) for a particular value of fa,b) Part : Let us now establish the upper bound for κ g, A, b) and κ g, A, b) If A = AA A and A = I AA ) A, then it comes from ) that A, b) R m n R m for g A, b) A, b) L T A T A) A r + L T A A x + L T A b = Y X, where L T A T A) Y = r, α L T A x, α L T A β ) and X = α A, α A, β b ) T Hence, from the Cauchy-Schwarz inequality we get g A, b) A, b) Y X, ) with and X = α A + α A + β b α A + α A + β b Y = fa, b) Then, since A = A + A, we have X A, b) and ) yields g A, b) A, b) A, b) Y which implies that κ g, A, b) fa, b) An upper bound of κ g, A, b) can be computed in a similar manner: we get from ) that g A, b) A, b) L T A T A) r + L T A x ) A + L T A b = Y X, ) where Y L = T A T A) r + L T A x α, LT A β and X = α A, β b ) T Since X = A, b) we have κ g, A, b) Y Using then the inequality L T A T A) r + L T A ) x L T A T A) r + L T A ) x

12 we get Y Y and finally obtain κ g, A, b) fa, b) which concludes the proof Theorem shows that fa, b) can be considered as a very sharp estimate of the partial condition number expressed either in robenius or spectral norm Indeed, it lies within a factor of κ g, A, b) or κ g, A, b) Another observation is that we have 6 κ g, A, b) κ g, A, b) Thus even if the robenius and spectral norms of a given matrix can be very different for X R m n, we have X X n X ), the condition numbers expressed in both norms are of same order It results that a good estimate of κ g, A, b) is also a good estimate of κ g, A, b) Moreover 6) shows that if the R factor of A is available, fa, b) can be computed by solving two n-by-n triangular systems with k right-hand sides and thus the computational cost is kn Remark We can check on the following example that κ g, A, b) is not equal to fa, b) Let us consider We have and we get A = , L = 0 0 ) and b = / / x = /, / ) T and x = r =, κ g, A, b) = 45 4 < fa, b) = Remark 4 Using the definition of the condition number and of the product norms, tight estimates for the partial condition number for perturbations of A only resp b only) can be obtained by taking α > 0 and β = + resp β > 0 and α = + ) in Theorem In particular, when we perturb only b we have, with the notations of Section, L T A fa, b) = = T = κ g, A, b) β β Moreover, when r = 0 we have fa, b) = ) L T A x ) α + x β = T α + β Remark 5 In the special case where L = I, we have fa, b) = A T A) Since A T A) = A we obtain that r α + ) A x α + β ) fa, b) = A A r + x α + β

13 In that case κ g, A, b) is exactly equal to fa, b) due to [8] Regarding the condition number in spectral norm, since we have A, b) A, b) we get κ g, A, b) fa, b) This lower bound is similar to that obtained in [6] where only A is perturbed) As mentioned in [6], an upper bound of κ g, A) is κ u g,a) = A r + A x If we take α = and β = +, we notice that fa, b) κ u g,a) fa, b) showing thus that our upper bound and κ u g, A) are essentially the same Remark 6 Generalization to other product norms: Other product norms may have been used for the data space R m n R m If we consider a norm ν on R such that c νx, y) x + y c νx, y) then we can define a product norm A, b),ν = να A, β b ) or instance in [9], ν corresponds to Note that the product norm, ) used throughout this paper corresponds to ν = and that with the above notation we have A, b), = A, b) Then the following inequality holds c A, b),ν A, b) c A, b),ν If we denote κ g,,ν A, b) = max A, b) g A,b) A, b) A, b),ν we obtain κ g,,ν A, b) c κ g, A, b) κ g,,νa, b) c Using the bounds for κ g, given in Theorem we can obtain tight bounds for the partial condition number expressed using the product norm based on ν and when the perturbations on matrices are measured with the robenius norm: c fa, b) κ g,,ν A, b) c fa, b) Similarly, if the perturbations on matrices are measured with the spectral norm, we get c fa, b) κ g,,ν A, b) c fa, b) The bounds obtained for three possible product norms ν =, ν = and ν = ) are given in Table when using the robenius norm for matrices and in Table when using the spectral norm for matrices product norm ν, c, c lower bound upper bound factor of fa, b)) factor of fa, b)) max{α A, β b },, 6 α A + β b,,, α A + β b,, Table Bounds for partial condition number robenius norm on matrices) 4 Statistical estimation of the partial condition number In this section we compute a statistical estimate of the partial condition number We have seen in Section that using the robenius or the spectral norm for the matrices gives condition numbers that are of the same order of magnitude or sake of simplicity, we compute here a statistical estimate of κ g, A, b) Let z, z,, z q ) be an orthonormal basis for a subspace of dimension q q k) that has been randomly and uniformly selected from the space of all q-dimensional subspaces of R k this can be done by choosing q random vectors and then orthogonalizing) Let us denote g i A, b) = Lz i ) T xa, b)

14 4 product norm ν, c, c lower bound upper bound factor of fa, b)) factor of fa, b)) max{α A, β b },, 6 α A + β b,, α A + β b,, Table Bounds for partial condition number spectral norm on matrices) Since Lz i R n, the absolute condition number of g i can be computed via the exact formula given in Corollary ie κ gi, A, b) = Lzi ) T A T A) We define the random variable φq) by r α + Lzi ) T A )) x α + β 4) φq) = k q q κ gi, A, b) ) Let the operator E) denote the expected value The following proposition shows that the root mean squared of φq), defined by Rφq)) = Eφq) ) can be considered as an estimate for the condition number of ga, b) = L T xa, b) Proposition The absolute condition number can be bounded as follows: Rφq)) k κ g, A, b) Rφq)) 4) Proof Let vec be the operator that stacks the columns of a matrix into a long ) vector and M vecα A) be the k-by-mn + ) matrix such that vecg A, b) A, b)) = M Note that M vecβ b) depends on A, b, L and not on the z i Then we have: κ g, A, b) = g A, b) A, b) max vecg A, b) A, b)) = max ) A, b) A, b) A, b) vecα A) vecβ b) M z = max = M z R mn+),z 0 z = M T Let Z = [z, z,, z q ] be the k-by-q random matrix with orthonormal columns z i rom [0] it follows that k q M T Z is an unbiased estimator of the robenius norm of the mn + )-by-k matrix M T ie we have E k q M T Z ) = M T rom M T Z = Z T M z T M = zq T M

15 5 we get, since zi T M is a row vector, M T Z = q z T i M We notice that for all vector u R k, if we consider the function g u A, b) = u T ga, b), then we have u T M = g u A, b) = κ g u, A, b) and therefore z T i M = κ gi, A, b) Eventually we obtain M T = Ek q q κ gi, A, b) ) = Eφq) ) Moreover, considering that M T R mn+) k and using the well-known inequality M T k M T M T, we get the result 4 Then we will consider φq) A,b) L T x as an estimator of κ rel) g, A, b) The root mean squared of φq) is an upper bound of κ g A, b), and estimates κ g, A, b) within a factor k Proposition involves the computation of the the condition number of each g i A, b), i =,, q rom Remark, it follows that the computational cost of each κ gi, A, b) is n if the R factor of the QR decomposition of A is available) Hence, for a given sample of vectors z i, i =,, q, computing φq) requires about qn flops However, Proposition is mostly of theoretical interest, since it relies on the computation of the root mean squared of a random variable, without providing a practical method to obtain it In the next proposition, the use of the small sample estimate theory developed by Kenney and Laub [0] gives a first answer to this question by showing that the evaluation of φq) using only one sample of q vectors z, z,, z q in the unit sphere may provide an acceptable estimate Proposition Using conjecture [0, p 78], we have the following result: or any α > 0, ) φq) P r α k κ g, A, b) αφq) α q This probability approaches very fast as q increases or α = and q = the probability for φq) to estimate κ g, A, b) within a factor k is 999% Proof We define as in the proof of Proposition the matrix M as the matrix related to the vec operation representing the linear operator g A, b) rom [0, 4) p 78 and 9) p 78] we get P r M T α φq) α M T ) α q 4) We have seen in the proof of Proposition that κ g, A, b) = M T Then we have κ g, A, b) M T κ g, A, b) k It follows that, for the random variable φq), we have κg, A, b) P r φq) ακ g, A, b) ) M T k P r α α φq) α M T )

16 6 Then we obtain the result from κg, A, b) P r φq) ακ g, A, b) ) k α ) φq) = P r α k κ g, A, b) αφq) We see from this proposition that it may not be necessary to estimate the root mean squared of φq) using sophisticated algorithms Indeed only one sample of φq) obtained for q = provides an estimate of κ g, A, b) within a factor α k Remark 7 If k = then Z = and the problem is reduced to computing κ g A, b) In this case, φ) is exactly the partial condition number of L T xa, b) Remark 8 Concerning the computation of the statistical estimate in the presence of roundofferrors, the numerical reliability of the statistical estimate relies on an accurate computation of the κ gi, A, b) for a given z i Let A be a 7-by- Vandermonde matrix, b a random vector and L R n the right singular vector v n Using the Mathematica software that computes in exact arithmetic, we obtained κ rel) g, A, b) If the triangular factor R form A T A = R T R is obtained by the QR decomposition of A, we get κ rel) g, A, b) 5 08 If R is computed via a classical Cholesky factorization, we get κ g, A, b) rel) 0 0 Corollary and Remark show that the computation of κ g, A, b) rel) involves linear systems of the type A T Ax = d, which differs from the usual normal equation for least squares in their right-hand side Our observation that for this kind of ill-conditioned systems, a QR factorization is more accurate than a Cholesky factorization is in agreement with [5] 5 Numerical experiments All experiments were performed in Matlab 65 using a machine precision Examples or the examples of Section, we compute the partial condition number using the formula given in Theorem In the first example we have A = ɛ ɛ 0 ɛ 0 ɛ ɛ ɛ ɛ and we assume that only A is perturbed If we consider the values for L that are 0 0 and 0 0 L = 0, 0, ) T then we obtain partial condition numbers κ rel) g, A) that are respectively 04 and, as expected since there is 50% relative error on x and x and there is no error on x In the second example where A is the 0 by 4 Vandermonde matrix defined by A ij = 0+i) j and only b is perturbed, the partial condition numbers κ rel) g, b) with respect to each component x, x, x, x 4 are respectively 45 0, 0 4, 0 5, which is consistent with the error variation given in Section for each component 5 Average behaviour of the statistical estimate We compare here the statistical estimate described in the previous section with the partial condition number obtained via the exact formula given in Theorem We suppose that only A is perturbed and then the partial condition number can be expressed as κ rel) g, A) We use the method described in [6] in order to construct test problems [A, x, r, b] = P m, n, n r, l) with ) D A = Y Z 0 T R m n, Y = I yy T, Z = I zz T,

17 where y R m and z R n are random unit vectors ) and D = n l diagn l, n ) l,, ) 0 x =,,, n ) T is given and r = Y R c m is computed with c R m n random vector ) DZx of norm n r The right-hand side is b = Y By construction, the condition number of A c and D is n l In our experiments, we consider the matrices ) ) A E A = I and L =, E A 0 where A R m n, A R m n, L R n n, m + m = m, n + n = n, and E and E contain the same element e p which defines the coupling between A and A The matrices A and A are randomly generated using respectively P m, n, n r, l ) and P m, n, n r, l ) or each sample matrix, we compute in Matlab: the partial condition number κ rel) g, A) using the exact formula given in Theorem and based on the singular value decomposition of A, the statistical estimate φ) using three random orthogonal vectors and computing each κ gi, A, b), i =, with the R factor of the QR decomposition of A These data are then compared by computing the ratio γ = 7 φ) κ rel) g, A) Table 5 contains the mean γ and the standard deviation s of γ obtained on 000 random matrices l with m =, n = 0, m = 7, n = by varying the condition numbers n l and n of respectively A and A and the coupling coefficient e p The residual norms are set to n r = n r = In all cases, γ is close to and s is about 0 The statistical estimate φ) lies within a factor of κ rel) g, A) which is very accurate in condition number estimation We notice that in two cases, φ) is lower than This is possible because Proposition shows that Eφ) ) is an upper bound of κ g, A) but not necessarily φ) condition e p = 0 5 e p = e p = 0 5 l l γ s γ s γ s Table 5 Ratio between statistical and exact condition number of L T x 6 Estimates vs exact formula We assume that the R factor of the QR decomposition of A is known We gather in Table 6 the results obtained in this paper in terms of accuracy and flops counts for the estimation of the partial condition number for the LLSP Table 6 gives the estimates and flops counts in the particular situation where A = m = 500, n = 000, k = 50, 0 0 ) 0, L = 0 ),

18 8 κ g, A, b) flops accuracy exact formula n exact n m sharp estimate fa, b) kn fa,b) κ g, A, b) fa, b) k n stat estimate φq) qn φq) α κ k g, A, b) αφq) q k P r α q for α > 0 Table 6 Comparison between exact formula and estimates for κ g, A, b) A = A 0 0 I n 0 0 and b =,,, ) T, L = L 0 0 I k 0 0 We see here that the statistical estimates may provide information on the condition number using a very small amount of floating point operations compared with the two other methods κ rel) g, A, b) fa, b) A,b) L T x φq) A,b) L T x Gflops 00 Mflops 6 Mflops Table 6 lops and accuracy : exact formula vs estimates 7 Conclusion We have shown the relevance of the partial condition number for test cases from parameter estimation This partial condition number evaluates the sensitivity of L T x where x is the solution of an LLSP when A and/or b are perturbed It can be computed via a closed formula, a sharp estimate or a statistical estimate The choice will depend on the size of the LLSP and on the needed accuracy The closed formula requires On ) flops and is affordable for small problems only The sharp estimate and the statistical estimate will be preferred for larger problems especially if k n since their computational cost is in On ) REERENCES [] A Björck, Numerical Methods for Least Squares Problems, SIAM, 996 [] Y Cao and L Petzold, A subspace error estimate for linear systems, SIAM Matrix Analysis and Applications, 4 00), pp [] S Chandrasekaran and I C Ipsen, On the sensitivity of solution components in linear systems of equations, Numerical Linear Algebra with Applications, 995), pp 7 86 [4] L Eldén, Perturbation theory for the least squares problem with linear equality constraints, SIAM J Numer Anal, 7 980), pp 8 50 [5] V rayssé, S Gratton, and V Toumazou, Structured backward error and condition number for linear systems of the type A Ax = b, BIT, ), pp 74 8 [6] A J Geurts, A contribution to the theory of condition, Numerische Mathematik, 9 98), pp [7] G Golub and C van Loan, Matrix Computations, The Johns Hopkins University Press, 996 Third edition [8] S Gratton, On the condition number of linear least squares problems in a weighted robenius norm, BIT, 6 996), pp 5 50 [9] J Grcar, Adjoint formulas for condition numbers applied to linear and indefinite least squares, Tech Rep Technical Report LBNL-55, Lawrence Berkeley National Laboratory, 005 [0] T Gudmundsson, C S Kenney, and A J Laub, Small-sample statistical estimates for matrix norms, SIAM Matrix Analysis and Applications, 6 995), pp [] N Higham, Accuracy and Stability of Numerical Algorithms, SIAM, 00 Second edition

19 [] E D Kaplan, Understanding GPS : Principles and Applications, Artech House Publishers, Boston, 996 [] C S Kenney and A J Laub, Small-sample statistical condition estimates for general matrix functions, SIAM J Sci Comput, 5 994), pp 6 6 [4] C S Kenney, A J Laub, and M S Reese, Statistical condition estimation for linear least squares, SIAM Matrix Analysis and Applications, 9 998), pp [5] C C Paige and M A Saunders, Toward a generalized singular value decomposition, SIAM Journal on Numerical Analysis, 8 98), pp [6] C C Paige and M A Saunders, LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans Math Software, 8 98), pp 4 7 [7] C R Rao and S K Mitra, Generalized Inverse of Matrices and Its Applications, Wiley, New York, 97 [8] G W Stewart and J Sun, Matrix Perturbation Theory, Academic Press, New York, 99 [9] C Van Loan, Generalizing the singular value decomposition, SIAM Journal on Numerical Analysis, 976), pp 76 8 [0] Y Wei, H Diao, and S Qiao, Condition number for weighted linear least squares problem and its condition number, Technical Report CAS 04-0-SQ, Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada, 004 [] Y Wei, W Xu, S Qiao, and H Diao, Componentwise condition numbers for generalized matrix inversion and linear least squares, Technical Report CAS 0--SQ, Department of Computing and Software, McMaster University, Hamilton, Ontario, Canada, 00 9

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning