Matrices and Moments: Least Squares Problems

Size: px

Start display at page:

Download "Matrices and Moments: Least Squares Problems"

Verity Heath
5 years ago
Views:

1 Matrices and Moments: Least Squares Problems Gene Golub, SungEun Jo, Zheng Su Comparative study (with David Gleich) Computer Science Department Stanford University

2 Matrices and Moments: Least Squares Problems Outlines Part I: Review of least squares problems and gauss quadrature rules Outline of Part I Least Squares Problems Ordinary/Data/Total least squares SVD solutions Secular equation approaches Inverse least squares Gauss Quadrature Rules Gauss quadrature theory Tri-diagonalization for orthonormal polynomials Inverse eigenvalue problem for Gauss-Radau rule Matrices and Moments: Least Squares Problems (2/52)

3 Matrices and Moments: Least Squares Problems Outlines Part II: Application to solving secular equations Outline of Part II Interpolating secular equations A common term of secular functions in TLS/DLS Variations of Newton s methods Seeking the smallest root Conjugate Gradient method with Quadrature (CGQ) Solving TLS and DLS by quadrature and CG method Alternative implementations Comparative study (with David Gleich) Large-scale TLS algorithms Comparison of large-scale TLS results Numerical example CG-based algorithms Conclusion Matrices and Moments: Least Squares Problems (3/52)

4 Least Squares Problems Gauss Quadrature Rules Part I Review Matrices and Moments: Least Squares Problems (4/52)

5 Least Squares Problems Gauss Quadrature Rules Ordinary/Data/Total least squares Approximation problem Approximation problems for a linear system: Ax b, A R m n, b R m 1, m > n. Notations: A and b x T Tr[ ] 2 A F = Tr[A T A] 2 or F A and b OLS/DLS/TLS given data solution to determine transpose the sum of diagonal entries of matrix two-norm of vector Frobenius norm of matrix Euclidean norm residual quantities ordinary/data/total least squares Matrices and Moments: Least Squares Problems (5/52)

6 Least Squares Problems Gauss Quadrature Rules Ordinary/Data/Total least squares Statements and geometric equivalences Table: Problem statements and geometric equivalent statements OLS TLS DLS Problem statement min b 2 s.t. Ax = b+ b x, b min x, A, b [ A, b] F s.t. (A+ A)x = b+ b min A F s.t. (A+ A)x = b x, A a Geometric equivalence min x Ax b 2 2 min x Ax b 2 2 x min x Ax b 2 2 x 2 2 a The TLS/DLS equivalent statements are derived by means of the Lagrange method [GV96, DD93, JK05]. Ordinary Least Squares (OLS): correcting with b Data Least Squares (DLS): correcting with A Total Least Squares (TLS): correcting with A and b TLS is also known as Errors-in-Variables modeling. Matrices and Moments: Least Squares Problems (6/52)

7 Least Squares Problems Gauss Quadrature Rules SVD solutions Singular value decomposition (SVD) approach Table: Singular value decomposition approach TLS: min x Ax b 2 x DLS: min x Ax b 2 x 2 σ 1 min ([A,b]) s.t. v TLS (n+1) 0 σ min (P b A) s.t. bt Av DLS 0 v TLS (n+1) is the last component of v TLS. P b = I 1 b T b bbt 2 1 x TLS = v TLS (n+1) v TLS(1 : n) x DLS = bt b b T v Av DLS DLS 3 [ A TLS, b TLS ] = [A,b]v TLS v T TLS A DLS = P b Av DLSv T DLS 4 [ A TLS, b TLS ] F = σ TLS A DLS F = σ DLS 1 Equivalent singular value problem and feasibility condition. v TLS and v DLS are the right singular vectors associated with the smallest singular values, respectively, of [A,b] and P b A. σ min the minimum singular value; 2 SVD solution; 3 Minimal residual in terms of singular vector; 4 Norm of minimal residuals. Matrices and Moments: Least Squares Problems (7/52)

8 Least Squares Problems Gauss Quadrature Rules Secular equation approaches Secular equation approach Table: Secular equation approach in the generic case 1 TLS: σtls 2 = min Ax b 2 x x A T (Ax b) = σ 2 TLS x b T (b Ax) = σ 2 TLS DLS: σdls 2 = min Ax b 2 x x 2 A T (Ax b) = σ 2 DLS x b T (b Ax) = 0 2 σ min (A) > σ TLS σ min (A) > σ DLS 3 b T b b T A(A T A σ 2 TLS I) 1 A T b = σ 2 TLS b T b b T A(A T A σ 2 DLS I) 1 A T b = 0 4 x TLS = (A T A σ 2 TLS I) 1 A T b x DLS = (A T A σ 2 DLS I) 1 A T b 5 [ A TLS, b TLS ] = r TLS [xt TLS, 1] x TLS 2 +1 A DLS = r DLS xt DLS x DLS 2 1 Normal equations (for stationary points); 2 Generic condition; 3 Secular equation; 4 De-regularized solution; 5 Residuals in terms of solution x and r = b Ax. Matrices and Moments: Least Squares Problems (8/52)

9 Least Squares Problems Gauss Quadrature Rules Secular equation approaches When is the secular equation approach more preferable than the SVD approach? The problem is sensitive: σ TLS/DLS σ min (A) Least squares solution or de-regularized form is needed: OLS provides a good initial guess of solution. The de-regularized form can be easily calculated by adjusting the amount of negative shift. The problem is large: The SVD of a large matrix is very expensive. Instead, we can approximate the secular equation in the large-scaled problem by Gauss quadrature rules. Matrices and Moments: Least Squares Problems (9/52)

10 Least Squares Problems Gauss Quadrature Rules Gauss quadrature theory Riemann-Stieltjes integral M = QΛQ T, Λ = diag(λ i ) 0 < λ 1 λ 2... λ n. M is symmetric positive definite. Q is orthonormal. u T f(m)u = α T f(λ)α = n f(λ i )αi 2 = I[ f] i=1 f(m) is an analytic function of M that is defined on (0, ). α = Q T u for an arbitrary vector u. Riemann-Stieltjes integral I[ f]: b 0 if λ < a = λ 1 I[ f] f(λ)dα(λ), α(λ) = i j=1 α 2 j if λ i λ < λ i+1 a n j=1 α2 j if b = λ n λ where the measure α(λ) is piecewise constant. Matrices and Moments: Least Squares Problems (10/52)

11 Least Squares Problems Gauss Quadrature Rules Gauss quadrature theory Bounds for Riemann-Stieltjes integral The Gauss quadrature theory is formulated in terms of finite summations: b a f(λ)dα(λ) = N j=1 w j f(t j )+ M k=1 v k f(z k )+R[ f] Unknown weights: [w j ] N j=1,[v k] M k=1 ; Unknown nodes: [t j] N j=1. Prescribed nodes: [z k ] M k=1 The remainder term R[ f] is given by R[ f] = f (2N+M) (ξ) b (2N + M)! a M k=1 (λ z k ) [ N 2 (λ t j )] dα(λ), a < ξ < b. j=1 Golub and Meurant [GM93] showed that the sign of the remainder term R[ f] can be adjusted by the prescribed nodes. Setting M = 1, we will use the Gauss-Radau formula to get the bounds of (a part of) secular function. Matrices and Moments: Least Squares Problems (11/52)

12 Least Squares Problems Gauss Quadrature Rules Gauss quadrature theory Orthonormal polynomials Define a sequence of polynomials p 0 (λ), p 1 (λ),... that are orthonormal with respect to α(λ): b { 1 if i = j p i (λ)p j (λ)dα(λ) = 0 otherwise, a and p k (λ) is of exact degree k. Moreover, the roots of p k (λ) are distinct, real and lie in the interval [a, b]. Matrices and Moments: Least Squares Problems (12/52)

13 Least Squares Problems Gauss Quadrature Rules Tri-diagonalization for orthonormal polynomials Three-term recurrence relationship If dα = 1, the set of orthonormal polynomials satisfies: where λ p(λ) = T N p(λ)+γ N p N (λ)e N, p(λ) = [p 0 (λ) p 1 (λ) p N 1 (λ)] T, e N = (0 0 1) T R N, ω 1 γ 1 γ 1 ω 2 γ 2 T N = γ N 2 ω N 1 γ N 1 γ N 1 ω N Matrices and Moments: Least Squares Problems (13/52)

14 Least Squares Problems Gauss Quadrature Rules Tri-diagonalization for orthonormal polynomials Lanczos algorithm for quadratures To obtain the tri-diagonal matrix and hence the Gauss-Radau rule, we will use the Lanczos algorithm with p 1 = u/ u 2 as a starting vector: Eigenvalue decomposition of T N : Function of matrix: M PT N P T T N = QΛQ T u T f(m)u u T f(pt N P T )u = u 2 2 et 1 Q f(λ)qt e 1, where e 1 = (1 0 0) T R N. Thus, the eigenvalues of T N give us the nodes and the squares of the first elements of the eigenvectors give the weights: N w j f(t j ) = u 2 N 2 (Q 1i ) 2 f(λ i ) j=1 i=1 Matrices and Moments: Least Squares Problems (14/52)

15 Least Squares Problems Gauss Quadrature Rules Inverse eigenvalue problem for Gauss-Radau rule Inverse eigenvalue problem To obtain the Gauss-Radau rule, we extend the matrix T N in such a way that it has one prescribed eigenvalue z 1. Lemma The extended tri-diagonal matrix ( ) TN γ ˆT N+1 N e N γ N e T N ŵ N+1 has z 1 as an eigenvalue, where ŵ N+1 = z 1 + δ N, and δ N is the last entry of δ such that (T N z 1 I)δ = γ 2 N e N. (1) Matrices and Moments: Least Squares Problems (15/52)

16 Least Squares Problems Gauss Quadrature Rules Inverse eigenvalue problem for Gauss-Radau rule Proof of the extended tri-diagonal matrix lemma Proof. We can verify that z 1 is an eigenvalue of ˆT N+1 by investigating the following relation to get (1): ˆT N+1 d = z 1 d, where d is a corresponding eigenvector. Now, ˆT N+1 gives the weights and nodes of the Gauss-Radau rule such that N w j f(t j )+v 1 f(z 1 ) = u 2 e T 1 f( ˆT N+1 )e 1. j=1 The remainder is [ R[ f] = u 2 f (2N+1) (ξ) b N 2 (λ z 1 ) (λ t j )] dα(λ). (2N + 1)! a j=1 Matrices and Moments: Least Squares Problems (16/52)

17 Part II Application to solving secular equations Matrices and Moments: Least Squares Problems (17/52)

18 Common function of matrix Secular functions Recall the secular equations: T LS : ψ TLS (λ) = b T b b T A(A T A λ I) 1 A T b λ = 0, DLS : ψ DLS (λ) = b T b b T A(A T A λ I) 1 A T b = 0. ψ TLS/DLS(λ) is referred to as secular function. λ: an estimate of the minimum squared TLS/DLS distance. generic condition: λ < σ 2 min (A) for (AT A λ I) 1 Thus, in the domain of 0 λ < σmin 2 (A), we need to evaluate a matrix function of λ which is common in ψ TLS and ψ DLS : φ(λ) = b T A(A T A λ I) 1 A T b Matrices and Moments: Least Squares Problems (19/52)

19 Common function of matrix Bounds of a common function Now we evaluate the bounds of the scalar quantity φ: φ = g T f 1 x (M)g, M = A T A λ I, g = A T b, where f 1 (x) = 1 x x. Then, the quadrature rule ˆφ N+1 (z 1 ) = g 2 e T 1 f 1 ( ˆT N+1 )e 1 is described in terms of the remainder: x [ b N 2 ˆφ N+1 (z 1 ) = I[ f 1 ]+ g 2 (ξ) (2N+2) (λ z 1 ) (λ t j )] dα(λ). x a j=1 We note that f (2N+1) 1 x (ξ) (2N+1)! = (ξ) (2N+2) < 0, λ < a < ξ < b. Thus, we have the bounds: ˆφ N+1 (ζ b ) < I[ f 1 x ] < ˆφ N+1 (ζ a ), ζ a < a < b < ζ b. Matrices and Moments: Least Squares Problems (20/52)

20 Common function of matrix Comments on bounds f 1(x) is well defined on the proper interval (a, b) such that x the sign of the derivative function f (2N+1) 1 (ξ) is not changed x with the interval ξ (a, b). Since A 1 A > b, we may use ζ b = A 1 A. However, the lower bound of a (the smallest eigenvalue of M) is not easily obtainable. ζ a is determined very roughly. This explains why the upper bound of I[ f 1 x ] is usually poorer than the lower bound. Matrices and Moments: Least Squares Problems (21/52)

21 Common function of matrix Lanczos process with a shift for efficiency The tri-diagonalization is independent of the shift. Tri-diag([g, M]) Tri-diag([g, M+λ I]) (M+λ I)Q N = Q N (T N + λ I) MQ N = Q N T N Then we re-define the extended matrix ˆT N+1 as ( ) TN λ I J N+1 N γ N e N γ N e T, N w where T N and γ N are calculated by Tri-Diag of [g, A T A], (not [g, M]), and w is determined so that J N+1 has a prescribed eigenvalue z 1. Thus, w = z 1 + d N, where d N is the last entry of d such that (T N (z 1 + λ)i N )d = γ 2 N e N. Matrices and Moments: Least Squares Problems (22/52)

22 Common function of matrix Finally, we have the quadrature for bounds: ˆφ N+1 (z 1 ) = g 2 e T 1 f 1 x (J N+1 )e 1 = g 2 e T 1 (J N+1 ) 1 e 1. Once we solve a tri-diagonal J N+1 y = e 1 for y, we have ˆφ N+1 (z 1 ) = g 2 e T 1 y = g 2 y 1. For later interpolation, we need to evaluate the derivatives of the matrix function φ(λ) w.r.t. λ by approximating with f 1 (x) = x 2 and f x 2 1 (x) = x 3 : x 3 ˆφ = g 2 e T 1 f 1 (J N+1)e 1 = g 2 e T x 2 1 (J N+1 ) 2 e 1, ˆφ = 2 g 2 e T 1 f 1 (J N+1)e 1 = 2 g 2 e T x 3 1 (J N+1 ) 3 e 1. By solving J N+1 h = y., we have ˆφ = g 2 y 2, ˆφ = 2 g 2 y T (J N+1 ) 1 y = 2 g 2 y T h. A symmetric, tri-diagonal, and positive definite system requires O(N) flops [GV96] to be solved Ṁatrices and Moments: Least Squares Problems (23/52)

23 Common function of matrix Lemma (Monotonicity of bound sequences) Along with Lanczos processes, a sequence of bound estimates of φ = g T f 1 x (M)g with full-rank symmetric M R n n and f 1(x) = 1 x x is generated by Gauss quadrature rules. Then the estimated sequence ˆφ N+1 is necessarily monotonic. In other words, given each prescribed node ζ a or ζ b such that ζ a < σ min (M) < σ max (M) < ζ b, the lower and upper bound sequences for I[ f 1 x ] satisfy < ˆφ N (ζ b ) < ˆφ N+1 (ζ b ) < < I[ f 1 x ] < < ˆφ N+1 (ζ a ) < ˆφ N (ζ a ) <. Note that the complete Lanczos processes yield the exact evaluation: φ = g 2 e T 1 f 1 x (T n λ I n )e 1. Matrices and Moments: Least Squares Problems (24/52)

24 Newton methods Interpolating the root of secular equations We approximate the common function φ(λ) by using Lanczos processes combined with Gauss quadrature rule, where we proceed the processes until the upper- and lower- bounds of φ(λ) match within a tolerance. Suppose λ k is the current estimate of the minimum distance. In order to interpolate the root of the secular equation ψ(λ k ), we need to evaluate the followings: ψ TLS (λ k ) = b 2 λ k φ(λ k ), ψ DLS (λ k ) = b 2 φ(λ k ). ψ TLS (λ k) = 1 φ (λ k ), ψ DLS (λ k) = φ (λ k ). ψ TLS(λ k ) = φ (λ k ), ψ DLS(λ k ) = φ (λ k ). Then, consider one-point interpolating methods to obtain λ k+1 such that ψ(λ k+1 ) = 0. Note that roots of secular equation consist of the stationary points of the geometrically equivalent cost function. We want to find the smallest root λ k+1 [0, σmin 2 (A)) from the definition of TLS/DLS problem. However, we can not achieve it without using additional information on the locations of poles such as σ 2 min (A) min j i a i j 2 and σ 2 max(a) A 1 A. In the following sections, we will discuss how to use bisection and the upper-bound of the smallest pole. Matrices and Moments: Least Squares Problems (25/52)

25 Newton methods Variations of Newton s method take the form: λ k+1 = λ k ψ(λ k) ψ (λ k ) C k, where C k denotes a convergence factor [Gan78, Gan85] according to methods such as the Newton s method, the Halley s variation, and simple rational approximation in the following Table. Table: Variations of Newton s method Newton s SRA a Halley s Interpolating function h(λ) ψ(λ) h(λ) = c 0 + c 1 λ h(λ) = b 2 c 1 c 2 λ h(λ) = c 0 c 1 ( c 2 λ ) 1 ψ(λ k )ψ (λ k ) 2(ψ (λ k )) 2 b Convergence factor C k 1 2 ψ(λ k ) b 2 1/ Rate of (local) convergence Quadratic Quadratic Cubic Convergence region b Narrow Wide Wider Algebraic interpretation c b g(λ) = ψ(λ) g(λ) = 1 2 b 2 g(λ) = ψ(λ) ψ(λ) ψ (λ) a Simple Rational Approximation b Global convergence in root-finding of secular equations c Equivalently, solve g(λ) = 0 by Newton s method. Matrices and Moments: Least Squares Problems (26/52)

26 Seeking the smallest root Mixing with bisection Whenever we detect that the root estimate is larger than σmin 2 (A), we bisect the estimate to assure that it is less than σmin 2 (A). The monotonicity of the sequence of estimate bounds of Gauss quadratures (GQ) is utilized based on the following scenario: 1. With the initial guess of root, we obtain the sequence of bounds of secular function by means of GQ. 2. If the sequences are not monotonic, we conclude that the root estimate is larger than the squared smallest singular value of A. Then the root estimate is cut by half, and go to Step 1 with the modified estimate of root. 3. Otherwise, we interpolate the root of secular equation by using the estimate of the secular function and its derivatives. 4. If the new root estimate is close to the previous one within a tolerance, then we calculate the de-regularized solution, and stop the algorithm. Otherwise, go to step 1. Although the violation of monotonicity is only a necessary condition, our numerical simulation works well. Matrices and Moments: Least Squares Problems (27/52)

27 Seeking the smallest root Stabilizing with the estimated smallest pole Although the bisection scheme almost always achieves the smallest root, it may suffer from a bi-stability problem which means the estimates are alternating between two values. To get around this, we employ the estimation of the smallest pole by modifying the previous scenario. If we detect the current estimate of root is larger than σmin 2 (A), we cut the estimate by half and set the upper-bound of the smallest pole to the current estimate of root as well. ˆσ 2 min (A) = λ k, λ k+1 = 1 2 λ k. Then, when we interpolate the next estimate of root, we take a harmonic sum between the Newton-based step δ k = ψ(λ k) ψ (λ k ) C k and the distance from the upper-bound of the smallest pole estimation. 1 λ k+1 = λ k + = λ k δ k + δk 2 /(δ k + λ k ˆσ min 2 (A)) δ k ˆσ min 2 (A) λ k Matrices and Moments: Least Squares Problems (28/52)

28 Solving TLS and DLS by quadrature and CG method CGQ as a secular equation approach 1. Find the smallest root of secular equation for TLS or DLS. 1.1 Evaluate the bounds of secular function by Gauss-Radau quadrature rule. 1.2 Interpolate the zero of the function by a variation of Newton method. 1.3 Determine a proper interval for the smallest zero by bisection and harmonic-summation with the upper-bound of the smallest pole. 2. Solve a de-regularized system with a shift of the smallest root. Solve the symmetric, positive-definite system by the conjugate gradient (CG) method, Or, solve the tri-diagonal system with shift. Matrices and Moments: Least Squares Problems (29/52)

29 Alternative implementations Alternative implementations Reuse of Lanczos vectors with sufficient memory Regeneration of Lanczos vectors with knowledge of tri-diagonal entries Avoiding of explicit multiplication of A T A Shifting into Lanczos bi-diagonalization Using backward perturbations Matrices and Moments: Least Squares Problems (30/52)

30 Backward perturbations for linear least squares I min x b Ax 2, A : m n, b : m 1. ξ : arbitrary vector, calculated. ˆx = ξ + e; ˆx = A + b. ξ = (A+δA) + b. ρ = b Aξ. µ(ξ) = min δa δa F. µ(ξ) = min{ ρ 2 ξ 2,σ min (A,B)} (Karlson, Waldén & Sun) B = ρ 2 ξ 2 (I ρ ρt ρ 2 ).

31 Backward perturbations for linear least squares II µ 2 (ξ) = ρ T A(αA T A+β I) 1 A T ρ α = ξ 2 2, β = ρ 2 2 µ(ξ) µ(ξ) lim ξ ˆx µ(ξ) µ( ˆx) = 1 (Grcar)

32 Large-scale TLS algorithms Björck s algorithm Solve the system of nonlinear equations [ A T A A T ][ ] [ ] b x x b T A b T = λ, b 1 1 or equivalently, the system [ f(x,λ) g(x,λ) ] = [ A T ] r λx b T = r+λ [ ] 0 0 with r = b Ax using a Rayleigh-quotient iteration (RQI). This algorithm will always converge to a singular value/vector pair, but we might not get λ = σn+1 2. Björck suggested one initial inverse iteration (i.e. λ = 0) to move closer to the desired λ, and then apply the RQI procedure. Matrices and Moments: Least Squares Problems (34/52)

33 Large-scale TLS algorithms Björck s algorithm details After some manipulations, Björck s algorithm greatly simplifies. The following presentation emphasizes the computationally intensive steps. 1. Solve the least squares system in A,b for x LS. 2. Perform one inverse iteration (solve A T Ax (0) = x LS ) to get the initial x (0). 3. While not converged... solve two systems in (A T A λi)x = b to iterate to x (k+1), but if we detect A T A λi is negative definite, decrease λ and repeat the iteration. Björck suggests using the PCGTLS algorithm to solve each linear system with the Cholesky factor R or A T A as a preconditioner. Inside the CG procedure, we detect A T A λi is negative definite and use the CG vectors to compute a new value of λ. Matrices and Moments: Least Squares Problems (35/52)

34 Large-scale TLS algorithms Notes on our implementation of Björck s algorithm Instead of using PCGTLS with the Cholesky factor of A T A, we use a matrix-free approach for large scale m > 800 and apply the unpreconditioned CGTLS algorithm. The tolerance used in the CG method is To compute the initial least squares solution, we use the LSQR algorithm. We detect convergence when λ changes by less than or the normalized residual increases (theory states the normalized residual always decreases). After we detect convergence, we run one more iteration of the algorithm to ensure that we compute an x for the λ. Matrices and Moments: Least Squares Problems (36/52)

35 Large-scale TLS algorithms Details of the matrix moments based algorithm Algorithm 2 uses the Golub-Kahan bidiagonalization of A and applies the moment algorithm to T = B T B instead of computing T directly from the Lanczos process on A T A. Algorithm 1 restarts the Lanczos process at each iteration. Algorithm 2 never restarts the bidiagonalization process and simply continues the process at each iteration. Matrices and Moments: Least Squares Problems (37/52)

36 Comparison of large-scale TLS results Problems Jo s problems, 15 8 and Björck s problem 1: matrix Large scale problems with and matrices. The large scale problems were generated using random Householder matrices to build the SVD of [ A b ] in product form. Each large-scale matrix was available solely as an operator to all of the algorithms. The singular values of [ ] A b are σ i = log(i)+ N(0,1), where N(0, 1) is a standard normal random variable. Matrices and Moments: Least Squares Problems (38/52)

37 Comparison of large-scale TLS results Parameter choices Algorithm 1 Algorithm 2 λ (0) = 0 λ (0) = 1 λ (0) = ρ λ (0) = 0 λ (0) = 1 λ (0) = ρ newton sra halley newton *8 *8 *7 sra *12 *24 *7 halley *14 *23 *6 newton *20 *7 *10 sra *20 25 *64 halley *55 55 *12 newton *15 *11 *11 sra ++ *15 *25 *14 halley *20 *57 *11 newton sra halley * wrong root; ++ correct w/o convergence; no convergence Matrices and Moments: Least Squares Problems (39/52)

38 Comparison of large-scale TLS results Convergence TEST ALG ITERS ERROR TIME LANZ. jo björck (15,8) Alg σ 2 = Alg jo björck (750,400) Alg 1 > σ 2 = Alg large-scale björck (10000,5000) Alg 1 > σ 2 = Alg large-scale björck (100000,60000) Alg 1 > σ 2 = Alg björck björck (30,15) Alg 1 >100 σ σ 2 = Alg Matrices and Moments: Least Squares Problems (40/52)

39 Numerical data generation The numerical data are generated as follows [GvM91]. Let the error-free data matrix A s = U s Σ s V T s, where U s = I m 2 us ut s u T and V s = I n 2 vs vt s s us v T are Householder matrices and s vs Σ s = diag(σ k ) is a m-by-n diagonal matrix with [σ 1,,σ n ] = [ n,,1]. The vectors u s and v s consist of pseudo-random numbers generated by a function randn providing uniformly distributed random numbers in Matlab TM. An error-free observation vector b s is computed by b s = A s x s. The error-prone data A and b are generated with a given deviation σ n as follows: randn( state,108881);% seed for random number u s = randn(m, 1); v s = randn(n, 1); A s = U s Σ s V T s ; x s = (1./(1 : n)) ;% [1, 1/2,, 1/n] T b s = A s x s ; σ n = 0.3; b = b s + σ n randn(m, 1); A = A s + σ n randn(m, n); Matrices and Moments: Least Squares Problems (42/52)

40 CG-based algorithms CG-based algorithms We can categorize algorithms in the following example as below. Table: CG-based algorithms for solving secular equation. CGQ-BS-KP CGQ-BS CG-BS-KP CG-BS Gauss Quadrature rules Used Used Not Not Bisection Used Used Used Used Upper-bound of the smallest pole Used Not Used Not The previous CG-based approaches in [GJK06] or [BHM00] employ matrix-vector multiplications to evaluate the secular function and derivative from the intermediate solution x(λ) such that (A T A λ I)x(λ) = A T b. Then the secular functions and derivatives are obtained as follows. ψ TLS (λ) = b 2 b T Ax(λ) λ ψ TLS (λ) = x(λ) 2 1 ψ DLS (λ) = b 2 b T Ax(λ) ψ DLS (λ) = x(λ) 2 Matrices and Moments: Least Squares Problems (43/52)

41 Small problem Secular function plots with A R TLS secular function, ψ TLS (λ) 20 DLS secular function, ψ DLS (λ) Squared distances, λ (a) TLS secular function plot Squared distances, λ (b) DLS secular function plot Matrices and Moments: Least Squares Problems (44/52)

42 Small problem Convergent curves in TLS with A R 15 8 Squared distance TLSCGQ Newton KP (6), x = TLSCGQ SRA KP (6), x = TLSCGQ Halley KP (6), x = Iterations (a) Variations of CGQ with the bisection and the smallest pole estimation. Squared distance Squared distance TLSCGQ Newton (6), x = TLSCGQ SRA (5), x = TLSCGQ Halley (5), x = Iterations (b) Variations of CGQ with the bisection only. Squared distance TLSCG Newton KP (5), x = TLSCG HS KP (5), x = Iterations (c) CG with two different interpolations, the bisection and the smallest pole estimation. 0.1 TLSCG RQI (4), x = TLSCG HSI (5), x = Iterations (d) CG with two different inverse iterations, and the bisection only. Matrices and Moments: Least Squares Problems (45/52)

43 Small problem Convergent curves in DLS with A R 15 8 Squared distance DLSCGQ Newton KP (6), x = DLSCGQ SRA KP (6), x = DLSCGQ Halley KP (6), x = Iterations (a) Variations of CGQ with the bisection and the smallest pole estimation. 1 Squared distance DLSCGQ Newton (7), x = DLSCGQ SRA (6), x = DLSCGQ Halley (5), x = Iterations (b) Variations of CGQ with the bisection only Squared distance Squared distance DLSCG Newton KP (5), x = DLSCG HS KP (5), x = Iterations (c) CG with two different interpolations, the bisection and the smallest pole estimation. 0.2 DLSCG RQI (4), x = DLSCG HSI (5), x = Iterations (d) CG with two different inverse iterations, and the bisection only. Matrices and Moments: Least Squares Problems (46/52)

44 Large problem Convergent curves in TLS with A R Squared distance Squared distance TLSCGQ Newton KP (15), x = TLSCGQ SRA KP (9), x = TLSCGQ Halley KP (6), x = Iterations (a) Variations of CGQ with the bisection and the smallest pole estimation. 20 TLSCGQ Newton (20), x = TLSCGQ SRA (20), x = TLSCGQ Halley (20), x = Iterations (b) Variations of CGQ with the bisection only Squared distance 10 5 TLSCG Newton KP (12), x = TLSCG HS KP (12), x = Iterations (c) CG with two different interpolations, the bisection and the smallest pole estimation. Squared distance TLSCG RQI (20), x = TLSCG HSI (20), x = Iterations (d) CG with two different inverse iterations, and the bisection only. Matrices and Moments: Least Squares Problems (47/52)

45 Large problem Convergent curves in DLS with A R Squared distance Squared distance DLSCGQ Newton KP (20), x = DLSCGQ SRA KP (20), x = DLSCGQ Halley KP (13), x = Iterations (a) Variations of CGQ with the bisection and the smallest pole estimation DLSCGQ Newton (20), x = DLSCGQ SRA (20), x = DLSCGQ Halley (20), x = Iterations (b) Variations of CGQ with the bisection only Squared distance Squared distance DLSCG Newton KP (9), x = DLSCG HS KP (9), x = Iterations (c) CG with two different interpolations and the smallest pole estimation. 5 DLSCG RQI (20), x = DLSCG HSI (20), x = Iterations (d) CG with two different inverse iterations, and the bisection only. Matrices and Moments: Least Squares Problems (48/52)

46 Conclusion Conclusion The presented Conjugate Gradient (CG) with Quadrature (CGQ) approximates secular functions by means of Gauss Quadrature (GQ) rules. The previous CG based approaches are exhaustively calculating intermediate solutions to evaluate secular functions. Interpolating the smallest root by variations of Newton method with stabilized modification: Bisection to assure of the convergence to the smallest root. Approximating with rational function of the estimated smallest pole to get around bi-stability problem. CGQ does not pursue the intermediate solution until the root estimate converges. The overall computational complexity can be reduced by adjusting accuracy of GQ. Matrices and Moments: Least Squares Problems (50/52)

47 References Åke Björck, P. Heggernes, and P. Matstoms. Methods for large scale total least squares problems. SIAM J. Matrix Anal. Appl., 22(3): , R. D. DeGroat and E. M. Dowling. The data least squares problem and channel equalization. 41(1): , January Walter Gander. On the linear least squares problem with a quadratic constraint. Technical Report STAN-CS , Stanford University, W. Gander. On halley s iteration method. The American mathematical monthly, 92(2): , Gene H. Golub, SungEun Jo, and Sang Woo Kim. Solving secular equations for total/data least squares problems. SIAM J. Matrix Anal. Appl., submitted for publication, G. H. Golub and G. Meurant. Matrices, moments and quadrature. Report SCCM-93-07, Computer Science Department, Stanford University, Matrices and Moments: Least Squares Problems (51/52)

48 References Gene H. Golub and Charles. F. Van Loan. Matrix Computations. Johns Hopkins Univ. Pr., Baltimore, MD, third edition, November G. H. Golub and U. von Matt. Quadratically constrained least squares and quadratic problems. Numerische Mathematik, 59(1): , December SungEun Jo and Sang Woo Kim. Consistent normalized least mean square filtering with noisy data matrix. (6): , June Matrices and Moments: Least Squares Problems (52/52)

Total least squares. Gérard MEURANT. October, 2008

Total least squares. Gérard MEURANT. October, 2008 Total least squares Gérard MEURANT October, 2008 1 Introduction to total least squares 2 Approximation of the TLS secular equation 3 Numerical experiments Introduction to total least squares In least squares