Metodi Numerici M p. 1/?? Numerical Methods Elena loli Piccolomini Civil Engeneering http://www.dm.unibo.it/ piccolom elena.loli@unibo.it
Metodi Numerici M p. 2/?? Least Squares Data Fitting Measurement errors are inevitable in observational and experimental sciences. Errors can be smoothed out by averaging over many cases, i.e., taking more measurements than are strictly necessary to determine parameters of system. Resulting system is overdetermined, so usually there is no exact solution. For linear problems, we obtain overdetermined linear system Ax = b, with m n matrix A, m > n System is better written Ax b since equality is usually not exactly satisfiable when m > n Least squares solution x minimizes squared Euclidean norm of residual vector r = b Ax min x r 2 2 min x b Ax 2 2
Metodi Numerici M p. 3/?? Data Fitting Given m data points (t i,y i ), i = 1,...,m find the n vector x of parameters that best fits the model function f(t i,x) in the least squares sense, i.e. min x m (y i f(t i,x)) 2 i=1 The problem is linear if the function f is linear in components of x: f(t,x) = x 1 φ 1 (t)+x 2 φ 2 (t)+ +x n φ n (t) where the function φ j are known and depend only on t. In this case the problem can be written in matrix form as Ax = b where A = [a i,j ], b = [y i ] a i,j = φ j (t i ), b i = y i, i = 1,...,m, j = 1,...,n
Metodi Numerici M p. 4/?? Polynomial Fitting Define the basis or shape functions φ j as: The model function f is: φ j (t) = t j 1, j = 1,1,...,n f(t,x) = x 1 +x 2 t+x 3 t 2 + +x n t n 1 Example: n = 3,m = 5, the model function f is a quadratic polynomial: f(t,x) = x 1 +x 2 t+x 3 t 2 A = 1 t 1 t 2 1 1 t 2 t 2 2 1 t 3 t 2 3 1 t 4 t 2 4, x = x 1 x 2 x 3, b = t 1 t 2 t 3 t 4 1 t 5 t 2 5 t 5 Matrix whose columns (or rows) are successive powers of independent variable is called Vandermonde matrix
Metodi Numerici M p. 5/?? Example.. For data: t 1.5.5 1 y 1.5.5 2 we obtain the overdetermined linear system Ax b where: A = 1 1 1 1.5.25 1 1.5.25 1 1 1, b = The computed solution is x = [.86,.4,1.4] t, so approximating polynomial is p(t) =.86+.4t+1.4t 2 1.5.5 2
Metodi Numerici M p. 6/??. 2 1.5 ls fit y y fit 1.5 1.5.5 1 The residual norm is given by r 2 = b Ax 2 =.341.
Metodi Numerici M p. 7/?? Existence and Uniqueness Linear least squares problem Ax b always has solution. Solution is unique if, and only if, columns of A are linearly independent, i.e., rank(a) = n, where A is m n and m n. If rank(a) < n, then A is rank-deficient, and solution of linear least squares problem is not unique. For now, we assume that A has full column rank n. To minimize squared Euclidean norm of residual vector r 2 2 =r t r = (b Ax) t (b Ax) = = b t b 2x t A t b+x t A t Ax Φ(x) derivating for x and setting the value to : x Φ(x) = 2A t Ax 2A t b = obtain the n n linear system of normal equations A t Ax = A t b (1)
Metodi Numerici M p. 8/?? Orthogonality Vectors v 1 and v 2 are orthogonal if their inner product is zero. i.e. v t 1 v 2 =. Space spanned by columns of m n matrix A, R(A) = {Ax : x R n }, is of dimension at most n If m > n, b generally does not lie in R(A), so there is no exact solution to Ax = b Vector y = Ax in R(A) closest to b in 2-norm occurs when residual r = b Ax is orthogonal to R(A), i.e. giving again the normal equations: A t r = A t (b Ax) = A t Ax = A t b
Metodi Numerici M p. 9/??. Geometric relationships among b, r, and R(A). R(A)
Metodi Numerici M p. 1/?? Orthogonal Projectors Matrix P is orthogonal projector if it is idempotent (P 2 = P) and symmetric (P T = P) Orthogonal projector onto orthogonal complement R(P) is given by P = I P For any vector v: v = (P+(I P))v = Pv+P v For least squares problem Ax b, if rank(a) = n, Ax b = P(b Ax)+P (b Ax) = P(b Ax) + P (b Ax) = Since PA = A and P A = O = Pb PAx + P b P Ax Ax b = Pb Ax + P b The second term does not depend on x thus the residual norm is minimized if x makes the first term zero.
Metodi Numerici M p. 11/??. The least squares solution x is given by the overdetermined but consistent linear system: Ax = Pb If rank(a) = n one way to obtain explicitly an orthogonal projector onto R(A) is: P = A(A T A) 1 A T And b is the sum of two mutually orthogonal vectors y R(A) and r R(A) : b = Pb+P b = Ax+b Ax = y+r An alterantive way to obtain an orthogonal projector onto R(A) is that of computing a matrix Q whose columns form an orthonormal basis of R(A). P = QQ t Ax = QQ t b Q t Ax = Q t b
Metodi Numerici M p. 12/?? Pseudoinverse and Condition Number Nonsquare m n matrix A has no inverse in usual sense. If rank(a) = n, pseudoinverse is defined by A + = (A T A) 1 A T and the condition number as: K(A) = A A + Just as condition number of square matrix measures closeness to singularity, condition number of rectangular matrix measures closeness to rank deficiency. Least squares solution of Ax b is given by x = A + b
Metodi Numerici M p. 13/?? Sensitivity and Conditioning Sensitivity of least squares solution to Ax b depends on b as well as A. large residual more sensitive small residual less sensitive Define angle θ between b and y = Ax by cos(θ) = y 2 b 2 = Ax b 2 measures the closedness of b to R(A). We expect greater sensitivity when this ratio is small (θ π/2). Bound on perturbation x in solution x due to perturbation b in b is given by: x 2 1 b K(A) x 2 cos(θ) b 2
Metodi Numerici M p. 14/??. proof. A t A(x+ x) = A t (b+ b) A t A x = A t b x = (A t A) 1 A t b = A + b x 2 A + 2 b 2 = A + 2 b 2 A 2 A 2 x 2 A + A 2 2 b 2 A + 2 A 2 b 2 x 2 A 2 x 2 Ax 2 Since Ax b 2 = cos(θ) then x 2 x 2 A + 2 A 2 b 2 cos(θ) b 2
Metodi Numerici M p. 15/??. Similarly, for perturbation E in matrix A, x 2 x 2 ( ) K(A) 2 E 2 tan(θ)+k(a) A 2 Condition number of least squares solution is about K(A) if residual is small, but can be squared or arbitrarily worse for large residual.
Metodi Numerici M p. 16/?? Normal Equations Method If m n matrix A has rank n, then symmetric n n matrix A t A is positive definite, so its Cholesky factorization A t A = LL t can be used to obtain solution x to system of normal equations A t Ax = A t b which has same solution as linear least squares problems Ax b.
Metodi Numerici M p. 17/?? Example For polynomial data-fitting example given previously, normal equations method gives A t A = 5. 2.5 2.5 2.5 2.125, A t b = 4. 1. 3.25 Cholesky factorization of symmetric positive definite matrix A t A gives: L = 2.2361 1.5811 1.118.9354 Solving lower triangular system Lz = A t b by forward-substitution gives z = (1.7889,.6325,1.3363) t Solving upper triangular system L t x = z by back-substitution gives x = (.857,.4, 1.4286)
Metodi Numerici M p. 18/?? Shortcomings of Normal Equations Information can be lost in forming A t A and A t b, For example A = 1 1 ǫ ǫ, A t A = if ǫ ǫ mach then in floating-point arithmetic A t A = [ 1 1 1 1 Sensitivity of solution is also worsened, since ] [ 1+ǫ 2 1 1 1+ǫ 2, is singular K(A t A) = K(A) 2 ]
Metodi Numerici M p. 19/?? Augmented System Method Definition of residual r + Ax = b together with orthogonality requirement A t r = give (m+n) (m+n) augmented system [ I A t A O ][ Augmented system is not positive definite, is larger than original system, and requires storing two copies of A Augmented system is sometimes useful, but is far from ideal in work and storage required. r x ] = [ b ]
Metodi Numerici M p. 2/?? Orthogonal Transformations We seek alternative method that avoids numerical difficulties of normal equations. We need numerically robust transformation that produces easier problem without changing solution What kind of transformation leaves least squares solution unchanged? Square matrix Q is orthogonal if Q t Q = I Multiplication of vector by orthogonal matrix preserves Euclidean norm Qv 2 2 = (Qv) t (Qv) = v t v = v 2 2 Thus, multiplying both sides of least squares problem by orthogonal matrix does not change its solution.
Metodi Numerici M p. 21/?? Triangular Least Squares Problems As with square linear systems, suitable target in simplifying least squares problems is triangular form. Upper triangular overdetermined (m > n) least squares problem has form [ ] [ ] R b 1 x O where R is n n upper triangular and b is partitioned similarly. Residual is r 2 2 = b 1 Rx 2 2 + b 2 2 2 We have no control over second term, b 2 2 2, but first term becomes zero if x satisfies n n triangular system: Rx = b 1 which can be solved by back-substitution. Resulting x is least squares solution, and minimum sum of squares is r 2 2 = b 2 2 2 So our strategy is to transform general least squares problem to triangular form using orthogonal transformation so that least squares solution is preserved. b 2
Metodi Numerici M p. 22/?? QR Factorization Given m n matrix A, with m > n, we seek m m orthogonal matrix Q such that: [ ] R A = Q O where R is n n and upper triangular. Linear least squares problem Ax b is then transformed into triangular least squares problem: [ ] [ ] Q t R c 1 Ax = x = Q t b O c 2 which has same solution, since r 2 2 = b Ax 2 2 = b Q [ R O ] x 2 2 = Q t b [ R O ] x 2 2
Metodi Numerici M p. 23/?? Orthogonal Bases If we partition m m orthogonal matrix Q = [Q 1 Q 2 ], where Q 1 is m n, then [ ] [ ] R R A = Q = [Q 1,Q 2 ] = Q 1 R O O is called reduced QR factorization of A. Columns of Q 1 are orthonormal basis for R(A), and columns of Q 2 are orthonormal basis for R(A) Q 1 Q 1 is orthogonal projector onto R(A). Solution to least squares problem Ax b is given by solution to square system Q t 1Ax = Rx = c 1 = Q t 1b To compute QR factorization of m n matrix A, with m > n, we annihilate subdiagonal entries of successive columns of A, eventually reaching upper triangular form. Similar to LU factorization by Gaussian elimination, but use orthogonal transformations instead of elementary elimination matrices
Metodi Numerici M p. 24/?? Householder Transformations Householder transformation has form H = I 2 vvt v t v for nonzero vector v H is orthogonal and symmetric:h = H t = H 1 Given vector a, we want to choose v so that α 1 Ha =. = α Substituting into formula for H, we can take. = αe 1 v = a αe 1 and α = ± a 2, with sign chosen to avoid cancellation.
Metodi Numerici M p. 25/??. αe 1 = Ha = ) (I 2 vvt v t a = a 2 vt a v v t v v hence v = (a αe 1 ) vt v 2v t a The scalar factor is irrelevant in determining v so we can take v = a αe 1. Example If a = [2,1,2] t then we take 2 1 2 α v = a αe 1 = 1 α = 1 2 2 where α = ± a 2 = ±3. Since a 1 is positive, we choose negative sign for α to avoid cancellation, so 2 3 5 v = 1 = 1 2 2
Metodi Numerici M p. 26/??.. To confirm that transformation works: Ha = a 2 vt a v t v v = 2 1 2 2 15 3 5 1 2 = 3 To compute QR factorization of a matrix A, use Householder transformations to annihilate subdiagonal entries of each successive columns Each Householder transformation is applied to entire matrix, but does not affect prior columns, so zeros are preserved In applying Householder transformation H to arbitrary vector u, Hu = ) ( ) (I 2 vvt v t u = u 2 vt u v v t v v which is much cheaper than general matrix-vector multiplication and requires only vector v, not full matrix H
Metodi Numerici M p. 27/?? Householder QR Factorization Applying Hauseholder transformations H j to the columns a j (j = 1,...,n) of matrix A, produces factorization H n,h n 1 H 1 A = [ R O ] where R is n n and upper triangular. Setting Q = H 1 H n we have: [ A = Q R O ] To preserve solution of linear least squares problem, right-hand side b is transformed by same sequence of Householder transformations: Ax b = Q ([ R O ] x Q t b )
Metodi Numerici M p. 28/??. Then solve triangular least squares problem: Q ([ R O ] x Q t b ) For solving linear least squares problem, product Q of Householder transformations need not be formed explicitly. R can be stored in upper triangle of array initially containing A Householder vectors v can be stored in (now zero) lower triangular portion of A (almost) Householder transformations most easily applied in this form anyway
Example For polynomial data-fitting example given previously, with A = 1 1 1 1.5.25 1 1.5.25 1 1 1, b = 1.5.5 2 Householder vector v 1 for annihilating subdiagonal entries of first column of A is: v 1 = 1 1 1 1 1 2.2361 = 3.2361 1 1 1 1 Metodi Numerici M p. 29/??
Metodi Numerici M p. 3/??. Applying resulting Householder transformation H 1 yields transformed matrix and right-hand side: H 1 A = 2.2361. 1.118.191.445.39.6545.89.445 1.39.3455, H 1 b = 1.7889.3618.8618.3618 1.1382 Householder vector v 2 for annihilating subdiagonal entries of second column of H 1 A is v 2 =.191.39.89 1.39 1.5811 = 1.7721.39.89 1.39
Metodi Numerici M p. 31/??.. Applying resulting Householder transformation H 2 yields transformed matrix and right-hand side: H 2 H 1 A = 2.2361 1.118 1.5811.7251.5892.467, H 2 H 1 b = 1.7889.6325 1.352.8157.438 Householder vector v 3 for annihilating subdiagonal entries of third column of H 2 H 1 A is v 3 =.7251.5892.467.9354 = 1.665.5892.467
Metodi Numerici M p. 32/??... Applying resulting Householder transformation H 3 yields transformed matrix and right-hand side: H 3 H 2 H 1 A = 2.2361 1.118 1.5811.9354, H 3 H 2 H 1 b = 1.7889.6325 1.3363.258.3371 Now solve upper triangular system Rx = c 1 by back-substitution to obtain x = (.857,.4,1.4286) t
Metodi Numerici M p. 33/?? Rank Deficiency If rank(a) < n, then QR factorization still exists, but yields singular upper triangular factor R, and multiple vectors x give minimum residual norm. Common practice selects minimum residual solution x having smallest norm. Can be computed by QR factorization with column pivoting or by Singular Value Decomposition (SVD) Rank of matrix is often not clear cut in practice, so relative tolerance is used to determine rank
Metodi Numerici M p. 34/?? Example: Near Rank Deficiency Consider the 3 2 matrix A =.641.242.321.121.962.363 Computing QR factorization, R = [ 1.1997.4527.2 ] R is extremely close to singular (exactly singular to 3-digit accuracy of problem statement) If Ris used to solve linear least squares problem, result is highly sensitive to perturbations in right-hand side For practical purposes, rank(a) = 1 rather than 2, because columns are nearly linearly dependent
Metodi Numerici M p. 35/?? Singular value Decomposition Diagonal factorization of a rectangular matrix using orthogonal transformations. Singular value decomposition (SVD) of m n matrix A has form: A = UΣV t where U is m m orthogonal matrix, V is n n orthogonal matrix, and U is m m diagonal matrix, with elements s i,j s i,j = { i j i = j σ i Diagonal entries σ i, called singular values of A, are usually ordered so that σ 1 σ 2 σ n Columns u i of U and v i of V are called left and right singular vectors. The algorithm for computing SVD is iterative and strictly related to the computation of eigenvalues.
Metodi Numerici M p. 36/?? Applications of SVD Minimum norm solution to Ax b is given by x = σ i u i b σ i v i Proof Let s assume: σ 1 σ 2 σ k, and σ k+1 = σ k+2 = = σ n = i.e. rank(a) = k < n. Ax b 2 2 = UΣV t x b 2 2 = ΣV t x U t b 2 2 = Σ 1 y 1 g 1 2 2 + g 2 2 2 where Σ = [ Σ 1 O O O ],y = V t x, g = U t b and Σ 1 is k k diagonal matrix, with elements σ 1,...,σ k, y 1 and g 1 are k elements vectors,g 2 has m k elements.
Metodi Numerici M p. 37/??. In order to minimize the residual norm Ax b 2 2 we find y 1 = (y 1,y 2,...,y k ) t s.t. Σ 1 y 1 g 1 2 2 i.e. y i = g 1(i) σ i = ut i b σ i, i = 1,...,k leaving the elements y k+1,y k+2,...,y n undetermined. Since x = Vy = y, we can find the minimum norm solution by imposing y i =,i = k +1,...,n hence x = Vy = y 1 v 1 + +y k v k = k y i v i = i=1 k i=1 u t i b σ i v i For ill-conditioned or rank deficient problems, small singular values can be omitted from summation to stabilize solution.
Metodi Numerici M p. 38/?? Properties of SVD Euclidean matrix norm: A 2 = UΣV t 2 = Σ 2 = σ 1 Euclidean condition number of matrix: K(A) = A 2 A + 2 = σ 1 σ k, k rank(a) n Rank of matrix : number of nonzero singular values. Pseudoinverse The pseudoinverse of general real m n matrix A is given by A + = VΣ + U t where Σ + is diagonal with elements σ + i = { 1/σ i σ i σ i = If A is square and nonsingular, then A + = A 1. In all cases, minimum-norm solution to Ax b is given by x = A + b
Metodi Numerici M p. 39/?? Comparison of Methods Forming normal equations matrix A t A requires about n 2 m/2 multiplications, and solving resulting symmetric linear system requires about n 3 /6 multiplications Solving least squares problem using Householder QR factorization requires about mn 2 n 3 /3 multiplications If m n, both methods require about same amount of work. If m n, Householder QR requires about twice as much work as normal equations. Cost of SVD is proportional to mn 2 +n 3, with proportionality constant ranging from 4 to 1, depending on algorithm used.
Metodi Numerici M p. 4/?? Matlab: Linear Least Squares Matlab book (Attaway) par. 2.6, chp 8 Matlab functions: house, qr, svd. 1. Computer Problem 3.5 pg. 153 Heath
Metodi Numerici M p. 41/?? SVD Singular value decomposition (SVD) of m n matrix A has form: A = UΣV t where U is m m orthogonal matrix, V is n n orthogonal matrix, and U is m m diagonal matrix, with elements s i,j s i,j = { i j i = j σ i
Metodi Numerici M p. 42/?? Algorithm Use matlab function SVD to compute SVD factorization [U,S,V]=svd(A); Compute Minimum norm solution to Ax b as x = σ i u t i b σ i v i x=zeros(size(v,1),1); for i=1:size(v,1) if COND SIGMA f= ut i b σ i x=x+f*v i end end Write the matlab function svd solve 1 to compute the minimum norm solution to Ax b where COND SIGMA is σ i >