Linear Least squares

Method of least squares Measurement errors are inevitable in observational and experimental sciences Errors can be smoothed out by averaging over more measurements than necessary to determine parameters of system Resulting system is overdetermined, so usually there is no exact solution In effect, higher dimensional data are projected into lower dimensional space to suppress irrelevant detail Such projection is most conveniently accomplished by method of least squares

Linear least squares For overdetermined linear system Ax = b m>n Better represented as Axe=b Least square solution x obtained by minimising residual vector r = b Ax min x r 2 2 =min x b Ax 2 2

Data Fitting Given m data points (ti,yi) find n-vector find Problem can be written in matrix form as

Existence/Uniqueness Always has a solution Solution is unique iff. No unique solution if rank(a) <n

Normal equations Minimising euclidean norm of residual vector min x r 2 2 = r T r =(b Ax) T (b Ax) = b T b 2x T A T b + x T A T Ax take derivative wrt x 2A T b +2A T Ax =0 A T Ax = A T b

Orthogonality A T r = A T (b Ax) A T Ax = A T b

Solving Normal equations If A has rank n then A T A is positive definite Using Cholesky factorisation A T A = LL T Obtain solution to A T Ax = A T b

Solving Normal equations If A has rank n then A T A is positive definite Using Cholesky factorisation A T A = LL T Obtain solution to A T Ax = A T b Normal equations method involves rectangular square triangular

Shortcomings Information can be lost in forming A T A and A T b A = 2 1 3 1 4 05 A T A = 0 apple 1+ 2 1 1 1+ 2 = apple 1 1 1 1 < p mach

Shortcomings

Shortcomings Similarly, for perturbation E in matrix A x 2 x 2 < ([cond(a)] 2 tan( )+cond(a)) E 2 A 2 Sensitivity of least squares solution is worsened as cond(a T A)=[cond(A)] 2

Augmented System Method (m + n) (m + n) augmented system apple I A A T 0 apple r x = apple b 0 Greater freedom in choosing pivots in computing LDL or LU apple I A apple r/ A T 0 x = apple b 0 = max i,j a ij /1000

Augmented System Method (m + n) (m + n) augmented system apple I A A T 0 apple r x = apple b 0 System not positive definite Larger than original System Requires storing of two copies of A

What kind of transformation leaves least square solution unchanged?

Orthogonal Transformation Q T Q = I Norm preserving Qv 2 2 =(Qv)T Qv = v T Q T Qv = v 2 2 Multiplying both sides of least squares problem by Q does not change its solution

Triangular Least Squares Upper triangular overdetermined least square problem apple R 0 x = apple b1 b 2 Residual r 2 2 = b 1 Rx 2 2 + b 2 2 2 Enough to solve Rx = b 1

QR factorisation We seek orthogonal matrix Q such that A = Q apple R 0 Least squares problem transformed into triangular least squares problem Q T Ax = apple R 0 x e= apple c1 c 2 = Q T b

Orthogonal Bases Partitioning Q, as A = Q apple R 0 Q = Q 1 Q 2 we get = Q 1 Q 2 apple R 0 = Q 1R Reduced QR of A Solution to least squares problem given by, Q T 1 Ax = Rx = c 1 = Q T 1 b

Computing QR factorization Annihilate sub-diagonal entries of successive columns of A, eventually reaching upper triangular form. Similar to by Gaussian elimination, but use orthogonal transformations. Householder transformations Givens rotations Gram-Schmidt orthogonalization

Householder Transformation Householder transformation has form : H = I 2 vvt v T v H is orthogonal and symmetric H = H T = H 1

Householder Transformation Given vector a we want to choose v so that Ha = 2 6 6 6 6 6 6 4 0... 0 3 7 7 7 7 7 7 5 = 2 6 6 6 6 6 6 4 1 0... 0 3 7 7 7 7 7 7 5 = e 1 Substituting into formula for H, v = a e 1 2 = a 2

Householder QR Annihilate sub diagonal entries of each successive column using Householder transformations Applied to entire matrix, but does not affect prior columns Applying H to arbitrary vector u, Hu = I 2 vvt v T v u = u 2 vt u v T v v

Householder QR Factorisation process H n...h 1 A = apple R 0 if Q = H 1...H n then A = Q apple R 0 Then solve triangular least squares problem apple R 0 xe=qt b

Householder QR Product Q need not be formed explicitly R can be stored in upper triangle of A vectors stored in (now zero) lower triangular portion of A

Givens Rotations Introduce zeros one at a time, chose c and s so that apple c s s c apple a1 a 2 = c 2 + s 2 =1or = q apple 0 a 2 1 + a2 2 c = a 1 p a 2 1 + a 2 2 s = a 2 p a 2 1 + a 2 2

Givens QR Generally, to annihilate selected component, rotate target with another component 2 3 1 0 0 0 0 0 c 0 s 0 60 0 1 0 0 7 40 s 0 c 05 0 0 0 0 1 2 6 4 a 1 a 2 a 3 a 4 a 5 3 7 = 5 2 6 4 a 1 a 3 0 a 5 3 7 5

Givens QR Requires about 50% more work than Householder method. Also requires more storage. These disadvantages can be overcome, but requires more complicated implementation Adv. for computing QR for sparse A

Gram-Schmidt orthogonalization Given vectors a1 and a2, Seek orthonormal vectors q1 and q2 having same span a 1 a 2 q 1 q 2 a 2 (q T 1 a 2 )q 1

Gram-Schmidt Algorithm Classical Gram-Schmidt : Process extended to any number of vectors for k =1! n do q k = a k for j =1! k 1 do r jk = qj T a k q k = q k r jk q j end for r kk = q k 2 q k = q k /r kk q j end for

Modified Gram-Schmidt Classical Gram schmidt procedure often suffers loss of orthogonality in finite precision Seperate storage required A Q and R Both deficiencies are dealt by modified gram-schmidt

Modified Gram-Schmidt Classical Gram-Schmidt : Process extended to any number of vectors for k =1! n do r kk = a k 2 q k = a k /r kk for j = k +1! n do r jk = qk T a j a j = a j r kj q k end for end for

Comparison of Methods Forming normal equations matrix ATA requires about n^2m /2 multiplications. and solving resulting symmetric linear system requires about n^3/6 Solving least squares problem using Householder QR factorization requires about mn^2 -n^3/3 multiplicatlons if m= n, both methods require about same amount ofwork If m >>n. Householder QR requires about twice as much work as normal equations Cost of SVD is proportional to mn^2 + n^3. with proportionality constant ranging from 4 to 10, depending onalgorithm used

Comparision of Methods Householder more accurate and broadly applicable Normal equations provide sufficient accuracy when well conditioned When rank deficient, householder with column pivoting can be used when normal method fails outright

Comparision of Methods Normal equations method produces solution with relative error proportional to [cond(a)]^2 Cholesky breaks down at cond(a) inversely proportional to machine precision Householder relative error proportional to inherent sensitivity of least square problem