Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose elements are the same as in x and A is an m n matrix such that if y = Ax, then y = Ax. We will refer to the vector spaces for x and A as n and mn respectively Column Vector Norms: Natural norms to use are: / p / n n n p x j i ti l x d j x j j j j x, in particular and, p x x x max x j j 7. Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose elements are the same as in x and A is an m n matrix such that if y = Ax, then y = Ax. We will refer to the vector spaces for x and A as n and mn respectively Inner Product: x, y y x he norm generated by this inner product is the p = norm. When we write a norm without a subscript, we usually mean the p = norm. x x / n x j j 7.
Vector and Matrix Norms Vector Space Algebra Matrix Algebra: We let x x and A A, where, if x is an element of an abstract vector space n, and A = A: n m, then x is a complex column vector of length n whose elements are the same as in x and A is an m n matrix such that if y = Ax, then y = Ax. We will refer to the vector spaces for x and A as n and mn respectively Matrix Norm: he natural norm to use is: Ax A max x max x A x Remark: his norm always exists (see slide 6.5). he definition implies Ax A x 7.3 Conditioning and Condition Number Ill-conditioned systems here are always errors in a matrix due to measurement difficulties and roundoff errors. An ill-conditioned matrix A will turn a small error in b in the equation Ax = b into a large error in x Example: 0.9999.000x x his equation has the solution: x = 0.5 + 5000.5, x = 0.5 + 4999.5 We see that is multiplied by a factor of around 5000! How small is small? How large is large? he answer to this question is application-dependent, BU when we multiply by factors that approach the roundoff limit ( ), we will always be in trouble. 7.4
Conditioning and Condition Number Ill-conditioned systems Geometric interpretation: An ill-conditioned system is nearly degenerate. A picture in two dimensions: () x x x x x 0 () x 0.99x.0x x x x x well-conditioned ill-conditioned 7.5 Conditioning and Condition Number Condition Number Definition: For a square n n matrix, the condition number is (A) = A A Definition: he residual r of an approximate solution ˆx of Ax = b is r baxˆ A( xxˆ) heorem: x x ˆ ( A) r x b Proof: From b = Ax, we infer b A x. We have x xˆ A r, so that x xˆ A r. Combining these results, we obtain the theorem Remark: With a small condition number and a small residual relative to b, the error in x relative to its norm will be small. Remark: here will some variation in the condition number, depending on the choice of norm. Generally, they track each other. 7.6 3
Some questions that this discussion leads to () What do we do when the condition number is poor? () What about problems where the matrix A is not square? he singular value decomposition allows us to deal with these questions and much more! heorem: Let A be an m n matrix of rank r. hen, A can be expressed as a product A = UΣV, where U and V are respectively m m and n n orthogonal (unitary) matrices of rank r, and Σ is a non-square diagonal m n matrix of rank r, Σ, r 0 r 0 7.7 Proof: We proceed inductively: () here is a vector x with a norm equal to ( p = ) that satisfies Ax = A x = A, and a vector y = (/ )Ax, whose norm is also equal to. We note that x n and y m. We start with a set of orthonormal column vectors that span n, for example e, e,, e n, where e = [ 0 0 0], e = [0 0 0],, e n = [0 0 0 ]. We note that I = [e e e n ] is the n n identity matrix and evidently unitary. We may now use the Gram- Schmidt procedure to create a new orthonormal set of column vectors that span n, v, v,, v n, where v = x, and a corresponding matrix V = [v v v n ]. In a similar way, we create a set of orthonormal vectors that span m, u, u,, u m, where u = y, and a corresponding matrix U = [u u u m ]. () We now construct the matrix B UAV. We see that AV = [ y Av Av n ] and UAV. In particular, we u kl kavl have UAV y y and UAV uky 0 when k > k 7.8 4
Proof: We proceed inductively: () We now construct the matrix B UAV. We see that AV = [ y Av Av n ] and. In particular, we UAV u kl kavl have UAV y y and UAV u ky 0 when k > k We conclude that B has the form B, where, () w ( n) ( m) ( n) w A 0 A We now note that B x x V A U UAVx y A A y A y, where y Vx Since x = y, we conclude B = A =. From (), using the column vector [ w ], we find B + w, which implies w = 0. Hence, 0 B 0 A 7.9 Proof: We proceed inductively: (3) We now proceed with A in exactly the same way that we proceeded with A (unless r = ). We have 0 < A. We find x (n) and y (m) just like before. As before, we obtain orthogonal matrices, Uˆ and Vˆ and we find ˆ 0 ( m) ( n) B, where 0 A A ( m) ( m) ( n) ( n) We can now construct matrices 0 0 0 0 0 U,, 0 0 ˆ V ˆ B 0 ˆ 0 0 U V B 0 0 A and we note B U U AVV 7.0 5
Proof: We proceed inductively: (4) We continue through r iterations, at which point we will have spanned the range of A. Hence, we must have A r+ = 0. he desired matrices are: UUU U, V VV V, Σ B r r r Remark: Suppose A is an n n non-singular matrix. We then find that A = VΣ U, so that A = / n, and (A) = / n. In general, a large ratio between singular values implies that a matrix is ill-conditioned. Remark: he theorem is not constructive since we have not described how to find the vectors that correspond to the maxima. We will discuss algorithms in connection with eigenvalues and eigenvectors. Corollary: A = VΣ U ; Rank A = Rank A = r; Nullity A = n r; Nullity A = m r 7. What is happening A A A = A : n m ; v u v v u v r r vr ur vr vr 0 ur 0 v 0 u 0 m n A = A : m n V spans n ; U spans m A: v u, v u,, v r u r with multipliers,,, r A : u v, u v,, u r v r with multipliers,,, r he SVD reveals the entire structure of A, including its nearly singular, but not exactly singular components! 7. 6
Example: Consider the matrix shown right >> A = [/3 /3 /3; /3 /3 4/3; /3 /3 3/3; /5 /5 4/5; 3/5 /5 4/5] his matrix has rank (col. 3 = col. + col. ) he SVD shows that sort of >> [U,S,V] = svd(a) Due to roundoff, the third singular value is non-zero. o see that, we change format >> format short e /3 /3 /3 /3 /3 4/3 A /3 /3 3/3 /5 /5 4/5 3/5 /5 4/5 MALAB can detect that the numerical rank is really >> rank(a) But that does not work with measurement errors that exceed roundoff >> A = A; A(5,3) = A(5,3) + e-7 >> rank(a) >> [U,S,V] = svd(a) 7.3 Example: We can change the tolerance /3 /3 /3 /3 >> rank(a,e-6) /3 /3 4/3 /3 hat leads to uncertain results for the A /3 /3 3/3, b /3 solution to Ax = b >> x = A\b, x = A\b /5 /5 4/5 /5 >> b = b; b(5) = b(5) + e-7 3/5 /5 4/5 3/5 >> x = A\b, x = A\b If b is not close to a column, things get very bad >> b = (/3)*[ ]' >> x = A\b, x = A\b MALAB does better with the first than the second because it knows that the numerical rank is really One way to fix things is to zero out large elements in the SVD of the pseudo-inverse What is the pseudo-inverse? 7.4 7
he (Moore-Penrose) pseudo-inverse Definition: Writing A = UΣV, where Σ = diag( r 0 0) [an m n matrix] hen the pseudo-inverse A + is given by A + = VΣ + U, where Σ + = diag(/ / / r 0 0) [an n m matrix] he pseudo-inverse equals the inverse for non-singular square matrices. heorem: Given the equation Ax = b, we have three cases: () If the system has a unique solution, x = A + b produces that solution. () If the system is over-determined, then x = A + b produces the solution from the linear manifold that t minimizes i i Ax b (least-squares solution) )that talso minimizes i i x. (3) If the system is under-determined, then x = A + b produces the solution of Ax = b that minimizes x. So, it always does something sensible! 7.5 Example (continued): >> Ap = pinv(a) >> Sp = svd(a) >> x = Ap*b, x = Ap*b, x=ap*b We get something sensible! Note however that pinv(a)*b is not the same as A\b. MALAB uses QR for over- or under-determined systems. Like the pseudoinverse, QR produces the least-square solution, but, unlike the pseudo-inverse, it produces nullity A zeros in the solution. >> Ap = pinv(a) [Many elements are very large a sign of trouble] >> x = Ap*b b, x = Ap*b b, x = Ap*b [Huge, nonsensical numbers in the third case!] We find the source of the problem by looking at the SVD >> [Up Sp Vp] = svd(ap) [Note the large first element] 7.6 8
Example (continued): We can fix the problem by zeroing out the large element of the SVD >> Sp(,) = 0 [We zero out the first element] >> Apm = Up*Sp*Vp' [and reconstruct the pseudo-inverse] >> x = Apm*b [and we now get sensible results] We can avoid this problem by using a tolerance with MALAB >> Ap = pinv(a,e-6) [he tolerance sets small SVD elements to zero] >> x = Ap*b [and we now get sensible results directly] 7.7 Least-Squares Method and the QR Factorization Problem Statement: We often are trying to fit a limited number of parameters to a large, noisy data set. hat leads to an over-determined linear system. Example: We have a set of carbon resistors in a circuit that must produce 0 A. As the resistors age, the voltage that is required to produce this current increases linearly. Hence, we expect v = v 0 + y, where v is the voltage and y is the number of years. We want to determine v 0 and using measurements on resistors of varying ages. We get a curve like the one below: voltage years 7.8 9
Least-Squares Method and the QR Factorization Problem Statement: Example (continued): With 0 measured points, we get a problem of the form Ax = b, where x = [v 0 ], b = [v v v 0 ], in which v v 0 are the measured voltages, and [a k ] =, [a k ] = y k = (k )/0; k = 0. We may generate the numerical example in the following way: >> y=0:.:0; >> v = + 0.*y; >> doc randn >> delta = randn(size(v)); >> vr = v + 0.05*delta; b = vr'; >> doc plot >> plot(y,v,y,vr,'.') >> A = ones(0,); >> A(:0,) = ; >> A(:0,) = y'; >> x = A\b 7.9 Least-Squares Method and the QR Factorization Problem Statement: Remark: Efficient solution of this problem is based on QR factorization. We have already seen that any matrix A may be written in the form A = QR, using Gram-Schmidt Sh orthogonalization ti, where Q is an orthogonal matrix and R is upper diagonal. (See slide 6.0) heorem: Ax b = Rx Q b Proof: Ax b = (QRx b) (QRx b) = (x R Q b )(QRx b) = x R Q QRx x R Q b b QRx+ b b = x R Rx x R Q b b QRx+ b QQ b = Rx Q b heorem: If A is an m nmatrix of rank r, and c = Q b,, then we may find a least squares solution by back-substitution if r = nor by setting x r+ x n = 0 and then using back-substitution if r < n. We then find m min Axb bk kr 7.0 0
Least-Squares Method and the QR Factorization Problem Statement: Remark: Except for the addition of column pivoting, which we will describe shortly, this algorithm is how MALAB calculates least squares. Column pivoting is needed d to ensure stability Remark: he QR algorithm is more efficient than SVD, although less robust, and allows the easy introduction of additional data. Remark: he QR algorithm plays an important role in finding eigenvalues and eigenvectors. o understand this algorithm (and algorithms for solving eigenproblems), we must tfirst study rotations ti and reflections! 7. Rotations: Rotations and Reflections x x (rcos, rsin ) x We find that cos sin x r x/ r x / r U x, where = sin cos x 0 U x / r x/ r (r, 0) and U a rotation matrix is an orthonormal matrix with determinant. x 7.
Rotations and Reflections Rotations: We can systematically implement a series of rotation matrices of the form U jk, to eliminate the lower triangular elements of A and create QR c s j U jk s c k j k 7.3 Rotations: Example: Rotations and Reflections a / a a a / a a 0 a a a 3 a a a 3 a / a a a / a a 0 Q a3 a3 a 33 0 0 A a a ( aa aa)/ a a ( aa3 aa3)/ a a QA 0 ( aa aa)/ aa ( aa3 aa3)/ a a a 3 a 3 a 33 A 7.4
Rotations: Rotations and Reflections Example (continued): a a a a / a a 0 a / a a 3 3 3 3 A 0 a a 3 Q 0 0 a 3 a3 a 33 a3 / a a3 0 a / a a3 a a ( a a a a )/ a a ( a a a a )/ a a QA 0 a a3 0 ( aa3 a3a)/ a a ( a a a a )/ a a A 3 3 3 3 3 3 33 3 3 3 3 33 3 3 3 We find Q 3 and A 3 similarly by eliminating a 3. We then have R = A 3 and Q = Q Q Q 3 Remark: hese rotations are called Givens rotations Better is to use reflections! 7.5 Reflections: Rotations and Reflections We note rsin cos = x /, and rsin x r) /. In vector notation, x v is the desired new vector. Letting u = v/ v, we find v = (rsin,, rsin cos) ( x r) ( x r) x r xr r xr x r 0 I ( x r) x x x r xr r xr U x uu x x x = (rcos, rsin ) Remark: det U =, which is a property of reflections Remark: Better in general is to use v = [x + sign(x )r] /, so that the first element is never zero. In that case, U x = [r 0]. x 7.6 3
Rotations and Reflections Reflections: Remark: Better in general is to use v = [x + sign(x )r] /, so that the first element is never zero. In that case, U x = [r 0]. x v = (rsin,, rsin cos) x x x = (rcos, rsin ) x x = (r, 0) x = (r, 0) x 7.7 Reflections: Rotations and Reflections Algorithm: his approach is extended in a very straightforward way to arbitrary dimensions and allows us to zero out an entire column at once in A. We let r = a, where a is the first column vector in A. We then let v = [r +a a a m ] and let Q Q I vv / v. We now proceed recursively r w ˆ ( m) ( n) ( n) A QA QA, where A, w 0 ˆ A We let r = a, where a is now the first column vector in Â. We now let v = [r +a a a (m) ], where a a (m) are the elements of a, and we let 0 ˆ ˆ Q Q I vv / v, Q Q 0 ˆ Q 7.8 4
Reflections: Algorithm (continued): Rotations and Reflections We now find r w A Q Q A A 0 ˆ A ˆ ( m) ( n) ( n) 0 r w, where, w We continue for r iterations, at which point the algorithm terminates. We have R = A r and Q = Q Q Q r One modification In order to guarantee stability, at each iteration we calculate max a j, where the a j are the column vectors of Aˆ k. Permuting the column vectors permutes the rows of the solution x, which must be restored. Remark: hese reflections are called Householder reflections. 7.9 QR Factorization Example: We again consider the matrix to the left. () We have a = [(/3) + (/3) + (/3) + (/5) + (3/5) ] / =.0893, a =.0954, a 3 =.8. So, we first permute columns and 3. /3 /3 /3 /3 /3 4/3 A /3 /3 3/3 /5 /5 4/5 3/5 /5 4/5 0.333 0.333 0.667 0.667 0.333 0.333 0.667 0.667.333.333 0.667 0.667 0.333 0.667.000.000 0.667 0.333 A 0.400 0.400 0.800 0.800 0.400 0.400 0.600 0.00 0.800 0.800 0.00 0.600 3 3 7.30 5
QR Factorization () We now have v = [0.667 +.8.333.000 0.800 0.800]. Calculating Q and A = Q A, we obtain Q I vv / v..064.058 0.000 0.005 0.005 0.000 0.655 0.655 A 0.000 0.0009 0.0009 0.000 0.009 0.009 3 (3) he norms of the two columns of  are the same by inspection. So, we do not permute. We have a = 0.603 and v = [0.005 0.603 0.655 0.0009 0.009] 7.3 QR Factorization (4) Calculating Q and R = A = Q Q A, Q I vv / v we obtain..064.058 000 0.000 0.603 0.603 0.000 0.000 0.000 A 0.000 0.000 0.000 0.000 0.000 0.000 3 MALAB: >> A = [/3 /3 /3; /3 /3 4/3; /3 /3 3/3; /5 /5 4/5; 3/5 /5 4/5] >> [Q,R,E] = qr(a) 7.3 6
Orthogonal and Unitary Matrices We stated that all our results apply to complex matrices, but all our examples are real. How do reflections and rotations work with complex matrices? heorem: he most general orthogonal l( (unitary) matrix may be written exp[ i( )]cos exp[ i( )]sin U exp[ i( )]sin exp[ i( )]cos Remark: det U = exp(i) can equal +, corresponding to rotations,, corresponding to reflections, and any other complex number of modulus. Remark: Building from this: We may always make the first column vector of A real by multiplying it by the m m orthogonal matrix U, chosen so that exp(i )a, exp(i )a,, exp(i m )a m are real. exp( i So, the QR algorithm can ) U exp( i easily be made to work with ) complex matrices 7.33 7