Applied Linear Algebra Guido Herweyers KHBO, Faculty Industrial Sciences and echnology,oostende, Belgium guido.herweyers@khbo.be Katholieke Universiteit Leuven guido.herweyers@wis.kuleuven.be Abstract In this workshop some fundamental concepts of applied linear algebra will be explored through exercises with the symbolic calculator I-89 itanium. Keywords are: matrix algebra, systems of linear equations, vector spaces, linear dependence and independence, eigenvalues and eigenvectors, discrete dynamical systems, least-squares problems. he exercises are based on Linear Algebra and Its Applications, third edition update by David C. Lay, and other books mentioned in the sources. Introduction Engineering students of the Katholieke Universiteit Leuven and the Faculty Industrial Sciences and echnology of KHBO Campus Oostende are using the book linear algebra and its applications, third edition update by David C. Lay. he website http://www.laylinalgebra.com contains useful documents in the student resources and instructor resources. he text includes a copy of the Study Guide (49 pages in pdf-format) on the companion CD with useful tips, summaries and solutions of exercises. he CD also contains data files for about 9 numerical exercises in the text. he data are available in formats for Matlab, Maple, Mathematica and the graphic calculators I-8+/86/89 and HP-48G. he following text illustrates the use of the symbolic calculator I-89 itanium for applications in linear algebra. A modern didactical approach of matrix multiplication he definitions and proofs in the book focus on the columns of a matrix rather than on the matrix entries. n A vector is a matrix with one column. he product of a m n matrix A with a vector x in is a linear combination of the successive columns of A with as weights the corresponding entries in x: x Ax= [ a a K a n] M = xa+ xa + L + xnan x n For the product of a m n matrix A with a n p matrix B we make sure that ( ) = ( AB) A Bx x for each x in p R.
his demand leads to the definition AB = A b b K b = Ab Ab K Ab p p Each column of AB is a linear combination of the columns of A using weights from the corresponding column of B. his view of matrix multiplication is very valuable in linear algebra, we illustrate this with the following exercise (to solve without calculator): a) Let A =, find a 4 matrix B with entries or, such that AB = I :???? =???? Solution: find linear combinations of the columns of A that give the columns of I :?? =?? and = Are there more possibilities for B if the entries are allowed to be, - of? b) Does there exist a solution to the problem???? a b c d =?? e f g h??? he answer is no! If there would exist a left inverse X of the matrix a b c d e f g h, this would imply that each of the columns of I 4 are linear combinations of the two columns of X. 4 Consequently, the two columns of X would generate the 4-dimensional space R, which is impossible because the column space of X can be at most -dimensional.
Brief introduction to the I-89 itanium he I-89 itanium is a symbolic calculator, with computer algebra based on DERIVE. A brief survey of its possibilities will be given during the workshop. Some advantages are: Computeralgebra in handheld format. he immediate availability, the fast on and off switching. he versatile features and applications not only for mathematics and statistics, but also for chemistry and physics, such as the DataMate application software for collecting data with the CBL and various sensors. A summary can be found on http://education.ti.com/educationportal/sites/us/productdetail/us_ti89ti.html 4 Applications of linear algebra In the following applications a lot of screen shots of the I-89 are shown, they should be selfexplaining. It is not our intention to discuss the I-89 syntax here. 4. Linearly dependent vectors Let A = 4 6. he third column of A is the sum of its first columns. 5 8 Let s change the last entry of A, for example B = 4 6. 5 Are the columns of B still linearly dependent? Answer: at first sight, one would expect that the columns of B are linearly independent But the rows of B are linearly dependent (the second row is twice the first row), so the columns of B must be linearly dependent too, because the row space and the column space of B have the same dimension (i.e. the rank of B). o find a linear dependence relation among the columns b, b, b of B, we calculate the reduced row echelon form R of B:
he linear dependence relation among the columns of R and B are the same. Obviously we have r = r 5 + r, consequently b = b 5 + b. 7 7 7 7 4. Conic sections a) A circle through given points: linear system From analytic geometry, we know that there is a unique circle, passing through three distinct points, not all lying on a straight line. he standard equation ( x x ) + ( y y ) = R of the circle with center (, ) x y and radius R can be written as x + y + lx+ my+ n= (). Substitution of the coordinates of the given points ( x, y),( x, y),( x, y ) into () gives a system of linear equations from which the unknowns l, m, n can be solved: lx + my + n = x y lx + my + n = x y lx + my + n = x y () Example: find the equation of the circle that passes through the points (, 7 )(, 6, ) and ( 4,6 ). he system () becomes l+ 7m+ n= 5 6l+ m+ n= 4 4l+ 6m+ n= 5. he reduced row echelon form of the augmented matrix of the system yields the solution lmm,, =, 4,. ( ) ( ) 4
he equation () of the circle becomes + 4 = or ( x ) + ( y ) = 5. x y x y A graphical affirmation: Remark: it is impossible to plot implicit equations with the I-89, therefore we have to solve the equation for y and draw the two functions. When does the system () have no solution or infinitely many solutions? b) A circle through given points: equation in determinant form he equation x + y + lx+ my+ n= of a circle can, after multiplication with a, a x + y + bx+ cy+ d = (). he coordinates of the given points be written as ( ) ( x, y ),( x, y ),( x, y ) must satisfy (). he coordinates of an arbitrary point (, ) circle satisfy () too. his gives the following homogeneous system: x y on the ( ) ( ) ( ) ( ) a x + y + bx+ cy+ d = a x + y + bx+ cy+ d = a x + y + bx + cy + d = a x + y + bx + cy+ d = (4) 5
he system (4) has a nontrivial solution ( abcd,,, ). hus, the determinant of the coefficient matrix is zero: + x y x y x + y x y + + x y x y x y x y = (5) his is the equation of the circle in determinant form. For the circle through the points 4,6 we get (,7 ),( 6, ) and ( ) x + y x y 5 7 4 6 5 4 6 = or ( x x+ y 4y ) = his procedure with determinants is only practical with computeralgebra. We store the general coefficient matrix as a function cirkel of the coordinates of the points: Remark: entering three different points lying on a straight line gives the equation of that line. c) A conic section through 5 given points he general equation of a conic section in the plane is ax by cxy dx ey f + + + + + = with the six coefficients not all zero. he number of coefficients can be reduced to 5 by dividing the equation by a coefficient which is not zero. hus, we expect that 5 distinct points are sufficient to determine the equation of the conic section. 6
Analogous with the last section we find the equation in determinant form: x y xy x y x y x y x y 4 4 4 4 4 4 5 5 5 5 5 5 x y x y x y x y x y x y x y x y x y x y x y x y = As an illustration we determine the equation of the conic section through the points,,,,4,6, 5,,, 4. ( ) ( ) ( ) ( ) ( ) he result is 8x 6y 57xy 946x 8y 454 + + = (after division by 6). Because 8 57 / 57 / 6 <, the conic section is a hyperbola. he general coefficient matrix is stored as the function ks of the coordinates of the points: he conic section through the points (, ), (, ), (, ), (, ), (, ) is of lines (degenerated conic). x y =, a pair 7
d) he ellipse as a locus of points An ellipse is the set of points ( x, y ) such that the sum of the distances from (, ) given points (the foci) is fixed. Choose an orthonormal axis system. Let = ( x, y) f = ( x f, y f ), = ( xg, yg ) he triangle inequality gives g the foci, then we have: ( ) ( ) x y to two p be a general point of the ellipse and d pf, + d pg, = a (6) ( fg, ) ( fp, ) + ( pg, ) or d( ) d d d fg, a (7). We demand the strict inequality d( fg, ) < a because d( ) connecting f with g. he distance formula gives d( pf, ) = p f = ( x xf ) + ( y yf ) Our goal is to get rid of the square roots associated with (6): squaring d( pf, ) = a d(, ) pg yields fg, = a results in the segment ( pf, ) = 4 4 ( pg, ) + ( p, g ) d a a d d or 4 a d( p, g) = 4 a + d( p, g) d( pf, ) ( ) squaring again gives a d( ) a d( ) d( ) 6 pg, 4 + pg, pf, = (8) he expression (8) is simplified with computeralgebra, resulting in the equation of the ellipse. he left-hand side of that equation is stored as the function ellips ( xf, yf, xg, yg, a ), with as variables the coordinates of the foci and the length a of the half major axis. his is a long expression (see the fourth screen shot below). 8
As an example, the equation ellips( 4,, 4,,5) = of the ellipse with the foci (4,), ( 4,) and a = 5 yields ( c c a) ellips,,,, 44x 4y 6 + = or = becomes ( ) x y + =. 5 a c x + a y a + a c = 6 6 6 6 4 6 or ( a c ) x + a y = a ( a c ) with b = a c, this gives the standard equation x a y + =. b De choice of the foci is free; ellips 5,, 4,,8 = becomes ( ) 7x 88xy 96y 7x 44y 459 + + = : Remark: We tend to choose a value a with d( ) satisfied. fg, > a, such that the triangle inequality (7) is not Exploring with ellips ( 5,, 5,, 4) = gives x y =!! 6 9 44x + 56y + 4 = or the hyperbola 9
One can check that also a hyperbola with d( pf, ) d( pg, ) = a and d( fg ) satisfies the equation (8)! d, d, = a (Hint: ( pf) ( pg ) is equivalent with d(, ) =± a+ d(, ) pf pg ), > a hus, the function ellips ( xf, yf, xg, yg, a ) results in an ellipse or a hyperbola with given foci f = ( x f, y f ) en g = ( xg, yg ) or greater than a., depending on whether the distance between f en g is less than 4. Eigenvalues and eigenvectors he problem of finding a number λ and a vector applications. x such that Ax= λx has a lot of a) he Cayley-Hamilton theorem he Cayley-Hamilton theorem can be introduced as: choose a matrix A and find its characteristic polynomial p ( λ ) = det ( A λi) replace λ in p ( λ ) by A (with this λ is replaced by what is the result? ry also with a and a 4 4 matrix. A = I ) We always obtain the zero matrix and suspect in general that each matrix A satisfies its own p λ = is the characteristic equation A, then p( A ) = (the zero characteristic equation: if ( ) matrix). his is the Cayley-Hamilton theorem. For 4 4 A = the characteristic equation becomes λ + λ = so that
A A I + = or A I A =. hus A can also be expressed in terms of A and I: ( ) ( ) A = A A= I A A= A A = A I A = A I k For any natural number k, the matrix A can be written as β are constants whose values will depend on k. = α + β, where α and k A A I b) he Gerschgorin circles Let A= a ij be a square matrix of order n, then every eigenvalue of A lies inside (or on) at least one of the circles (called Gerschgorin circles) in the complex plane with center a ii and n n radii ri = ai, j = ai, j a ( ii i=,,, K n ). j=, j i j= hus all the eigenvalues of A lie in the union of the discs D = z C: z a r ( i=,,, K n). { } i ii i his first Gerschgorin theorem provides a quick graphic view of the position of the eigenvalues. Let 8 A =. he three Gerschgorin circles are (i) z 8 = + = (ii) z = + = (iii) z = + = We see that the eigenvalues are lying in the union of the discs. he second Gerschgorin theorem states that if the union of s of Gerschgorin circles forms a connected region, isolated from the remaining circles, then exactly s of the eigenvalues lie within this region. In our example, s =. Since A and A have the same eigenvalues, we can also consider the three circles with radii calculated from the column sums instead of the row sums:
he eigenvalues lie in the intersection of the two unions of three discs. Applications: If the origin doesn t lie in the union of the discs associated with the matrix A, then is not an eigenvalue of A. his means that A has an inverse. A system u = Au of linear first order differential equations, with A diagonalizable, is stable if all the eigenvalues of A have a strict negative real part. his is certainly the case if all the circles of Gerschgorin are lying in the half plane x <. n ij K (analogous j= All the eigenvalues are lying in the disc z max a ( i =,, n) for the column sums). i c) Markov chains Suppose that the annual population migration between three geographical regions A, B and C is given by the following transition diagram..8. A... C.5..6 B For example, % of the population in region A moves annually to region C.. his transition is governed by the following transition matrix:
from : A B C.8.. P=...6..5. to : A B C his matrix in a stochastic matrix (the column sums add up to ). ) Suppose that the initial distribution (our first observation) of the population is given by the.4 initial state vector x =.5 (i.e. a probability vector with nonnegative entries that add. up to ). At that moment, 4% of the population lives in A, 5% in B and % in C. he Markov chain is a sequence of state vectors x, x = Px, x = Px, x = Px, K ( x k is the state vector after k years). his leads to x = Px or k k+ k xk = P x for k =,,, K Calculate the successive state vectors and study the long-run population distribution. hen choose another initial state vector x and observe the long run distribution. Conjecture? ) Find a steady-state vector (or equilibrium vector) for P, i.e. a probability vector q such that P q= q. Conjecture? ) Observe the evolution of k = k P for,, K. Conjecture? he calculator can do the calculations: )
.557 he state vectors converge to.. another x and see what happens)!, independent of the initial distribution (choose ) Remark that P q= q certainly has a solution as a stochastic matrix P always has eigenvalue ( for P the vector is an eigenvector with eigenvalue ). n ij or i= All the eigenvalues are lying in the disc z max a ( j =,,) he eigenspace corresponding to the eigenvalue is given by j z. x 4 / x x 4 / = x with x by 4 + 4 + = 6 gives the only steady state vector 4 R. For x = we find the eigenvector 4, dividing 4 / 6 q = 4 / 6 with P q= q. / 6 he Markov chain x, x, x, x, K seems to converge to q, independent of the initial state vector x. Can we prove this conjecture? 4
Choose a basis {,, } v v v for 4 eigenvalues λ =, λ =.55, λ =.5. Let v = 4. R, existing of eigenvectors of P with corresponding he vector x can be written as a unique linear combination of the basis vectors: x = c v + c v + c v = P = c P + c P + c P = c + c.55 + c (.5) hen x x v v v v v v x x v v v v v v and = P = c P + c.55 P + c (.5) P = c + c (.55) + c (.5) in general we find: = + (.55) k + (.5) We conclude that lim x k = cv k k xk c v c v c v for k =,, K. he limiting vector belongs to the eigenspace of P corresponding to λ =. Every x k is a probability vector with column sum, therefore the limiting vector must be a probability vector too. 4 4 / 6 cv = c 4 with 4c+ 4c+ c =, thus c = /6 and cv = q = 4 / 6. / 6 he result is independent of the initial state vector x = cv+ cv + cv. Observe that the first term c v = q is the same for every initial state vector! ) We observe successive powers of the transition matrix P: 5
We suspect that lim P k = [ q q q ] k Can we prove this conjecture? k k k Remark that P = P I = P [ e e e ] k k k = P e P e P e [ ] lim P k = lim P k e lim P k e lim P k e = q q q k k k k since e, e, e are probability vectors. heorem: k If P is an n n regular stochastic matrix (i.e. there exists a k such that P contains only strictly positive elements), then P has a unique steady state vector q (i.e. P q= q). Moreover, if x is any initial state vector and x = P k+ x k for k =,,, K, than the Markov k q q K q as k. chain { x k } converges to q and P to [ ] d) Diagonalization of a square matrix An n n matrix A is diagonalizable if A is similar to a diagonal matrix, that is, if A= PDP for some invertible matrix P and some diagonal matrix D. he columns of P are n linearly independent eigenvectors of A and the diagonal entries D are the successive corresponding eigenvalues of A. Diagonalization occurs in Diagonalizing quadatric forms. Linear discrete and linear continuous dynamical systems (systems of linear first-order difference equations or differential equations). he calculation of matrixfunctions. 7 As an example we calculate A with A = 4. Diagonalization of A results in A= PDP with P 5 = and D =. We define 5 A = P P. One can check that this matrix has the desired property ( ) A = A. 6
However, is this a good definition, independent of the chosen eigenvectors in P corresponding with the successive eigenvalues in D, chosen in some order? Obviously we find the same matrix A for another choice of P. heorem: Let A= PDP be a diagonalizable matrix where the eigenvalues in D= diag ( λi, λi, K, λk I) are grouped by repetition. For a function f ( z ) that is defined at each eigenvalue λ i of A, define ( λ ) f I L f ( λ ) I f ( A) = P f ( D) P = P P M M O f ( λk ) I his definition is independent of the chosen diagonalization of A. Exercise: prove that sin ( ) cos ( ) A + A = I and check this for 7 A = 4. 7
e) Exploring eigenvalues of matrices ) Let the I-89 generate two matrices A and B with integer values between -9 and 9. Find the eigenvalues of A and B. i) Find the eigenvalues of A + B. Conjecture? ii) Find the eigenvalues of AB. Conjecture? iii) Find the eigenvalues of 5A and B. Conjecture? iii) Compare the eigenvalues of AB and BA. Conjecture? ) Construct a matrix A (not diagonal or triangular) with integer entries and eigenvalues, and. Solution: It suffices to find a matrix P with integer entries and det ( P ) =. hen the matrix A P ( ) P = diag,, has the desired properties. Let lower triangular matrix with integer entries and s on the diagonal. P = L L where L is a 8
4.4 Orthogonality and least-squares problems a) Orthogonal bases for a vector space simplify the calculations, they play an important role in numerical analysis, such as the QR factorization where an m n matrix A with linearly independent columns is factored as A= QR (where Q is an m n matrix whose columns form an orthonormal basis for the column space of A, and R is an n n invertible upper triangular matrix with positive entries on its diagonal). b) Orthogonal projections are the key for finding solutions for overdetermined systems. In practice, systems of linear equations with more equations than unknowns and no solutions often appear. For example, how can we find the best line y = ax+ b through (, )(,, )(,, )(,, )? We desire that yi axi b = + (,,, 4) i = but this system a+ b= a + b = a+ b= a+ b= or a + b = or a = b or A x = y has no solution x. A best solution ˆx gives a vector ( ˆ, ) d Ax y = Axˆ y Ax y for each ˆ Ax as close as possible to y, in the sense that x R. We call ˆx a least-squares solution, the corresponding least-squares line minimizes the sum of the squares of the vertical deviations of the given points to the line. he vector Ax ˆ belongs to the column space of A (notation Col A ). he closest vector in Col A to y is the orthogonal projection of y onto Col A, this is the unique vector proj y where ColA y = n +projcol A y and n Col A. 9
Such a vector ˆx where A xˆ = projcol A y surely exists because if x runs over over the complete column space of A. R, then Ax runs We demand that y = n+ Ax ˆ and n Col A or y Ax ˆ Col A. he columns of A generate Col A. hus, the vector y Ax ˆ is orthogonal to Col A if and only if Aˆ A = a a. y x is orthogonal to the column vectors of [ ] his means that a ( y Ax ) = and a ( y Ax ) = or A ( Aˆ ) ˆ ˆ y x =. Consequently, AAxˆ = Ay (the system of normal equations). A least-squares solution ˆx of A x = y can be found by solving the system AAxˆ = Ay. he matrix AA is invertible if and only if the columns of A are linearly independent (as in our example), in that case we find the unique solution ˆ ( A A) x = A y. If the columns of A are linearly dependent, then the system solutions ˆx, satisfying A xˆ = proj A y. Col AAxˆ = Ay has infinitely many For our example we find: he least-squares solution is x a 7/ ˆ = = b /5 ; the least squares line is y =.7 x+.. he calculator confirms this result.
5 Conclusion With the foregoing examples we have illustrated that the I-89 can help to get the correct answer quickly, explore methods that are too time-consuming with manual calculations, gain insight in introducing new mathematical concepts, explore different situations, leading to conjectures draw graphical representations. 6 Sources. H. Anton, C. Rorres, Elementary Linear Algebra, Applications Version, John Wiley & Sons, 99.. S.I. Grossman, Elementary Linear Algebra, fourth edition, Saunders College Publishing, 99.. G. James, Advanced Modern Engineering mathematics, third edition, Pearson Education, 4. 4. D.C. Lay, Linear Algebra and Its Applications, third edition update, Pearson Education, 6. 5. C.D. Meyer, Matrix Analysis and Applied Linear Algebra, Siam,.