Linear transformations: the basics

Linear transformations: the basics The notion of a linear transformation is much older than matrix notation. Indeed, matrix notation was developed (essentially) for the needs of calculation with linear transformations over finite-dimensional vector spaces. The details of this are in these notes. The dear reader would be advised to take to heart, when getting to the calculational parts of these notes, the slogan for matrix multiplication columnby-column-on-the-right. That is, if A is a matrix with k columns, listed as C, C 2,..., C k (in order) and A is any matrix over the same field with jth column a.. a k, then the jth column of AA is a C + + a k C k. (I mean, seriously, get used to this.) She would also be well advised to forget until further notice that matrix multiplication is associative. An easy consequence of the columnby-column rule is the distributive law A(A + A ) = AA + AA, whenever it makes sense; just look at the jth columns of both sides. A linear transformation is a homomorphism between vector spaces (possibly the same one). The algebraic structure of a vector space over the field F consists of vector addition and multiplication of vectors by the scalars in F. Hence we have DEFINITION (LINEAR TRANSFORMATION). Suppose that V and W are vector spaces over the same field F. A function T : V W is an F -linear transformation if. T ( v + V v 2 ) = T v + W T v 2 for all v, v 2 V (T preserves or respects vector addition), and 2. T (α F,V v) = α F,W T v for any v V and α F (T preserves or respects scalar multiplication). (As usual, we will leave off the subscripts whenever we can, which is just about always. I put them on in the definition just to emphasize where the operations were taking place. We will also leave off the F if it is understood. Please do not write T is closed under addition, or T is closed under scalar multiplication these phrases make no sense.) In case V = W, a linear transformation from V to itself is usually called a linear operator on V. In case W = F, a linear transformation from V to F is called a linear functional on V. In case T is not only a linear transformation, but is also a bijection (a one-to-one and onto function) from V to W, it is an isomorphism of vector spaces. The most basic kind of example of a linear transformation is this: Suppose that V = F n and W = F m for some field F and positive integers m and n. Let A be any m n matrix with entries from F and T A : F n F m be given

by T A v = A v (matrix multiplication). T A preserves addition because matrix multiplication is distributive. (A ( v + v 2 ) = A v + A v 2.) It preserves scalar multiplication since A(α v) = α(a v) for any α F and v F n. Here are some other examples. EXAMPLES.. For any V, W (vector spaces over F ), V,W (or just ) is the zero transformation. V,W v = W for any v V. It is obviously linear. 2. For any V, the identity operator on V, denoted I V (or just I), given by I V v = v for every v V, is clearly linear. So is αi for any scalar α, given by (αi) v = α v; in case α, it is an isomorphism of V with itself (a so-called automorphism of V ). 3. If V = M n (F ) and T = tr, the map that takes an n n matrix A to its trace, the sum of its diagonal elements, then T is a linear functional on V. (So tr(a) = n j= a j,j.) 4. If V = M m,n (F ) and W = M n,m (F ), the map A A T, is plainly linear, and an isomorphism. 5. Say V = M n (C), where C is the complex numbers. If T A = ĀT (the conjugate transpose, often denoted A ), then T easily preserves addition. But if we regard V as a vector space over C, it does not preserve scalar multiplication and is thus not a linear operator. (E.g. T (ii) = ii it I.) But if we regard V as a real vector space, this map is linear, since trivially ra T = r(āt ) for any real r. 6. Here s a map from R 2 to itself that preserves scalar multiplication but not addition. Any v R 2 that is not on the y-axis is uniquely representable a as, where m is the slope of the line through the vector and the ma origin. Suppose we let T v = m v in case v is not on the y-axis, and T v = if v is on the y-axis. It is not ( hard ) to see( that) T preserves ( scalar ) multiplication. But T = T + T =. 7. If V is any finite-dimensional vector space over F and B = ( v,..., v n ) is an ordered basis of V, then the map v [ v] B is an isomorphism of V with F n. We have already observed that [ u + v] B = [ u] B + [ v] B and [α v] B = α[ v] B for any u, v V and α F ; that is, the map is linear. It is clearly one-to-one since if [ u] B = [ v] B = a.. a n, then u = v = a v + +a n v n. 2

b.. And it s onto, since if F n, then v = b v + +b n v n V maps b n to this vector in F n. (So, in a definite sense, the only n-dimensional vector space over F is F n ; but what the isomorphism is depends just as much on B as it does on V.) 8. Now, a couple of simple geometrical examples. Fix a line l through the origin in V = R 2. Let l be the line through perpendicular to l. Evidently, any vector v V can be decomposed uniquely into P roj l v + P roj l v, where P roj l v is on l and P roj l v is on l. It is easy to see that the map v P roj l v is linear; it is called the orthogonal projection of v to l. Also, the map T l sending v to P roj l v P roj l v is the reflection of v across l. It is also a linear map. (The notation T l for this is fairly common, but not quite standard.) 9. Suppose that V is the vector space of differentiable functions on the reals. Let W be the (larger) space of functions which are derivatives of some differentiable function, and D : V W be given by Df = f ; as is well known, (f + g) = f + g and (rf) = r(f ) for any f, g V and r R, so D is linear. If we instead let V = C (R), the space of functions with derivatives of all orders, and let D still be differentiation, then D is a linear operator on V.. Let V be the vector space of continuous functions on the reals (or some interval containing a) and a R. The indefinite integral operator on V, defined by T (f(x)) = x f(t)dt is patently linear, by simple observations a from calculus. (Indeed, the facts that x a (f (t) + f 2 (t))dt = x a f (t)dt + x a f 2(t)dt and x a rf(t)dt = r x f(t)dt may have been described to you a as the linearity properties of the integral.). A variation on the last example is when V is the space of continuous functions on [a, b] for some fixed a < b in R, and T : V R takes f V to b f(t)dt; this is a linear functional on V. a As I hope you see, linear transformations show up all over the place. (As I hope somebody noticed, there are a lot of ways to say something is obvious in English and people complain about mathematicians using more than one term for the same idea.) Here are some of the most clear properties of linear transformations. OBSERVATIONS. Suppose that T : V W is a linear transformation (where V and W are vector spaces over F ). Then T V = W. Also, for any v,..., v k V and scalars α,..., α k F, we have that T (α v + + α k v k ) = α T v +... + α k T v k. (That is, T preserves dependence relations, and hence span. If V V is Span(S), then Span{T v : v S} is none other than T (V ).) 3

On the other hand, a linear transformation doesn t always preserve independence. It does just in case the transformation is -to-. One direction of this is blatant suppose the set { v i : i I} V is independent and T : V W is linear and -to-. The claim is that {T v i : i I} is also independent. (I will leave this as an exercise basically in notation.) The other direction is also easy if T is not one-to-one, say v w but T v = T w; the singleton { v w} is independent, but {T ( v w)} = {T v T w} = { W } is not. In an earlier set of notes (on definitions and examples of vector spaces) I went through the details to show that F X (the set of all functions from X to F ) is a vector space over F for any nonempty set X; the operations are defined pointwise. In case W is a vector space over F and X is nonempty, essentially the same work shows that W X is also a vector space over F, again with operations defined pointwise. (That is, (f + g)(x) = f(x) + W g(x) and (αf)(x) = α F,W f(x).) Now consider the case where X = V is also a vector space over F. Consider the subset of W V comprising the linear transformations from V to W ; we will call this set Hom F (V, W ), although L(V, W ) is also used. (The subscript may be omitted if the field is understood.) I claim that Hom F (V, W ) is a subspace of W V. That is, if T, T and T 2 are linear transformations from V to W and α is a scalar, then T + T 2 and αt are also linear. It is clear that the zero tranformation is in Hom F (V, W ) and acts as there. For the first, for any v, v 2 V, (T +T 2 )( v + v 2 ) = T ( v + v 2 )+T 2 ( v + v 2 ) by the definition of T + T 2. Because both T and T 2 preserve addition, this is (T v +T v 2 )+(T 2 v +T 2 v 2 ); using commutativity and associativity of addition in W, we see that this is (T v + T 2 v ) + (T v 2 + T 2 v 2 ), and by the definition of T + T 2 again, this equals (T + T 2 ) v + (T + T 2 ) v 2 ; hence T + T 2 preserves addition. Now for any v V and scalar α, (T + T 2 )(α v) = T (α v) + T 2 (α v) by the definition of T + T 2. As T and T 2 preserve scalar multiplication, this is (αt v)+(αt 2 v); by one of the distributive laws in W, this becomes α(t v+t 2 v) and by the definition of T + T 2 again, we see this is α[(t + T 2 ) v], so T + T 2 preserves scalar multiplication. For αt, consider (αt )( v + v 2 ) for v, v 2 V. By definition of αt, this is α(t ( v + v 2 )), which equals α(t v + T v 2 ) as T preserves addition. By the same distributive law as above, this is α(t v ) + α(t v 2 ) and by definition of αt, this is (αt ) v + (αt ) v 2, so αt preserves vector addition. Finally, if v V and β F, then (αt )(β v) = α(t (β v)) by definition of αt ; this equals α(βt v) as T preserves scalar multiplication, which equals (αβ)t v by the associative law for scalar multiplication in W. By commutativity of multiplication in F, this is (βα)t v; by the associative law again, this is β(αt v) and by the definition of αt, this is β((αt ) v), showing that αt preserves scalar multiplication. [I did this in gruesome detail to show exactly how the definitions come in, and to emphasize how modular this verification is. Note that we use the vector space properties of W in several places, but never those of V.] Now U, V, and W are all vector spaces over F, and T : U V and 4

T 2 : V W are any functions, it makes sense to talk about the composition T 2 T : U W defined by (T 2 T ) u = T 2 (T u) for every u U. In case T and T 2 are linear, it is routine to see that so is T 2 T ; it s like the verifications above, but even easier. I will skip the details here. We usually write T 2 T instead of T 2 T. It is clear that if T Hom F (V, W ), then for any F -spaces U and X, T U,V = U,W and W,X T = V,X. Also, I W T = T I V = T. Recall that, whenever it makes sense to compose three functions, the composition is associative. For any sets U, V, W and X, and functions f : U V, g : V W and h : W X, we have h (g f) = (h g) f just from the definition. In particular, this holds for linear transformations between F -spaces. We use the following standard abbreviations in case T Hom F (V, V ) is a linear operator on V. T = I V, T = T, T 2 = T T, T 3 = T T T and so on T n+ = T n T. With this, for any linear operator T on V and polynomial p(x) F [X] say p(x) = n j= a jx j it makes sense to define p(t ) = n j= a jt j ; it will be a linear operator on V. A word on the distributive laws; there are two, as composition of transformations is not usually commutative. (E.g., if A and B are noncommuting matrices, T A T B T B T A.) Consider the case where T : U V, and T 2, T 3 are any functions from V to W ; here W (at least) is assumed to be a vector space. Then for any u U, [(T 2 + T 3 )T ] u = (T 2 + T 3 )(T u) by definition of composition; this is T 2 (T u) + T 3 (T u) by definition of T 2 + T 3 and again by definition of composition this is (T 2 T ) u + (T 3 T ) u = [T 2 T + T 3 T ] u by the definition of sums of functions. Thus (T 2 + T 3 )T = T 2 T + T 3 T, and this verification does not use linearity of these functions at all. Now suppose that T, T 2 : U V and T 3 : V W and T 3 preserves addition. Then for any u U, [T 3 (T + T 2 )] u = T 3 [(T + T 2 ) u] = T 3 (T u + T 2 u) by the definition of composition and addition of functions. Because T 3 preserves addition, this is T 3 (T u) + T 3 (T 2 u) = (T 3 T ) u + (T 3 T 2 ) u = (T 3 T + T 3 T 2 ) u, again by the definitions of sums and compositions of functions. So we have both distributive laws in case the functions are linear, but only one of them requires any part of linearity. You are responsible for knowing these rules in the generality given above, but I will summarize our properties in case V = W. PROPOSITION 2. If V is a vector space over F, then the collection Hom F (V, V ) of all linear operators on V is a linear algebra with identity over F. The addition and scalar multiplication are defined pointwise, and the multiplication is function composition. We have:. Hom F (V, V ) is a vector space over F. 2. T (T 2 + T 3 ) = T T 2 + T T 3 for any T, T 2, T 3 in Hom F (V, V ). 3. (T + T 2 )T 3 = T T 3 + T 2 T 3 for any T, T 2, T 3 in Hom F (V, V ). 5

4. T (αt 2 ) = α(t T 2 ) for any T, T 2 in Hom F (V, V ) and any scalar α in F. 5. T (T 2 T 3 ) = (T T 2 )T 3 for any T, T 2, T 3 in Hom F (V, V ). 6. There is an identity I V such that T I V = I V T = T for every T Hom F (V, V ). This algebra will only be commutative (T T 2 = T 2 T for any T, T 2 in Hom F (V, V )) in case dim(v ). There s little to say about the proof at this point. I haven t mention part (4) before; it follows easily from the fact that T preserves scalar multiplication. It also generalizes to any case where the compositions are defined. What about the comment on (non)commutativity? If dim(v ) =, it has only the zero vector and the space of operators is also trivial. If dim(v ) =, it s not hard to see that for any T Hom F (V, V ), there is a scalar α such that T v = α v for all v V. If the dimension is at least 2, say v and v 2 are independent vectors in V. We can find linear operators T and T 2 on V such that T v = v, T v 2 =, T 2 v = and T 2 v 2 = v. Then T T 2 v 2 = and T T 2 v 2 = v, so T T 2 T 2 T. OBSERVATION. Fix a linear operator T on the F -space V. Consider the map Φ : F [X] Hom F (V, V ) where Φ(p(X)) = p(t ) for each polynomial p(x) F [X]. Then Φ is a ring homomorphism. That is, if p (X), p 2 (X) are in F [X], then Φ(p (X) + p 2 (X)) = p (T ) + p 2 (T ) and Φ(p (X)p 2 (X)) = p (T )p 2 (T ). The additive property is kind of obvious. The multiplicative property less so, since the multiplication on the right side is function composition, but it follows easily from the distributive laws. The kernel of this homomorphism will interest us considerably. It will always be nontrivial if V is finite-dimensional, as we shall see. But onto more basic matters. It is also easily seen that it ˆT is any operator that commutes with T (i.e., T ˆT = ˆT T ), then ˆT commutes with p(t ) for any polynomial p(x). In particular, p (T )p 2 (T ) = p 2 (T )p (T ) for any polynomials p and p 2. In case a function f : V W is one-to-one and onto, the inverse function : W V exists it will also be a bijection (i.e., -to- and onto). In case V and W are F -spaces, and T is an isomorphism, then I claim that T is also an isomorphism of vector spaces. We just need to check that it is linear. So suppose that w and w 2 are in W and T w j = v j for j =, 2. Then T v j = w j for j =, 2. So w + w 2 = T v + T v 2 = T ( v + v 2 ), so T w +T w 2 = v + v 2 = T ( w + w 2 ) and thus T preserves addition. Now if w W and α F, say T w = v and then T v = w, so α w = αt v = T (α v) and so T (α w) = α v = αt w. Thus T is linear, and so an isomorphism. Note that if T : U V and T 2 : V W are both isomorphisms, then f (T 2 T ) = T T 2. In case V = W, we will often write T 2 for T T, and so on. 6

DEFINITION(S) 3 (KERNEL and IMAGE). Suppose that T : V W is a linear transformation between the F -spaces V and W. The set { v V : T v = W } is called the kernel of T and denoted ker(t ). The set { w W : w = T v for some v V } is called the image (or range) of T and denoted im(t ) (or ran(t )). Note that if V = F n, W = F m, and T = T A for some m n matrix A, these are nothing new. ker(t A ) is the null space null(a) and im(t A ) = col(a), the column space of A. PROPOSITION 4. With the notation of the definition, ker(t ) is a subspace of V and im(t ) is a subspace of W. PROOF: We know that T V = W, so V ker(t ) and W im(t ). Now if v, v 2 ker(t ), then T v = T v 2 = W. So T ( v + v 2 ) = T v + T v 2 = W + W = W, showing the v + v 2 ker(t ) and so ker(t ) is closed under addition. If v ker(t ) and α F, then T (α v) = αt v = α W = W, so α v ker(t ) and thus ker(t ) is closed under scalar multiplication. For the image, say w, w 2 im(t ) and that T v j = w j. Then w + w 2 = T v + T v 2 = T ( v + v 2 ) im(t ). im(t ) is closed under addition. Also, for any w im(t ) and scalar α, α w = αt v for some (at least one) v V, so α w = T (α v) im(t ) and im(t ) is closed under scalar multiplication. That s it. Obviously, T is onto (a surjection) if and only if im(t ) = W. It is also easily seen that T is -to- (an injection) if and only if ker(t ) is trivial. For if ker(t ) is nontrivial, there is a nonzero v V such that T v = T V = W. Conversely, if T is not -to-, then T v = T v 2 where v v 2. So the nonzero vector v v 2 is in the kernel since T ( v v 2 ) = T v T v 2 = W. In a sense, the size of ker(t ) measures how far T is from being one-toone, and the size of im(t ) measures how far it is from being onto. The first isomorphism theorem for vector spaces makes this precise. (Exercise?) So, in a way, does a related result to follow shortly. But first, some examples. Some of the examples above of transformations were isomorphisms. (This includes the reflection T l, where l is a line in R 2.) They all have trivial kernel and the image is W. But the projection P roj l is neither -to- nor onto. Its kernel is the line l and its image is just l. If T (A) = tr(a) for A M n (F ), then the image is F and for n the kernel rather large it consists of all matrices the sum of whose diagonal entries is zero. Clearly V,W has kernel V and trivial image. If V = C, and D : V V is differentiation, ker(d) consists of all constant functions (so is -dimensional) and its image is all of V. This kind of thing can t happen for finite-dimensional V, as we shall see. If V = R[x] (the space of polynomial functions) and T f = x f(t)dt, then T : V V has trivial kernel, but is not onto; im(t ) = {f V : f() = }. Again, this cannot happen in the finite-dimensional case. On the other hand, if a < b and T f = b f(t)dt, then its image is R and its kernel is huge. Lots of a functions have definite integral. 7

Here s a basic fact that generalizes rank+nullity=number of columns for matrices. PROPOSITION 5. Suppose that T : V W is a linear transformation of F -spaces. Suppose that { v i : I I} is a basis for ker(t ) and { w j : j J} is a basis for im(t ). Now suppose that for each j J we choose(!) u j V such that T u j = w j. Then { v i : i I} {u j : j J} is a basis for V (with no redundancies). (We assume the bases for ker(t ) and im(t ) have no repetitions.) In particular, if V is finite-dimensional, dim(ker(t ))+dim(im(v )) = dim(v ). Proof: By no redundancies I mean of course that no u j can be a v i this is clear, since T u j = w j W = T v i and that if j k then u j u k again clear, since T u j = w j w k = T u k. Let s see that the given union is independent. If there are distinct i,..., i m I and j,..., j n J and scalars a i,..., a im, b j,..., b jn so that a i v i + + a im v im + b j u j + + b jn u jn = V, we must show that all the a i s and b j s are zero. Let v = a i v i + + a im v im and u = b j u j + + b jn v jn. So T ( v + u) = W. Since T v = W, too, we have T u = W. But T u = b j w j + + b jn w jn ; we assumed that the w j s were independent, so this forces all the b j s to be zero. So v = and this forces all the a i s to be, as well. Now for spanning. Given any v V, T v im(t ), so there are j,..., j n J and scalars d j,..., d jn such that T v = n k= d j k w jk. Let v = n k= d j k u jk. T ( v v ) = T v T v = W, so v v ker(t ). Thus v v is a linear combo of the v i s; thus v is a linear combo of the v i s together with the u j s. This does it. I trust the finite-dimensional version is now clear. Once one make sense of the sum of infinite numbers, it generalizes using this proposition (including its use of the axiom of choice). Note that it follows that if V and W have the same finite dimension, then T is one-to-one if and only if it is onto. A linear operator on the space V is called nonsingular is it -to-; in case V is finite-dimensional, this is equivalent to T being invertible. Now we turn explicitly to the finite-dimensional case. Suppose that V and W are finite-dimensional vector spaces over F, and that T : V W is a linear transformation. Suppose that B = ( v,..., v n ) is an ordered basis for V and that C = ( w,..., w m ) is an ordered basis for W. DEFINITION 6 (MATRIX OF A TRANSFORMATION WITH RESPECT TO GIVEN ORDERED BASES). With the notation of the last paragraph, we define the matrix of T with respect to the ordered bases B and C as follows. It will be an m n matrix, and its jth column (for j =,..., n) will be [T v j ] C. That is, we evaluate the jth vector in the basis B and express it in terms of the basis C. We use the notation C [T ] B for this matrix. (Others use different notation.) In the important special case where V = W and B = C, we will just write [T ] B instead of B [T ] B except occasionally for emphasis. A trivial example is when T = V,W. Whatever ordered bases B and C we 8

choose, we get C [T ] B = m,n. Also, if V = W and T = αi V, then for any B, [T ] B = αi n. Another simple observation is that if T = T A for an m n matrix A, V = F n, W = F m and we choose B and C as the standard bases for each of V and W, then C [T A ] B is just aw, you guessed A itself. (If we vary the bases, it probably won t be, though.) Consider the projection P roj l and reflection T l (where l is a line through the origin in the real space R 2 ). Instead of choosing the standard basis here, let s suppose B = C is a basis consisting of a vector v on l and a vector v 2 on l. Then P roj l v = v = v + v 2 and P roj l v 2 =. Also T l v = ( v and T) l v 2 = v 2. We get particularly simple matrices this way; [P roj l ] B = and [T l ] B =. If we want the matrices with respect to the standard basis ( e, e 2 ), it still helps to refer to this nonstandard basis; well, it s nonstandard unless l is the x-axis. ( Let s ) do this in the specific case where l is defined by y = 2 3x. Let 3 v = on l. (I m tempted to divide this by 3 for some reason, but 2 2 let s forgo that for now.) Let v 2 = on l 3. Now e = 3 3 v + 2 3 v 2 9 6 and e 2 = 2 3 v + 3 3 v 2. So P roj l e = 3 3 v = 3, P roj 6 l e 2 = 3. 4 5 2 T l e = 3 3 v 2 3 v 2 = 3 and T 2 l e 2 = 2 3 v 3 3 v 2 = 3. If 5 9 6 5 2 S = ( e, e 2 ), then [P roj l ] S = 3 and [T 6 4 l ] S = 3. Not 2 5 only are these matrices less pretty, but to actually find them it seemed natural to go through the nonstandard basis. One recurring theme for us will be, given a linear operator on a finite-dimensional vector space, to choose a basis for a vector space so that its matrix with respect to that basis comes out nice, and this will give us considerable information about the operator. Suppose now that V = W = P 3 (X), the space of polynomials over the reals with degree 3. Let B = (, X, X 2, X 3 ) be the standard ordered basis for V and D : V V be given by differentiation. As D =, DX =, DX 2 = 2X and DX 3 = 3X 2, we have [D] B = 2 3. If we instead chose the basis B = (, X, 2 X2, 6 X3 ), we d get the slightly nicer matrix [D] B =, which is in Jordan canonical form coming attractions. 9

We could also regard D as a transformation from V to W = P 2 (X) and let B be as above, C = (, X, X 2 ); then C [D] B would look like [D] B except it would miss that last row of zeroes. What this matrix C [T ] B is good for is that it reduces calculations about T to matrix multiplication. More specifically, if v V, then C[T ] B [ v] B = [T v] C. To see this, suppose that V is k-dimensional, B = ( v,..., v k ), [ v] B = and the columns of C [T ] B are, in order, C,..., C k. Then [T v] C = [T (a v + + a k v k )] C = [a T v + + a k T v k ] C = a [T v ] C + +a k [T v k ] C = a C + +a k C k, which is just the matrix product C[T ] B [ v] B by the column-by-column rule. For instance, (( say T ) = T ( l for the )) line l defined ( by y ) = 2 3 x, V = W = R 2, B = 3 2 23 C = ( v, v 2 ) =, and v = = 5 v 2 3 2 4 v 2. C[T ] B (or 5 5 just [T ] B ) is and [ v] B =. [T v] 4 C = C [T ] B [ v] B =, so 4 7 T v = 5 v +4 v 2 =. If we use B 22 = C, the standard basis ( e, e 2 ) instead, 5 2 5 2 23 then C [T ] B = 3 and 2 5 C [T ] B [ v] B = 3 = 2 5 2 7, which is reassuring. 22 Here s a second example again for simplicity we have V = W and B = C but in this case V = P 3 (X) and B = (, X, X 2, X 3 ) and T = D. We know that [D] B = 2 3. For any polynomial p(x) = a + a X + a 2 X 2 + a 3 X 3 in V, we easily see that [p] B = a a a 2 a 3 and [D] B[p] B = a 2a 2 3a 3 a.. a k. This corresponds to the easily verified fact that Dp(X) = a +2a 2 X +3a 3 X 2. (Again, reassuring.) One use of the matrix with respect to a basis is to do calculations involving kernels and images. I ll illustrate; suppose that V is the vector space M n (F ) of all n n matrices over F, and A is any particular n n matrix. Let T : V V be defined by T X = AX XA for any X V. Let s do the (easy) verification that any such T is linear. First, T (X + X 2 ) = A(X + X 2 ) (X + X 2 )A =

(AX + AX 2 ) (X A + X 2 A) = (AX X A) + (AX 2 X 2 A) = T X + T X 2 for any X, X 2 V. Next, T (αx) = A(αX) (αx)a = α(ax) α(xa) = α(ax XA) = αt X for any X V and scalar α. (We have of course used several basic properties of matrix algebra here.) Now let s get specific. Say F = R, n = 2 and A =. It s not too hard to compute a basis for each of ker(t ) and im(t ) directly, (( but ) let s ( do it ) us-ing [T ] B for some B. Let B = (E,, E,2, E 2,, E 2,2 ) =,, be the standard ordered basis for V. Before actually finding [T ] B, notice that it is not itself ( a 2 2 matrix; ) it s 4 4 and I really hope you know why. T E, = = E, E,2 + E 2, + E 2,2 so the first column of [T ] B is. And so on. T E,2 = E, + E 2,2, T E 2, = E, E 2,2, and T E 2,2 = E,2 E 2,. [T ] B = row-reduces very easily to, and for col([t ] B ) is. This, and a basis for null([t ] B) is,. These col- umn vectors are not in V, but they correspond to elements of V, and from {( them we can ) ( easily read )} off bases {( for ker(t ) )( and im(t )} ). These bases are,, and,, respectively. (I hope this is clear; I ve had students forget to make this re-translation step.) OBSERVATION. Assume all the data is fixed; (F, V, W, C, D). Then the map T C [T ] D is an isomorphism between Hom F (V, W ) and M m,n (F ). In particular, C [T + T ] B = C [T ] B + C [T 2 ] B and C [αt ] B = α C [T ] B for any T, T 2, T Hom F (V, W ) and α F. At this stage, what I am going to say next is also more of a notational observation than anything else, but it is significant enough to be called a proposition and to earn its rather pompous name. PROPOSITION 7 (THE REASON MATRIX MULTIPLICATION IS DE- ) (, ))

FINED THE WAY IT IS). Suppose that U, V and W are finite-dimensional vector spaces over F, B is an ordered for U, C is an ordered basis for V, and D is an ordered basis for W. Suppose also that T : U V and T 2 : V W are linear transformations. Then D [T 2 T ] B = D [T 2 ] C C [T ] B. Proof: Say B = ( u,..., u k ), C = ( v,..., v l ) and D = ( w,..., w m ). I trust it is clear that both D [T 2 T ] B and D [T 2 ] C C [T ] B are m k matrices. We check that their jth columns are the same for each j. Suppose that the jth column of C [T ] B is a. a l ; this means that T u j = a v + + a l v l. Suppose that the columns of D [T 2 ] C are C,..., C m ; for each i m, this means that C i = [T 2 v i ] D. Now the jth column of the product of D [T 2 ] C and C[T ] B is a C + + a l C l. The jth column of D [T 2 T ] B is [(T 2 T ) u j ] D. This is equal to [T 2 (T u j )] D = [T 2 (a v + + a ell v l )] D = [a T 2 v + + a l T 2 v l ] D = a [T 2 v ] D + + a l [T 2 v l ] D = a C + + a l C l. This completes the proof. As a simple illustration, let U = V = W = P 3 (X), B = C = D = (, X, X 2, X 3 ) and T = T 2 (differentiation, but I don t want to call it D in this context). D[T 2 ] C C [T ] B = [T ] 2 B = 2 3 2 = 2 6 This reflects the simple fact that T 2 (a + a X + a 2 X 2 + a 3 X 3 ) = 2a 2 + 6a 3 X. (T 2 p(x) is of course the second derivative p (X).) Another example is provided by T = T l for the line l above; it is obvious that [T ] 2 B = I and an easy calculation show that [T ] 2 B = I, too (no accident). This, um, reflects the fact that if we reflect across the same line twice, we get back to where we started. Note that if T : V W is an isomorphism of finite-dimensional vector spaces, then it preserves dimension, so for any ordered basis B of V and ordered basis C of W, C [T ] B is a square matrix. So is B [T ] C, and it should be no surprise that the product of these is the identity. ( C [T ] B B [T ] C = C [T T ] C = [I] C = I and similarly in the other direction.) Another very instructive illustration of Proposition 7 is this: Suppose that U = F k, V = F l, W = F m and X = F n, and we give each of these vector spaces its standard basis. Suppose that A is a k l matrix, A 2 is an l m matrix, and A 3 is an m n matrix, all with entries from F. Let T j be T Aj for j =, 2, 3. We leave off the subscripts on [T j ] because the basis is standard in all cases. We have. A 3 (A 2 A ) = [T 3 ]([T 2 ][T ]) = [T 3 ]([T 2 T ]) = [T 3 (T 2 T )] = [(T 3 T 2 )T ] = 2

([T 3 T 2 ])[T ] = ([T 3 ][T 2 ])[T ] = (A 3 A 2 )A. I wish to emphasize that this last paragraph, while it illustrates Proposition 7, is not an example or illustration of the fact that matrix multiplication is associative. It is a proof that matrix multiplication is associative. It is almost entirely conceptual; the only serious calculational thing it uses is the much easier column-by-column fact. I ve implicitly (in fact, explicitly in some of the examples) raised the issue of what happens to the matrix of a transformation in case we change the basis on one side or the other or both. Let s deal with this systematically. In the following, T : V W is assumed to be a linear transformation between the finite-dimensional vector spaces V and W (over F ). Also we assume that B and B are ordered bases for V, and C and C are ordered bases for W. Say W is m-dimensional and V is n-dimensional. PROPOSITION 8. With the notation we just set up, C [T ] B = C P C C [T ] B B P B. In particular, if V = W, B = C, B = C and P = B P B, then [T ] B = P [T ] B P. a.. Proof: Say a = is any vector is F n ; if B = ( v,..., v n), let a n v = a v + + a n v n; so the given vector a is [ v] B. Then B P B a = [ v] B. C[T ] B [ v] B = [T v] C and C P C [T v] C = [T v] C. But C [T ] B a = [T v] C, too. That is, multiplying any vector in F n by C [T ] B gives the same result as multiplying it by C P C C [T ] B B P B these two matrices must needs be the same, therefore. The special case of a linear operator will most concern us in this course. We ve done the calculations for the examples of P roj l and T l for our line defined by y = 2 3x. (Incidentally, this line has no particular significance; it s just that it s (( nice) to( have a)) specific example to flog to death.) With ( B = C ) = ( v, v 2 ) = 3 2 3 2, and B 2 3 = ( e, e 2 ), we have B P B = and 2 3 3 2 P = B P B is its inverse 3. A straight matrix calculation shows 2 3 5 2 that, if A = = [T l ] B, then P AP = 3 = [T ] 2 5 B. If 9 6 Â = [P roj l ] B =, P ÂP = 3 = [P roj 6 4 l ] B, as advertized. If V = W = P 3 (X), B = C = (, X, X 2, X 3 ) and B = C = (, X, 2 X2, 6 X3 ), 3

then P = B P B =. 2 6, [T ] B = 2 3 and P [T ] B P = One reason for considering a nonstandard basis for a finite-dimensional vector space is that the matrix of a particular linear operator on the space may come out nicer, and easier to calculate with, if it is expressed in terms of the nonstandard basis. Ideally, we hope that an operator T on V may be diagonalizable, which means that that there is some ordered basis B for V such that [T ] B = A is a diagonal matrix. (That is, a j,k = for any j k.) As seen above, if l is line in R 2 and P roj l is the orthogonal projection to l, and T l the reflection across l, then both these transformations are diagonalizable. ( s are allowed on the diagonal they are just required off the diagonal.) Maybe it s not obvious just yet, but the transformation D : P 3 (X) P 3 (X) is not diagonalizable. In a sense, the best we can do is the Jordan canonical form mentioned above. But this is more advanced material, and we ll get to it later. 4