Dot Products. K. Behrend. April 3, Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Dot Products K. Behrend April 3, 008 Abstract A short review of some basic facts on the dot product. Projections. The spectral theorem.

Contents The dot product 3. Length of a vector........................ 4 A few rules............................ 5 Unit vectors............................ 6. Orthogonality........................... 7.3 Pythagorean theorem: vector version.............. 8 Orthogonal matrices 9. Orthogonal bases, orthonormal bases.............. 9. Orthogonal matrices....................... 0 3 Projections 3. Orthogonal projections...................... General case............................ 4 One remark on the length of the projection.......... 4 3. The matrix of a projection.................... 5 4 Cauchy-Schwarz 6 4. The Cauchy-Schwarz inequality................. 6 4. The angle between two vectors................. 8 4.3 The triangle inequality...................... 5 The spectral theorem 5. Diagonalization of symmetric matrices: the spectral theorem The case of simple eigenvalues.................. The general case......................... 3 5. Exercises............................. 7

The dot product If we have two vectors in R n v = v. v n w = their dot product (or scalar product, or inner product is defined to be the scalar v w = v w + v w +... + v n w n R w. w n For example = ( + ( + 3 5 = 4 + 5 = 9 3 5 or ( ( 3 = 3 + ( 3 = 3 3 = 0 3 Notice that we can think of the dot product as a matrix product: Here v T is the transpose of the column vector v, which is the same vector, but now it s a row vector: Therefore, all rules that apply to matrix multiplication also apply to dot products. In addition, the dot product is commutative: v w = w v 3

. Length of a vector The length (or norm of the vector v R n is defined to be the real number v = v v ( For example, the length of ( 44 is 4 = 4 4 = + 4 + 4 = 4 + 6 + 6 = 36 = 6 4 4 4 Figure : Apply the Pythagorean theorem twice to obtain the length of a vector in R 3. Referring to Figure, to calculate the length of the vector ( a bc, we apply the Pythagorean theorem to the shaded triangle in the xy-plane (the blue 4

one, whose hypotenuse has length a + b. Then the vector itself is the hypotenuse of the upright shaded triangle (the red one, and has length a + b + c = ( ab ( abc a + b + c = So we have used the Pythagorean theorem of plane geometry to justify the definition (. In R 3, Formula ( gives the correct answer, namely the answer we get from our knowledge of basic geometry. But in R n, we consider ( as a definition. The number that the formula v v yields is called the length of the vector v. If we take the dot product of a vector with itself, we get the length squared This is just the square of Equation (. A few rules v v = v ( How does the length change if we multiply a vector by a scalar? ( 3 = ( + ( ( 3 + ( = ( ( + ( 3 + c The general formula is = + ( 3 + = 3 = 3 α v = α v The second rule is v = 0 if and only if v = 0 5

The only way that the length of ( a b can be zero is if a + b = 0, which means that both a and b have to be zero, which means that the vector ( a b has to be zero. Unit vectors A vector of length is called a unit vector. For example, the vector ( 3/5 4/5 is a unit vector, because ( 3/5 3 = 4/5 5 + 4 9 + 6 5 = = = 5 No need to take square roots! To check that v is a unit vector, it s enough to check that v v = ( ( 3/5 3/5 = 3 3 4/5 4/5 5 5 + 4 4 5 5 = 9 + 6 = 5 To produce a unit vector pointing in the same direction as v, rescale v by its length: if v 0, the normalization of v is u = v v The vector u obtained in this way is always a unit vector. Let s check that: u u = v v v v = v v v = v v = so that, indeed, u is a unit vector. For example, let us normalize the vector (. Calculate its length: so the vector = + 4 + = 6 6 = 6 6 is the normalization of (, the vector obtained by normalizing (. 6

. Orthogonality Two vectors v, w are called orthogonal (or perpendicular, notation v w, if v w = 0. v w if v w = 0 (3 For example, ( ( 3 and 3 are orthogonal, because ( ( 3 = 3 + 3 = 0 3 or 0 3 5 5 7 because 0 3 5 5 7 = 0 5 + 35 = 0 Consider two vectors in R : The vectors ( ( ( a b and b a are perpendicular (the angle that ab forms with the x-axis is the same as the angle that ( b a forms with the y-axis, and since the two coordinate axes are perpendicular, the two vectors have to be perpendicular, too. When we calculate their dot product, we get ( ( a b = a( b + ba = 0 b a so our definition (3 of orthogonality in terms of the dot product, agrees with basic geometry in the plane. In higher dimensions, where our geometric intuition fails, we consider (3 a definition: we call two vectors orthogonal if their dot product vanishes. 7

.3 Pythagorean theorem: vector version Suppose v and w are orthogonal vectors. Then we have v + w = ( v + w ( v + w by Eqn ( for v + w = v v + v w + w v + w w distributive law for dot product = v v + 0 + 0 + w w because v w = v + w again by ( This is the vector form of the theorem of Pythagoras: if v w then v + w = v + w (4 Why do we call this the theorem of Pythagoras? Consider this sketch: The three vectors are v, w and v + w. The shaded triangle is a right triangle. The side lengths of the right triangle are also indicated. Thus, Formula (4, is the Pythagorean theorem for the shaded triangle. We can visualize Formula (4 using this picture, but Formula (4 is really just a property of the dot product of column vectors in R n. (Note that the above sketch does not assume that v and w are in R. They can be in R n. But they will span a two dimensional subspace E = span( v, w inside R n. This plane E is what is displayed in the sketch. The length squared of any vector is greater than or equal to zero: w 0. Thus, Formula (4 has the easy consequence: if v w then v v + w if we take the square root of this, we get if v w then v v + w (5 8

We will need this formula to deduce the Cauchy-Schwarz inequality later on. Orthogonal matrices. Orthogonal bases, orthonormal bases If two vectors are perpendicular, they cannot point in the same direction, so they are linearly independent. This is true for any number of vectors. If v,..., v k are vectors in R n, and every one of these vectors is perpendicular to all the others, then we call v..., v i an orthogonal set of vectors. Theorem If v,..., v k is an orthogonal set of vectors, then v,..., v k are linearly independent. Here is why. Suppose we have a linear relation among v,..., v k : α v +... + α k v k = 0 (6 To show that our vectors are linearly independent, we have to show that all the α i have to be 0. The trick is simple: take the dot product of (6 with v i. This gives: α v v i +... + α i v i v i +... + α k v k v i = 0 v i Because v i v and v i v k (in fact v i is perpendicular to all the vectors on the list, on the left hand side of this equation, all the terms v v i, v k v i, etc., are zero, and the only one left is v i v i. Thus, we get α i v i v i = 0 but then, because v i is not the zero vector, v v = v 0, and so we get the desired α i = 0. For example, the vectors ( ( (, 3 and form an orthogonal set: 4 7 3 = 6 + 4 = 0 = 4 7 = 0 4 7 3 = + 6 8 = 0 4 7 Hence these three vectors are linearly independent, and so they form an orthogonal basis of R 3. If all the vectors in an orthogonal set are unit vectors, we call it an orthonormal set of vectors (because all the vectors are normalized. If there 9

are n vectors in an orthonormal set in R n, then it s a basis, an orthonormal basis. For example, ( ( 0, is an orthonormal basis of R 0 6 9, 3 74, is an orthonormal basis of R 3 6 9 74 4 7. Orthogonal matrices Say u, u, u 3 is an orthonormal basis in R 3. Let P = u u u 3 be the change of basis matrix. Calculate P T P. The rows of the transpose matrix P T contain the vectors u, u, u 3. u P T P = u u u u 3 the rows of P T are the u 3 columns of P u u u u u u 3 = u u u u u u 3 matrix multiplication! u 3 u u 3 u u 3 u 3 0 0 = 0 0 because u, u, u 3 is an orthonormal set 0 0 This is a general fact: Theorem If P is an n n matrix, then the columns of P form an orthonormal basis of R n if and only if P T P = I n Another way to say P T P = I is that P T is the inverse of P. Theorem 3 If the columns of P form an orthonormal basis of R n, then P is invertible and P = P T 0

Matrices with this property are called orthogonal matrices (not orthonormal matrices. Definition 4 If the columns of P form an orthonormal basis of R n, then P is called an orthogonal matrix. Suppose P is an orthogonal matrix, so P T P = I. Then P = P T. Since P P = I, we also get P P T = I, which we can rewrite as (P T T P T = I. Thus P T satisfies the property of Theorem, and is therefore also orthogonal. So the columns of P T are an orthonormal basis, which means that the rows of P are an orthonormal basis. We can summarize everything in the following theorem. Theorem 5 The following conditions on an n n matrix P are all equivalent to each other (they all mean the same thing (i P is orthogonal, (ii the columns of P form an orthonormal basis of R n, (iii the rows of P form an orthonormal basis of R n, (iv P T P = I n, (v P P T = I n, (vi P is non-singular, and P = P T. 3 Projections 3. Orthogonal projections Let us consider a plane E through the origin in R 3, spanned by two vectors v and w, and let us assume that the two spanning vectors are orthogonal: v w, E = span( v, w. These two vectors form a basis of E. We consider a further vector x and its orthogonal projection onto E. We want to write the projection of v onto E in terms of this basis. The projection of x into the plane E is a vector inside the plane, notation proj E ( x. If we subtract the projection from x, we get the vector x proj E ( x. The important fact is that this vector x proj E ( x is orthogonal to E: ( x proje ( x E (7

Figure : The rectangle formed by proj E ( x and x proj E ( x (which contains the reddish right triangle is perpendicular to the plane E, in blue. We know that proj E ( x is a linear combination of v and w, because it is in E: proj E ( x = a v + b w (8 To find a, we take the dot product of this equation with v: proj E ( x v = (a v + b w v this is (8 v = a v v + b w v rules for dot products = a v v + 0 because v w = a v by Equation ( Since v is part of a basis it is not the zero vector. so v is not zero, and

so we can divide: a = proj E( x v v Now we use the fact (7: since x proj E ( x is orthogonal to E, it has to be orthogonal to every vector inside E, in particular, ( x proj E ( x v, and hence ( x proj E ( x v = 0 x v proj E ( x v = 0 proj E ( x v = x v we plug this into our formula for a and get a = x v v We can use similar reasoning to find b. First take the dot product of (8 with w, to get proj E ( x w = b w solve for b: Then use (7 again: which gives us b = proj E( x w w ( x proj E ( x w = 0 proj E ( x w = x w which we plug into the formula for b to obtain b = x w w We plug the values we found for a and b back into (8 to get our final formula for proj E ( x in terms of the orthogonal basis v, w for E: proj E ( x = x v x w v + w v, w : orth. basis for E (9 v w Sometimes people find it easier to memorize if it s written like this: proj E ( x = x v x w v + v v w w w 3

General case The formula we derived is true in general: Theorem 6 If V is a subspace of R n and v,..., v k is an orthogonal basis of V, then for every vector x R n we have proj V ( x = x v v v + x v v v +... + x v k v k v k If we have an orthonormal basis, the formula simplifies: Theorem 7 If V is a subspace of R n and u,..., u k is an orthonormal basis of V, then for every vector x R n we have proj V ( x = ( x u u + ( x u u +... + ( x u k u k The second theorem looks simpler, but in practise, it does not save any calculational effort. If V = R n, then proj V x = x, and so we get: Theorem 8 If v,..., v n is an orthogonal basis of R n, then for every x R n x = x v v v + x v v v +... + x v n v n v n If u,..., u n is an orthonormal basis of V, then for every vector x R n x = ( x u u + ( x u u +... + ( x u n u n One remark on the length of the projection Looking one more time at the important fact (7, we see that proj V ( x ( x proj V ( x because, proj V ( x is in V, and x proj V ( x is perpendicular to everything in V. Now looking back at Fact (5, we deduce that proj V ( x proj V ( x + ( x proj V ( x = x (0 All this is saying is that the length of a vector is as least as big as its projection. 4

3. The matrix of a projection So far, we have looked at the projection of one vector at a time. Now, fixing the subspace V, we project all vectors. We get a linear map: T : R n R n x proj V ( x The transformation T takes a vector and maps it to its projection into V. The image of T is the subspace V. The kernel of T consists of all vectors perpendicular to all V. This is called the orthogonal complement of V, notation V. ker T = V An example: Suppose B = ( u, u, u 3, u 4 is an orthonormal basis of R 4. Let V = span( u, u be the plane spanned by the first two of the vectors in B. Let T be the orthogonal projection onto V. We have T ( u = u T ( u = u T ( u 3 = 0 T ( u 4 = 0 because u V because u V because u 3 V because u 4 V Therefore, the matrix of T in the basis B is 0 0 0 0 B0 0 0C [T ] B = @ 0 0 0 0 A 0 0 0 0 So the matrix of T in the standard basis is [T ] E = P [T ] B P Since B is orthonormal, the matrix P is orthogonal, so P = P T. Thus, [T ] E = P [T ] B P T Let s calculate:! 0 0 0 0 0 u B 0 0 0C B u [T ] E = u u u 3 u C 4 @ 0 0 0 0 A @ u 3 A 0 0 0 0 u 4 0 0 u = @ u u 0 0 A B u C @ u 3 A u 4! «u = u u u Yes, the last equation is a 4 times a 4 matrix, resulting in a 4 4 matrix. Note also, that even though we started with a full basis ( u, u, u 3, u 4 of R 4, in the end only the vectors u and u, which span the subspace V enter into this formula. 5

Let s do this calculation in a concrete example. The following is an orthonormal basis of R 4 : 0 0 0 0 u = BC @ A u = B C @ A u 3 = B C @ A u = B C @ A The matrix of the projection T onto the plane spanned by the first two of these vectors is 0 0 0 0 0 0 [T ] E = B C B0 0 0C @ A @ 0 0 0 0 A B C @ A 0 0 0 0 0 0 0 0 = B 0 0C B C @ 4 0 0 A @ A 0 0 0 = «B C @ 4 A (The best way to understand the last step is to convince yourself that when you actually calculate the two different matrix products, you end up doing the same calculations, because of all those zeros. In our example, we get 0 [T ] E = «B C @ 4 A 0 0 0 0 0 0 = B0 0 C @ 4 0 0 A = B0 0 C @ 0 0 A 0 0 0 0 In the general case, we have the following theorem: Theorem 9 If V is a subspace of R n and u,..., u k is an orthonormal basis of V, then the matrix of the orthogonal projection onto V is given by the matrix product u u k! 0 u B C @. A u k 4 Cauchy-Schwarz 4. The Cauchy-Schwarz inequality The Cauchy-Schwarz inequality derives from the fact that a vector cannot get any longer by projecting it orthogonally. Consider two vectors v and w 6

Figure 3: The vector v and its projection are in red, the vector w and the line it spans in blue and project v into the line spanned by w, as in Figure 3. Recall (0 that a vector is always as least as long as its projection: proj span w ( v v Plug in Formula (9: v w w v w or, by using the rules for manipulating lengths: which simplifies to v w w v w v w v w This is the Cauchy-Schwarz inequality. Theorem 0 (Cauchy-Schwarz Inequality For any two vectors v, w in R n, we have v w v w In words: the absolute value of the dot product is less than or equal to the product of the lengths. For example, for plane vectors v = ( ( a b and w = cd, the Cauchy- Schwarz inequality says that ( ( ( ( a c a c b d b d 7

ac + bd a + b c + d (ac + bd (a + b (c + d a c + acbd + b d a c + a d + b c + b d acbd a d + b c 0 a d adbc + b c 0 (ad bc In fact, all of these statements are equivalent. Of course (ad bc is always greater than or equal to zero. So this reasoning gives another proof of the Cauchy-Schwarz inequality in two dimensions. 4. The angle between two vectors Let v and w be two non-zero vectors in R n. Dividing the Cauchy-Schwarz inequality by v w, we get or, in other words, v v w w v v w w So (see Figure 4 there is a unique θ [0, π] such that cos θ = v v w w This θ is called the angle between the vectors v and w. Figure 4: for every x between and there is a unique angle θ [0, π], such that cos θ = x 8

w Definition For any two vectors v, w in R n, the angle between them is defined to be ( θ = arccos v v w w Notice that v v is the unit vector in the same direction as v. Similarly, w is the unit vector in the direction of w. So our definition really just says that for unit vectors v and w the angle between them is defined to be arccos( v w θ = arccos( v w if v and w are unit vectors Let us check that this makes sense for vectors in the plane: The vectors v and w are unit vectors. spanned by w is then the vector The projection of v onto the line proj span w ( v = cos θ w by some simple trigonometry. Using Theorem 6, we calculate proj span w ( v = ( v w w and comparing these two formulas for the projection, we see that cos θ w = ( v w w and from that we get, because w is not zero: cos θ = v w 9

or θ = arccos( v w So Definition agrees with trigonometry in the plane, and is therefore a reasonable definition in R n. We can tell whether and angle is acute or obtuse by looking at its cosine: > 0 then θ is acute if cos θ = 0 then θ is right < 0 then θ is obtuse so if v and w are arbitrary vectors in R n, we can say > 0 then the angle between v and w is acute if v w = 0 then the angle between v and w is right < 0 then the angle between v and w is obtuse Consider one more time the formula for the projection of v onto w proj span w ( v = If we take the length on both sides, we get which simplifies to and from this we deduce v w = proj span w ( v = v w w w v w w w proj span w ( v w = v w proj span w ( v w if the angle( v, w is acute 0 if the angle( v, w is right proj span w ( v w if the angle( v, w is obtuse This is a geometric interpretation of the meaning of the dot product: the dot product of two vectors is (up to sign the length of the projection of one onto the other, times the length of the other. The sign is determined by whether the angle between the vectors if acute or obtuse. 0

4.3 The triangle inequality Let v and w be two vectors in R n. Let us calculate v + w : v + w = ( v + w ( v + w just the definition of length = v v + v w + w v + w w dot product is distributive = v + v w + w dot product is commutative v + v w + w Cauchy-Schwarz! = ( v + w Taking the square root of this inequality we get the triangle inequality v + w v + w To see why it s called the triangle inequality, look at Figure 5 Figure 5: The vectors w, v + w and v (translated form a triangle. The length of v + w cannot be bigger than the sum of the lengths of the other two sides of the triangle 5 The spectral theorem 5. Diagonalization of symmetric matrices: the spectral theorem The most important source of orthogonal sets of vectors is from eigenvectors of symmetric matrices. Recall that a matrix A is symmetric if A = A T.

Theorem Suppose A is a symmetric matrix. Suppose further that λ and λ are two (different eigenvalues of A, and that v and v are corresponding eigenvectors. Then v v It s easy to explain why this is true. multiplied by one of the eigenvectors: λ ( v v = v T (λ v Simply compute the dot product, rewrite dot product as matrix multiplication = v T A v because A v = λ v by the eigenvalue property = v T A T v use that A T = A = (A v T v rules for transposes include (A v T = v T A T = λ v T v because A v = λ v by the eigenvalue property = λ ( v v back to dot product This equality gives rise to (λ λ ( v v = 0 and because the two eigenvalues are different, we can conclude that v v = 0, so that v v. The case of simple eigenvalues From Theorem, we can immediately deduce: Corollary 3 If A is a symmetric n n matrix, which has n distinct eigenvalues, then there exists an orthonormal basis of R n, consisting of eigenvalues for A. (We say that A is orthogonally diagonalizable. (Simply take an eigenvector for each eigenvalue, they form an orthogonal set by the theorem, then normalize them to get the orthonormal basis. For example, the matrix 0 A = 3 0 is symmetric. Its characteristic polynomial is λ det λ 3 = λ (λ 3 λ λ 4(λ 3 λ

= λ 3 3λ 6λ + 8 = (λ 4(λ (λ + So the eigenvalues are λ = 4, λ = and λ 3 =. Corresponding eigenvectors are easily found (each of the three homogeneous systems of equations has exactly one free variable. We get v = v = v 3 = 0 We normalize to get an orthonormal basis: 6 u = 3 u = u 3 = 6 3 0 The change of basis matrix P has these vectors as columns: 6 3 3 P = 6 3 0 6 6 3 3 Now we can write down the diagonalization A = P DP T of A: 0 6 3 3 3 = 6 4 0 0 6 6 6 3 0 6 0 0 3 3 3 6 0 6 3 3 0 0 3 0 3 The general case The spectral theorem says that every symmetric matrix admits an orthonormal basis of eigenvectors. Theorem 4 (Spectral Theorem Suppose A is a symmetric matrix. Then there exists an orthonormal basis of R n consisting of eigenvectors of A. So there exists an orthogonal matrix such that where D is diagonal. A = P D P T This theorem says three things: the characteristic polynomial of every symmetric matrix splits completely, all the geometric multiplicities of all the 3

eigenvalues are equal to all the algebraic multiplicities, and on top of this, that the eigenvectors can be chosen orthogonal to each other. Let us explain why the theorem is true for matrices. If A is symmetric, it looks like this: ( a b A = b c The characteristic polynomial is λ (a + cλ + ac b. Whose roots are λ = (a + c ± (a + c 4(ac b Let us examine the term under the square root (the discriminant of the quadratic (a+c 4ac+4b = a +ac+c 4ac+4b = a ac+c +4b = (a c +4b Notice that this expression is always positive, because it is a sum of two squares. Thereofore, the square root exits. When solving this particular quadratic, we never run into square roots of negative numbers. This already shows that the characteristic polynomial of A will split completely, into two linear factors. So we have two real eigenvalues. They can either be equal or distinct. If they are distinct, we can find corresponding eigenvectors which will automatically be orthogonal. We can normalize them and then we have our orthonormal basis of eigenvectors. What happens if the two eigenvalues are equal? In other words, there is only one eigenvalue, but its algebraic multiplicity is? Could it be that the geometric multiplicity is only? Let s see. The only way the two roots of the quadratic equation can be equal, is if the term underneath the square root is 0. So that means that (a c + 4b = 0 but the only way the sum of two positive numbers can be zero, is if both of them are zero. So (a c = 0 and 4b = 0, which impies that a = c and b = 0. So our matrix is ( a 0 A = 0 a which is already diagonal! The standard basis is an orthonormal basis which diagonalizes A. This justifies the spectral theorem in dimensions. 4

Let us do an example. Consider the matrix 4 A = 4 4 which it symmetric. The characteristic polynomial is λ 3 λ + 45λ 54. After some trial and error, we may find that 3 is a root of this polynomial: 7 9 + 45 3 54 = 7 08 + 35 54 = 0. Factoring out (λ 3 from the characterist polynomial we are left with the quadratic λ 9λ+8, which has the two roots λ = 3 and λ = 6. So our eigenvalues are 3, with multiplicity, and 6 with multiplicity. Let us first deal with λ = 6. The eigenspace E is the null space of the matrix which has one free variable and a solution is v = (. Now let us deal with λ = 3. The eigenspace E is the null space of the matrix which row reduces to 0 0 0 ( 0 0 0 We see that there are two free variables, and so the eigenspace is twodimensional, and so the geometric multiplicity of the eigenvalue 3 is equal to the algebraic multiplicity, as it should be, by the spectral theorem. The usual way of finding a basis for the null space of ( (setting the free variables first equal to (, 0 and then equal to (0, gives us the two eigenvectors ( 0 ( 0 These are not orthogonal to each other. We have to work a little harder to find an orthogonal basis of E. Let us the first of these as second eigenvector v : v = ( 0 5

Now to find the third eigenvector, we want a vector which satisfies two properties: first it is an eigenvector with eigenvalue 3, so it has to be in the null space of (, second, it has to be perpendicular to v = ( 0, which means the dot procuct with ( 0 has to be zero, which we can force by adding this condition as a second row to (: ( 0 This matrix now reduces to which has the vector ( 0 3/ 0 / v 3 = ( as solution. Now the vectors v = ( 0 v 3 = ( form an orthogonal basis of the eigenspace E. Both of these are automatically orthogonal to v. All together, we now have an othogonal eigenbasis: v = ( with corresponding eigenvalues v = ( 0 v 3 = ( λ = 6 λ = 3 λ 3 = 3 We normalize these eigenvectors to form an othonormal basis: u = 3 3 u = u 3 = 6 6 0 So the change of basis matrix is 3 3 6 P = 6 3 3 6 3 0 6 6

and the orthogonal diagonalization of A is 4 3 3 6 4 = 6 3 3 6 0 0 6 4 3 0 0 3 0 6 0 0 3 6 3 3 3 3 3 0 6 6 6 By this method we can can orthogonally diagonalize any symmetric matrix, (assuming we can manage to find the roots of the characteristic polynomial. For multiple eigenvalues we solve for one eigenvector at a time, each time adding the eigenvectors already found as equations, so as to assure that the subsequent eigenvectors are perpendicular to the ones already found. 5. Exercises Exercise 5. Find an orthonormal basis of eigenvectors for the matrix A = 9 9 9 Write down the corresponding diagonalization of A. (Hint: the eigenvalues are 8 and. 7