Covariance and Dot Product - PDF Free Download

Covariance and Dot Product 1 Introduction. As you learned in Calculus III and Linear Algebra, the dot product of two vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) in R n is the number n := x i y i, and as you learned in Probability, the covariance of two random variables X and Y 1 is the number i=1 := E [(X µ X )(Y µ Y )]. These two quantities appear to have nothing in common, beyond the fact that each is a function that accepts two inputs of the same type and returns a numerical value. Appearances can be deceptive, though: the dot product and the covariance are actually twins. In this handout, I will show why this is so. 2 Dot Products and Covariance: Elementary Properties. The table below lists each elementary property 2 of dot product beside the corresponding elementary property of covariance. As you can see, except for a slight discrepancy between L2 and R2, the properties in each row correspond perfectly. Dot Product Covariance L1. 0 R1. 0 L2. = 0 x = 0 R2. = 0 X is constant L3. = y x R3. = Cov(Y, X) L4. (α x) y = α() = x (α y) R4. Cov(αX, Y ) = α = Cov(x, αy ) L5. x ( y 1 + y 2 ) = ( 1 ) + ( 2 ) R5. Cov(X, Y 1 +Y 2 ) = Cov(X, Y 1 ) + Cov(X, Y 2 ) 1 X and Y need to be defined on the same sample space. We will assume throughout that this is the case. 2 These properties are deemed elementary because all of the other properties can be derived from them. 1

3 Dot Products and Covariance: Some Derived Properties. As I mentioned in the introduction, there many properties follow from L1 L5 or respectively from R1 R5. In this section, I will discuss several important examples. 3.1 The Pythagorean Relation. I will begin with a pair of parallel definitions: the length of a vector and the standard deviation of a random variable. 3 Dot Product Covariance x := σ X := L6. x + y 2 = x 2 + 2() + y 2 R6. σ 2 X+Y + 2 + σ 2 Y L7. = 0 x + y 2 = x 2 + y 2 R7. = 0 σ 2 X+Y + σ 2 Y Formulas L6/R6 can be derived by routine parallel calculations from L1 L5 and R1 R5 respectively, and properties L7/R7 follow immediately from formulas L6/R6 respectively. When both sides of the in L7 (respectively R7) are true, then each of these formulas can be intrepreted as the Pythagorean Theorem for an appropriately labeled right triangle. Exercise 1 (a): Derive Formula L6 from properties L1 L5 and the definition of x. (b): Use L6 to prove L7. Exercise 2 (a): Derive Formula R6 from properties R1 R5 and the definition of σ X. (b): Use R6 to prove R7. 3.2 The Cauchy-Schwarz Inequality. The most important property of the dot product is the formula (for nonzero vectors) (1) = ( x )( y ) cos(θ), 3 It is important to note that we are defining x from the dot product of x with itself not from the coordinates of x). Similarly, we are using covariance to define σ X.

where 0 θ π is the non-reflex angle between x and y. 4 There is an immediate consequence that follows from (1). If you take the absolute value on both sides of (1) and use the fact that cos(θ) 1, you arrive very quickly at the Cauchy-Schwarz Inequality: 5 (2) ( x )( y ). At first glance, it does not seem as though either (1) or (2) could possibly correspond to a property of covariance. Consider (1) first. The the analogue of this equation for covariance would seem to be (3) = σ X σ Y cos(φ), for some angle φ. This equation seems utterly meaningless: what angle could φ possibly represent? Now, consider (2). In this case, the analoguous inequality for covariance would seem to be (4) σ X σ Y. In contrast to equation (3), inequality (4) is definitely not meaningless; however, it is not clear (yet) whether it is true. At this point, we certainly do not have a proof of it: without an angle φ that makes (3) true, we cannot use cos(φ) 1 to deduce (4) from (3). Remarkably, as it turns out, there is a different way to prove (1) and (2). The trick, as you will see (Theorem 3.1 below), is to reverse the logical order, by first proving (2) without using cos(θ) and then by using (2) to prove (1). It also turns out that this alternate approach does have a covariance parallel; the proof or (2) corresponds to a proof of (4), and the proof of (1) from (2) indicates what angle φ will make (3) true. The cosine-free proof of (2) is based upon the properties of projections. Recall that if x is a nonzero vector and y is any vector, then the projection p = proj x ( y) of y onto x is the shadow that y casts upon the line l containing x, when the light rays are perpendicular to l. 6 Recall also that one can calculate p with the dot-product-based formula 7 (5) p = x. Since (5) is dot-product based, it gives rise to an analogous covariance-based entity, namely the random variable ( ) (6) P := X, where X is a nonconstant random variable and Y is any random variable. Before discussing the details of Theorem 3.1, I will extend the table so as to include definitions (5) and (6) and equations (1), (2), (3), and (4); this should help you keep the larger picture in focus. 8 4 θ is contained in the (usually) unique plane P containing x and y. 5 The importance of this inequality, which cannot be made clear in a Calculus III course, will emerge in the course of this discussion. 6 The light rays are in the same plane P mentioned above. 7 Formula (5) should actually be viewed as the definition of proj x ( y). 8 You may find the complete table on the last page of this handout.

(5). p := x (6). P := ( ) X L8. p ( y p) = 0 R8. Cov(P, Y P ) = 0 L9. ( x )( y ) R9. σ X σ Y L10. = ( x )( y ) cos(θ) R10. = σ X σ Y cos(φ) The proofs of R8 and R9, which parallel the proofs of of L8 and L9, will be left as exercises. Property R10, which is still meaningless at this point, will be discussed in section 3.3. Theorem 3.1 L8: p ( y p) = 0, where p is the vector proj x ( y). L9: ( x )( y ). L10: = ( x )( y ) cos(θ). Proof of L8. The proof will be easier to follow if I first put α := Equation (5) in the simpler form (7) The proof is the following calculation. p = α x. p ( y p) = (α x) ( y α x) (L4) = α[ x ( y α x)] (L5) = α[ x (α x)] (L4) = α[ α()] (definition of α) = α[ ()] (cancel) = α[ ] = 0. Proof of L9. The first step is to apply the = direction of L7 to L9: : this allows me to express (8) p ( y p) = 0 = p 2 + y p 2 = y 2. Since y p 2 0, (8) leads immediately to the inequality (9) or, equivalently, p 2 y 2 p p y 2.

Then, replacing p with α x and calculating gives (10) (α x) (α x) y 2 α 2 x 2 y 2 α x y ( ) x 2 x y x y ( x )( y ). Proof of L10. From L9, it follows that ( x )( y ) = ( x )( y ) 1, so that there is an angle ˆθ such that We also know from equation (1) 9 that cos(ˆθ) = ( x )( y ). Hence, cos(ˆθ) = cos(θ), so that ˆθ = θ. cos(θ) = ( x )( y ). As mentioned above, one can prove R8 and LR by same arguments used to prove L8 and L9. I will therefore leave the proof of Theorem 3.2 as an exercise. Theorem 3.2 R8: Cov(P, Y P ) = 0, where P is the random variable defined in (6). R9: σ X σ Y. Exercise 3 Prove both parts of Theorem 3.2. Exercise 4 One of the Calculus III handouts uses L9 to prove the triangle inequality By a parallel argument, use R9 to prove x + y x + y. σ X+Y σ X + σ Y. 9 We know this equation is correct by the Law-of-Cosines proof from Calc III/Linear Algebra.

Exercise 5 (a): Show that if you get equality in L9 that is, if = ( x )( y ) then y = c x for a certain scalar c. (Identify c.) (Hint: If you have equality in the last line of the proof of L9 [inequality (10)], then (working upwards), you also have equality in (9), so you can replace p 2 with y 2 in equation (8). Do so, and proceed from there...) (b): Show that if you get equality in R9 that is, if = σ X σ Y then Y = mx +b for certain scalars m and b. (Identify m.) 3.3 Property R10 and the Correlation Coefficient. If we divide R9 through by σ X σ Y, we get (11) σ X σ Y there is therefore an angle φ such that (12) Multiplying (12) through by σ X σ Y then gives σ X σ Y = σ X σ Y 1; = cos(φ). = σ X σ Y cos(φ), so that R10 holds true for this angle φ. This suggests that we define the angle between X and Y to be this angle φ. 10 As you are aware, the correlation coefficient ρ(x, Y ) of two nonconstant random variables X and Y is defined by the formula (13) ρ(x, Y ) :=. σ X σ Y Compare equations (13) and (12): ρ(x, Y ) is the cos(φ) of equation (12). Now compare equation (13) and inequality (11): inequality (11) is actually the well-known theorem ρ(x, Y ) 1. As you are also aware, different values of ρ(x, Y ) imply different linear relationships between X and Y : 11 ρ(x, Y ) is close to 1 Y is almost an increasing linear function of X ρ(x, Y ) is close to ( 1) Y is almost a decreasing linear function of X ρ(x, Y ) is close to 0 there is almost no linear relationship between X and Y If you vizualize ρ(x, Y ) as the cosine of the angle between X and Y, you will form a complementary mental image picture: ρ(x, Y ) is close to 1 X and Y go in almost the same direction ρ(x, Y ) is close to ( 1) X and Y go in almost the opposite directions ρ(x, Y ) is close to 0 X and Y are almost perpendicular 10 Observe that the collection of random variables over a fixed sample space constitutes a vector space. You can visualize X, Y and φ against this backdrop. 11 Compare Exercise 5(b).

4 The Entire Table. Dot Product Covariance L1. 0 R1. 0 L2. = 0 x = 0 R2. = 0 X is constant L3. = y x R3. = Cov(Y, X) L4. (α x) y = α() = x (α y) R4. Cov(αX, Y ) = α = Cov(x, αy ) L5. x ( y 1 + y 2 ) = ( 1 ) + ( 2 ) R5. Cov(X, Y 1 +Y 2 ) = Cov(X, Y 1 ) + Cov(X, Y 2 ) x := σ X := L6. x + y 2 = x 2 + 2() + y 2 R6. σ 2 X+Y + 2 + σ 2 Y L7. = 0 x + y 2 = x 2 + y 2 R7. = 0 σ 2 X+Y + σ 2 Y (5). p := x (6). P := ( ) X L8. p ( y p) = 0 R8. Cov(P, Y P ) = 0 L9. ( x )( y ) R9. σ X σ Y L10. = ( x )( y ) cos(θ) R10. = σ X σ Y cos(φ)