Yi Lu Correlation and Covariance Yi Lu ECE 313 2/51
Definition Let X and Y be random variables with finite second moments. the correlation: E[XY ] Yi Lu ECE 313 3/51 Definition Let X and Y be random variables with finite second moments. the correlation: E[XY ] the covariance: Cov(X, Y )=E[(X E[X])(Y E[Y ])] Yi Lu ECE 313 4/51
Definition Let X and Y be random variables with finite second moments. the correlation: E[XY ] the covariance: Cov(X, Y )=E[(X E[X])(Y E[Y ])] Cov(X, Y ) the correlation coefficient: ρ X,Y = = Cov(X, Y ). Var(X)Var(Y ) σ X σ Y Yi Lu ECE 313 5/51 Covariance generalizes variance Var(X) =Cov(X, X). Yi Lu ECE 313 6/51
Covariance generalizes variance Var(X) =Cov(X, X). Shortcut for computing variance: Var(X) =E[X(X E[X])] = E[X 2 ] E[X] 2. Yi Lu ECE 313 7/51 Covariance generalizes variance Var(X) =Cov(X, X). Shortcut for computing variance: Var(X) =E[X(X E[X])] = E[X 2 ] E[X] 2. Similar shortcuts exist for computing covariances: Cov(X, Y ) = E[X(Y E[Y ])] = E[(X E[X])Y ] = E[XY ] E[X]E[Y ]. In particular, if either X or Y has mean zero, then E[XY ]=Cov(X, Y ). Yi Lu ECE 313 8/51
Correlated If Cov(X, Y )=0,Xand Y are uncorrelated. If Cov(X, Y ) > 0, Xand Y are positively correlated. If Cov(X, Y ) < 0, Xand Y are negatively correlated. ρ X,Y is a scaled version of Cov(X, Y ). Yi Lu ECE 313 9/51 Uncorrelated vs. Independent If Cov(X, Y )=0,Xand Y are uncorrelated. Cov(X, Y )=0is equivalent to E[XY ]=E[X]E[Y ]. Yi Lu ECE 313 10/51
Uncorrelated vs. Independent If Cov(X, Y )=0,Xand Y are uncorrelated. Cov(X, Y )=0is equivalent to E[XY ]=E[X]E[Y ]. Does independence of X and Y imply E[XY ]=E[X]E[Y ]? Yi Lu ECE 313 11/51 Uncorrelated vs. Independent If Cov(X, Y )=0,Xand Y are uncorrelated. Cov(X, Y )=0is equivalent to E[XY ]=E[X]E[Y]. Does independence of X and Y imply E[XY ]=E[X]E[Y]? Yes, independence uncorrelated. Yi Lu ECE 313 12/51
Uncorrelated vs. Independent Does uncorrelated independence? Yi Lu ECE 313 13/51 Uncorrelated vs. Independent Does uncorrelated independence? No, independence is a much stronger condition. Independence requires a larger number of equations to hold, namely F XY (u, v) =F X (u)f Y (v) for every real value of u and v. Uncorrelated only requires a single equation to hold. Yi Lu ECE 313 14/51
Uncorrelated vs. Independent Correlation is a pairwise property. A set of random variables being uncorrelated is the same as being pairwise uncorrelated. Yi Lu ECE 313 15/51 Uncorrelated vs. Independent Correlation is a pairwise property. A set of random variables being uncorrelated is the same as being pairwise uncorrelated. In contrast, mutual independence of 3 or more random variables is a stronger property than pairwise independence. Yi Lu ECE 313 16/51
Play with Covariance Yi Lu ECE 313 17/51 Linearity Covariance is linear in each of its two arguments: Cov(X + Y,U + V ) = Cov(X, U)+ Cov(X, V )+ Cov(Y,U)+ Cov(Y,V Cov(aX + b, cy + d) = accov(x, Y ). for constants a, b, c, d. Recall that Var(aX + b) =a 2 Var(X). Yi Lu ECE 313 18/51
Variance of sum of r.v. The variance of the sum of uncorrelated random variables is equal to the sum of the variances of the random variables. For example, if X and Y are uncorrelated, Var(X + Y )=Var(X)+Var(Y ), Yi Lu ECE 313 19/51 Variance of sum of r.v. The variance of the sum of uncorrelated random variables is equal to the sum of the variances of the random variables. For example, if X and Y are uncorrelated, Var(X+Y )=Cov(X+Y,X+Y )=Cov(X, X)+Cov(Y,Y )+2Cov(X, Y ) = Var(X)+Var(Y ). Yi Lu ECE 313 20/51
Variance of sum of r.v. Consider the sum S n = X 1 + + X n, such that X 1,,X n are uncorrelated (so Cov(X i,x j )=0if i j) with E[X i ]=μ and Var(X i )=σ 2 for 1 i n. Find E(S n ) and Var(S n ). Yi Lu ECE 313 21/51 Variance of sum of r.v. E[S n ]=nμ (1) Yi Lu ECE 313 22/51
and n n Var(S n ) = Cov(S n,s n )=Cov X i, X j i=1 j=1 = n n Cov(X i,x j ) i=1 j=1 = n Cov(X i,x i )+ Cov(X i,x j ) i=1 i,j:i j = n Var(X i )+0=nσ 2. (2) i=1 Yi Lu ECE 313 23/51 Practice! Simplify the following expressions: (a) Cov(8X +3, 5Y 2), (b) Cov(10X 5, 3X +15), (c) Cov(X+2,10X- 3Y), (d) ρ 10X,Y +4. Yi Lu ECE 313 24/51
Correlation coefficient Yi Lu ECE 313 25/51 Correlation coefficient ρ X,Y = ρ X,Y is a scaled version of Cov(X, Y ). Cov(X, Y ) Var(X)Var(Y ) = Cov(X, Y ) σ X σ Y. The situation is similar to the standardized version of X and Y. Find ( X E[X] Cov, Y E[Y ] ). σ X σ Y Yi Lu ECE 313 26/51
Correlation coefficient ( X E[X] Cov, Y E[Y ] ) ( X = Cov, Y ) = Cov(X, Y ) = ρ X,Y. σ X σ Y σx σ Y σ X σ Y Yi Lu ECE 313 27/51 Correlation coefficient Find ρ ax+b,cy +d Yi Lu ECE 313 28/51
Correlation coefficient ρ ax+b,cy +d = ρ X,Y for a, c > 0. Yi Lu ECE 313 29/51 Correlation coefficient ρ X,Y 1, ρ X,Y =1if and only if Y = ax + b for some a, b with a>0, and ρ X,Y = 1 if and only if Y = ax + b for some a, b with a<0. Yi Lu ECE 313 30/51
Unbiased Estimator Suppose X 1,...,X n are independent and identically distributed random variables, with mean μ and variance σ 2. We estimate μ and σ 2 by the sample mean and sample variancedefined as follows: X = 1 n n k=1 X k σ2 = 1 n 1 n (X k X) 2. Note the perhaps unexpected appearance of n 1 in the sample variance. Of course, we should have n 2 to estimate the variance (assuming we don t know the mean) so it is not surprising that the formula is not defined if n =1. An estimator is called unbiased if the mean of the estimator is equal to the parameter that is being estimated. k=1 Yi Lu ECE 313 31/51 Q1: Why don t we use ML parameter estimation for μ and σ 2? Q2: Why is σ 2 undefined for n =1? (a) Is the sample mean an unbiased estimator of μ? (b) Find the mean square error, E[(μ X) 2 ], for estimation of the mean by the sample mean. (c) Is the sample variance an unbiased estimator of σ 2? Yi Lu ECE 313 32/51
Minimum mean square error estimation Yi Lu ECE 313 33/51 Constant estimator Let Y be a random variable with some known distribution. Suppose Y is not observed but that we wish to estimate Y. We use a constant δ to estimate Y. Yi Lu ECE 313 34/51
Constant estimator Let Y be a random variable with some known distribution. Suppose Y is not observed but that we wish to estimate Y. We use a constant δ to estimate Y. The mean square error (MSE) for estimating Y by δ is defined by E[(Y δ) 2 ]. Q. How do we find a δ that minimizes E[(Y δ) 2 ]? What is the resulting MSE? Yi Lu ECE 313 35/51 Unconstrained Estimator We want to estimate Y. We have an observation X. The joint distribution is f X,Y. We use the estimator g(x) for some function g. The resulting mean square error (MSE) is E[(Y g(x)) 2 ] Q. What is the function g that minimizes the MSE? What is the resulting MSE? Yi Lu ECE 313 36/51
Unconstrained Estimator Suppose you observe X =10. What do you know about Y? Yi Lu ECE 313 37/51 Unconstrained Estimator Suppose you observe X =10. What do you know about Y? You can derive the conditional pdf of Y given X = 10, denoted by f Y X (v 10). Which value of Y should you pick? Yi Lu ECE 313 38/51
Unconstrained Estimator Suppose you observe X =10. What do you know about Y? You can derive the conditional pdf of Y given X = 10, denoted by f Y X (v 10). Which value of Y should you pick? Based on the fact, discussed above, that the minimum MSE constant estimator for a random variable is its mean, it makes sense to estimate Y by the conditional mean: E[Y X = 10] = vf Y X (v 10)dv. Yi Lu ECE 313 39/51 Unconstrained Estimator In general, we can show that g (u) =E[Y X = u] = The minimum MSE is vf Y X (v u)dv. MSE = E[Y 2 ] E[(E[Y X]) 2 ], (3) = Var(Y ) Var(Y X) (4) Yi Lu ECE 313 40/51
Linear estimator In practice it is not always possible to compute g (u). The conditional density f Y X (v u) may not be available or might be difficult to compute. Worse, there might not even be a good way to decide what joint pdf f X,Y to use in the first place. A reasonable alternative is to consider linear estimators of Y given X. Yi Lu ECE 313 41/51 Linear estimator We use a linear estimator L(X) =ax + b. We only need to find a and b. The resulting mean square error (MSE) is E[(Y (ax + b)) 2 ]. Q. What are the a and b that minimize the MSE? What is the resulting MSE? Yi Lu ECE 313 42/51
Linear estimator The minimum MSE linear estimator is given by L (X) =Ê[Y X], where Ê[Y X] = μ Y + ( ) Cov(Y,X) Var(X) = μ Y + σ Y ρ X,Y ( X μx σ X (X μ X ) ). minimum MSE for linear estimation = σy 2 (Cov(X, Y ))2 Var(X) = σ 2 Y (1 ρ 2 X,Y ). Yi Lu ECE 313 43/51 If X and Y are standard, then Ê[Y X] =ρ X,Y X and the MSE is 1 ρ 2 X,Y. Yi Lu ECE 313 44/51
Three estimators Constant Unconstrained Linear Which is the best? Yi Lu ECE 313 45/51 Three estimators Constant (estimator) Linear Unconstrained E[(Y g (X)) 2 ] }{{} σy 2 (1 ρ 2 X,Y ) }{{} σy 2 }{{} MSE for g (X)=E[Y X] MSE for L (X)=Ê[Y X] MSE for δ =E[Y ]. (5) Yi Lu ECE 313 46/51
Three estimators All three estimators are linear as functions of the variable to be estimated: E[aY + bz + c] = ae[y ]+be[z]+c E[aY + bz + c X] = ae[y X]+bE[Z X]+c Ê[aY + bz + c X] = aê[y X]+bÊ[Z X]+c Yi Lu ECE 313 47/51 Noisy observation Let X = Y + N, where Y has the exponential distribution with parameter λ, andn is Gaussian with mean 0 and variance σn 2. Suppose the variables Y and N are independent, and the parameters λ and σn 2 are known and strictly positive. (Recall that E[Y ]= 1 λ and Var(Y )=σ2 Y = 1.) λ 2 (a) Find Ê[Y X], the MSE linear estimator of Y given X, and also find the resulting MSE. (b) Find an unconstrained estimator of Y yielding a strictly smaller MSE than Ê[Y X] does. Yi Lu ECE 313 48/51
Uniform distribution Suppose (X, Y ) is uniformly distributed over the triangular region with vertices at ( 1, 0), (0, 1), and(1, 1), shown in Figure 1. (a) Find and sketch v u Figure 1: Support of f X,Y. Yi Lu ECE 313 49/51 the minimum MSE estimator of Y given X = u, g (u) =E[Y X = u], for all u such that it is well defined, and find the resulting minimum MSE for using g (X) =E[Y X] to estimate Y. (b) Find and sketch the function Ê[Y X = u], used for minimum MSE linear estimation of Y from X, and find the resulting MSE for using Ê[Y X] to estimate Y. Yi Lu ECE 313 50/51
Questions?