Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

Similar documents
Jointly Distributed Random Variables

Joint Distribution of Two or More Random Variables

Chapter 4 continued. Chapter 4 sections

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

Class 8 Review Problems 18.05, Spring 2014

Correlation analysis 2: Measures of correlation

Lecture 2: Repetition of probability theory and statistics

Algorithms for Uncertainty Quantification

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

6.041/6.431 Fall 2010 Quiz 2 Solutions

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

Class 8 Review Problems solutions, 18.05, Spring 2014

18.440: Lecture 28 Lectures Review

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

Stochastic processes Lecture 1: Multiple Random Variables Ch. 5

Class 26: review for final exam 18.05, Spring 2014

Confidence Intervals for the Mean of Non-normal Data Class 23, Jeremy Orloff and Jonathan Bloom

CHAPTER 5. Jointly Probability Mass Function for Two Discrete Distributed Random Variables:

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Linear regression Class 25, Jeremy Orloff and Jonathan Bloom

18.05 Exam 1. Table of normal probabilities: The last page of the exam contains a table of standard normal cdf values.

11. Regression and Least Squares

18.440: Lecture 26 Conditional expectation

University of California, Los Angeles Department of Statistics. Joint probability distributions

ECON Fundamentals of Probability

18.440: Lecture 25 Covariance and some conditional expectation exercises

More than one variable

18.440: Lecture 28 Lectures Review

Dependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown

Bivariate distributions

Practice Exam 1: Long List 18.05, Spring 2014

Probability Theory Refresher

Conditional Probability, Independence and Bayes Theorem Class 3, Jeremy Orloff and Jonathan Bloom

Lecture 1: August 28

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Review for Exam Spring 2018

01 Probability Theory and Statistics Review

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Ch. 5 Joint Probability Distributions and Random Samples

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Lecture 4. August 24, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

More on Distribution Function

Summary of Random Variable Concepts March 17, 2000

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Appendix A : Introduction to Probability and stochastic processes

1 Presessional Probability

University of Illinois ECE 313: Final Exam Fall 2014

RELATIONS AND FUNCTIONS through

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Lecture 2: Review of Probability

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

STOR Lecture 16. Properties of Expectation - I

ECE Homework Set 3

Outline Properties of Covariance Quantifying Dependence Models for Joint Distributions Lab 4. Week 8 Jointly Distributed Random Variables Part II

BASICS OF PROBABILITY

E X A M. Probability Theory and Stochastic Processes Date: December 13, 2016 Duration: 4 hours. Number of pages incl.

18.600: Lecture 24 Covariance and some conditional expectation exercises

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

STAT/MATH 395 PROBABILITY II

ECE 450 Homework #3. 1. Given the joint density function f XY (x,y) = 0.5 1<x<2, 2<y< <x<4, 2<y<3 0 else

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Chapter 3. Expectations

Recitation 2: Probability

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Stationary independent increments. 1. Random changes of the form X t+h X t fixed h > 0 are called increments of the process.

Homework 4 Solution, due July 23

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Tutorial for Lecture Course on Modelling and System Identification (MSI) Albert-Ludwigs-Universität Freiburg Winter Term

2: Distributions of Several Variables, Error Propagation

Econ 371 Problem Set #1 Answer Sheet

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.

ECE 313: Conflict Final Exam Tuesday, May 13, 2014, 7:00 p.m. 10:00 p.m. Room 241 Everitt Lab

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014

Covariance and Correlation

Review of Probability. CS1538: Introduction to Simulations

Introduction to Machine Learning

Random variables (discrete)

Random Vectors. 1 Joint distribution of a random vector. 1 Joint distribution of a random vector

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

1 Basic continuous random variable problems

Expectation and Variance

14.30 Introduction to Statistical Methods in Economics Spring 2009

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 9 Fall 2007

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

Regression and Covariance

Probability Theory and Statistics. Peter Jochumzen

Introduction to bivariate analysis

Mathematical Statistics. Gregg Waterman Oregon Institute of Technology

HW5 Solutions. (a) (8 pts.) Show that if two random variables X and Y are independent, then E[XY ] = E[X]E[Y ] xy p X,Y (x, y)

Introduction to bivariate analysis

Exercises with solutions (Set D)

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Joint probability distributions: Discrete Variables. Two Discrete Random Variables. Example 1. Example 1

Chapter 5 continued. Chapter 5 sections

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

Transcription:

1 Learning Goals Covariance and Correlation Class 7, 18.05 Jerem Orloff and Jonathan Bloom 1. Understand the meaning of covariance and correlation. 2. Be able to compute the covariance and correlation of two random variables. 2 Covariance Covariance is a measure of how much two random variables var together. For eample, height and weight of giraffes have positive covariance because when one is big the other tends also to be big. Definition: Suppose X and Y are random variables with means µ X and µ Y. The covariance of X and Y is defined as Cov(X, Y ) = E((X µ X )(Y µ Y )). 2.1 Properties of covariance 1. Cov(aX + b, cy + d) = accov(x, Y ) for constants a, b, c, d. 2. Cov(X 1 + X 2, Y ) = Cov(X 1, Y ) + Cov(X 2, Y ). 3. Cov(X, X) = Var(X) 4. Cov(X, Y ) = E(XY ) µ X µ Y. 5. Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ) for an X and Y. 6. If X and Y are independent then Cov(X, Y ) = 0. Warning: The converse is false: zero covariance does not alwas impl independence. Notes. 1. Propert 4 is like the similar propert for variance. Indeed, if X = Y it is eactl that propert: Var(X) = E(X 2 ) µ 2 X. B Propert 5, the formula in Propert 6 reduces to the earlier formula Var(X + Y ) = Var(X) + Var(Y ) when X and Y are independent. We give the proofs below. However, understanding and using these properties is more important than memorizing their proofs. 1

18.05 class 7, Covariance and Correlation, Spring 2014 2 2.2 Sums and integrals for computing covariance Since covariance is defined as an epected value we compute it in the usual wa as a sum or integral. Discrete case: If X and Y have joint pmf p( i, j ) then n m n m Cov(X, Y ) = p( i, j )( i µ X )( j µ Y ) = p( i, j ) i j µ X µ Y. i=1 j=1 i=1 j=1 Continuous case: If X and Y have joint pdf f(, ) over range [a, b] [c, d] then d b ( d b ) Cov(X, Y ) = ( µ )( µ )f(, ) d d = f(, ) d d µ µ. 2.3 Eamples c a c a Eample 1. Flip a fair coin 3 times. Let X be the number of heads in the first 2 flips and let Y be the number of heads on the last 2 flips (so there is overlap on the middle flip). Compute Cov(X, Y ). answer: We ll do this twice, first using the joint probabilit table and the definition of covariance, and then using the properties of covariance. With 3 tosses there are onl 8 outcomes {HHH, HHT,...}, so we can create the joint probabilit table directl. X\Y 0 1 2 p( i ) 0 1/8 1/8 0 1/4 1 1/8 2/8 1/8 1/2 2 0 1/8 1/8 1/4 p( j ) 1/4 1/2 1/4 1 From the marginals we compute E(X) = 1 = E(Y ). Now we use use the definition: Cov(X, Y ) = E((X µ )(Y µ Y )) = p( i, j )( i 1)( j 1) We write out the sum leaving out all the terms that are 0, i.e. all the terms where i = 1 or i = 1 or the probabilit is 0. 1 Cov(X, Y ) = 8 (0 1)(0 1) + 1 8 (2 1)(2 1) = 1. 4 We could also have used propert 4 to do the computation: From the full table we compute 2 E(XY ) = 1 8 + 2 1 8 + 21 8 + 41 8 = 5 4. i,j

18.05 class 7, Covariance and Correlation, Spring 2014 3 5 So Cov(XY ) = E(XY ) µ X µ Y = 1 = 1. 4 4 Net we redo the computation of Cov(X, Y ) using the properties of covariance. As usual, let X i be the result of the i th flip, so X i Bernoulli(0.5). We have X = X 1 + X 2 and Y = X 2 + X 3. We know E(X i ) = 1/2 and Var(X i ) = 1/4. Therefore using Propert 2 of covariance, we have Cov(X, Y ) = Cov(X 1 +X 2, X 2 +X 3 ) = Cov(X 1, X 2 )+Cov(X 1, X 3 )+Cov(X 2, X 2 )+Cov(X 2, X 3 ). Since the different tosses are independent we know Cov(X 1, X 2 ) = Cov(X 1, X 3 ) = Cov(X 2, X 3 ) = 0. Looking at the epression for Cov(X, Y ) there is onl one non-zero term 1 Cov(X, Y ) = Cov(X 2, X 2 ) = Var(X 2 ) = 4. Eample 2. (Zero covariance does not impl independence.) Let X be a random variable that takes values 2, 1, 0, 1, 2; each with probabilit 1/5. Let Y = X 2. Show that Cov(X, Y ) = 0 but X and Y are not independent. answer: We make a joint probabilit table: Y \X -2-1 0 1 2 p( j ) 0 0 0 1/5 0 0 1/5 1 0 1/5 0 1/5 0 2/5 4 1/5 0 0 0 1/5 2/5 p( i ) 1/5 1/5 1/5 1/5 1/5 1 Using the marginals we compute means E(X) = 0 and E(Y ) = 2. Net we show that X and Y are not independent. To do this all we have to do is find one place where the product rule fails, i.e. where p( i, j ) p( i )p( j ): P (X = 2, Y = 0) = 0 but P (X = 2) P (Y = 0) = 1/25. Since these are not equal X and Y are not independent. Finall we compute covariance using Propert 4: 1 Cov(X, Y ) = ( 8 1 + 1 + 8) µ X µ = 0. 5 Discussion: This eample shows that Cov(X, Y ) = 0 does not impl that X and Y are independent. In fact, X and X 2 are as dependent as random variables can be: if ou know the value of X then ou know the value of X 2 with 100% certaint. The ke point is that Cov(X, Y ) measures the linear relationship between X and Y. In the above eample X and X 2 have a quadratic relationship that is completel missed b Cov(X, Y ).

18.05 class 7, Covariance and Correlation, Spring 2014 4 2.4 Proofs of the properties of covariance 1 and 2 follow from similar properties for epected value. 3. This is the definition of variance: Cov(X, X) = E((X µ X)(X µ X )) = E((X µ X ) ) = Var(X). 4. Recall that E(X µ ) = 0. So Cov(X, Y ) = E((X µ X )(Y µ Y )) = E(XY µ X Y µ Y X + µ X µ Y ) = E(XY ) µ X E(Y ) µ Y E(X) + µ X µ Y = E(XY ) µ X µ Y µ X µ Y + µ X µ Y = E(XY ) µ X µ Y. 5. Using properties 3 and 2 we get Var(X+Y ) = Cov(X+Y, X+Y ) = Cov(X, X)+2Cov(X, Y )+Cov(Y, Y ) = Var(X)+Var(Y )+2Cov(X, Y ). 6. If X and Y are independent then f(, ) = f X ()f Y (). Therefore Cov(X, Y ) = ( µ X )( µ Y )f X ()f Y () d d = ( µ X )f X () d ( µ Y )f Y () d = E(X µ X)E(Y µ Y ) = 0. 2 3 Correlation The units of covariance Cov(X, Y ) are units of X times units of Y. This makes it hard to compare covariances: if we change scales then the covariance changes as well. Correlation is a wa to remove the scale from the covariance. Definition: The correlation coefficient between X and Y is defined b 3.1 Properties of correlation Cov(X, Y ) Cor(X, Y ) = ρ =. σ X σ Y 1. ρ is the covariance of the standardizations of X and Y. 2. ρ is dimensionless (it s a ratio!). 3. 1 ρ 1. Furthermore, ρ = +1 if and onl if Y = ax + b with a > 0, ρ = 1 if and onl if Y = ax + b with a < 0.

18.05 class 7, Covariance and Correlation, Spring 2014 5 Propert 3 shows that ρ measures the linear relationship between variables. If the correlation is positive then when X is large, Y will tend to large as well. If the correlation is negative then when X is large, Y will tend to be small. Eample 2 above shows that correlation can completel miss higher order relationships. 3.2 Proof of Propert 3 of correlation (This is for the mathematicall interested.) ( X 0 Var Y ) ( ) ( X Y = Var + Var σ X σ Y σ X σ Y ρ 1 ( X Likewise 0 Var + Y ) 1 ρ. σ X σ ( Y X If ρ = 1 then 0 = Var Y ) X Y = c. σ X σ Y σ X σ Y ) ( X 2Cov ) σ X, Y = σ Y 2 2ρ Eample. We continue Eample 1. To compute the correlation we divide the covariance b the standard deviations. In Eample 1 we found Cov(X, Y ) = 1/4 and Var(X) = 2Var(X j ) = 1/2. So, σ X = 1/ 2. Likewise σ Y = 1/ 2. Thus Cor(X, Y ) = Cov(X, Y ) = 1/4 σ X σ Y 1/2 = 1. 2 We see a positive correlation, which means that larger X tend to go with larger Y and smaller X with smaller Y. In Eample 1 this happens because toss 2 is included in both X and Y, so it contributes to the size of both. 3.3 Bivariate normal distributions The bivariate normal distribution has densit e f(, ) = [ 1 ( µ X ) 2 2(1 ρ 2 ) σ X 2 ] + ( µ Y )2 σ Y 2 2ρ( µ)( µ) σσ 2πσ X σ Y 1 ρ 2 For this distribution, the marginal distributions for X and Y are normal and the correlation between X and Y is ρ. In the figures below we used R to simulate the distribution for various values of ρ. Individuall X and Y are standard normal, i.e. µ X = µ Y = 0 and σ X = σ Y = 1. The figures show scatter plots of the results. These plots and the net set show an important feature of correlation. We divide the data into quadrants b drawing a horizontal and a verticle line at the means of the data and data respectivel. A positive correlation corresponds to the data tending to lie in the 1st and 3rd quadrants. A negative correlation corresponds to data tending to lie in the 2nd and 4th quadrants. You can see the data gathering about a line as ρ becomes closer to ±1.

18.05 class 7, Covariance and Correlation, Spring 2014 6 3 2 1 0 1 2 3 3 1 0 1 2 3 rho=0.00 4 3 2 1 0 1 2 3 3 1 0 1 2 3 rho=0.30 3 2 1 0 1 2 3 3 1 0 1 2 rho=0.70 4 2 0 2 4 2 0 2 rho=1.00 3 2 1 0 1 2 3 2 0 1 2 3 rho= 0.50 3 2 1 0 1 2 3 4 4 2 0 2 rho= 0.90

18.05 class 7, Covariance and Correlation, Spring 2014 7 3.4 Overlapping uniform distributions We ran simulations in R of the following scenario. X 1, X 2,..., X 20 are i.i.d and follow a U(0, 1) distribution. X and Y are both sums of the same number of X i. We call the number of X i common to both X and Y the overlap. The notation in the figures below indicates the number of X i being summed and the number which overlap. For eample, 5,3 indicates that X and Y were each the sum of 5 of the X i and that 3 of the X i were common to both sums. (The data was generated using rand(1,1000);) Using the linearit of covariance it is eas to compute the theoretical correlation. For each plot we give both the theoretical correlation and the correlation of the data from the simulated sample. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.4 0.8 (1, 0) cor=0.00, sample_cor= 0.07 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 (2, 1) cor=0.50, sample_cor=0.48 1 2 3 4 1 2 3 4 (5, 1) cor=0.20, sample_cor=0.21 1 2 3 4 1 2 3 4 (5, 3) cor=0.60, sample_cor=0.63

18.05 class 7, Covariance and Correlation, Spring 2014 8 2 3 4 5 6 7 2 3 4 5 6 7 (10, 5) cor=0.50, sample_cor=0.53 3 4 5 6 7 8 2 3 4 5 6 7 8 (10, 8) cor=0.80, sample_cor=0.81

MIT OpenCourseWare https://ocw.mit.edu 18.05 Introduction to Probabilit and Statistics Spring 2014 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.