Expectation and Variance August 22, 2017 STAT 151 Class 3 Slide 1
Outline of Topics 1 Motivation 2 Expectation - discrete 3 Transformations 4 Variance - discrete 5 Continuous variables 6 Covariance STAT 151 Class 3 Slide 2
Survival time data 0.67 0.01 0.48 1.06 0.85 0.45 0.19 0.78 1.23 0.18 0.37 2.17 0.15 1.19 0.58 1.22 0.72 0.67 1.42 0.02 0.12 0.50 0.15 0.81 0.64 0.22 0.05 0.55 0.46 0.83 0.46 0.56 0.82 0.07 2.28 0.34 0.64 0.09 0.77 0.26 0.45 0.41 0.23 0.16 0.39 0.29 0.62 1.09 0.14 0.49 0.66 0.89 0.99 0.98 0.95 0.14 0.03 0.01 2.62 0.99 0.08 1.39 1.31 0.50 0.74 1.19 0.15 0.14 1.18 1.53 Density f(x) = 1.5e 1.5x 0 1 2 3 4 5 X (Time in years) PDF f (x) tells us the behavior of X, survival time of individuals with this kind of cancer. By finding area under f (x), we can answer questions such as P(X < 2), P(X > 2 X > 1), etc.. STAT 151 Class 3 Slide 3
Survival time data (2) If we have n observations X 1,..., X n of X, a simple way to summarize the data is to take the sample average (mean): X = X 1 + X 2 +... + X n n i=1 X i n n Using the survival time data, we obtain: 0.67 + 0.01 +... + 1.53 70 0.654 which tells us patients in the sample lived on average approximately 0.654 years beyond cancer diagnosis Compared to the data (70 numbers) or the histogram, X is a much simpler description of X since it is just a single number (0.654) What is the equivalence of the sample mean if we use PDF such as f (x) = 1.5e 1.5x to model survival time of similar patients? i.e., what is simpler than 1.5e 1.5x? STAT 151 Class 3 Slide 4
Expectation of a discrete random variable Suppose we have a discrete random variable X with possible values a 1, a 2,..., a k The PDF P(X = a 1 ), P(X = a 2 ),..., P(X = a k ) completely describes the behavior X However, if k is large, the PDF is tedious. Can we find something simpler that describes the behavior of X? The expectation, expected value or mean, of X, can be written as either E(X ) or µ X (or simply µ when there is no risk of confusion), and is E(X ) = a 1 P(X = a 1 ) + a 2 P(X = a 2 ) +... + a k P(X = a k ) = a i a i P(X = a i ). E(X ) is a single numeric quantity that describes the behavior of X These are often called population property as opposed to X which is a sample property STAT 151 Class 3 Slide 5
Expectation as a weighted average Suppose X = What is the average value of X? { 1 with probability.9, 1 with probability.1. 1+( 1) 2 = 0 would not be useful, because it ignores the fact that usually X = 1, and only occasionally is X = 1. E(X ) = 1 P(X = 1) + ( 1) P(X = 1) = 1.9 + ( 1).1 =.8 E(X ) is the average value if we observed X many times. In Statistics, E() means the weighted average of the quantity inside the brackets STAT 151 Class 3 Slide 6
Example Let X be the number of heads in 3 independent tosses of a fair coin. The outcome distribution is Number of heads, X 0 1 2 3 P(X ).125.375.375.125 What is the average value of X if we repeat the three tosses many times? E(X ) = 0 P(X = 0) + 1 P(X = 1) + 2 P(X = 2) + 3 P(X = 3) = 0.125 + 1.375 + 2.375 + 3.125 = 1.5 E(X ) does not equal any of the possible values of X E(X ) is the long run average of X If we called the collection of outcomes from three tosses as a population, then E(X ) is a population average E(X ) is a constant, there is nothing random about a population average E(c) = c for any constant c STAT 151 Class 3 Slide 7
Transformation of a discrete random variable Example We are often interested in the transformation of a random variable, X Examples of transformations of X are: X + 2, X 2, X, 1/X, e x,... In general, a transformation of X can be written as g(x ) where g is a function of X Number of heads, X 0 1 2 3 P(X ).125.375.375.125 What is the probability distribution for Y = X 2? Y = X 2 0 2 = 0 1 2 = 1 2 2 = 4 3 2 = 9 P(Y ).125.375.375.125 Therefore, the values of X are transformed from 0, 1, 2, 3 to 0, 1, 4, 9 but the probabilities are NOT transformed STAT 151 Class 3 Slide 8
Expectation of a transformed discrete random variable Example X 0 1 2 3 Y = X 2 0 1 4 9 P(X ).125.375.375.125 E(Y ) = 0 P(X = 0) + 1 P(X = 1) + 4 P(X = 2) + 9 P(X = 3) = 0.125 + 1.375 + 4.375 + 9.125 = 3 Write Y = g(x ) = X 2, so E{g(X )} = g(0) P(X = 0)+g(1) P(X = 1)+g(2) P(X = 2)+g(3) P(X = 3) In general E{g(X )} = a i g(a i )P(X = a i ) STAT 151 Class 3 Slide 9
Expectation of a transformed discrete random variable In general E{g(X )} = g(e(x )) Example X 0 1 2 3 g(x ) = X 2 0 1 4 9 P(X ).125.375.375.125 E(X ) = 0.125 + 1.375 + 2.375 + 3.125 = 1.5 E{g(X )} = 0.125 + 1.375 + 4.375 + 9.125 = 3 g(e(x )) = E(X ) 2 = 1.5 2 = 2.25 STAT 151 Class 3 Slide 10
Linear property of expectation For a discrete random variable X and constants c, d E(cX + d) = a i (ca i + d)p(x = a i ) = c a i P(X = a i ) + d P(X = a i ) a i a i = ce(x ) + d 1 = ce(x ) + d In general, if g is a function of X, then E{cg(X ) + d} = ce{g(x )} + d STAT 151 Class 3 Slide 11
Expectation of a sum or product of discrete random variables For ANY two discrete random variables X and Y E(X + Y ) = E(X ) + E(Y ) regardless of whether X and Y are independent. If X and Y are independent, then E(XY ) = E(X )E(Y ) (The converse is not true, i.e., E(XY ) = E(X )E(Y ) does not imply independence). For general X and Y E(XY ) E(X )E(Y ). STAT 151 Class 3 Slide 12
Survival time data (3) A plot of the data shows observations are spread out. Is X = 0.645 too simplistic to describe X? Survival time (X) 0.0 0.5 1.0 1.5 2.0 2.5 0.654 Note that: 1 if observations are far from X, it is not useful to describe the data 2 if observations are not far from X, it is adequate to describe the data 3 1+2 suggest we need to measure the spread of the observations from X STAT 151 Class 3 Slide 13
Survival time data (4) A measure of the spread of observations from X is: s 2 = (X 1 X ) 2 + (X 2 X ) 2 +... + (X n X ) 2 n s 2 is called a sample variance Using our data, we obtain: n i=1 = (X i X ) 2. n (0.67 X ) 2 + (0.01 X ) 2 +... + (1.53 X ) 2 0.291 70 s 2 is also a single number summary of X in the sample. Sometimes, we use the sample standard deviation, s = s 2 as a measure of spread. Both are much simpler than the data (70 numbers) or the plot on slide 13 What is the equivalence of s 2 (or s) if we use a PDF such as f (x) = 1.5e 1.5x to model survival time of similar patients? i.e., what is simpler than 1.5e 1.5x? Another version of s 2 is STAT 151 Class 3 Slide 14 ni=1 (X i X ) 2 n 1. We discuss the two versions further in class 6
Variance of a discrete random variable For a discrete random variable X with possible values a 1, a 2,..., a k and expected value µ, the variance of X, var(x ) = σ 2 X (or simply σ2 ) is var(x ) = (a 1 µ) 2 P(X = a 1 ) + (a 2 µ) 2 P(X = a 2 ) +... + (a k µ) 2 P(X = a k ) = }{{} E [ (X µ) 2 ], }{{} weighted average deviation of X from µ var(x ) tells us the average deviation between a particular outcome X from its long run average. A second way to calculate variance is var(x ) = E(X 2 ) E(X ) 2 = E(X 2 ) µ 2, which is sometimes more convenient, especially if we already know µ. However, the first way allows a better interpretation of the concept of variance STAT 151 Class 3 Slide 15
Example X = { 1 with probability.9 1 with probability.1 var(x ) = weighted average {}}{ E [(X µ) 2 ] = ( 1.8) 2.1 + (1.8) 2.9 =.36 STAT 151 Class 3 Slide 16
Example The distribution of X = number of heads in three tosses of a coin Number of heads, X 0 1 2 3 P(X ).125.375.375.125 What is the average difference of X from its long run average if we repeat the three tosses many times? var(x ) = weighted average {}}{ E [(X µ) 2 ] (0 1.5) 2 P(X = 0) + (1 1.5) 2 P(X = 1) +(2 1.5) 2 P(X = 2) + (3 1.5) 2 P(X = 3) = 1.5 2.125 + 0.5 2.375 + 0.5 2.375 + 1.5 2.125 =.75 STAT 151 Class 3 Slide 17
Variance of a transformed discrete random variable Example X 0 1 2 3 E(X ) = 1.5 Y = X 2 0 1 4 9 E(Y ) = 3 P(X ).125.375.375.125 var(y ) = (0 3) 2 P(X = 0) + (1 3) 2 P(X = 1) + (4 3) 2 P(X = 2) + (9 3) 2 P(X = 3) = 9.125 + 4.375 + 1.375 + 36.125 = 7.5 Write Y = g(x ) = X 2, so var{g(x )} = [g(0) E{g(X )}] 2 P(X = 0) + [g(1) E{g(X )}] 2 P(X = 1) + [g(2) E{g(X )}] 2 P(X = 2) + [g(3) E{g(X )}] 2 P(X = 3). In general var{g(x )} = [g(a i ) E{g(X )}] 2 P(X = a i ) = E ( [g(x ) E{g(X )}] 2). a i STAT 151 Class 3 Slide 18
Variance of a transformed discrete random variable (2) For a discrete random variable X, let Y = cx + d = g(x ), then var(cx + d) = E ([g(x ) E{g(X )}] 2) = E [ {cx + d E(cX + d)} 2] = E [ {cx + d ce(x ) d} 2] = E [ {cx ce(x )} 2] = E [ c 2 {X E(X )} 2] = c 2 E [ {X E(X )} 2] = c 2 var(x ) In general, if g is a function of X, then var{cg(x ) + d} = c 2 var{g(x )} STAT 151 Class 3 Slide 19
Variance of a sum or product of discrete random variables For two discrete random variables X and Y var(x + Y ) = var(x ) + var(y ) only if X and Y are independent. For general X and Y var(x + Y ) var(x ) + var(y ) In general, var(xy ) var(x )var(y ) STAT 151 Class 3 Slide 20
Expectation and variance - continuous random variable Survival data (5) PDF f (x) = 1.5e 1.5x xf (x)dx STAT 151 Class 3 Slide 21 X A continuous random variable X may assume any value in a range (a, b) E(X ) = µ can be loosely interpreted as a weighted average of X over (a, b), where f (x)dx gives the weight at X = x. For example, the contribution of X = x to E(X ) is xf (x)dx; so E(X ) = xf (x)dx var(x ) is similarly interpreted as the weighted average of (X µ) 2 over (a, b) Both E(X ) and var(x ) must be evaluated using analytical or numerical methods
Survival data (6) Let X have PDF f (x) = { λe λx, 0 < x, 0 < λ 0, otherwise, E(X ) = 0 xλe λx dx d Integration by parts: dx [u(x)v(x)] = u (x)v(x) + u(x)v (x) u(x)v(x) = u (x)v(x)dx + u(x)v (x)dx u(x)v (x)dx = u(x)v(x) u (x)v(x)dx Let u = x, so u = 1; v = λe λx, so v = e λx E(X ) = uv dx {}}{ 0 xλe λx dx = uv {[ }} ] { (x)( e λx ) [ 1 ] λ e λx 0 = 0 + u vdx {}}{ 0 0 = 1 λ e ( 1λ ) e0 = 1 λ (1)( e λx )dx STAT 151 Class 3 Slide 22
Survival data (7) var(x ) = E(X 2 ) E(X ) 2 = E(X 2 ) E(X 2 ) = 0 x 2 λe λx dx ( ) 1 2 λ Let u = x 2, so u = 2x; v = λe λx, so v = e λx E(X 2 ) = uv dx { }} { 0 x 2 λe λx dx = uv {[ }} ] { (x 2 )( e λx ) = 0 + 2 λ var(x ) = E(X 2 ) E(X ) 2 = 2 λ 2 ( 1 λ 0 u vdx { }} { 0 0 xλe λx dx = 2 λ E(X ) = 2 λ 1 λ = 2 λ 2 ) 2 (2x)( e λx )dx = 1 λ 2 STAT 151 Class 3 Slide 23
Example Let X have PDF f (x) = { 3x 2, 0 < x < 1 0, otherwise E(X ) = E(X 2 ) = xf (x)dx = x 2 f (x)dx = 1 0 1 x(3x 2 )dx = var(x ) = E(X 2 ) E(X ) 2 = 3 5 ( 3 4 0 x 2 (3x 2 )dx = 1 ) 2 = 3 80 [ 3x 3x 3 4 dx = 4 0 1 0 ] 1 [ 3x 3x 4 5 dx = 5 0 = 3 4 ] 1 0 = 3 5 STAT 151 Class 3 Slide 24
Properties of expectation and variance All the properties of expectation and variance for a discrete random variable apply to a continuous random variable. For continuous random variables X, Y E{cg(X ) + d} = ce{g(x )} + d E(X + Y ) = E(X ) + E(Y ) E(XY ) = E(X )E(Y ) if X and Y are independent var{g(x )} = E ( [g(x ) E{g(X )}] 2) var{cg(x ) + d} = c 2 var{g(x )} var(x + Y ) = var(x ) + var(y ) if X and Y are independent STAT 151 Class 3 Slide 25
Two variables: Survival data (8) In addition to survival time, suppose age of each patient is also recorded: Observation X (survival time) Y (age) 1 0.67 74 2 0.01 44 3 0.48 62... 70 1.53 58 We could study X and Y by E(X ), E(Y ), var(x ), var(y ). These summaries are examples of univariate analysis, cf., class 2 Univariate analysis does not allow us to answer questions such as: Do younger patients live longer because they are stronger? Do younger patients do worse because they tend to have more aggressive tumours? These questions can only be answered using a multivariate analysis STAT 151 Class 3 Slide 26
Two variables: Survival data (9) A simple graphical summary of bivariate data is the scatterplot. This is simply a plot of the observations (X i, Y i ) in the plane. A scatterplot of the data shows: Y 50 60 70 80 0.0 0.5 1.0 1.5 2.0 2.5 X We need a simple summary that captures the relationship between X and Y observed in the scatterplot STAT 151 Class 3 Slide 27
Sample covariance between two random variables Consider the sample covariance: n i=1 (X i X )(Y i Ȳ ) n A typical term in the numerator of the summation is (X i X )(Y i Ȳ ) and it has the following characteristics: If X i and Y i both fall on the same side of their respective means, X i > X and Y i > Ȳ or X i < X and Y i < Ȳ then this term is positive. If X i and Y i fall on opposite sides of their respective means, then this term is negative. Another version is n i=1 STAT 151 Class 3 Slide 28 X i > X and Y i < Ȳ or X i < X and Y i > Ȳ (X i X )(Y i Ȳ ) n 1. We discuss the two versions further in class 9
Sample covariance between two random variables (2) The sign of (X i X )(Y i Ȳ ) depends on which quadrant the observation (X i, Y i ) falls X Y 50 60 70 80 < 0 > 0 > 0 < 0 Y 0.0 0.5 1.0 X 1.5 2.0 2.5 STAT 151 Class 3 Slide 29
Sample covariance Survival data (10) X Y 50 60 70 80 0.0 0.5 1.0 1.5 2.0 2.5 Y X The green observations are such that (X i, Y i ) are both either larger than or both smaller than ( X, Ȳ ). The red observations are such that (X i, Y i ) are on opposite sides of their respective means ( X, Ȳ ). The green observations contribute positively to sample covariance, the red observations contribute negatively to the sample covariance. If we sum up the green and red observations, the result is approximately 2.238 (> 0), which suggests patients older than average tend to survive longer following diagnosis of the disease STAT 151 Class 3 Slide 30
Covariance Like the sample mean and sample variance, the sample covariance is also a single number summary. It is a summary of the relationship between X and Y from a sample of pairs of (X i, Y i ) What if we use a joint PDF f (x, y) to model the relationship between X and Y? In that case, we seek an equivalence of the sample covariance, which is the covariance: cov(x, Y ) = E(X µ(x ))(Y µ(y )) When X and Y are both discrete, such that the possible values of X are a 1,..., a k and those of Y are b 1,..., b l, cov(x, Y ) = k l (a i µ X )(b j µ Y )P(X = a i, Y = b j ) i=1 j=1 cov(x, Y ) is a single number summary of the relationship between X and Y. When cov(x, Y ) > 0, X, Y tend to agree in their direction in relation to their respective means; when cov(x, Y ) < 0, they tend to disagree STAT 151 Class 3 Slide 31