Expectation and Variance

Similar documents
Random Variables and Their Distributions

Random Variables. Lecture 6: E(X ), Var(X ), & Cov(X, Y ) Random Variables - Vocabulary. Random Variables, cont.

4. Distributions of Functions of Random Variables

Topic 3: The Expectation of a Random Variable

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Review of Probability Theory

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Discrete Probability distribution Discrete Probability distribution

Chapter 4. Chapter 4 sections

2 (Statistics) Random variables

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Bivariate distributions

Lecture 2: Repetition of probability theory and statistics

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Properties of Summation Operator

Conditional densities, mass functions, and expectations

Random Variables and Expectations

MULTIVARIATE PROBABILITY DISTRIBUTIONS

STOR Lecture 16. Properties of Expectation - I

Introduction to bivariate analysis

Introduction to bivariate analysis

18.440: Lecture 28 Lectures Review

Multivariate probability distributions and linear regression

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

6.041/6.431 Fall 2010 Quiz 2 Solutions

Introduction to Machine Learning

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

STAT Chapter 5 Continuous Distributions

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Lecture 11. Probability Theory: an Overveiw

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Chapter 4 continued. Chapter 4 sections

Chapter 2. Continuous random variables

Political Science 6000: Beginnings and Mini Math Boot Camp

Algorithms for Uncertainty Quantification

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

18.440: Lecture 28 Lectures Review

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Set Theory Digression

2 Functions of random variables

STAT 430/510: Lecture 16

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Some Concepts of Probability (Review) Volker Tresp Summer 2018

18.440: Lecture 19 Normal random variables

Covariance and Correlation

1 Probability theory. 2 Random variables and probability theory.

STAT/MATH 395 PROBABILITY II

Lecture 1: August 28

Brief reminder on statistics

More than one variable

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Continuous Random Variables

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Lecture 3: Random variables, distributions, and transformations

Probability and Distributions

1 Review of Probability and Distributions

Special distributions

Joint Distribution of Two or More Random Variables

Week 9 The Central Limit Theorem and Estimation Concepts

Introduction to Machine Learning

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Week 2. Review of Probability, Random Variables and Univariate Distributions

3. Probability and Statistics

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Lecture 3. David Aldous. 31 August David Aldous Lecture 3

1 Basic continuous random variable problems

Properties of Random Variables

Multivariate Distributions (Hogg Chapter Two)

Chapter 3: Random Variables 1

Quick Tour of Basic Probability Theory and Linear Algebra

Formulas for probability theory and linear models SF2941

1 Random Variable: Topics

Discrete Random Variables

We introduce methods that are useful in:

Econ 371 Problem Set #1 Answer Sheet

Review of Probability. CS1538: Introduction to Simulations

Final Exam # 3. Sta 230: Probability. December 16, 2012

Introduction to Probability and Stocastic Processes - Part I

Multivariate Random Variable

Lecture 22: Variance and Covariance

ECE531: Principles of Detection and Estimation Course Introduction

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

Raquel Prado. Name: Department of Applied Mathematics and Statistics AMS-131. Spring 2010

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

Stat 5101 Notes: Algorithms (thru 2nd midterm)

EXPECTED VALUE of a RV. corresponds to the average value one would get for the RV when repeating the experiment, =0.

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

18.440: Lecture 26 Conditional expectation

MAS223 Statistical Inference and Modelling Exercises

3-1. all x all y. [Figure 3.1]

Bivariate Distributions

Math 105 Course Outline

Discrete Probability Refresher

Transcription:

Expectation and Variance August 22, 2017 STAT 151 Class 3 Slide 1

Outline of Topics 1 Motivation 2 Expectation - discrete 3 Transformations 4 Variance - discrete 5 Continuous variables 6 Covariance STAT 151 Class 3 Slide 2

Survival time data 0.67 0.01 0.48 1.06 0.85 0.45 0.19 0.78 1.23 0.18 0.37 2.17 0.15 1.19 0.58 1.22 0.72 0.67 1.42 0.02 0.12 0.50 0.15 0.81 0.64 0.22 0.05 0.55 0.46 0.83 0.46 0.56 0.82 0.07 2.28 0.34 0.64 0.09 0.77 0.26 0.45 0.41 0.23 0.16 0.39 0.29 0.62 1.09 0.14 0.49 0.66 0.89 0.99 0.98 0.95 0.14 0.03 0.01 2.62 0.99 0.08 1.39 1.31 0.50 0.74 1.19 0.15 0.14 1.18 1.53 Density f(x) = 1.5e 1.5x 0 1 2 3 4 5 X (Time in years) PDF f (x) tells us the behavior of X, survival time of individuals with this kind of cancer. By finding area under f (x), we can answer questions such as P(X < 2), P(X > 2 X > 1), etc.. STAT 151 Class 3 Slide 3

Survival time data (2) If we have n observations X 1,..., X n of X, a simple way to summarize the data is to take the sample average (mean): X = X 1 + X 2 +... + X n n i=1 X i n n Using the survival time data, we obtain: 0.67 + 0.01 +... + 1.53 70 0.654 which tells us patients in the sample lived on average approximately 0.654 years beyond cancer diagnosis Compared to the data (70 numbers) or the histogram, X is a much simpler description of X since it is just a single number (0.654) What is the equivalence of the sample mean if we use PDF such as f (x) = 1.5e 1.5x to model survival time of similar patients? i.e., what is simpler than 1.5e 1.5x? STAT 151 Class 3 Slide 4

Expectation of a discrete random variable Suppose we have a discrete random variable X with possible values a 1, a 2,..., a k The PDF P(X = a 1 ), P(X = a 2 ),..., P(X = a k ) completely describes the behavior X However, if k is large, the PDF is tedious. Can we find something simpler that describes the behavior of X? The expectation, expected value or mean, of X, can be written as either E(X ) or µ X (or simply µ when there is no risk of confusion), and is E(X ) = a 1 P(X = a 1 ) + a 2 P(X = a 2 ) +... + a k P(X = a k ) = a i a i P(X = a i ). E(X ) is a single numeric quantity that describes the behavior of X These are often called population property as opposed to X which is a sample property STAT 151 Class 3 Slide 5

Expectation as a weighted average Suppose X = What is the average value of X? { 1 with probability.9, 1 with probability.1. 1+( 1) 2 = 0 would not be useful, because it ignores the fact that usually X = 1, and only occasionally is X = 1. E(X ) = 1 P(X = 1) + ( 1) P(X = 1) = 1.9 + ( 1).1 =.8 E(X ) is the average value if we observed X many times. In Statistics, E() means the weighted average of the quantity inside the brackets STAT 151 Class 3 Slide 6

Example Let X be the number of heads in 3 independent tosses of a fair coin. The outcome distribution is Number of heads, X 0 1 2 3 P(X ).125.375.375.125 What is the average value of X if we repeat the three tosses many times? E(X ) = 0 P(X = 0) + 1 P(X = 1) + 2 P(X = 2) + 3 P(X = 3) = 0.125 + 1.375 + 2.375 + 3.125 = 1.5 E(X ) does not equal any of the possible values of X E(X ) is the long run average of X If we called the collection of outcomes from three tosses as a population, then E(X ) is a population average E(X ) is a constant, there is nothing random about a population average E(c) = c for any constant c STAT 151 Class 3 Slide 7

Transformation of a discrete random variable Example We are often interested in the transformation of a random variable, X Examples of transformations of X are: X + 2, X 2, X, 1/X, e x,... In general, a transformation of X can be written as g(x ) where g is a function of X Number of heads, X 0 1 2 3 P(X ).125.375.375.125 What is the probability distribution for Y = X 2? Y = X 2 0 2 = 0 1 2 = 1 2 2 = 4 3 2 = 9 P(Y ).125.375.375.125 Therefore, the values of X are transformed from 0, 1, 2, 3 to 0, 1, 4, 9 but the probabilities are NOT transformed STAT 151 Class 3 Slide 8

Expectation of a transformed discrete random variable Example X 0 1 2 3 Y = X 2 0 1 4 9 P(X ).125.375.375.125 E(Y ) = 0 P(X = 0) + 1 P(X = 1) + 4 P(X = 2) + 9 P(X = 3) = 0.125 + 1.375 + 4.375 + 9.125 = 3 Write Y = g(x ) = X 2, so E{g(X )} = g(0) P(X = 0)+g(1) P(X = 1)+g(2) P(X = 2)+g(3) P(X = 3) In general E{g(X )} = a i g(a i )P(X = a i ) STAT 151 Class 3 Slide 9

Expectation of a transformed discrete random variable In general E{g(X )} = g(e(x )) Example X 0 1 2 3 g(x ) = X 2 0 1 4 9 P(X ).125.375.375.125 E(X ) = 0.125 + 1.375 + 2.375 + 3.125 = 1.5 E{g(X )} = 0.125 + 1.375 + 4.375 + 9.125 = 3 g(e(x )) = E(X ) 2 = 1.5 2 = 2.25 STAT 151 Class 3 Slide 10

Linear property of expectation For a discrete random variable X and constants c, d E(cX + d) = a i (ca i + d)p(x = a i ) = c a i P(X = a i ) + d P(X = a i ) a i a i = ce(x ) + d 1 = ce(x ) + d In general, if g is a function of X, then E{cg(X ) + d} = ce{g(x )} + d STAT 151 Class 3 Slide 11

Expectation of a sum or product of discrete random variables For ANY two discrete random variables X and Y E(X + Y ) = E(X ) + E(Y ) regardless of whether X and Y are independent. If X and Y are independent, then E(XY ) = E(X )E(Y ) (The converse is not true, i.e., E(XY ) = E(X )E(Y ) does not imply independence). For general X and Y E(XY ) E(X )E(Y ). STAT 151 Class 3 Slide 12

Survival time data (3) A plot of the data shows observations are spread out. Is X = 0.645 too simplistic to describe X? Survival time (X) 0.0 0.5 1.0 1.5 2.0 2.5 0.654 Note that: 1 if observations are far from X, it is not useful to describe the data 2 if observations are not far from X, it is adequate to describe the data 3 1+2 suggest we need to measure the spread of the observations from X STAT 151 Class 3 Slide 13

Survival time data (4) A measure of the spread of observations from X is: s 2 = (X 1 X ) 2 + (X 2 X ) 2 +... + (X n X ) 2 n s 2 is called a sample variance Using our data, we obtain: n i=1 = (X i X ) 2. n (0.67 X ) 2 + (0.01 X ) 2 +... + (1.53 X ) 2 0.291 70 s 2 is also a single number summary of X in the sample. Sometimes, we use the sample standard deviation, s = s 2 as a measure of spread. Both are much simpler than the data (70 numbers) or the plot on slide 13 What is the equivalence of s 2 (or s) if we use a PDF such as f (x) = 1.5e 1.5x to model survival time of similar patients? i.e., what is simpler than 1.5e 1.5x? Another version of s 2 is STAT 151 Class 3 Slide 14 ni=1 (X i X ) 2 n 1. We discuss the two versions further in class 6

Variance of a discrete random variable For a discrete random variable X with possible values a 1, a 2,..., a k and expected value µ, the variance of X, var(x ) = σ 2 X (or simply σ2 ) is var(x ) = (a 1 µ) 2 P(X = a 1 ) + (a 2 µ) 2 P(X = a 2 ) +... + (a k µ) 2 P(X = a k ) = }{{} E [ (X µ) 2 ], }{{} weighted average deviation of X from µ var(x ) tells us the average deviation between a particular outcome X from its long run average. A second way to calculate variance is var(x ) = E(X 2 ) E(X ) 2 = E(X 2 ) µ 2, which is sometimes more convenient, especially if we already know µ. However, the first way allows a better interpretation of the concept of variance STAT 151 Class 3 Slide 15

Example X = { 1 with probability.9 1 with probability.1 var(x ) = weighted average {}}{ E [(X µ) 2 ] = ( 1.8) 2.1 + (1.8) 2.9 =.36 STAT 151 Class 3 Slide 16

Example The distribution of X = number of heads in three tosses of a coin Number of heads, X 0 1 2 3 P(X ).125.375.375.125 What is the average difference of X from its long run average if we repeat the three tosses many times? var(x ) = weighted average {}}{ E [(X µ) 2 ] (0 1.5) 2 P(X = 0) + (1 1.5) 2 P(X = 1) +(2 1.5) 2 P(X = 2) + (3 1.5) 2 P(X = 3) = 1.5 2.125 + 0.5 2.375 + 0.5 2.375 + 1.5 2.125 =.75 STAT 151 Class 3 Slide 17

Variance of a transformed discrete random variable Example X 0 1 2 3 E(X ) = 1.5 Y = X 2 0 1 4 9 E(Y ) = 3 P(X ).125.375.375.125 var(y ) = (0 3) 2 P(X = 0) + (1 3) 2 P(X = 1) + (4 3) 2 P(X = 2) + (9 3) 2 P(X = 3) = 9.125 + 4.375 + 1.375 + 36.125 = 7.5 Write Y = g(x ) = X 2, so var{g(x )} = [g(0) E{g(X )}] 2 P(X = 0) + [g(1) E{g(X )}] 2 P(X = 1) + [g(2) E{g(X )}] 2 P(X = 2) + [g(3) E{g(X )}] 2 P(X = 3). In general var{g(x )} = [g(a i ) E{g(X )}] 2 P(X = a i ) = E ( [g(x ) E{g(X )}] 2). a i STAT 151 Class 3 Slide 18

Variance of a transformed discrete random variable (2) For a discrete random variable X, let Y = cx + d = g(x ), then var(cx + d) = E ([g(x ) E{g(X )}] 2) = E [ {cx + d E(cX + d)} 2] = E [ {cx + d ce(x ) d} 2] = E [ {cx ce(x )} 2] = E [ c 2 {X E(X )} 2] = c 2 E [ {X E(X )} 2] = c 2 var(x ) In general, if g is a function of X, then var{cg(x ) + d} = c 2 var{g(x )} STAT 151 Class 3 Slide 19

Variance of a sum or product of discrete random variables For two discrete random variables X and Y var(x + Y ) = var(x ) + var(y ) only if X and Y are independent. For general X and Y var(x + Y ) var(x ) + var(y ) In general, var(xy ) var(x )var(y ) STAT 151 Class 3 Slide 20

Expectation and variance - continuous random variable Survival data (5) PDF f (x) = 1.5e 1.5x xf (x)dx STAT 151 Class 3 Slide 21 X A continuous random variable X may assume any value in a range (a, b) E(X ) = µ can be loosely interpreted as a weighted average of X over (a, b), where f (x)dx gives the weight at X = x. For example, the contribution of X = x to E(X ) is xf (x)dx; so E(X ) = xf (x)dx var(x ) is similarly interpreted as the weighted average of (X µ) 2 over (a, b) Both E(X ) and var(x ) must be evaluated using analytical or numerical methods

Survival data (6) Let X have PDF f (x) = { λe λx, 0 < x, 0 < λ 0, otherwise, E(X ) = 0 xλe λx dx d Integration by parts: dx [u(x)v(x)] = u (x)v(x) + u(x)v (x) u(x)v(x) = u (x)v(x)dx + u(x)v (x)dx u(x)v (x)dx = u(x)v(x) u (x)v(x)dx Let u = x, so u = 1; v = λe λx, so v = e λx E(X ) = uv dx {}}{ 0 xλe λx dx = uv {[ }} ] { (x)( e λx ) [ 1 ] λ e λx 0 = 0 + u vdx {}}{ 0 0 = 1 λ e ( 1λ ) e0 = 1 λ (1)( e λx )dx STAT 151 Class 3 Slide 22

Survival data (7) var(x ) = E(X 2 ) E(X ) 2 = E(X 2 ) E(X 2 ) = 0 x 2 λe λx dx ( ) 1 2 λ Let u = x 2, so u = 2x; v = λe λx, so v = e λx E(X 2 ) = uv dx { }} { 0 x 2 λe λx dx = uv {[ }} ] { (x 2 )( e λx ) = 0 + 2 λ var(x ) = E(X 2 ) E(X ) 2 = 2 λ 2 ( 1 λ 0 u vdx { }} { 0 0 xλe λx dx = 2 λ E(X ) = 2 λ 1 λ = 2 λ 2 ) 2 (2x)( e λx )dx = 1 λ 2 STAT 151 Class 3 Slide 23

Example Let X have PDF f (x) = { 3x 2, 0 < x < 1 0, otherwise E(X ) = E(X 2 ) = xf (x)dx = x 2 f (x)dx = 1 0 1 x(3x 2 )dx = var(x ) = E(X 2 ) E(X ) 2 = 3 5 ( 3 4 0 x 2 (3x 2 )dx = 1 ) 2 = 3 80 [ 3x 3x 3 4 dx = 4 0 1 0 ] 1 [ 3x 3x 4 5 dx = 5 0 = 3 4 ] 1 0 = 3 5 STAT 151 Class 3 Slide 24

Properties of expectation and variance All the properties of expectation and variance for a discrete random variable apply to a continuous random variable. For continuous random variables X, Y E{cg(X ) + d} = ce{g(x )} + d E(X + Y ) = E(X ) + E(Y ) E(XY ) = E(X )E(Y ) if X and Y are independent var{g(x )} = E ( [g(x ) E{g(X )}] 2) var{cg(x ) + d} = c 2 var{g(x )} var(x + Y ) = var(x ) + var(y ) if X and Y are independent STAT 151 Class 3 Slide 25

Two variables: Survival data (8) In addition to survival time, suppose age of each patient is also recorded: Observation X (survival time) Y (age) 1 0.67 74 2 0.01 44 3 0.48 62... 70 1.53 58 We could study X and Y by E(X ), E(Y ), var(x ), var(y ). These summaries are examples of univariate analysis, cf., class 2 Univariate analysis does not allow us to answer questions such as: Do younger patients live longer because they are stronger? Do younger patients do worse because they tend to have more aggressive tumours? These questions can only be answered using a multivariate analysis STAT 151 Class 3 Slide 26

Two variables: Survival data (9) A simple graphical summary of bivariate data is the scatterplot. This is simply a plot of the observations (X i, Y i ) in the plane. A scatterplot of the data shows: Y 50 60 70 80 0.0 0.5 1.0 1.5 2.0 2.5 X We need a simple summary that captures the relationship between X and Y observed in the scatterplot STAT 151 Class 3 Slide 27

Sample covariance between two random variables Consider the sample covariance: n i=1 (X i X )(Y i Ȳ ) n A typical term in the numerator of the summation is (X i X )(Y i Ȳ ) and it has the following characteristics: If X i and Y i both fall on the same side of their respective means, X i > X and Y i > Ȳ or X i < X and Y i < Ȳ then this term is positive. If X i and Y i fall on opposite sides of their respective means, then this term is negative. Another version is n i=1 STAT 151 Class 3 Slide 28 X i > X and Y i < Ȳ or X i < X and Y i > Ȳ (X i X )(Y i Ȳ ) n 1. We discuss the two versions further in class 9

Sample covariance between two random variables (2) The sign of (X i X )(Y i Ȳ ) depends on which quadrant the observation (X i, Y i ) falls X Y 50 60 70 80 < 0 > 0 > 0 < 0 Y 0.0 0.5 1.0 X 1.5 2.0 2.5 STAT 151 Class 3 Slide 29

Sample covariance Survival data (10) X Y 50 60 70 80 0.0 0.5 1.0 1.5 2.0 2.5 Y X The green observations are such that (X i, Y i ) are both either larger than or both smaller than ( X, Ȳ ). The red observations are such that (X i, Y i ) are on opposite sides of their respective means ( X, Ȳ ). The green observations contribute positively to sample covariance, the red observations contribute negatively to the sample covariance. If we sum up the green and red observations, the result is approximately 2.238 (> 0), which suggests patients older than average tend to survive longer following diagnosis of the disease STAT 151 Class 3 Slide 30

Covariance Like the sample mean and sample variance, the sample covariance is also a single number summary. It is a summary of the relationship between X and Y from a sample of pairs of (X i, Y i ) What if we use a joint PDF f (x, y) to model the relationship between X and Y? In that case, we seek an equivalence of the sample covariance, which is the covariance: cov(x, Y ) = E(X µ(x ))(Y µ(y )) When X and Y are both discrete, such that the possible values of X are a 1,..., a k and those of Y are b 1,..., b l, cov(x, Y ) = k l (a i µ X )(b j µ Y )P(X = a i, Y = b j ) i=1 j=1 cov(x, Y ) is a single number summary of the relationship between X and Y. When cov(x, Y ) > 0, X, Y tend to agree in their direction in relation to their respective means; when cov(x, Y ) < 0, they tend to disagree STAT 151 Class 3 Slide 31