ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations

Similar documents
ORF 245 Fundamentals of Statistics Great Expectations

LIST OF FORMULAS FOR STK1100 AND STK1110

Probability and Distributions

3. Probability and Statistics

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

1 Review of Probability and Distributions

Practice Examination # 3

1: PROBABILITY REVIEW

Final Exam # 3. Sta 230: Probability. December 16, 2012

1 Review of Probability

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Massachusetts Institute of Technology

Random Variables. P(x) = P[X(e)] = P(e). (1)

Chp 4. Expectation and Variance

2 (Statistics) Random variables

ACM 116: Lectures 3 4

Joint Distribution of Two or More Random Variables

STAT Chapter 5 Continuous Distributions

1 Presessional Probability

Random Variables and Their Distributions

ECE 302 Division 2 Exam 2 Solutions, 11/4/2009.

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Topic 3: The Expectation of a Random Variable

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

STT 441 Final Exam Fall 2013

18.440: Lecture 28 Lectures Review

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

Continuous Random Variables

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Formulas for probability theory and linear models SF2941

Chapter 4 continued. Chapter 4 sections

18.440: Lecture 28 Lectures Review

STAT 430/510: Lecture 16

Review 1: STAT Mark Carpenter, Ph.D. Professor of Statistics Department of Mathematics and Statistics. August 25, 2015

More on Distribution Function

Northwestern University Department of Electrical Engineering and Computer Science

Continuous Random Variables

We introduce methods that are useful in:

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Topic 4: Continuous random variables

Chapter 4. Chapter 4 sections

Review of Probability Theory

Practice Midterm 2 Partial Solutions

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

ECON 5350 Class Notes Review of Probability and Distribution Theory

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random

14.30 Introduction to Statistical Methods in Economics Spring 2009

Probability Theory and Statistics. Peter Jochumzen

STA 256: Statistics and Probability I

MULTIVARIATE PROBABILITY DISTRIBUTIONS

SDS 321: Introduction to Probability and Statistics

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Set Theory Digression

Lecture 2: Repetition of probability theory and statistics

Probability Models. 4. What is the definition of the expectation of a discrete random variable?

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Chapter 4 : Expectation and Moments

Problem Solving. Correlation and Covariance. Yi Lu. Problem Solving. Yi Lu ECE 313 2/51

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Expectation of Random Variables

Actuarial Science Exam 1/P

ECE 302 Division 1 MWF 10:30-11:20 (Prof. Pollak) Final Exam Solutions, 5/3/2004. Please read the instructions carefully before proceeding.

Class 8 Review Problems solutions, 18.05, Spring 2014

Probability Review. Gonzalo Mateos

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

1 Basic continuous random variable problems

Lecture 22: Variance and Covariance

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Topic 4: Continuous random variables

Random Variables and Expectations

6.1 Moment Generating and Characteristic Functions

1. Let X be a random variable with probability density function. 1 x < f(x) = 0 otherwise

Topic 3: The Expectation of a Random Variable

Exercises and Answers to Chapter 1

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

APPM/MATH 4/5520 Solutions to Exam I Review Problems. f X 1,X 2. 2e x 1 x 2. = x 2

Stochastic Models of Manufacturing Systems

STOR Lecture 16. Properties of Expectation - I

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Stat 5101 Notes: Brand Name Distributions

Probability and Statistics Notes

4. Distributions of Functions of Random Variables

CME 106: Review Probability theory

Exam 3, Math Fall 2016 October 19, 2016

Jointly Distributed Random Variables

Lectures on Elementary Probability. William G. Faris

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

This does not cover everything on the final. Look at the posted practice problems for other topics.

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Transcription:

ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations Robert Vanderbei Fall 2014 Slides last edited on October 20, 2014 http://www.princeton.edu/ rvdb

Definition The expectation of a random variable is a measure of it s average value. Discrete Case: E(X) = i x i p(x i ) Caveat: If it s an infinite sum and the x i s are both positive and negative, then the sum can fail to converge. We restrict our attention to cases where the sum converges absolutely: x i p(x i ) < Otherwise, we say that the expectation is undefined. i Continuous Case: E(X) = xf(x)dx Corresponding Caveat: If we say that the expectation is undefined. x f(x)dx = 1

Geometric Random Variable Recall that a geometric random variable takes on positive integer values, 1, 2,..., and that p(k) = P (X = k) = q k 1 p where q = 1 p. We compute: E(X) = kpq k 1 = p k=1 kq k 1 = p k=1 k=1 d dq qk = p d dq q k = p d dq q q k = p d dq k=1 k=0 q 1 q (1 q)(1) q( 1) 1 = p = p (1 q) 2 (1 q) 2 = 1 p (Isn t calculus fun!) 2

Poisson Random Variable Recall that a Poisson random variable takes on nonnegative integer values, 0, 1, 2,..., and that p(k) = P (X = k) = λk k! e λ where λ is some positive real number. We compute: E(X) = k=0 k λk k! e λ = λe λ k=1 λ k 1 (k 1)! = λe λ k=0 λ k k! = λe λ e λ = λ We now see that λ is the mean. 3

Exponential Random Variable Recall that an exponential random variable is a continuous random variable with f(x) = λe λx, x 0, where λ > 0 is a fixed parameter. We compute: E(X) = 0 x λe λx dx = 1 λ 0 u e u du = 1 λ (the last integral being done using integration by parts). 4

Normal Random Variable Recall that a normal random variable is a continuous random variable with f(x) = 1 2πσ e (x µ)2 /2σ 2 We compute: E(X) = = = x 1 2πσ e (x µ)2 /2σ 2 dx (u + µ) u 1 2πσ e u2 /2σ 2 du 1 e u2 /2σ 2 du + µ 2πσ 1 2πσ e u2 /2σ 2 du = 0 + µ 1 = µ The expected value of X is the mean µ. 5

Cauchy Random Variable Recall that a Cauchy random variable is a continuous random variable with We compute: f(x) = x f(x)dx = 1 π(1 + x 2 ) f(x) 0.45 0.4 0.35 0.3 0.25 0.2 0.15 Normal vs. Cauchy Densities π(1 x 0.1 1 + x 2 ) dx = 0.05 0 5 4 3 2 1 0 1 2 3 4 5 The Cauchy density is symmetric about the origin so it is tempting to say that the expectation is zero. But, the expectation does not exist. The Cauchy distribution is said to have fat tails. Let X 1, X 2,... be independent random variables with the same distribution as X. S n = n k=1 X k. Usually, we expect that It s not the case for Cauchy (see next slide). S n /n E(X) x Std. Normal Cauchy Let 6

Empirical Average n = 5000; figure(1); X=random('norm',sqrt(2),1,[1 n]); S=cumsum(X)./(1:n); plot((1:n),s,'k-'); xlabel('n'); ylabel('s_n/n'); title('s_n/n for Normal distribution'); S n /n 3 2.5 2 1.5 S n /n for Normal distribution 1 0 1000 2000 3000 4000 5000 n figure(2); U=random('unif',-pi/2,pi/2,[1 n]); X=tan(U); S=cumsum(X)./(1:n); plot((1:n),s,'k-'); xlabel('n'); ylabel('s_n/n'); title('s_n/n for Cauchy distribution'); S n /n 1.5 1 0.5 0 0.5 1 S n /n for Cauchy distribution 1.5 0 1000 2000 3000 4000 5000 n 7

Theorem 4.1.1 A Let g( ) be some given function. Discrete Case: E(g(X)) = x j g(x j )p(x j ) Continuous Case: E(g(X)) = g(x)f(x)dx Derivation (Discrete case): Let Y = g(x). Then E(g(X)) = E(Y ) = i y i p Y (y i ) Let A i = {x j g(x j ) = y i }. Then, p Y (y i ) = x j A i p(x j ) and so E(Y ) = i y i x j A i p(x j ) = i x j A i y i p(x j ) = i x j A i g(x j )p(x j ) = x j g(x j )p(x j ) Note: Usually E(g(X)) g(e(x)). 8

Theorem 4.1.1 B Suppose that Y = g(x 1, X 2,..., X n ) for some given function g( ). Discrete Case: E(Y ) = x 1,x 2,...,x n g(x 1, x 2,..., x n )p(x 1, x 2,..., x n ) Continuous Case: E(Y ) = g(x 1, x 2,... x n )f(x 1, x 2,... x n )dx n dx 2 dx 1 Derivation: Same as before. 9

Theorem 4.1.2 A Theorem: E a + n b i X i = a + i=1 n b i E(X i ) i=1 Proof: We give the proof for the continuous case with n = 2. Other cases are similar. E(Y ) = (a + b 1 x 1 + b 2 x 2 )f(x 1, x 2 )dx 1 dx 2 = a f(x 1, x 2 )dx 1 dx 2 + b 1 x 1 f(x 1, x 2 )dx 1 dx 2 + b 2 x 2 f(x 1, x 2 )dx 1 dx 2 = a + b 1 x 1 ( ) f(x 1, x 2 )dx 2 dx 1 + b 2 x 2 ( f(x 1, x 2 )dx 1 ) dx 2 = a + b 1 x 1 f X1 (x 1 )dx 1 + b 2 x 2 f X2 (x 2 )dx 2 = a + b 1 E(X 1 ) + b 2 E(X 2 ) NOTE: In this class, an integral without limits is an integral from to. It s not an indefinite integral. 10

Example 4.1.2 A Consider a binomial random variable Y representing the number of successes in n independent trials where each trial has success probability p. It s expectation is defined in terms of the probability mass function as ( ) n n E(Y ) = k p k (1 p) n k k This sum is tricky to simplify. k=0 Here s an easier way. Let X i denote the Bernoulli random variable that takes the value 1 if the i-th trial is a success and 0 otherwise. Then and so E(Y ) = n Y = i=1 X i n n E(X i ) = p = np i=1 i=1 11

Variance and Standard Deviation Definition: The variance of a random variable X is defined as σ 2 := Var(X) := E ( X E(X) ) 2 The standard deviation, denoted by σ, is simply the square root of the variance. Theorem: If Y = a + bx, then Var(Y ) = b 2 Var(X). Proof: E ( Y E(Y ) )2 = E ( a + bx E(a + bx) ) 2 = E ( a + bx a be(x) ) 2 = E ( bx be(x) ) 2 = b 2 E ( X E(X) ) 2 = b 2 Var(X) 12

Bernoulli Distribution Recall that q = 1 p and E(X) = 0q + 1p = p Hence Var(X) = E(X E(X)) 2 = (0 p) 2 q + (1 p) 2 p = p 2 q + q 2 p = pq(p + q) = pq Important Note: EX 2 = E(X 2 ) (E(X)) 2 13

Normal Distribution Recall that Hence E(X) = µ Var(X) = E(X µ) 2 = 1 (x µ) 2 e (x µ)2 2σ 2πσ 2 dx Make a change of variables z = (x µ)/σ to get Var(X) = σ2 2π z 2 e z2 2 dz This last integral evaluates to 2π a fact that can be checked using integration by parts with u = z and dv = everything else. Hence Var(X) = σ 2 14

An Equivalent Alternate Formula for Variance Var(X) = E(X 2 ) (E(X)) 2 Let µ denote the expected value of X: µ = E(X). Var(X) = E(X µ) 2 = E(X 2 2µX + µ 2 ) = E(X 2 ) 2µE(X) + µ 2 = E(X 2 ) 2µ 2 + µ 2 = E(X 2 ) µ 2 15

Poisson Distribution Let X be a Poisson random variable with parameter λ. Recall that E(X) = λ To compute the variance, we follow a slightly tricky path. First, we compute E(X(X 1)) = n=0 n(n 1) λn n! e λ = n=2 λ n (n 2)! e λ = λ 2 n=0 λ n n! e λ = λ 2 Hence, E(X 2 ) = λ 2 + E(X) = λ 2 + λ and so Var(X) = E(X 2 ) (E(X)) 2 = λ 2 + λ λ 2 = λ 16

Standard and Poors 500 Daily Returns Raw data: R j, j = 1, 2,..., n 0.04 Real data from S&P500 0.03 0.02 Return 0.01 0 0.01 0.02 0.03 0.04 0 50 100 150 200 250 Days from start 17

Standard and Poors 500 Daily Returns µ = E(R i ) R j /n = 9.86 10 4, σ 2 = Var(R i ) (R j µ) 2 /n = 0.0108 j j 1 0.8 Cumulative Distribution Function for S&P500 S&P500 Normal(µ,σ) 0.6 F(x) 0.4 0.2 0 0.05 0 0.05 x 18

Standard and Poors 500 Value Over Time 1.6 1.5 1.4 Value of Investment over Time S&P500 Simulated from Same Distrituion Current Value 1.3 1.2 1.1 1 0.9 0.8 0 50 100 150 200 250 Days from start 19

Standard and Poors 500 Matlab Code load -ascii 'sp500.txt' [n m] = size(sp500); R = sp500; mu = sum(r)/n sigma = std(r) figure(1); plot(r); xlabel('days from start'); ylabel('return'); title('real data from S&P500'); figure(2); Rsort = sort(r); x = (-400:400)/10000; y = cdf('norm', x, mu, sigma); plot(rsort, (1:n)/n, 'r-'); hold on; plot(x,y,'k-'); hold off; xlabel('x'); ylabel('f(x)'); title('cumulative Distribution Function for S&P500'); legend('s&p500', 'Normal(\mu,\sigma)'); figure(3); P = cumprod(1+r); plot(p,'r-'); hold on; for i=1:4 RR = R(randi(n,[n 1])); PP = cumprod(1+rr); plot(pp,'k-'); end xlabel('days from start'); ylabel('current Value'); title('value of Investment over Time'); legend('s&p500', 'Simulated from Same Distribution'); hold off; 20

Standard and Poors 500 The Data The data file is called sp500.txt. It is 250 lines of plain text. Each line contains one number R i. Here are the first 15 lines....033199973 -.00048403243.022474383 -.0065553654 -.014074893.019397096-1.0780741e-05 -.0014122923.0058298966 -.014425864 -.0039424103 -.014017057 -.015702278 -.010432392.010223599 21

Covariance Given two random variables, X and Y, let µ X = E(X) and µ Y = E(Y ). The covariance between X and Y is defined as: Cov(X, Y ) = E((X µ X )(Y µ Y )) = E(XY ) µ X µ Y Proof of equality. E((X µ X )(Y µ Y )) = E(XY Xµ Y µ X Y + µ X µ Y ) = E(XY ) µ X µ Y µ X µ Y + µ X µ Y = E(XY ) µ X µ Y Comment: If X and Y are independent, then E(XY ) = E(X)E(Y ) and so Cov(X, Y ) = 0. The converse is not true. 22

Covariance/Variance of Linear Combinations If U = a + n i=1 b i X i and V = c + m j=1 d j Y j, then Cov(U, V ) = n i=1 m b i d j Cov(X i, Y j ) j=1 If X i s are independent, then Cov(X i, X j ) = 0 for i j and so Var i X i = Cov i X i, i X i = i = i Cov(X i, X i ) Var(X i ) 23

Variance of a Binomial RV Recall our representation of a Binomial random variable Y Bernoulli s: as a sum of independent Y = n i=1 X i From this we see that Var(Y ) = i Var(X i ) = np(1 p). 24

Correlation Coefficient The correlation coefficient between two random variables X and Y is denoted by ρ and defined as ρ = Cov(X, Y ) Var(X)Var(Y ) = σ XY σ X σ Y Let s talk about units. Suppose that X represents a random spatial length measured in meters (m) and that Y representst a random time interval measured in seconds (s). Then, the units of Cov(X, Y ) are meter-seconds, Var(X) is measured in meters-squared and Var(Y ) has units of seconds-squared. Hence, ρ is unitless the units in the numerator cancel with the units in the denominator. One can show that always holds. 1 ρ 1 25

Conditional Expectation The following formulas seem self explanatory... Discrete case: E(Y X = x) = y yp Y X (y x) Continuous case: E(Y X = x) = yf Y X (y x)dy Arbitrary function of Y : E(h(Y ) X = x) = h(y)f Y X (y x)dy 26

Prediction Let Y be a random variable. We d like to give a single deterministic number to represent where this random variable sits on the real line. The expected value, E(Y ) is one choice that is quite reasonable if the distribution of Y is symmetric about this mean value. But, many distributions are skewed and in such cases the expected value might not be the best choice. The real question is: how do we quantify what we mean by best choice? One answer to that question involves the mean squared error (MSE): MSE(α) = E(Y α) 2 To find a good estimator, pick the value of α that minimizes the MSE. To find this minimizer, we differentiate and set the derivative to zero: d dα MSE(α) = d ( ) d dα E(Y α)2 = E (Y α)2 = E ( 2(Y α)( 1) ) dα Hence, we pick α such that 0 = E(α Y ) = α E(Y ) i.e., α = E(Y ) Conclusion: the mean minimizes the mean squared error. 27

Least Squares Suppose we know from some underlying fundamental principle (say physics for example) that some parameter y is related linearly to another parameter x: y = α + βx but we don t know α and β. We d like to do experiments to determine them. A probabilistic model of the experiment has X and Y as random variables. Let s imagine we do the experiment over and over many times and have a good sense of the joint distribution of X and Y. We want to pick α and β so as to minimize E(Y α βx) 2 Again, we take derivatives and set them to zero. This time we have two derivatives: and ( ) α E(Y α βx)2 = E (Y α βx)2 α = 2E(Y α βx) = 2(µ Y α βµ X ) = 0 ( ) β E(Y α βx)2 = E (Y α βx)2 = 2E ( (Y α βx)x ) = 2 ( E(XY ) αe(x) βe(x 2 ) ) = 0 β 28

Least Squares Continued We get two linear equations in two unknowns α + βµ X = µ Y αµ X + βe(x 2 ) = E(XY ) Multiplying the first equation by µ X and subtracting it from the second equation, we get This equation simplifies to βe(x 2 ) βµ 2 X = E(XY ) µ X µ Y and so βσ 2 X = σ XY β = σ XY σ 2 X = ρ σ Y σ X Finally, substituting this expression into the first equation, we get α = µ Y ρ σ Y σ X µ X 29

Regression to the Mean Suppose that a large statistics class has two midterms. Let X denote the score that a random student gets on the first midterm and let Y denote the same student s score on the second midterm. Based on prior use of these two exams, the instructor has figured out how to grade them so that the average and variance of the scores are the same µ X = µ Y = µ, σ X = σ Y = σ But, those students who do well on the first midterm tend to do well on the second midterm, which is reflected in the fact that ρ > 0. From the calculations on the previous slide, we can estimate how a student will do on the second midterm based on his/her performance on the first one. Our estimate, denoted Ŷ, is Ŷ = µ ρµ + ρx We can rewrite this as Ŷ µ = ρ(x µ) In words, we expect the performance of the student on the second midterm to be closer by a factor of ρ to the average than was his/her score on the first midterm. This is a famous effect called regression to the mean. 30

Moment Generating Function We will skip the definition and details of Moment Generating functions. However, we will cover some of the important examples of this section. 31

Sum of Poissons Let X and Y be independent Poisson random variables with parameter λ and µ, respectively. Let Z = X + Y. Let s compute the probability mass function: P (Z = n) = P (X + Y = n) = = = n P (X = k, Y = n k) k=0 n P (X = k) P (Y = n k) k=0 n k=0 = e (λ+µ) 1 n! λ k k! e λ µ n k (n k)! e µ = e (λ+µ) n k=0 n k=0 λ k k! µ n k (n k)! ( ) n λ k µ n k = e (λ+µ)(λ + µ)n k n! Conclusion: The sum is Poisson with parameter λ + µ. The result can be extended to a sum of any number of independent Poisson random variables: X k Poisson(λ k ) = Poisson k X k k λ k 32

Sum of Normals Let X and Y be independent Normal(0,1) r.v. s and Z = X + Y. Compute Z s cdf: P (Z z) = P (X + Y z) = = 1 2π z x e x2 /2 Differentiating, we compute the density function for Z: f Z (z) = 1 2π = 1 2π = 1 2π e z2 /4 f(x)p (Y z x)dx e y2 /2 dydx e x2 /2 e (z x)2 /2 dx = 1 2π e (x z/2)2 +z 2 /4 z 2 /2 dx = 1 e x2 dx = 1 2π 2 e z2 /4 e x2 +xz z 2 /2 dx /4 2π e z2 e (x z/2)2 dx Conclusion: The sum is Normal with mean 0 and variance 2. The result can be extended to a sum of any number of independent Normal random variables: X k Normal(µ k, σk) 2 = X k Normal µ k, σ 2 k k k k 33

Sum of Gammas Let X and Y be independent r.v. s having Gamma distribution with parameters (n, λ) and (1, λ), respectively, and let Z = X + Y. Compute Z s cdf: P (Z z) = P (X + Y z) = = z 0 z λ n z x (n 1)! xn 1 e λx Differentiating, we compute the density function for Z: f Z (z) = z 0 λ n (n 1)! xn 1 e λx λe λ(z x) dx + = λn+1 (n 1)! e λz = λn+1 z n e λz n! z 0 0 0 f(x)p (Y z x)dx λe λy dy dx λ n (n 1)! zn 1 e λz z z x n 1 dx + 0 Conclusion: The sum is Gamma with parameters (n + 1, λ). 0 λe λy dy Induction: A Gamma random variable with parameters (n, λ) can always be interpreted as a sum of n independent exponential r.v. s with parameter λ. 34

Approximate Methods We will skip this section 35