ACM 116: Lectures 3 4

Similar documents
LIST OF FORMULAS FOR STK1100 AND STK1110

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

Bivariate distributions

Chapter 5. Chapter 5 sections

Formulas for probability theory and linear models SF2941

Random Variables and Their Distributions

5 Operations on Multiple Random Variables

Continuous Random Variables

Chapter 5 continued. Chapter 5 sections

3. Probability and Statistics

Probability and Statistics Notes

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Lecture 2: Repetition of probability theory and statistics

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

18.440: Lecture 28 Lectures Review

Notes for Math 324, Part 19

18.440: Lecture 28 Lectures Review

Chapter 4. Chapter 4 sections

Multivariate Random Variable

Lecture 14: Multivariate mgf s and chf s

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Chapter 2. Discrete Distributions

Lecture 1: August 28

Let X and Y denote two random variables. The joint distribution of these random

4 Moment generating functions

The Multivariate Normal Distribution. In this case according to our theorem

STAT Chapter 5 Continuous Distributions

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring Luc Rey-Bellet

Stat 5101 Notes: Brand Name Distributions

Multiple Random Variables

1 Basic continuous random variable problems

MATH/STAT 3360, Probability Sample Final Examination Model Solutions

Chp 4. Expectation and Variance

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 19: Properties of Expectation

BASICS OF PROBABILITY

A Probability Review

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

1 Random Variable: Topics

MAS113 Introduction to Probability and Statistics. Proofs of theorems

STA2603/205/1/2014 /2014. ry II. Tutorial letter 205/1/

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

ORF 245 Fundamentals of Statistics Chapter 4 Great Expectations

Statistical Methods in Particle Physics

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

CDA5530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

Problem #1 #2 #3 #4 Total Points /5 /7 /8 /4 /24

ECSE B Solutions to Assignment 8 Fall 2008

18 Bivariate normal distribution I

2 (Statistics) Random variables

Chapter 4 : Expectation and Moments

Exam P Review Sheet. for a > 0. ln(a) i=0 ari = a. (1 r) 2. (Note that the A i s form a partition)

Math 510 midterm 3 answers

ENGG2430A-Homework 2

1 Presessional Probability

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Probability Background

Actuarial Science Exam 1/P

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Multivariate probability distributions and linear regression

Conditional distributions. Conditional expectation and conditional variance with respect to a variable.

EE4601 Communication Systems

Lecture 1: Review on Probability and Statistics

Probability and Distributions

MATH Notebook 4 Fall 2018/2019

Chapter 5 Class Notes

Probability- the good parts version. I. Random variables and their distributions; continuous random variables.

Lecture 2: Review of Probability

1 Solution to Problem 2.1

Lecture 11. Multivariate Normal theory

Lecture 5: Moment generating functions

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Stat 5101 Notes: Algorithms

STA 256: Statistics and Probability I

1.1 Review of Probability Theory

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Tom Salisbury

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Class 8 Review Problems solutions, 18.05, Spring 2014

Joint Distributions. (a) Scalar multiplication: k = c d. (b) Product of two matrices: c d. (c) The transpose of a matrix:

Sampling Distributions

Sampling Distributions

Lecture 6: Special probability distributions. Summarizing probability distributions. Let X be a random variable with probability distribution

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

Bivariate Paired Numerical Data

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Stochastic Models of Manufacturing Systems

Stochastic Processes and Monte-Carlo Methods. University of Massachusetts: Spring 2018 version. Luc Rey-Bellet

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

STAT 430/510: Lecture 16

1. Let X be a random variable with probability density function. 1 x < f(x) = 0 otherwise

Transcription:

1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance of a random variable Covariance Conditional Expectation

Joint distributions 2 Two random variables X and Y. Interested in their joint outcome. (X, Y ) = (x, y) X = x and Y = y. Outcomes for Y Outcomes for X Example : Server on the web X : Y : # customers hitting your server in the next hour # customers in the next 10 minutes or # customers who will purchase an item in the next hour

3 Joint frequency function X, Y discrete r.v. s taking on values x 1, x 2,... and y 1, y 2,... resp. Joint frequency (distribution) function : p(x i, y j ) = P (X = x i, Y = y j ) Marginal probabilities P (X = x i ) = j P (X = x i, Y = y j ) Why? Addition rule

4 Joint frequency function Several random variables : similar story X 1,... X m r.v. s defined on the same sample space p(x 1,... x m ) = P (X 1 = x 1,..., X m = x m ) p X1 (x 1 ) = p X1,X 2 (x 1, x 2 ) = p(x 1, x 2,..., x m ) x 2,...,x m p(x 1, x 2, x 3,..., x m ) x 3,...,x m

5 Example: the multinomial distribution n fixed independent experiments resulting in one of r possible outcomes. Probabilities p 1,..., p r X i, 1 i r, # outcomes of type i. Joint distribution of the X i s? P (X 1 = x 1,..., X r = x r ) = n! x 1!... x r! px 1 1... px r r Why?

6 Example : multinomial distribution Outcome of experiments such that (X 1 = x 1,..., X r = x r ) has prob. p x 1 1... px r r (all equally likely). Indeed, First x 1 Type 1 p x 1 1 Next x 2 Type 2 p x 2 2... Last x r Type r p x r r How many configurations? ( )( ) n n x1... x 1 x 2 ( xr x r ) = n! x 1!... x r!

Joint distributions: the continuous case 7 X and Y continuous r.v. s Joint density function p(x, y) s.t. P ((x, y) A) = for any reasonable set A. A f(x, y) dxdy A P ((X, Y ) A) = this volume

8 Interpretation Joint distributions: the continuous case P (x X x + dx, y Y y + dy) f(x, y) dxdy f(x,y) x y Marginal density of X : f X (x) = f XY (x, y) dy

9 Joint distributions: the continuous case Example Disk of radius 1. 1 f(x, y) = 1 π if x 2 + y 2 1 0 otherwise R: distance from point to origin. Density of R?

Joint distributions: the continuous case 10 R: distance from point to origin. Density of R? X : first coordinate. Density of X? P (R r) = πr2 π = r2 2r if 0 r 1 f R (r) = 0 otherwise f X (x) = f XY (x, y) dy = 1 x 2 1 x 2 1 π dy = 2 π 1 x 2 f X (x) = 2 π 1 x 2 if 1 x 1 0 otherwise

11 Bivariate normal density 5 parameters : < µ x, µ y <, 0 < σ x, σ y <, 1 ρ 1 Extensively used in modeling Galto, mid 19th century : modeling of the heights of fathers and sons, e.g. X = father s height Y = son s height Density x = (x, y), Σ = σ2 x ρσ x σ y ρσ x σ y σ 2 y µ = (µ x, µ y ) is the vector of means is the covariance matrix f XY (x, y) = f X (x) = 1 2πdet(Σ) e 1 2 (x µ)t Σ 1 (x µ)

12 Bivariate normal density Level sets = ellipses Interesting calculation : marginals X N(µ x, σ 2 x ) Y N(µ y, σ 2 y )

13 Independent random variables Two random variables are independent if P (X A, Y B) = P (X A)P (Y B) for any subsets A and B. Discrete r.v. s : Independence iff P (X = x, Y = y) = P (X = x)p (Y = y) Continuous r.v. s : Independence iff f XY (x, y) = f X (x)f Y (y) (Complicated)

14 Example Node of a communication network If two packets of info arrive within time τ of each other they collide and then have to be retransmitted. Times of arrival independent U[0, T ]. What is the probability that they collide?

T 1, T 2 independent U[0, T ]. Independent random variables 15 Joint distribution : f(t 1, t 2 ) = 1 T 2 uniform distribution over the square τ τ P (Collision) = 1 (1 τ T )2 = 2 τ T ( τ T )2 = τ T (2 τ T )

Conditional distributions 16 Discrete case : X and Y joint discrete r.v. s Conditional prob. of X = x i given Y = y j is P (X = x i Y = y j ) = P (X = x i, Y = y j ) P (Y = y j ) Notation : p X Y Continuous case : X and Y jointly continuous r.v. s Conditional density of Y given X is defined to be NB : f Y X (y x) = f XY (x, y) f X (x) P (y Y y + dy x X x + dx) = f XY (x, y) dxdy f X (x) dx = f XY (x, y) dy f X (x)

17 Conditional distributions Independence f Y X = f Y Joint density = conditional marginal f XY (x, y) = f Y X (y x)f X (x) Law of total probability f Y (y) = f Y X (y x)f X (x) dx NB : p X Y (x Y = y) is a distribution : p X Y (x Y = y) 0 all x p X Y (x Y = y) = 1 i.e. conditional distribution.

18 Example Imperfect particle detector Detects each incoming particle w.p. p only. N true number of particles. Poisson distribution X: detected number of particles λ λn P (N = n) = e n! What is the distribution of the detected number of particles?

19 Conditional distributions p X N=n Binomial(n, p) P (X = k) = P (N = n)p (X = k N = n) n=0 ( n )p k (1 p) n k λn e λ = n k k n! = (λp)k k! e λ n k n k (1 p)n k λ (n k)! = (λp)k k! λp (λp)k = e k! e λ e λ(1 p) Poi(λp)

20 Conditional distributions Example: Bivariate normal density X and Y bivariate normal distribution Distribution of Y X = x is normal too : Linear regression! N(µ y + ρ σ y σ x (x µ x ), (1 ρ 2 )σ 2 y )

21 Sampling & Monte Carlo Density function f we wish to sample from. f is nonzero on an interval [a, b] and zero outside the interval (a and b may be infinite). Let M be a function such that M(x) f(x) on [a, b] and let m(x) = M(x) b a M(x)dx The idea is to choose M so that it is easy to generate random variables from m. If [a, b] is finite, m can be chosen to be the uniform distribution.

22 Rejection sampling algorithm. Step 1: Generate T with the density m. Step 2: Generate U, uniform on [0, 1] and independent of T. If M(T ) U f(t ) accept, X = T. Otherwise reject, goto step 1 y REJECT ACCEPT a T b x Remark. High efficiency if algorithm accepts with high probability, i.e. M close to f.

Why does this work? 23 To show P (X A) = A f(t) dt P (X A) = P (T A Accept) = Condition on T = t (I = b a M(t)dt) P (T A and Accept) P (Accept) P (T A and Accept) = = = b a b a A P (T A and Accept T = t)f T (t) dt P (U f(t)/m(t) and t A)m(t) dt f(t) M(t) m(t) dt = 1 I A f(t) dt Similarly P (Accept) = b a P (Accept T = t)m(t) dt = b a f(t) M(t) m(t) dt = 1 I.

Example 24 Suppose we want to sample from a density whose graph is shown below. 1.8 Density f 1.6 1.4 1.2 1 f(x) 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x In this case we let M(T ) be the maximum of f over the interval [0, 1], namely M(x) = max(f), 0 x 1 so that m is the uniform density over the interval [0, 1].

25 Implementation N = 1000; t = (1:N)/N; M = max(f(t)); n = 2000; x = rejection_sampling(n,@f,m);

26 This routine will sample n iid samples from the density f function x = rejection_sampling(n,fun,m) x = []; for k = 1:n, OK = 0; while not(ok); T = rand(1); % Generate T U = rand(1); % Generate U if (M*U <= fun(t)), OK = 1; x = [x T]; end end end

27 Visualization of the results 250 Histogram of Sampled Data, Sample Size = 5000 200 150 frequency 100 50 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x Figure 1: Histogram of the Sampled Data

Variance and Standard Deviation 28 The Expected value can be viewed as an indication of the central value of the density or frequency function. The variance or standard deviation gives an indication about the dispersion of the probability distribution. The variance of random variable X is the mean square deviation from E[X] = µ Var(X) = E[(X µ) 2 ] = E[(X E[X]) 2 ]. The standard deviation is the square root of the variance SD(X) = Var(X). We often write µ = E[X] and σ 2 = Var(X) so that σ = SD(X). We may also calculate the variance as follows: Var(X) = E[X 2 ] (E[X]) 2.

Examples 29 Bernouilli: X Ber(p) Normal: X N(µ, σ 2 ) Var(X) = Var(X) = p(1 p) 1 2πσ = σ2 2π = σ 2. (x µ) 2 e (x µ)2 /(2σ 2) dx u 2 e u2 /2 du Remark: Y = a + bx Var(Y ) = b 2 Var(X) Example: X standard normal µ = 0, σ = 1 Y = µ + σx N(µ, σ 2 ) Then Var(Y ) = σ 2 Var(X) = σ 2.

30 Covariance and correlation The variance is a measure of the variability of a r.v. The covariance is a measure of the joint variability of two r.v s. The covariance of two jointly distributed r.v s (X, Y ) is given by Cov(X, Y ) = E[(X E[X])(Y E[Y ])] = E[(X µ X )(Y µ Y )]. Another formula Cov(X, Y ) = E[XY ] E[X]E[Y ].

31 Interpretation Positive covariance When X is larger than its mean, Y tends to be larger than its mean as well. When X is smaller than its mean, Y tends to be smaller than its mean as well. Negative covariance When X is larger than its mean, Y tends to be smaller than its mean. When X is smaller than its mean, Y tends to be larger than its mean.

Properties 32 1. Cov(X, X) = Var(X) 2. X and Y two r.v s: Var(X + Y ) = Var(X) + 2Cov(X, Y ) + Var(Y ) 3. If X and Y are independent, then Cov(X, Y ) = 0(= E[XY ] E[X]E[Y ]). 4. Let X 1, X 2,...X m be m r.v s and Y 1, Y 2,...Y n be n r.v s, let S 1 = m and S 2 = n i=1 Y i. Then X i i=1 m Var(S 1 ) = Var( X i ) = i=1 m i=1 Var(X i ) + 2 i<j Cov(X i, X j ), and Cov(S 1, S 2 ) = m n Cov(X i, Y j ). i=1 j=1

33 5. Suppose X 1, X 2,...X m are independent: Var(X 1 +... + X m ) = Var(X 1 ) +... + Var(X m ).

34 Example The variance of the binomial X Bin(n, p). X is the sum of n independent Bernouilli s and therefore X = I 1 +... + I n Var(X) = n Var(I i ) = i=1 n p(1 p) = np(1 p). i=1

Correlation 35 The correlation between two r.v s X and Y is defined by ρ = Cov(X, Y ) Var(X)Var(Y ). The correlation is a dimensionless quantity The correlation ρ between X and Y obeys 1 ρ 1. The correlation is a measure of the strength of the linear relationship existing between X and Y. ρ ± 1 ax + b = Y. Remark: (X, Y ) bivariate normal: the correlation between X and Y is ρ and Cov(X, Y ) = ρσ X σ Y

36 Conditional expectation The conditional expectation of Y given X = x is the mean of the conditional distribution of Y given X = x. For example, in the discrete case we have E[Y X = x] = y yp Y X (y x) and more generally the conditional expectation of h(y ) given X = x is E[h(Y ) X = x] = y h(y)p Y X (y x).

Example 37 X Binomial(n, p). Set m n and let Y be the number of successes in the first m trials. What are the conditional distribution and mean of Y given X = x? Conditional distribution: P (Y = y X = x) = ( m )( n m ) y x y ( n x). Conditional mean: Y = I 1 +... I m and E(Y X = x) = E(I 1 X = x) +... + E(I m X = x) = P (I 1 X = x) +... + P (I m X = x) = x/n +... + x/n = m n x

38 Conditional expectation as a random variable Assuming that the conditional expectation of Y given X = x exists for every x, it is a well defined function of X and, hence, a random variable. (The expectation of Y given X = x is a function depending on x : E[Y X = x] = g(x). E[Y X] = g(x) is r.v.) Example: E(Y X) = m n X. This random variable has an expectation, and a variance (proviso absolute convergence, etc.) Its expectation is E[E(Y X)]

39 Iterated Expectation Theorem E(Y ) = E[E(Y X)] Interpretation: the expectation of Y can be calculated by first conditioning on X, finding E(Y X) and then averaging this quantity with over X: E[Y ] = x E[Y X = x]p X (x) where E[Y X = x] = y yp Y X (y x). Example: E(Y X) = m n X and E(X) = np give E(Y ) = m E(X) = mp. n

40 Proof We need to show E(Y ) = x E(Y X = x)p (X = x) where E(Y X = x) = y yp Y X (y x). We have E(Y X = x)p X (x) = y y x P Y X (y x)p X (x) x = y yp Y (y) = E(Y ).

41 Random Sums Consider sums of the type where T = N X i, i=1 N is a r.v. with finite expectation E(N) <, the X i s are independent of N, with comon mean E[X i ] = E[X], i. Such sums arise in a variety of applications Insurance companies might receive N claims in a given period of time and the amounts of the individual claims may be modeled as r.v s X 1, X 2,.... N is the number of jobs in a single server queue and X i the service time for the ith job. T is the time to serve all the jobs in the queue.

42 E(T ) = all n E(T N = n)p (N = n) and i.e. ( n ) E(T N = n) = E X i = ne(x). i=1 E(T ) = ne(x)p (N = n) = E(N) E(X). n In other words, the average time to complete n jobs when n is random is the average value of N times the average time amount to complete a job.

43 The Moment Generating Function The moment generating function (mgf) of a random variable X is given by M(t) = E[e tx ] (In the continuous case, M(t) = e tx f(x) dx, Laplace transform of f) The mgf may not exist If X is Cauchy distributed (f(x) = (π(1 + x 2 )) 1 ), the mgf does not exist for any t 0. The mgf of a normal random variable exists for all t. Important property: if the mgf exists for t in an open interval containing zero, it uniquely determines the prob. distribution.

44 Why the Name MGF? e tx = 1 + tx + t2 X 2 E(e tx ) = 1 + te(x) + t2 E(X 2 ) 2 2 +... + tr X r r! +... +... + tr E(X r ) r! +... Suppose that the mgf exists in an open interval containing 0. Then M(0) = 1, M (0) = E(X),..., M (r) (0) = E(X r ). May be useful to compute all the moments of a distribution.

45 Example Mgf of the Poisson distribution This gives M(t) = k 0 = k 0 = e λ e λet. e tk λ λk e k! e λ (et λ) k k! M (t) = λe t M(t), M (t) = λe t M(t) + λ 2 e 2t M(t). It follows that if X is Poisson E(X) = λ, E(X 2 ) = λ 2 + λ Var(X) = λ.

46 Mgf of Sums of Independent Random Variables Very useful property: If X and Y are RV s with mgf s M X and M Y S = X + Y, then M Z (t) = M X (t)m Y (t) and on the common interval where both mgf s exist. Why? M S (t) = E(e t(x+y ) ) = E(e tx e ty ) = E(e tx )E(e ty ) = M X (t)m Y (t). Obvious extensions to sums of random variables.

47 Example The sum of two independent Poisson rv s is a Poisson rv. X Poi(λ) and Y Poi(µ), then X + Y Poi(λ + µ) M X+Y (t) = e λ e λet e µ e µet = e (λ+µ) e (λ+µ)et but this is the mgf of a Poisson rv with parameter λ + µ.