ECON 535 Class Notes Review of Probability and Distribution Theory 1 Random Variables Definition. Let c represent an element of the sample space C of a random eperiment, c C. A random variable is a one-to-one function X = X(c). An outcome of X is denoted. Eample. Single Coin Toss C = {c = T ; c = H} X(c) = if c = T X(c) = 1 if c = H 1.1 Probability Density Function (pdf) Two types: 1. Discrete pdf. A function f() such that f(), and f() = 1. 2. Continuous pdf. A function f() such that f(), and f()d = 1. = See MATLAB eample #1 for an eample to calculate the area under a pdf. 1. Pr(X = ) = f() in the discrete case, and Pr(X = ) = in the continuous case. 2. Pr(a X b) = b =a f()d. 1.2 Cumulative Distribution Function (cdf) Two types: 1. Discrete cdf. A function F () such that X f() = F (). 2. Continuous cdf. A function F () such that f(t)dt = F (). 1. F (b) F (a) = b f(t)dt a f(t)dt where b a. 2. F () 1. 3. lim F () =. 1
4. lim + F () = 1. 5. If > y, F () F (y). 2 Mathematical Epectations Consider the continuous case only. 2.1 Mean Definition. The mean or epected value of g(x) is given by E[g(X)] = g()f()d. 1. E(X) = µ = f()d is called the mean of X or the first moment of the distribution. 2. E( ) is a linear operator. Let g(x) = a + bx. E[g(X)] = (a + b)f()d = = E(a) + E(bX) = a + be(x). af()d + bf()d 3. Other measures of central tendency: median, mode. 2.2 Variance Definition. The variance of g(x) is given by V ar[g(x)] = E[{g(X) E[g(X)]} 2 ] = {g() E[g()]} 2 f()d. 1. Let g(x) = X. We have V ar(x) = σ 2 = ( µ) 2 f()d = = E(X 2 ) 2µE(X) + µ 2 = E(X 2 ) µ 2. 2 f()d 2µ f()d + µ 2 f()d 2
2. V ar(x) is NOT a linear operator. Let g() = a + bx. V ar[g(x)] = {g() g(µ)} 2 f()d = b 2 ( µ) 2 f()d = b 2 V ar(x) = b 2 σ 2. 3. σ is called the standard deviation of X. 2.3 Other Moments The measure E(X r ) is called the r th moment of the distribution while E[(X µ) r ] is called the r th central moment of the distribution. r Central Moment Measure 1 E[(X µ)] = 2 E[(X µ) 2 ] = σ 2 variance (dispersion) 3 E[(X µ) 3 ] skewness (asymmetry) 4 E[(X µ) 4 ] kurtosis (tail thickness). Moment Generating Function (MGF). The MGF uniquely determines a pdf when it eists and is given by M(t) = E(e tx ) = e t f()d. The r th moment of a distribution is given by d r M(t) dt r t=. 2.4 Chebyshev s Inequality Definition. Let X be a random variable with σ 2 <. For any k >, Pr(µ kσ X µ + kσ) 1 1 k 2. Chebyshev s inequality is used to calculate upper (and lower) bounds on a random variable without having to know the eact distribution. Eample. Let X f() where f() = 1 2 3, 3 < < 3 3
and zero elsewhere. If we let k = 3/2, we get Cheb : Pr( 3/2 X 3/2) 1 1 = 5/9 =.55 (3/2) 2 Eact : Pr( 3/2 X 3/2) = 3/2 3/2 1 2 3 d = 1 2 [(3/2) ( 3/2)].866. 3 3 Specific Probability Distributions 3.1 Normal pdf If X has a normal distribution, then f() = 1 ( ) ( µ) 2 σ 2π ep 2σ 2 where < <. In short-hand notation, X N(µ, σ 2 ). 1. The normal pdf is symmetric. 2. Z = (X µ)/σ N(, 1) is called a standardized random variable and φ(z) = 1 2π ep(.5z 2 ) is called the standard normal distribution. 3. Linear transformations of normal random variables are normal. If Y = a + bx where X N(µ, σ 2 ), then Y N(a + bµ, b 2 σ 2 ). 3.2 Chi-square pdf If Z i, i = 1,..., n, are independently distributed N(, 1) random variables, Y = n i=1 Z2 i χ 2 (n) where E(Y ) = n and V ar(y ) = 2n. Eercise. Find the MGF for Y = Z 2 and use it to derive the mean and variance. Answer. We begin by calculating the MGF for Z 2 where t <.5: M(t) = E(e tz2 ) = e tz2 φ(z)dz = (2π).5 e (t.5)z2 dz = (2π).5 e.5(1 2t)z2 dz. 4
Now using the method of substitution, let w = (1 2t)z so that dw = (1 2t) 1/2 dz. Now making the substitution produces M(t) = (1 2t) 1/2 (2π).5 e.5w2 dw = (1 2t) 1/2. To calculate the mean, we take the first derivative of M(t) and evaluate at t = : µ = dm(t) t= = (1 2t) 3/2 t= = 1. dt To calculate the variance, we take the second derivative of M(t), evaluate at t =, and subtract µ 2 : σ 2 = [ d 2 ] M(t) dt 2 t= µ 2 = 3(1 2t) t= µ 2 = 2. 3.3 F pdf If X 1 and X 2 are independently distributed χ 2 (n i ) random variables, F = X 1/n 1 X 2 /n 2 F (n 1, n 2 ). 3.4 Student s t pdf If Z N(, 1) and X χ 2 (n) are independent, T = Z X/n t(n). 3.5 Lognormal pdf If X N(µ, σ 2 ) then Y = ep(x) has the distribution 1 f(y) = ep[.5( ln(y) µ ) 2 ] 2πσy σ for y. Sometimes this is written as y LN(µ, σ 2 ). The mean and variance of Y are E(Y ) = ep(µ + σ 2 /2) and V ar(y ) = ep(2µ + σ 2 )(ep(σ 2 ) 1). 5
1. If Y 1 LN(µ 1, σ 2 1) and Y 2 LN(µ 2, σ 2 2) are independent random variables, then Y 1 Y 2 LN(µ 1 + µ 2, σ 2 1 + σ 2 2). 3.6 Gamma pdf The gamma distribution is given by f() = 1 Γ(α)β α α 1 ep( /β) for <. The mean and variance are E(X) = αβ and V ar(x) = αβ 2. 1. Γ(α) = y α 1 ep( y)dy is called the gamma function, α >. 2. Γ(α) = (α 1)! if α is a positive integer. 3. Greene sets β = 1/λ and α = P. 4. When α = 1, you get the eponential pdf. 5. When α = n/2 and β = 2, you get the chi-square pdf. Eample. Gamma distributions are sometimes used to model waiting times. Let W be the waiting time until death for a human. Let W Gamma(α = 1, β = 8) so that the epected waiting time until death is 8 years. (Note: W Eponential(β)). Find the Pr(W 3). Pr(W 3) = 3 1 Γ(1)8 ep( w/8)dw = 1 8 3 ep( w/8)dw = 1 ( 8 ep( w/8)) 3 w= = [ep( 3/8) ep()] = 1.687 =.313. 8 3.7 Beta pdf If X 1 and X 2 are independently distributed Gamma random variables then Y 1 = X 1 + X 2 and Y 2 = X 1 /Y 1 are independently distributed. The marginal distribution f 2 (y 2 ) of f(y 1, y 2 ) is called the beta pdf: g(y) = Γ(α + β) Γ(α)Γ(β) (y/c)α 1 [1 (y/c)] β 1 (1/c) where y c. The mean and variance are E(Y ) = cα/(α + β) and V ar(y ) = c 2 αβ/(α + β + 1). 6
3.8 Logistic pdf The logistic distribution is f() = Λ() [1 Λ()] where < < and Λ() = (1 + ep( )) 1. The mean and variance are E(X) = and V ar(x) = π 2 /3. A useful property of the logistic distribution is that the cdf has a closed-form solution F () = Λ(). 3.9 Cauchy pdf If X 1 and X 2 are independently distributed N(, 1), then Y = X 1 /X 2 f(y) = 1 π(1 + y 2 ) where < y <. The mean and the variance of the Cauchy pdf do not eist because the tails are too thick. See See MATLAB eample #2 for an eample that graphs the Cauchy and standard normal pdfs. 3.1 Binomial pdf The distribution for successes in n trials is b(n, α, ) = ( ) n α (1 α) n where =, 1,..., n and α 1. The mean and variance of the binomial distribution are E(X) = nα and V ar(x) = nα(1 α). set n distinct objects is The combinatorial formula for the number of ways to choose objects from a ( ) n n! =!(n )!. 3.11 Poisson pdf The Poisson pdf is often used to model the number of changes in a fied interval. The Poisson pdf is f() = ep( λ)λ! where =, 1,... and λ >. The mean and variance are E(X) = V ar(x) = λ. 7
4 Distributions of Functions of Random Variables Let X 1, X 2,..., X n have joint pdf f( 1,..., n ). What is the distribution of Y = g(x 1, X 2,..., X n )? To answer this question, we will use the change-of-variable technique. Change of Variable Technique. Let X 1 and X 2 have joint pdf f( 1, 2 ). Let Y 1 = g 1 (X 1, X 2 ) and Y 2 = g 2 (X 1, X 2 ) be the transformed random variables. If A is the set where f >, then let B be the set defined by the one-to-one transformation of A to B. Then g(y 1, y 2 ) = f(h 1 (y 1, y 2 ), h 2 (y 1, y 2 )) abs(j) where (y 1, y 2 ) B, 1 = h 1 (y 1, y 2 ), 2 = h 2 (y 1, y 2 ) and J = 1 1 y 1 y 2 2 2 y 1 y 2. Eample. Let X 1 and X 2 be uniformly distributed on X i 1. The random sample X 1, X 2 is jointly distributed f( 1, 2 ) = f 1 ( 1 )f 2 ( 2 ) = 1 over 1, 2 1 and zero elsewhere. Find the joint distribution of Y 1 = X 1 + X 2 and Y 2 = X 1 X 2. Answer. We know that 1 = h 1 (y 1, y 2 ) =.5(y 1 + y 2 ) and 2 = h 2 (y 1, y 2 ) =.5(y 1 y 2 ). We also know that.5.5 J = =.5..5.5 Therefore, g(y 1, y 2 ) = f 1 (h 1 (y 1, y 2 ))f 2 (h 1 (y 1, y 2 )) abs(j) =.5 where (y 1, y 2 ) B and zero elsewhere. 5 Joint Distributions 5.1 Joint pdfs and cdfs A joint pdf for X 1 and X 2 gives Pr(X 1 = 1, X 2 = 2 ) = f( 1, 2 ). property f( 1, 2 )d 2 d 1 = 1 and f( 1, 2 ) for all 1 and 2. A proper joint pdf will have the A joint cdf for X 1 and X 2 is Pr(X 1 1, X 2 2 ) = F ( 1, 2 ) = 1 2 f(t 1, t 2 )dt 2 dt 1. 8
5.2 Marginal Distributions The marginal pdf of X 1 is found by integrating over all X 2 : f 1 ( 1 ) = f( 1, 2 )d 2 and likewise for X 2. Eample. Let X 1 and X 2 have joint pdf f( 1, 2 ) = 2, < 1 < 2 < 1 and zero elsewhere. Is this a proper pdf? 1 1 1 2d 2 d 1 = 1 [ 22 1 ] 1 2= 1 d1 = 2(1 1 )d 1 = 2 1 1 1= 2 1 1 1= = 2 1 = 1. So yes, this is a proper pdf. The marginal distribution for X 1 is f 1 ( 1 ) = 1 1 2d 2 = 2 2 1 2= 1 = 2(1 1 ), < 1 < 1 and zero elsewhere. The marginal distribution for X 2 is f 2 ( 2 ) = 2 2d 1 = 2 1 2 1= = 2 2, < 2 < 1 and zero elsewhere. See MATLAB eample #4 for a graphical eample of a joint and marginal pdf. 1. Two random variables are stochastically independent if and only if f 1 ( 1 )f 2 ( 2 ) = f( 1, 2 ). 2. In our eample, X 1 and X 2 are not independent because f 1 ( 1 )f 2 ( 2 ) = 4 2 4 1 2 2 = f( 1, 2 ). 3. Moments (e.g., means and variances) in joint distributions are calculated using marginal densities (e.g., E(X 1 ) = 1 f 1 ( 1 )d 1. 5.3 Covariance and Correlation Definition. The covariance between X and Y is cov(x, Y ) = E [ (X µ )(Y µ y ) ] = E(XY ) µ µ y. 9
Definition. The correlation coeffi cient between X and Y removes the dependence on the unit of measurement: ρ = corr(x, Y ) = cov(x, Y ) σ σ y where 1 ρ 1. 1. If X and Y are independent, then cov(x, Y ) = : cov(x, Y ) = E(XY ) µ µ y = yf ()f y (y)dyd µ µ y = f ()d yf y (y)dy µ µ y = µ µ y µ µ y =. 2. However, cov(x, Y ) = does not imply stochastic independence. Consider the following joint distribution table y 1 1 f () 1 1/3 1/3 1/3 1/3 1 1/3 1/3 f y (y) 1/3 2/3 where µ =, µ y = 2/3 and cov(x, Y ) = ( µ )(y µ y )f(, y) = ( 1)(1/3)(1/3) + ()( 2/3)(1/3) + (1)(1/3)(1/3) =. However, X and Y are not independent because for (, y) = (, ) we have f ()f y () = 1/9 f(, ) = 1/3. 6 Conditional Distributions Definition. The conditional pdf for X given Y is f( y) = f(, y) f y (y). 1
1. If X and Y are independent, f( y) = f () and f(y ) = f y (y). 2. The conditional mean is E(X Y ) = f( y)d = µ y. 3. The conditional variance is V ar(x Y ) = ( µ y ) 2 f( y)d. 7 Multivariate Distributions Let X = (X 1,..., X n ) be a (n 1) column vector of random variables. The mean and variance of X is µ = E(X) = (µ 1,..., µ n ) and σ 11 σ 12 σ 1n Σ = V ar(x) = E[(X µ)(x µ) σ 21 σ 22 σ 2n ] =..... σ n1 σ n2 σ nn 1. Let W = A + BX. Then E(W ) = A + BE(X). 2. The variance of W is V ar(w ) = E[(W E(W ))(W E(W )) ] = E[(BX BE(X))(BX BE(X)) ] = E[B(X E(X))(X E(X)) B ] = BΣB. 7.1 Multivariate Normal Distributions Let X = (X 1,..., X n ) N(µ, Σ). The form of the multivariate normal pdf is f() = (2π) n/2 Σ 1/2 ep[.5( µ) Σ 1 ( µ)]. See MATLAB eample #5 for an eample of a bivariate normal density function. 7.2 Quadratic Form in a Normal Vector If (X µ) is a normal vector, then the quadratic form Q = (X µ) Σ 1 (X µ) χ 2 (n). 11
Proof. The moment generating function of Q is M(t) = E(e tq ) = = (2π) n/2 Σ 1/2 ep[t( µ) Σ 1 ( µ).5( µ) Σ 1 ( µ)]d 1 d n (2π) n/2 Σ 1/2 ep[.5( µ) (1 2t)Σ 1 ( µ)]d 1 d n. Net, multiply and divide by (1 2t) n/2 : M(t) = (2π) n/2 Σ/(1 2t) 1/2 ep[.5( µ) (1 2t)Σ 1 ( µ)]d 1 d n (1 2t) n/2 = (1 2t) n/2, t <.5. The numerator is the integral of a multivariate normal random distribution with variance Σ/(1 2t) and so it equals one. M(t) then simplifies to the MGF for a χ 2 (n) random variable. 7.3 A Couple of Important Theorems 1. Let X N(, I) and A 2 = A (i.e., A is idempotent). X AX χ 2 (r) where the rank of A is r. 2. Let X N(, I). X AX and X BX are stochastically independent iff A B =. 12