Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University
ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real number X(ω) to each outcome ω in the sample space of a random experiment. S is the domain, and S X = {X(ω) : ω S} is the range of r.v. X Example: toss a coin S = {H,T } X(H) = 0,X(T) = 1,S X = {0, 1} P(H) = P(X = 0) = 0.5,P(T) = P(X = 1) = 0.5 measure the temperature, S = {ω 10 < ω < 30} X(ω) = ω, then S X = {x 10 < x < 30} What is P(X = 25)?
ENCS6161 p.2/47 Cumulative Distribution Function F X (x) = P(X x) = P(ω : X(ω) x) < x < Properties of F X (x) 1. 0 F X (x) 1 2. lim x F X (x) = 1 3. lim x F X (x) = 0 4. F X (x) is nondecreasing, i.e., if a < b then F X (a) F X (b) 5. F X (x) is continuous from right, i.e., for h > 0 F X (b) = lim h 0 F X (b + h) = F X (b + ) (see examples below)
ENCS6161 p.3/47 Cumulative Distribution Function Properties of F X (x) 6. P(a < X b) = F X (b) F X (a) for a b Proof: {X a} {a < X b} = {X b} {X a} {a < X b} = so P {X a} + P {a < X b} = P {X b} P {a < X b} = F X (b) F X (a) 7. P {X = b} = F X (b) F X (b ) Proof: P {X = b} = lim P {b ε < X b} ε 0 = F X (b) lim F X (b ɛ) ε 0 = F X (b) F X (b )
ENCS6161 p.4/47 Cumulative Distribution Function Example: roll a dice S = {1, 2, 3, 4, 5, 6}, X(ω) = ω 0 x < 1 1 6 1 x < 2 2 6 2 x < 3 F X (x) =. 5 6 5 x < 6 1 x 6 check properties 5,6,7. 1 0 F X (x) 1 2 3 4 5 6 x
ENCS6161 p.5/47 Cumulative Distribution Function Example: pick a real number between 0 and 1 uniformly F X (x) F X (x) = 0 x < 0 x/1 0 x 1 1 x > 1 1 0 1 2 x P {X = 0.5} =?
ENCS6161 p.6/47 Three Types of Random Variables 1. Discrete random variables S X = {x 0,x 1, } finite or countable Probability mass function (pmf ) P X (x k ) = P {X = x k } F X (x) = k P X(x k )u(x x k ) where u(x) = { 0 x < 0 1 x 0 see the example of rolling a dice F X (x) = 1 6 u(x 1) + 1 6 u(x 2) + + 6 1 u(x 6)
ENCS6161 p.7/47 Three Types of Random Variables 2. Continuous random variables F x (x) is continuous everywhere P {X = x} = 0 for all x 3. Random variables of mixed type F X (x) = pf 1 (x) + (1 p)f 2 (x), where 0 < p < 1 F 1 (x) CDF of a discrete R.V F 2 (x) CDF of a continuous R.V Example: toss a coin if H generate a discrete r.v. if T generate a continuous r.v.
ENCS6161 p.8/47 Probability density function PDF, if it exists is defined as: f X (x) = df X(x) dx f X (x) F X(x+ x) F X (x) x = P {x<x x+ x} x Properties of pdf (assume continuous r.v.) 1. f X (x) 0 2. P {a X b} = b a f X(x) dx 3. F X (x) = x f X(t) dt 4. + f X(t) dt = 1 "density"
ENCS6161 p.9/47 Probability density function Example: uniform R.V f X (x) f X (x) = { 1 b a a x b 0 elsewhere 1 b a x a b F X (x) F X (x) = 0 x < a x a b a a x b 1 x > b 1 a b x
ENCS6161 p.10/47 Probability density function Example: if f X (x) = ce α x, < x < + 1. Find constant c + f X (x) dx = 2 + 0 ce αx dx = 2c α = 1 c = α 2 2. Find P { X < v} P { X < v} = α 2 v v e α x dx = 2 α 2 v 0 e αx dx = 1 e αv
ENCS6161 p.11/47 PDF for a discrete r.v. For a discrete r.v. F X (x) = k P X(x k )u(x x k ) We define the delta function δ(x) s.t. u(x) = x du(x) δ(t) dt and δ(x) = dx So pdf for a discrete r.v. f X (x) = k P X(x k )δ(x x k ) Example: f X (x) 1 6 0 1 2 3 4 5 6 x
ENCS6161 p.12/47 Conditional CDF and PDF Conditional CDF of X given event A F X (x A) = P {X x A} = P {X x A}, P(A) > 0 P(A) The conditional pdf f X (x A) = df X(x A) dx
ENCS6161 p.13/47 Conditional CDF and PDF Example: the life time X of a machine has CDF F X (x), find the conditional CDF & PDF given the event A = {X > t} F X (x X > t) = P {X x X > t} = = P {X x X > t} P {X > t} { 0 x t F X (x) F X (t) 1 F X (t) x > t f X (x X > t) = { 0 x t f X (x) 1 F X (t) X > t
ENCS6161 p.14/47 Important Random Variables Discrete r.v. 1. Bernoulli r.v. 2. Binomial r.v. 3. Geometric r.v. 4. Poisson r.v. Continuous r.v. 1. Uniform r.v. 2. Exponential r.v. 3. Gaussian (Normal) r.v.
ENCS6161 p.15/47 Bernoulli r.v. S X = {0, 1} P X (0) = 1 p P X (1) = p e.g toss a coin Let S be a a sample space and A S be an event with P(A) = p. The indicator { function of A 0 if ω A I A (ω) = 1 if ω A I A is a r.v. since it assigns a number to each outcome of S I A is a Bernoulli r.v. P IA (0) = P {ω A} = 1 P(A) = 1 p P IA (1) = P {ω A} = P(A) = p
ENCS6161 p.16/47 Binomial r.v. Repeat a random experiment n times independently and let X be the number of times that event A with P(A) = p occurs S X = {0, 1,,n} Let I j be the indicator function of the event A in jth trial X = I 1 + I 2 + I 3 + I n so X is a sum of Bernoulli r.v.s, where I 1,I 2,,I n are i.i.d r.v.s (independent identical distribution) X is called Binomial ( ) r.v. n P {X = k} = p k (1 p) n k, k = 0, 1,,n k
ENCS6161 p.17/47 Geometric r.v. Count the number X of independent Bernoulli trials until the first occurrence of a success. X is called the geometric r.v. S X = {1, 2, 3, } Let p = P(A) be the prob of "success" in each Bernoulli trial P {X = k} = (1 p) k 1 p, k = 1, 2, The geometric r.v. has the memoryless property: P {X k + j X > j} = P {X k} for all j,k 1 prove it by yourself
ENCS6161 p.18/47 Poisson r.v. Poisson r.v. has the pmf: P {N = k} = αk k! e α k = 0, 1, 2, A good model for the number of occurrences of an event in a certain time period. α is the avg number of event occurrences in the given time period. Example: number of phone calls in 1 hour, number of data packets arrived to a router in 10 mins. Poisson prob can be used to approximate a Binomial prob. when n is large and p is small. Let α = np ( n P k = )p k (1 p n k αk k k! e α k = 0, 1,,n
ENCS6161 p.19/47 Uniform r.v. Read on your own.
ENCS6161 p.20/47 Exponential r.v. f X (x) = { 0 x < 0 λe λx x 0 F X (x) = { 0 x < 0 1 e λx x 0 Good model for the lifetime of a device. Memoryless: for t,h > 0 P {X > t + h X > t} = = P {X > t + h X > t} P {X > t} P {X > t + h} P {X > t} = e λ(t+h) e λt = P {X > h} = e λh
ENCS6161 p.21/47 Gaussian (Normal) r.v. f X (x) = 1 2πσ e (x m)2 2σ 2 for < x < m, σ are two parameter (will be discussed later) CDF: F X (x) = P {X x} 1 x = 2πσ = x m 1 σ 2π m) 2 e (x 2σ 2 dx e t2 2 dt If we define Φ(x) = 1 x 2π e t2 2 dt Then ( ) x m F X = Φ σ x m (t = ) σ
ENCS6161 p.22/47 Gaussian (Normal) r.v. We usually use Q(x) = 1 Φ(x) = 1 2π x e t2 2 dt. f X (x) Φ(z) Q(z) z x Φ( x) = Q(x) = 1 Φ(x) Q( x) = Φ(x) = 1 Q(x)
ENCS6161 p.23/47 Functions of a random variable If X is a r.v., Y = g(x) will also be a r.v. The CDF of Y F Y (y) = P {Y y} = P {g(x) y} When x = g 1 (y) exists and is unique, the PDF of Y f Y (y) = f X(x) dy/dx = f X (x) dx x=g 1 (y) dy Note: f X (x) x f Y (y) y f Y (y) f X (x) x y f X(x) dx dy In general, if y = g(x) has n solutions x 1,x 2,,x n n f X (x) n f Y (y) = dy/dx = f X (x) dx x=xk dy k=1 k=1 x=xk
ENCS6161 p.24/47 Functions of a random variable Example: Find the pdf of Y = ax + b, in terms of pdf of X. Assume a 0 and X is continuous. F Y (y) = P {Y y} = P {ax + b y} { P {X y b = a } = F X( y b a ) if a > 0 P {X y b a } = 1 F X( y b a { ) if a < 0 f Y (y) = df Y (y) 1a f X ( y b = a ) if a > 0 dy a 1f X( y b a ) if a < 0 = 1 ( ) y b a f X a Example: Let X N(m,σ 2 ) i.e. f X (x) = 1 2πσ e (x m)2 2σ 2 if Y = ax + b (a 0) then 1 f Y (y) = e (y b am) 2(aσ) 2 2πσ a 2
ENCS6161 p.25/47 Functions of a random variable Example: Y = X 2 where X is a continuous r.v. F Y (y) = P {Y y} = P {X 2 0 y < 0 y} = P { y X y} y 0 0 y < 0 = F X ( y) F X ( y) y 0 So f Y (y) = If X N(0, 1) then 0 y < 0; f X ( y) 2 y + f X( y) 2 y y 0 f X (x) = 1 2π e x2 2 fy (y) = e y 2 2πy for y 0 ch-square r.v. with one degree of freedom
ENCS6161 p.26/47 Expected Value E[X] = xf X(x)dx (may not exist) for continuous r.v. E[X] = k x kp X (x k ) for discrete r.v. Example: uniform r.v. E[X] = b a x 1 b a dx = b + a 2 When f X (x) is symmetric about x = m then E[X] = m. (e.g., Gaussian r.v.) m x: odd symmetric about x = m (m x)f X (x): odd symmetric x = m (m x)f X(x)dx = 0 m = m f X(x)dx = xf X(x)dx = E[X]
ENCS6161 p.27/47 Expected Value Example: the arrival time of packets to a queue has exponential pdf f X (x) = λe λx for x 0 E[X] = 0 = 1 λ e λx xλe λx dx = xe λx 0 = 1 λ 0 + 0 e λx dx
ENCS6161 p.28/47 Expected Value For a function Y = g(x) of r.v. X E[Y ] = g(x)f X (x)dx Example: Y = a cos(ωt + φ) where φ is uniformly distributed in [0, 2π]. Find E[Y ] and E[Y 2 ]. E[Y ] = 2π 0 1 2π a cos(ωt+φ)dφ = a 2π sin(ωt + φ) 2π 0 = 0 E[Y 2 ] = E[a 2 cos 2 (ωt+φ)] = E [ a 2 2 + a2 2 ] cos(2ωt + 2φ) = a2 2
ENCS6161 p.29/47 Variance of a random variable Var[X] = E[(X E[X]) 2 ] = E[X 2 ] E[X] 2 Example: uniform r.v. Var(X) = Let y = x a+b 2 b a 1 b a ( x a + b ) 2 dx 2 Var(X) = 1 b a b a 2 b a 2 y 2 dy = (b a)2 12
ENCS6161 p.30/47 Variance of a random variable Example: For a Gaussian r.v. X N(m,σ 2 ) Var(X) = = = (x m) 2 f X (x)dx (x m) 2 1 e (x m)2 2σ 2 dx 2πσ (x m) σ 2π d(e (x m)2 2σ 2 ) = (x m) σ e (x m)2 2σ 2 2π = σ 2 + 1 2πσ e (x m)2 2σ 2 dx = σ 2 σ e (x m)2 2σ 2 2π dx
ENCS6161 p.31/47 Variance of a random variable Some properties of variance Var[C] = 0 Var[X + C] = Var[X] Var[CX] = C 2 Var[X] nth moment of a random variable E[X n ] = x n f X (x)dx
ENCS6161 p.32/47 Markov Inequality CDF,PDF u,σ 2, how to u,σ 2? CDF PDF The Markov inequality: For non-negative r.v. X P[X a] E[X] for a > 0 a Proof: E[X] = 0 a xf X (x)dx = xf X (x)dx a 0 xf X (x)dx + P[X a] E[X] a a a xf X (x)dx af X (x)dx = ap[x a]
ENCS6161 p.33/47 Chebyshev Inequality Chebyshev inequality Proof: P { x m a} σ2 a 2 P { x m a} = P {(x m) 2 a 2 } E[(x m)2 ] a 2 = σ2 a 2 Markov inequality
ENCS6161 p.34/47 Transform Methods The characteristic function Φ X (ω) = E[e jωx ] = f X (x) e jωx dx j = 1 Fourier transform of f X (x) (with a reversal in the sign of exponent) f X (x) = 1 Φ X (ω) e jωx dx 2π Example: Exponential r.v., f X (x) = λ e λx,x 0 Φ X (ω) = 0 λ e λx e jωx dx = 0 λ e (λ jω)x dx = λ λ jω
ENCS6161 p.35/47 Characteristic Function If X is a discrete r.v., Φ X (ω) = k P X (x k ) e jωx k If x k are integer-valued, Φ X (ω) = k= P X (k) e jωk P X (k) = 1 2π 2π 0 Φ X (ω) e jωk dω
ENCS6161 p.36/47 Characteristic Function Example: For a geometric r.v., P X (k) = p(1 p) k, k = 0, 1, Φ X (ω) = = p = p(1 p) k e jωk k=0 [(1 p)e jω ] k k=0 p 1 (1 p)e jω
ENCS6161 p.37/47 Characteristic Function Moment Throrem: Proof: Φ X (ω) = so, E[X n ] = 1 d n j n dω Φ X(ω) n f X (x) e jωx dx = = 1 + jωe[x] + (jω)2 2! E[X 2 ] + ω=0 f X (x)(1 + jωx + (jωx)2 2! Φ X (ω) ω=0 = 1 (note: Φ X (0) = E[e 0 ] = E[1] = 1) d dω Φ X(ω) ω=0 = je[x] = E[X] = 1 d j dω Φ X(ω) ω=0. d n dω n Φ X(ω) ω=0 = j n E[X n ] = E[X n ] = 1 j n d n dω n Φ X(ω) ω=0 + )dx
ENCS6161 p.38/47 Characteristic Function Example: For an exponential r.v., so, Φ X (ω) = λ λ jω d dω Φ X(ω) = λj (λ jω) 2 = E[X] = 1 j d dω Φ X(ω) ω=0 = 1 λ Find E[X 2 ] and σ 2 by yourself.
ENCS6161 p.39/47 Probability Generating Function For non-negative, integer-valued r.v. N, G N (Z) = E[Z N ] = P N (k)z k k=0 which is the Z-transform of the pmf. P N (k) = 1 d k k! dz k G N(Z) Z=0 To find the first two moment of N : d dz G N(Z) Z=1 = d 2 dz 2 G N(Z) Z=1 = N (k)kz k=0p k 1 Z=1 = k k=0p N (k)k(k 1)Z k 2 Z=1 = k = E[N 2 ] E[N] kp N (k) = E[N] k(k 1)P N (k)
ENCS6161 p.40/47 Probability Generating Function Example: For a Poisson r.v., P N (k) = αk k! e α G N (Z) = k=0 α k k! e α Z k = e α k=0 (αz) k k! = e α e αz = e α(z 1) G N (Z) = αeα(z 1) G N (Z) = α2 e α(z 1) = E[N] = G N (1) = α E[N2 ] E[N] = G N (1) = α2 E[N 2 ] = α 2 + E[N] = α 2 + α V ar[n] = E[N 2 ] E[N] 2 = α 2 + α α 2 = α
ENCS6161 p.41/47 Entropy Let X be a r.v. with S X = {1, 2,,k} Uncertainty or information of X = k I(X = k) = log 1 P X (k) = log P X(k) Properties: 1. P X (k) small, I(X = k) large, more information 2. Additivity, if X, Y independent, P(X = k,y = m) = P(X = k)p(y = m) I(X = k,y = m) = log P(X = k,y = m) = log P(X = k) log P(Y = m) = I(X = k) + I(Y = m)
ENCS6161 p.42/47 Entropy Example: toss coins I(X = H) = log 2 1 2 = 1 (1 bit of info) I(X 1 = H,X 2 = T) = log 2 1 4 = 2 (bits) If the base of log is 2, we call the unit "bits" If the base of log is e, we call the unit "nats"
ENCS6161 p.43/47 Entropy Entropy: H(X) = E [ log ] 1 P X (k) = k P X (k) log P X (k) For any r.v. X with S X = {1, 2,,K} H(X) log K with equality iff P k = K 1, k = 1, 2,,K Read the proof by yourself.
ENCS6161 p.44/47 Entropy Example: r.v. X with S X = {1, 2, 3, 4} and The entropy: P 1 2 1 2 3 4 1 4 1 8 1 8 H(X) = 1 2 log 1 2 1 4 log 1 4 1 8 log 1 8 1 8 log 1 8 = 7 4
ENCS6161 p.45/47 Entropy Source Coding Theorem: the minimum average number of bits required to encode a source X is H(X). Example: 0 1 10 2 110 3 111 4 Average number of bits = 1 1 2 +2 1 4 +3 1 8 +3 1 8 = 7 4 = H(X). 00 1 01 2 10 3 11 4 Average number of bits = 2 > H(X).
ENCS6161 p.46/47 Differential Entropy For a continuous r.v., since P {X = x} = 0 the entropy is infinite. Differential entropy for continuous r.v. H(X) = f X (x) log f X (x)dx = E[ log f X (x)] Example: X is uniform in [a,b] [ ] 1 H X = E log b a = log(b a)
ENCS6161 p.47/47 Differential Entropy Example: X N(m,σ 2 ) f X (x) = 1 e (x m)2 2σ 2 2πσ H X = E[log f X (x)] = E [ log 1 2πσ (x m)2 2σ 2 ] = 1 2 log(2πσ2 ) + 1 2 = 1 2 log(2πeσ2 )