Basic concepts of probability theory

Basic concepts of probability theory Random variable discrete/continuous random variable Transform Z transform, Laplace transform Distribution Geometric, mixed-geometric, Binomial, Poisson, exponential, Erlang, Hyper-exponential, phasetype, Markovian arrival process 1

Random variable X is denoted as random variable Discrete random variable, if X is discrete Continuous random variable, if X is continuous Distribution function, or called cdf, cumulative distribution function F(x)=Pr(X<x) Probability density/mass function f(x)=pr(x=x), x is discrete (probability mass function, pmf) f(x)= F(x)/ x, x is continuous (pdf) 2

Random variable Mean: E(X) Variance: Var(X), or σ 2 (X) σ 2 (X) = E{(X-E(X)) 2 }=E(X 2 )-E 2 (X) Standard deviation: σ(x) Covariance of two random variables X, Y Cov(X,Y)=E{(X-E(X))(Y-E(Y))} Correlation coefficient of X, Y r(x,y)=cov(x,y)/ σ(x)σ(y) -1 r(x,y) 1 3

Coefficient of variation: Coefficient of variation: c X c X = σ(x)/e(x) c X =0: deterministic c X <1: smooth c X =1: pure random c X >1: bursty more than variance Figures as example (interarrival time) t t t 4

Discrete Random Variables Probability mass function(pmf), P(n) Pn ( ) = Pr[ X= n] where 0 Pn ( ) 1, Pn ( ) = 1 Cumulative distribution function(cdf), F(x) Fx ( ) = Pr[ X x] = Pn ( ) where F(0) = 0 F( ) = 1 n x Fb ( ) Fa ( ), if b a Fb ( ) Fa ( ) = Pr[ a< X< b] n Suppose the random variable X is positive 5

Discrete Random Variables Mean and Variance Mean: x = np(n) n Variance: 2 2 σ = ( n x) P(n) n 6

Continuous Random Variables Probability density function(pdf), f(x) f( x) 0 Pr[ a X b] = f ( x) dx 0 f ( x) dx = 1 b a Cumulative distribution function(cdf), F(x) F( x) = Pr[ X x] = f ( y) dy df( x) f( x) = dx 0 x 7

Continuous Random Variables Mean and Variance Mean: x = xf ( x) dx = xdf( x) 0 0 x = (1 F ( x )) dx 0 Variance: σ 2 2 2 2 = = ( x x) f ( x) dx x f ( x) dx x 0 0 8

Z-transform for discrete distribution Z-transform is also called generating function P(z): Z-transform of discrete r.v. X, p(n)=pr(x=n), assume n=0,1,2, Pz ( ) = Ez [ X ] = pnz ( ) n= 0 Property P(0) = p(0), P(1) = 1, P (1) = EX ( ) P (1) =? n 9

Laplace-transform for continuous distribution F*(s): Laplace transform of a continuous r.v. X pdf of X is f(x), cdf of X is F(x), assume x 0 * sx sx sx F s E e = = e f x dx = e df x 0 0 ( ) [ ] ( ) ( ) Property Shortcut to calculate the k-moment of X * * *( k) k k F(0) = 1, F (0) = EX ( ), F (0) = ( 1) EX ( ) * sx F () s = s e F() x dx 0 10

Geometric distribution p: success probability in a Bernoulli trial X: the number of Bernoulli trials needed to first get one success g(k;p)=pr{x=k}=p(1-p) k-1, k=1,2, Property E(X)=1/p Var(X)=(1-p)/p 2 c X =(1-p) ½ Curve of probability mass function Discrete version of exponential distribution 11

Mixed Geometric distribution n set of independent Bernoulli trials, with success probability p i, respectively, i=1,2,,n Mixed probability is θ i, i=1,2,,n X: the number of Bernoulli trials first get one success of any one from the n set trials n k pmf: gb( k; θ, p) = θipi(1 pi) EX ( ) = θi / n i= 1 p i i= 1 Discrete version of hyper-exponential distribution 12

Binomial distribution p: success probability in a Bernoulli trial X: the number of successes during n Bernoulli trials Probability function n r Prnp ( ;, ) = Pr[ X= r] = p(1 p) r Property E(X)=np n r 13

Negative binomial distribution Definition (also called Pascal distribution) Bernoulli trial with success prob. p, predefined number k of failures has occurred, stop the random number of successes, X, obeys NB distr. k+ r 1 r nb( rk ;, p) = Pr[ X= r] = p(1 p) r Not required, for your reference k 14

Poisson distribution A Poisson random variable X with parameter λ has probability distribution n λ λ PX ( = n) = e, n= 0,1, 2,... n! For the Poisson distribution, it holds X = σ ( X) = λ, cx = 2 1 λ 15

Exponential Distribution (with Parameter μ) Pdf and cdf are f( x) = µ e µ x F( x) = 1 e, x 0 1 2 1 σ x x = σ x = c 1 2 x = = µ µ x f(x) µ 1 Pure random F(x) 0 x 0 x 16

Exponential distribution If X 1, X 2,, X n are independent exponential random variables with parameter μ 1, μ 2,, μ n Then, Y = min(x 1, X 2,, X n ) is an exponential random variable with parameter μ = μ 1 + μ 2 + + μ n How about Z = max(x 1, X 2,, X n )? Prove it as homework 17

Exponential distribution has memoryless property PX ( > x+ t) PX ( > x+ t X> t) = PX ( > t) µ ( x+ t) e µ x = = e = PX ( > x) µ t e P( t < X < x+ t X > t) = P( X < x) = F( x) = 1 e µx Memoryless: Future state only depends on the current state, independent of the history 18

Erlang distribution X: Erlang-k random variable, X= X 1 +X 2 + +X k, X i are independent random variables obeying exponential distribution with para. μ Denote this distribution as E k (μ), its pdf is Its cdf is 19

Erlang distribution Mean, variance, squared coefficient of variation Model smooth data traffic barber shop pdf curve is 20

Hyperexponential distribution X is with prob. α i to select exponential r.v. X i with para. μ i, i=1,,k, denote this distribution as H k k it Its pdf is f( t) = αµ i ie µ, t 0 Its mean is Its variance is i= 1 k αi EX ( ) = µ Its coefficient of variation 1 i= 1 Model bursty data traffic i k k 2 α i α i σ ( X ) = 2 2 i= 1 µ i i= 1 µ i 2 α 1 α k µ 1 1 µ k k 21

Laplace distribution also called the double exponential distribution two exponential distributions (with an additional location parameter) spliced together back-to-back Pdf of Laplace(μμ, bb): 1 x µ f( x µ, b) = exp 2b b μμ is location parameter, b is scale parameter 22

Phase-type distribution (discrete PH type distribution time case) the first passage time to the absorbing state of a discrete time Markov chain Two para., T: part of transition probability matrix, α: the initial state distribution, n phase k cdf:, pmf: Example Fk ( ) = 1 αte k 1 f( k) = αt t 0.3 0.4 0.3 T t P = 0.2 0.6 0.2 = 0 1 0 0 1 α = [0.3,0.7] Te+ t = e 23

Phase-type distribution (discrete time case) Example Geometric distribution, 1 phase 1 p p T t P = = 0 1 α = [1] 0 1 pmf: Mix-Geometric distribution, n phase 0.3 0 0.7 T t P = 0 0.8 0.2 = 0 1 0 0 1 pmf: f( k) = T t = p(1 p) k 1 k 1 α α = [0.5,0.5] f( k) = T t = 0.5 0.7 0.3 + 0.5 0.2 0.8 k 1 k 1 k 1 α 24

PH type distribution (continuous PH type distribution time case) the first passage time to the absorbing state of a continuous time Markov process Two para., T: part of the transition rate matrix, α: the initial state distribution, n phase cdf: Example Tx F( x) = 1 αe e 5 4 1 T t B = 2 6 4 = 0 0 0 0 0 -pdf: α = [0.8,0.2] Te+ t = 0 ff xx = αα ee TTTT tt 25

PH type distribution (continuous Example time case) Exponential distribution: F( x) 1 e Tx = α e= 1 e λ Erlang distribution with (m,λ): α = [1,0,0,...,0] Hyper-exponential distribution α = [ θ1, θ2,..., θ n ] x α = 1, T = λ T t B = 0 0 λ λ λ λ T =.... λ λ1 λ 2 T =... λn 26

Coxian distribution A special case of PH type distribution µ 1 p1µ 1 0 0... (1 p1) µ 1 0 µ 2 p2µ 2 0... (1 p2) µ 2 B = 0 0... µ k 1 pk 1µ k 1 (1 pk 1) µ k 1 0 0... 0 µ k µ k 0 0... 0 0 0 α = [1,0,0,...,0] k phase 27

Pareto distribution Pareto(α): a distribution with a power-law tail pdf: cdf: f x x x f x Usually, 0<α<2 α 1 ( ) = α, 1; otherwise, ( ) = 0 α F( x) = 1 x, x 1; otherwise, F( x) = 0 Pareto distribution decays much more slowly than exponential distribution We call it has heavy tail or fat tail Heavy tail is very important for practical data traffic 28

Relation to Central Limit Theorem CLT: X i are i.i.d. r.v. s with mean μ and variance σ 2, define Z n =(X 1 + + X n - nμ)/(σ n), then Z n converges to the standard normal distribution CLT can explain many natural phenomena obey normal distribution Binomial(n,p) is a sum of i.i.d. Bernoulli(p) r.v. s, it converges to a normal distribution when n is large Poisson(λ) can be approximated by a normal distribution with mean λ and variance λ: sum of λ r.v. s with rate 1 Binomial and Poisson can be viewed as an approximation of discrete version of normal distribution, plot their pdf 29

Relation to Central Limit Theorem Binomial Distribution sum of n Geometric distribution r.v. s with probability p Poisson Distribution sum of λ Poisson distribution r.v. s with rate 1 30

An explanation to Additive White Gaussian Noise (AWGN) AWGN explanation Transmit a signal in a channel 100 i.i.d. sources make low noise, uniformly distributed among [-1, 1] Noise is additive, if the total amount of noise level is greater than 10 or less than -10, then it corrupts the signal; otherwise, no problem Calculate the probability of signal no corruption Calculation X~Unif(-1,1), E(X)=0, Var(X)=1/3; S 100 ~Norm(0,100/3) P(-10<S 100 <10)=Φ( 3)- Φ(- 3)=2Φ( 3)-1=2(0.9572)-1 =0.9144 31

A map of the relation of all these distributions Chalk drawing PH type distribution can almost model any distribution Convert stochastic process to Markovian model Widely used in queuing system, reliable system, regenerative system, etc. Discrete version v.s. continuous version Geometric v.s. Exponential Mixed Geo v.s. HyerExp Binomial, Poisson v.s. Normal Erlang (continuous r.v.) Normal(continuous r.v.) 32