PAS04 - Important discrete and continuous distributions Jan Březina Technical University of Liberec 30. října 2014
Bernoulli trials Experiment with two possible outcomes: yes/no questions throwing coin born girl/boy defect on product test of quality success/failure The probability of success is p.
Bernoulli/Alternative Alt(p) description: value 1 with probability p, value 0 with prob. 1 p values: 0, 1 parameter: probability p P (X = 1) = p EX = 1p + 0(1 p) = p DX = (1 p) 2 p + (0 p) 2 (1 p) = p(1 p)
Binomial Bi(n, p) description: number of successes in n independent trials example: k defects on n products; selection with replacement (non-destructive) values: 0,..., n parameters: probability p, number of trials n P (X = k) = EX = np (calculation with shifting) DX = np(1 p) ( ) n p k (1 p) n k ; k 0 k n Notes: Alt(p) = Bi(1, p) R> plot( dbinom(0:n, n, p) )
Hypergeometric H(N, M, n) description: number of successes in n draws from a set of size N containing M successes without replacement description: k cracked eggs in n drawn if there is M cracked and N total in the basket, quality tests, destructive tests values: max(0, n + M N),..., min(m, n) parameters: number drawn n, total N, total of successes M P (X = k) = EX = nm N DX = nm N ( 1 M N n-draw has k successes all n-draw )( N n ) N 1 = ( M k )( N M n k Notes: H(N, M, n) Bi(n, M/N) for big values N/n R> plot( dhyper( 0:n, M, N-M, n) ) )( ) 1 N n
Geometric G(p) description: number of trials until first success (included) example: number of production cycles until defect values: 1,..., parameter: probability of success p EX = 1 p DX = 1 p p 2 P (X = k) = (1 p) k 1 p
Negative binomial N B(k, p) description: number of trials until k successes (included) values: k,..., parameters: probability of success p, number of successes k P (X = n) = ( ) n 1 (1 p) n k p k k 1... last success is fixed, selecting k 1 successes from n 1 trials EX = k p DX = k(1 p) p 2 NB(k, p) is sum of k RV G(p)
Example Oil company; geological study reveals: 0.2 chance to strike oil per well. What is prob. that there will be 2 strikes out of 7 wells? What is prob. that we need to drill 7 wells to gain 2 strikes? What is prob. that we need to drill more then 5 wells to gain 2 strikes?
Poisson distribution Poisson process: number of events during (time) interval, assuming that: events are evenly distributed with density λ events over time unit events are independent Example: number of nuclear decays over given time, number of defects on given length of fabric values: 0,..., parameters: density λ, period t P (X = k) = (λt)k e λt k!
Poisson distribution - derivation Divide interval t to n pieces, use Bi(n, λt/n) and pass to the limit: p k = lim n n! k!(n k)! ( λt ) k ( n = (λt)k k! 1 λt n ) n k = n! n k (n k)! } {{ } 1 ( 1 + λt ) n }{{ n } exp( λt) ( 1 λt ) k }{{ n } 1
... expectation and variance Using expansion for exp(λt): (λt) k e λt = e λt k! k=0 k=0 (λt) k k! = e λt e λt = 1 similarly for expectation: EX =... and variance: k=0 k=0 k (λt)k k! e λt = λt k=0 (λt) k 1 (k 1)! e λt = λt DX = (k 2 (EX) 2 )p k = k(k 1)p k + kp k (λt) 2 p k = k=0 = (λt) 2 + (λt) (λt) 2 = λt
Exponential distribution E(λ) X is time between two events in Poisson process with density λ. Time until failure. Consider random variable N t P o(λ, t). Event {X t} (time until next P. event is smaller then t is identical with event {N t 1} (there will be at least one P.event during time t). F X (t) = P (X t) = 1 P (N t < 1) = 1 (λt)0 e λt 0! = 1 e λt... we have to assume t > 0. EX = 0 f X (t) = d dt F X(t) = λe λt tλe λt dt = [te λt] + 0 0 [ e e λt λt ] = λ = 1 0 λ DX = 1 λ 2
Exponential distribution is without memory Time until failure is independent on the history: Prob. of no failure until time a + b under condition of no failure until time a is same as prob. of no failure until time b P (X > a + b X > a) = 1 F (a + b) 1 F (a) = e λ(a+b) e λa = e λb = 1 F (b) = P (X > b)
Erlang distribution Erlang(k, λ) X is time until k-th event in Poisson process with density λ. Particular case of more general Gamma distribution (even for non-integer k) λt (λt)k 1 f X (t) = λe (k 1)! k 1 F X (t) = 1 e λt i=0 (λt) i i! EX = k λ DX = k λ 2
Relation between Bernoulli and Poisson process
Uniform distribution U(a, b) density: CDF: mean value: f(x) = { 1 b a for x [a, b] 0 elsewhere 0 pro x < a x a f(x) = b a pro x [a, b] 1 pro x > b variance: EX = b a DX = x b a dx = 1 ( b 2 a 2) = a + b 2(b a) 2 b a ( a + b 2 ) 2 1 (a b)2 x = b a 12
Properties of uniform distribution. Theorem For arbitrary RV X with continuous increasing CDF F X the random variable Y = F X (X) has uniform distribution U(0, 1). proof: P (Y y) = P (X F 1 X (y)) = F X(F 1 X (y)) = y Obviously it holds also in other direction: Theorem Let Y R(0, 1) and F is some distribution function, then X = F 1 X (Y ) is random variable with CDF F X = F. In the later theorem, F can be arbitrary CDF (even discontinuous). Computer generators of (pseudo)random numbers usually produce numbers with distribution R(0, 1). The second theorem can be used to generate random numbers with prescribed distribution. (approximation is used in practice)
Weibull distribution W (λ, β) Time until failure with shape/age parameter β. F X (t) = 1 exp ( (t/λ) β)... similar to exponential distribution. Meaning of parameter β k < 1 failure rate decreases over time, infant mortality k = 1 failure rate is constant over time, exponential distr. k > 1 failure rate increases with time, aging process R> rweibull(n, beta, lambda)
Weibull distribution - influence of parameter β intensity of failures λ(t) = f(t)/(1 F (t)):
Weibull distribution - influence of parameter β probability density function:
Normal distribution N(µ, σ 2 ) Sum of large number of independent RV. Natural events. NOT social events. density: 1 ( f(x) = exp (x ) µ)2 2πσ 2 2σ 2 CDF: F (x) = errf( x µ σ )... integral of density, no closed formula EX = µ DX = σ 2
ND - density
Standard normal distribution N(0, 1) Standardization of normal random variable X N(µ, σ 2 ): Y = X µ σ N(0, 1) and vice versa.
Log-normal distribution LN(µ, σ 2 ) X is log-normal if ln X has normal distribution, so X = exp(µ + σz) where Z N(0, 1). density: 1 ( f(x) = x 2πσ exp (ln x ) µ)2 2 2σ 2
Presence of normality When adding lot of random factors (central limit theorem): velocity of molecules measurements biological values (often log-norm, after separating male/female)