Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms Multivariate normal distribution Convergence Poisson process Series Expansions and Integrals 1 Probability 1.1 Two inequalities A B PA) PB). P A B) PA) + PB) 1.2 Change of variable in a probability density If X has the probability density f X x), Y = AX + b, and A is invertible, then Y has the probability density 1 f Y y) = det A f X A 1 y b) ) 1
2 Continuous bivariate distributions 2.1 Bivariate densities 2.1.1 Definitions The bivariate vector X, Y ) T has a continuous joint distribution with density f X,Y x, y) if where f X,Y x, y) 0, PX x, Y y) = + + f X,Y x, y)dxdy = 1 Marginal distribution: f X x) = + f X,Y x, y)dy, f Y y) = + f X,Y x, y)dx. Distribution function Conditional densities: x F X x) = PX x) = y x f X,Y u, v)dudv. f X u)du. Pa < X b) = F X b) F X a). X Y = y, if f Y y) > 0. Y X = x if f X x) > 0. f X Y =y x) := f X,Y x, y), f Y y) f Y X=x y) := f X,Y x, y), f X x) 2
Bayes formula f X Y =y x) = f Y X=xy) f X x) = f Y y) f Y X=x y) f X x) = + f Y X=xy)f X x)dx. f X Y =y x) is a a posteriori density for X and f X x) is a priori density for X. 2.1.2 Independence X and Y are independent iff f X,Y x, y) = f X x) f Y y) for every x, y). 2.1.3 Conditional density of X given an event B f X B x) = 2.1.4 Normal distribution { fx x) PB) If X has the density fx; µ, σ) defined by f X x; µ, σ) := x B 0 elsewhere 1 2πσ 2 e x µ)2 2σ 2, then X Nµ, σ 2 ). X N0, 1) is standard normal with density φx) = f X x; 0, 1). The cumulative distribution function of X N0, 1) is for x > 0 Φx) = x φt)dt = 1 x 2 + φt)dt 0 and Φ x) = 1 Φx). 3
2.1.5 Numerical computation of Φx) Approximative values of the cumulative distribution function of X N0, 1), Φx), can be calculated for x > 0 by Φx) = 1 Qx), Qx) = where we use the following approximation 1 : Qx) 1 1 π 2.2 Mean and variance 1 ) x + 1 x2 + 2π π x φt)dt, 1 2π e x2 /2. The expectations or means EX), EY ) are defined if they exist) by EX) = EY ) = + + xf X x)dx, yf Y y)dy, respectively. Variances VarX), VarY ) are defined as VarX) = VarY ) = + + x EX)) 2 f X x)dx, y EY )) 2 f Y y)dy, respectively. We have VarX) = EX 2 ) EX)) 2. The function of a random variable gx) has expectation EgX)) = + gx)f X x)dx. 1 P.O. Börjesson and C.E.W. Sundberg: Simple Approximations of the Error Function Qx) for Communication Applications. IEEE Transactions on Communications, March 1979, pp. 639 643. 4
2.3 Chebyshev, s inequality P X EX) > ε) VarX) ε 2. 2.4 Conditional expectations The condtional expectations of X given Y = y is EX Y = y) := + xf X Y =y x)dx. This can be seen as y EX Y = y), as a function of Y. EX) = EEX Y )), VarX) = VarEX Y )) + EVarY X)). E [ Y gx)) 2] = E [Var[Y X]] + E [ E [Y X] gx)) 2]. 2.5 Covariance CovX, Y) := EXY ) EX) EY ) = = We have = E[X EX)][Y EY )]) + + x EX))y EY ))f X,Y x, y)dxdy. n Var a i X i ) = i=1 n m Cov a i X i, b j Y j ) = i=1 j=1 n n 1 a 2 i VarX n 1 i) + 2 a i a j CovX i, X j ), i=1 i=1 j=i+1 n m a i b j CovX i, X j ). i=1 j=1 2.6 Coefficient of correlation Coefficient of correlation between X and Y is defined as ρ := ρ X,Y := CovX, Y ) VarX) VarY ). 5
3 Best linear prediction α and β that minimize E [Y α + βx)] 2 are given by α = µ Y σ XY σ 2 X µ X = µ Y ρ σ Y σ X µ X β = σ XY σ 2 X = ρ σ Y σ X where µ Y = E [Y ], µ X = E [X], σ 2 Y = Var [Y ], σ 2 X = Var [X], σ XY = CovX, Y ), ρ = σ XY σ X σ Y. 4 Covariance matrix 4.1 Definition Covariance matrix C X := E [ X µ X ) X µ X ) T ] where the element i, j) C X i, j) = E [X i µ i ) X j µ j )] is the covariance between X i and X j. Covariance matrix is nonnegative definite, i.e., for all x we have x T C X x 0 Hence det C X 0. The covariance matrix is symmetric C X = C T X 6
4.2 2 2 Covariance Matrix The covariance matrix of a bivariate random variable X = X 1, X 2 ) T. ) σ 2 C X = 1 ρσ 1 σ 2 ρσ 1 σ 2 σ2 2, where ρ is the coefficient of correlation of X 1 and X 2, and σ1 2 = VarX 1), σ2 2 = Var X 2 ). C X is invertible iff ρ 2 1 Then the inverse is CX 1 1 σ = 2 ) 2 ρσ 1 σ 2 σ1σ 2 11 2 ρ 2 ) ρσ 1 σ 2 σ1 2. 5 Discrete Random Variables X is a discrete) random variable that assumes values in X and Y is a discrete) random variable that assumes values in Y. Remark 5.1 These are measurable maps Xω), ω Ω, from a basic probability space Ω, F, P) = outcomes, a sigma field of of subsets of Ω and probability measure P on F), to X. X and Y are two discrete state spaces, whose generic elements are called values or instantiations and denoted by x i and y j, respectively. X = {x 1,, x L }, Y = {y 1,, y J }. X := the number of elements in X) = L, Y = J. Unless otherwise stated the alphabets considered here are finite. 5.1 Joint Probability Distributions A two dimensional joint simultaneous) probability distribution is a probability defined on X Y px i, y j ) := PX = x i, Y = y j ). 5.1) Hence 0 px i, y j ) and L i=1 Lj=1 px i, y j ) = 1. Marginal distribution for X: J px i ) = px i, y j ). 5.2) j=1 7
Marginal distribution for Y : L py j ) = px i, y j ). 5.3) i=1 These notions can be extended to define the joint simultaneous) probability distribution and the marginal distributions of n random variables. 5.2 Conditional Probability Distributions The conditional probability for X = x i given Y = y j is px i y j ) := px i, y j ) py j ). 5.4) The conditional probability for Y = y j given X = x i is py j x i ) := px i, y j ) px i ). 5.5) Here we assume py j ) > 0 and px i ) > 0. If for example px i ) = 0, we can make the definition of py j x i ) arbitrarily through px i ) py j x i ) = px i, y j ). In other words Hence Next py j x i ) = prob. for the event {X = x i, Y = y j }. prob. for the event {X = x i } L px i y j ) = 1. i=1 P X A) := x i A px i ) 5.6) is the probability of the event that X assumes a value in A, a subset of X. From 5.6) one easily finds the complement rule P X A c ) = 1 P X A), 5.7) where A c is the complement of A, i.e., those outcomes which do not lie in A. Also P X A B) = P X A) + P X B) P X A B), 5.8) is immediate. 8
5.3 Conditional Probability Given an Event The conditional probability for X = x i given X A is denoted by P X x i A) and given by { PX x i ) P X x i A) = P X if x A) i A 5.9) 0 otherwise. 5.4 Independence X and Y are independent random variables if and only if px i, y j ) = px i ) py j ) 5.10) for all pairs x i, y j ) in X Y. In other words all events {X = x i } and {Y = y j } are to be independent. We say that X 1, X 2,..., X n are independent random variables if and only if the joint distribution p X1,X 2,...,X n x i1, x i2...,x in ) = P X 1 = x i1, X 2 = x i2,...,x n = x in ) 5.11) equals p X1,X 2,...,X n x i1, x i2...,x in ) = p X1 x i1 ) p X2 x i2 ) p Xn x in ) 5.12) for every x i1, x i2...,x in X n. We are here assuming for simplicity that X 1, X 2,...,X n take values in the same alphabet. 5.5 A Chain Rule Let Z be a discrete) random variable that assumes values in Z = {z k } K k=1. If pz k ) > 0, px i, y j z k ) = px i, y j, z k ). pz k ) Then we obtain as an identity px i, y j z k ) = px i, y j, z k ) py j, z k ) py j, z k ) pz k ) and again by definition of conditional probability the right hand side is equal to px i y j, z k ) py j z k ). In other words, p X,Y Z x i, y j z k ) = px i y j, z k ) py j z k ). 5.13) 9
5.6 Conditional Independence The random variables X and Y are called conditionally independent given Z if px i, y j z k ) = px i z k ) py j z k ) 5.14) for all triples z k, x i, y j ) Z X Y cf. 5.13)). 6 Miscellaneous 6.1 A Marginalization Formula Let Y be discrete and X be continuous, and let their joint distribution be Then P Y = k, X x) = P Y = k) = = x P Y = k X = u)f X u)du. P Y = k, X u)du u P Y = k X = x) f X x)dx. 6.2 Factorial Moments X is an integer-valued discrete R.V., µ [r] def = E [XX 1) X r + 1)] = = x:integer is called the r:th factorial moment. 6.3 Binomial Moments xx 1) x r + 1))f X x). X is an integer-valued discrete R.V.. ) X E = E [XX 1) X r + 1)]/r! r is called the binomial moment. 10
7 Transforms 7.1 Probability Generating Function 7.1.1 Definition Let X have values k = 0, 1, 2,...,. g X t) = E t X) = t k f X k) k=0 is called the probability generating function. 7.1.2 Prob. Gen. Fnct: Properties d dt g X1) = d n dt n g X 0) n! kt k 1 f X k) t=1 k=1 = E [X] = P X = n) µ [r] = E [XX 1) X r + 1)] = dr dt rg X1) 7.1.3 Prob. Gen. Fnct: Properties Z = X + Y, X and Y non negative integer valued, independent, g Z t) = E t Z) = E t X+Y ) = E t X) E t Y ) = g X t) g Y t). 11
7.1.4 Prob. Gen. Fnct: Examples X Bep) Y Binn, p) Z Po λ) g X t) = 1 p + pt. g Y t) = 1 p + pt) n g Z t) = e λ t 1) 7.2 Moment Generating Functions 7.2.1 Definition Moment generating function, for some h > 0, ψ X t) def = E [ e tx], t < h. { x ψ X t) = i e tx if X x i ) X discrete etx f X x)dx X continuous 7.2.2 Moment Gen. Fnctn: Properties d ds ψ X0) = E [X] ψ X 0) = 1 d k ds kψ X0) = E [ X k]. S n = X 1 + X 2 +... + X n, X i independent. X i I.I.D., ψ Sn t) = E e tsn ) = E e tx 1+X 2 +...+X n) ) = E e tx 1 e tx2 e txn ) = E e tx 1) E e tx 2 ) E e tx n ) = ψx1 t) ψ X2 t) ψ Xn t) ψ Sn s) = ψ X t)) n. 12
7.2.3 Moment Gen. Fnctn: Examples X N µ, σ 2 ) Y Exp a) ψ X t) = e µt+1 2 σ2 t 2 ψ Y s) = 1, at < 1. 1 at 7.2.4 Characteristic function Characteristic function exists for all t. ϕ X t) = E [ e itx] 7.3 Moment generating function, characteristic function of a vector random variable Moment generating function of X n 1 vector) is defined as ψ X t) def = Ee ttx = Ee t 1X 1 +t 2 X 2 + +t nx n Characteristic function of X is defined as ϕ X t) def = Ee ittx = Ee it 1X 1 +t 2 X 2 + +t nx n) 8 Multivariate normal distribution An n 1 random vector X has a normal distribution iff for every n 1- vector a the one-dimensional random vector a T X has a normal distribution. When vector X = X 1, X 2,, X n ) T has a multivariate normal distribution we write X N µ,c). 8.1) The moment generating function is ψ X s) = e st µ+ 1 2 st Cs 8.2) 13
The characteristic function of X N µ, C) is Let X N µ,c) and ϕ X t) = Ee ittx = e itt µ 1 2 tt Ct. 8.3) Y = a + BX. for an arbitrary m n -matrix B and arbitrary m 1 vector a. Then Y N a + Bµ, BCB T ). 8.4) 8.1 χ 2 -distribution for quadratic form of normal r.v. X N µ,c), C is invertible. Then where n is the dimension of X. 8.2 Four product rule X 1, X 2, X 3, X 4 ) T N 0,C).Then X µ) T C 1 X µ) χ 2 n), E [X 1 X 2 X 3 X 4 ] = E [X 1 X 2 ] E [X 3 X 4 ]+E [X 1 X 3 ] E [X 2 X 4 ]+E [X 1 X 4 ] E [X 2 X 3 ] 8.3 Conditional distributions for bivariate normal random variables X, Y ) T N µ,c ). The conditional distribution for Y given X = x is normal ) N µ Y + ρ σy x µ X ), σy 2 σ 1 ρ2 ), X where µ Y = EY ), µ X = EX), σ Y = VarY ), σ X = VarY ) and ρ = CovX, Y ) VarX) VarY ). Z 1 och Z 2 are independent N0, 1). ) ) µ1 σ1 0 µ =,B = ρσ 2 σ 2 1 ρ 2. µ 2 14
If then with X Y C = ) X Y = µ + B ) Z1 Z 2 N µ,c ) σ 2 1 ρσ 1 σ 2 ρσ 1 σ 2 σ 2 2 ) )., 9 Poisson process Xt) = number of occurrences of some event in in 0, t]. Definition 9.1 {Xt) t 0} is a Poisson process with parameter λ > 0, if 1) X0) = 0. 2) The increments Xt k ) Xt k 1 ) are independent stochastic variables 1 k n, 0 t 0 t 1 t 2... t n 1 t n and all n. 3) Xt) Xs) Poλt s)), 0 s < t. T k = the time of occurrence of the kth event. T 0 = 0. We have {T k t} = {Xt) k} τ k = T k T k 1, is the kth interoccurrence time. τ 1, τ 2...,τ k... are independent and identically distributed, τ i Exp ) 1 λ. 10 Convergence 10.1 Definitions We say that X n P X, as n 15
if for all ǫ > 0 P X n X > ǫ) 0, as n We say that if X n q X E X n X 2 0, as n We say that X d n X, as n if F Xn x) F X x), as n for all x, where F X x) is continuous. 10.2 Relations between convergences as n. If c is a constant, X n q X X n P X X n P X X n d X X n P c X n d c as n. If ϕ Xn t) are the characteristic functions of X n, then X n d X ϕ Xn t) ϕ X t) If ϕ X t) is a characteristic function continuous at t = 0, then ϕ Xn t) ϕ X t) X n d X 16
10.3 Law of Large Numbers X 1, X 2,... are independent, identically distributed i.i.d.) random variables with finite expectation µ. We set Then S n = X 1 + X 2 +... + X n, n 1. S n n P µ, as n. 10.4 Central Limit Theorem X 1, X 2,... are independent, identically distributed i.i.d.) random variables with finite expectation µ and finite variance σ 2. We set Then S n = X 1 + X 2 +... + X n, n 1. S n nµ σ n d N0, 1), as n. 11 Series Expansions and Integrals 11.1 Exponential Function e x = 11.2 Geometric Series k=0 x k k! < x <. c n c 1 + c ) n n e c. n 1 1 x = k=0 1 1 x) 2 = n k=0 k=0 x k, x < 1. kx k 1, x < 1. x k = 1 xn+1 1 x, x 1. 17
11.3 Logarithm function ln1 x) = k=1 11.4 Euler Gamma Function Γt) = 0 x k, 1 x < 1. k x t 1 e x dx, t > 0 1 Γ = 2) π Γn) = n 1)! 0 x t e λx dx = n is a nonnegative integer. Γt + 1) λ t+1, λ > 0, t > 1 11.5 A formula with a probabilistic proof) t k 1 1 Γk) λk x k 1 e λx dx = e λtλt)j j! j=0 18