ESSENTIALS OF PROBABILITY THEORY Michele TARAGNA Dipartimento di Elettronica e Telecomunicazioni Politecnico di Torino michele.taragna@polito.it Master Course in Mechatronic Engineering Master Course in Computer Engineering 01RKYQW / 01RKYOV Estimation, Filtering and System Identification Academic Year 2017/2018
Random experiment and random source of data S : outcome space, i.e., the set of possible outcomessof the random experiment; F : space of events (or results) of interest, i.e., the set of the combinations of interest where the outcomes ins can be clustered; P( ) : probability function defined inf that associates to any event inf a real number between0and1. E = (S,F,P( )) : random experiment Example: roll a dice with six sides to see if an odd or even side appears S = {1,2,3,4,5,6} is the set of the six sides of the dice; F = {A,B,S, }, wherea = {2,4,6} andb = {1,3,5} are the events of interest, i.e., the even and odd number sets; P (A) = P (B) = 1/2 (if the dice is fair),p (S) = 1,P ( ) = 0. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 1
A random variable of the experimente is a variablev whose values depend on the outcomesofe through of a suitable functionϕ( ) : S V, wherev is the set of possible values of v: v = ϕ(s) Example: the random variable depending on the outcome of the roll of a dice with six sides can be defined as v = ϕ(s) = +1 ifs A = {2,4,6} 1 ifs B = {1,3,5} A random source of data produces data that, besides the process under investigation characterized by the unknown true valueθ o of the variable to be estimated, are also functions of a random variable; in particular, at the time instant t, the datum d(t) depends on the random variable v(t). 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 2
Probability distribution and density functions Let us consider a real scalar variablex R. The (cumulative) probability distribution function F( ) : R R of the scalar random variablev is defined as: F(x) = P(v x) Main properties of the function F( ): F( ) = 0 F(+ ) = 1 F( ) is a monotonic nondecreasing function: F(x 1 ) F(x 2 ), x 1 < x 2 F( ) is almost continuous and, in particular, it is continuous from the right: F(x + ) = F(x) P(x 1 v < x 2 ) = F(x 2 ) F(x 1) F( ) is almost everywhere differentiable 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 3
The p.d.f. or probability density functionf( ) : R R is defined as: Main properties of the function f( ): f(x) 0, x R f(x)dx = P(x v < x+dx) + f(x)dx = 1 F(x) = x f(ξ)dξ f(x) = df(x) dx P(x 1 v < x 2 ) = F(x 2 ) F(x 1) = x 2 x 1 f(x)dx 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 4
Characteristic elements of a probability distribution Let us consider a scalar random variable v. Mean or mean value or expected value or expectation: Ev] = + xf(x) dx = v Note thate ] is a linear operator, i.e.: Eαv +β] = αev]+β, α,β R. Variance: Varv] = E (v Ev]) 2] = + (x Ev]) 2 f(x) dx = σ 2 v 0 Standard deviation or root mean square deviation: σ v = Varv] 0 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 5
k-th order (raw) moment: m k v] = E v k] = + x k f(x) dx In particular: m 0 v] = E1] = 1,m 1 v] = Ev] = v k-th order central moment: µ k v] = E (v Ev]) k] = + (x Ev]) k f(x) dx In particular: µ 0 v] = E1] = 1,µ 1 v] = Ev Ev]] = 0, µ 2 v] = E (v Ev]) 2] = Varv] = σ 2 v 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 6
Vector random variables A vectorv = v 1 v 2 v n ] T is a vector random variable if it depends on the outcomes of a random experimentethrough a vector functionϕ( ):S R n such that ϕ 1 (v 1 x 1,v 2 x 2,...,v n x n ) F, x = x 1 x 2 x n ] T R n The joint (cumulative) probability distribution functionf( ) : R n 0,1] is defined as: F(x 1,...,x n ) = P(v 1 x 1,v 2 x 2,...,v n x n ) withx 1,...,x n R and with all the inequalities simultaneously satisfied. Thei-th marginal probability distribution functionf i ( ) : R 0,1] is defined as: F i (x i ) = F(+,...,+,x i,+,...,+ ) = }{{}}{{} i 1 n i = P(v 1,...,v i 1,v i x i,v i+1,...,v n ) 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 7
The joint p.d.f. or joint probability density functionf( ) : R n R is defined as: and it is such that: f(x 1,...,x n ) = n F(x 1,...,x n ) x 1 x 2 x n f(x 1,...,x n )dx 1 dx 2 dx n = P(x 1 <v 1 x 1 +dx 1,...,x n <v n x n +dx n ) Thei-th marginal probability density functionf i ( ) : R R is defined as: f i (x i ) = + + }{{} n 1 times f(x 1,...,x n )dx 1 dx i 1 dx i+1 dx n The n components of the vector random variable v are (mutually) independent if and only if: f(x 1,...,x n ) = n i=1 f i(x i ) 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 8
Mean or mean value or expected value or expectation: Ev] = Ev 1 ] Ev 2 ] Ev n ] ] T R n, Ev i ] = + x if i (x i ) dx i Variance matrix or covariance matrix: Σ v Main properties ofσ v : = Varv] = E it is symmetric, i.e.,σ v = Σ T v (v Ev])(v Ev]) T] = = R n (x Ev])(x Ev]) T f(x) dx R n n it is positive semidefinite, i.e.,σ v 0, since the quadratic form x T Σ v x = E (x T (v Ev])) 2] 0, x R n the eigenvaluesλ i (Σ v ) 0, i = 1,...,n det(σ v ) = n i=1 λ i(σ v ) 0 Σ v ] ii = E (v i Ev i ]) 2] = σ 2 v i = σ 2 i = variance ofv i Σ v ] ij = E(v i Ev i ])(v j Ev j ])] = σ vi v j = σ ij = covariance ofv i andv j 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 9
Example: letv = v1 Ev] = v = Σ v = E = E v 2 ] be a bidimensional vector random variable ] Ev1 ] Ev 2 ] = v1 v 2 ] (v v)(v v) T] = E v1 ] ] v 1 v1 v T ] 1 v 2 v 2 v 2 v 2 (v1 = E ] ]) ( ] ]) T ] v1 v1 v1 = v 2 v 2 v 2 v 2 ] v1 v ] ] 1 v 2 v v 1 v 1, v 2 v 2 = 2 ]] (v 1 v 1 ) 2 (v 1 v 1 )(v 2 v 2 ) = E (v 2 v 2 )(v 1 v 1 ) (v 2 v 2 ) 2 = E (v 1 v 1 ) 2] ] E(v 1 v 1 )(v 2 v 2 )] = E(v 1 v 1 )(v 2 v 2 )] E (v 2 v 2 ) 2] = ] ] σ 2 v1 σ v1 v 2 σ 2 σ = σ v1 v 2 σ 2 1 12 = v 2 σ 12 σ 2 = Σ T v 2 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 10
Correlation coefficient and correlation matrix Let us consider any two componentsv i andv j of a vector random variablev. The (linear) correlation coefficientρ ij R of the scalar random variablesv i andv j is defined as: E(v i Ev i ])(v j Ev j ])] ρ ij = E (v i Ev i ]) 2] E (v j Ev j ]) 2] = σ ij σ i σ j ρij 1, since the vector random variablew = vi v j ] T has: Σ w = Varw] = σ2 i σ ij σ = 2 i ρ ij σ i σ j 0 σ ij ρ ij σ i σ j Note that σ 2 j det(σ w ) = σ 2 iσ 2 j ρ 2 ij σ 2 iσ 2 j = ( 1 ρ 2 ij) σ 2 i σ 2 j 0 ρ 2 ij 1 σ 2 j 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 11
The random variablesv i andv j are uncorrelated if and only ifρ ij = 0, i.e., if and only ifσ ij = E(v i Ev i ])(v j Ev j ])] = 0. Note that: ρ ij = 0 Ev i v j ] = Ev i ]Ev j ] σ ij =E(v i Ev i ])(v j Ev j ])]=Ev i v j v i Ev j ] Ev i ]v j +Ev i ]Ev j ]]= =Ev i v j ] 2Ev i ]Ev j ]+Ev i ]Ev j ]=Ev i v j ] Ev i ]Ev j ]=0 Ev i v j ]=Ev i ]Ev j ] Ifv i andv j are linearly dependent, i.e.,v j = αv i +β α,β R withα 0, thenρ ij = α { α = sgn(α) = +1, ifα > 0 1, ifα < 0 and then ρij = 1 σ 2 i (v =E i Ev i ]) 2] =E vi 2 2v iev i ]+Ev i ] 2] =E vi 2 =E vi 2 ] Evi ] 2 σ 2 j (v =E j Ev j ]) 2] =E (αv i +β Eαv i +β]) 2] =E =E (αv i αev i ]) 2] =E α 2 (v i Ev i ]) 2] =α 2 E ] 2Evi ] 2 +Ev i ] 2 = (αv i +β αev i ] β) 2] = (v i Ev i ]) 2] =α 2 σ 2 i σ ij =Ev i v j ] Ev i ]Ev j ] =Ev i (αv i +β)] Ev i ]Eαv i +β]= =αe vi 2 ] +βevi ] Ev i ](αev i ]+β)=αe vi 2 ] αevi ] =α 2 E vi 2 ] Evi ] 2] =ασ 2 i 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 12
Note that, if the random variablesv i andv j are mutually stochastically independent, they are also uncorrelated, while the converse is not always true. In fact, ifv i andv j are mutually stochastically independent, then: Ev i v j ]= = = + + + + + = Ev i ]Ev j ] x i x j f(x i,x j ) dx i dx j = x i x j f i (x i )f j (x j ) dx i dx j = x i f i (x i ) dx i + ρ ij = 0 x j f j (x j ) dx j = Ifv i andv j are jointly Gaussian and uncorrelated, they are also mutually independent. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 13
Let us consider a vector random variablev = v 1 v 2 v n ] T. The correlation matrix or normalized covariance matrixρ v R n n is defined as: ρ 11 ρ 12 ρ 1n 1 ρ 12 ρ 1n ρ 12 ρ 22 ρ 2n ρ 12 1 ρ 2n ρ v =..... =........... Main properties ofρ v : ρ 1n ρ 2n ρ nn ρ 1n ρ 2n 1 it is symmetric, i.e.,ρ v = ρ T v it is positive semidefinite, i.e.,ρ v 0, sincex T ρ v x 0, x R n the eigenvaluesλ i (ρ v ) 0, i = 1,...,n det(ρ v ) = n i=1 λ i(ρ v ) 0 ρ v ] ii = ρ ii = σ ii σ 2 i = σ2 i σ 2 i = 1 ρ v ] ij = ρ ij = correlation coefficient ofv i andv j,i j 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 14
Relevant case #1: if a vector random variablev = v 1 v 2 v n ] T is such that all its components are each other uncorrelated (i.e.,σ ij =ρ ij = 0, i j), then: σ 2 1 0 0 0 σ 2 2 0 Σ v =..... = diag ( σ 2... 1,σ 2 2,,σ 2 ) n 0 0 σ 2 n 1 0 0 0 1 0 ρ v =........ 0 0 1 = I n n Obviously, the same result holds if all the components of v are mutually independent. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 15
Relevant case #2: if a vector random variablev = v 1 v 2 v n ] T is such that all its components are each other uncorrelated (i.e.,σ ij =ρ ij = 0, i j) and have the same standard deviation (i.e.,σ i =σ, i), then: Σ v = ρ v = σ 2 0 0 0 σ 2 0........ 0 0 σ 2 1 0 0 0 1 0........ 0 0 1 = σ 2 I n n = I n n Obviously, the same result holds if all the components of v are mutually independent. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 16
Gaussian or normal random variables A scalar Gaussian or normal random variablev is such that its p.d.f. turns out to be: f(x) = 1 2πσv exp ( ) (x v) 2 2σ 2 v, with v = Ev] andσ 2 v = Varv] and the notationsv N ( v,σ 2 v) orv G ( v,σ 2 v ) are used. Ifw = αv +β, wherev is a scalar normal random variable andα,β R, then: w N ( w,σ 2 ( w) = N α v +β,α 2 σ 2 ) v note that, ifα= 1 andβ= v, thenw N (0,1), i.e.,w has a normalized p.d.f.: σ v σ v f(x) = 1 ( ) x 2 exp 2π 2 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 17
The probabilityp k that the outcome of a scalar normal random variablev differs from the mean value v no more thank times the standard deviationσ v is equal to: P k = P ( v k σ v v v +k σ v ) = P ( v v k σ v ) = = 1 2 + ( ) x 2 exp dx 2π 2 k In particular, it turns out that: k P k 1 68.3% 2 95.4% 3 99.7% and this allows to define suitable confidence intervals of the random variable v. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 18
A vector normal random variablev = v 1 v 2 v n ] T is such that its p.d.f. is: ( 1 f(x) = (2π) n/2 exp 1 ) detσ v 2 (x v)t Σ 1 v (x v) where v = Ev] R n andσ v = Varv] R n n. n scalar normal variablesv i,i = 1,...,n, are said to be jointly Gaussian if the vector random variablev = v 1 v 2 v n ] T is normal. Main properties: ifv 1,...,v n are jointly Gaussian, then anyv i,i = 1,...,n, is also normal, while the converse is not always true ifv 1,...,v n are normal and independent, then they are also jointly Gaussian ifv 1,...,v n are jointly Gaussian and uncorrelated, they are also independent 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 19