ESSENTIALS OF PROBABILITY THEORY

Similar documents
ESSENTIALS OF PROBABILITY THEORY

Multiple Random Variables

Random Variables. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

3. Probability and Statistics

BASICS OF PROBABILITY

Continuous Random Variables

5. Random Vectors. probabilities. characteristic function. cross correlation, cross covariance. Gaussian random vectors. functions of random vectors

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Review (Probability & Linear Algebra)

Multivariate Random Variable

Random Variables and Their Distributions

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

EE4601 Communication Systems

Quick Tour of Basic Probability Theory and Linear Algebra

L2: Review of probability and statistics

Communication Theory II

ESTIMATION THEORY. Michele TARAGNA. Dipartimento di Elettronica e Telecomunicazioni Politecnico di Torino.

where r n = dn+1 x(t)

Let X and Y denote two random variables. The joint distribution of these random

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Elements of Probability Theory

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Exercises with solutions (Set D)

ENGG2430A-Homework 2

Algorithms for Uncertainty Quantification

Multiple Random Variables

Outline. Random Variables. Examples. Random Variable

1 Random variables and distributions

Bivariate distributions

Multivariate probability distributions and linear regression

16.584: Random Vectors

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

ECE 4400:693 - Information Theory

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Review of Probability

ECE Lecture #10 Overview

Lecture 11. Probability Theory: an Overveiw

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

2 (Statistics) Random variables

5 Operations on Multiple Random Variables

LQR, Kalman Filter, and LQG. Postgraduate Course, M.Sc. Electrical Engineering Department College of Engineering University of Salahaddin

Recitation 2: Probability

for valid PSD. PART B (Answer all five units, 5 X 10 = 50 Marks) UNIT I

TAMS39 Lecture 2 Multivariate normal distribution

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Probability theory. References:

ECE Lecture #9 Part 2 Overview

Course on Inverse Problems

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

3. Review of Probability and Statistics

Preliminary statistics

Introduction to Probability Theory

Review of Statistics

Introduction to Probability and Stocastic Processes - Part I

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

1.12 Multivariate Random Variables

Review: mostly probability and some statistics

Probability Theory and Statistics. Peter Jochumzen

MULTIVARIATE PROBABILITY DISTRIBUTIONS

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Module 9: Stationary Processes

Stochastic Processes for Physicists

Introduction to Statistics and Error Analysis

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Continuous Random Variables and Continuous Distributions

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Probability Background

Random Variables. P(x) = P[X(e)] = P(e). (1)

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 ISSN

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Lesson 4: Stationary stochastic processes

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Chapter 5,6 Multiple RandomVariables

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

1 Presessional Probability

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Basics on Probability. Jingrui He 09/11/2007

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

Review of Statistics

Chapter 6. Random Processes

FINAL EXAM: Monday 8-10am

Statistical signal processing

Review of Probability Theory

Chp 4. Expectation and Variance

Chapter 1. Sets and probability. 1.3 Probability space

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

2.7 The Gaussian Probability Density Function Forms of the Gaussian pdf for Real Variates

LIST OF FORMULAS FOR STK1100 AND STK1110

Question 1. Question 4. Question 2. Question 5. Question 3. Question 6.

If we want to analyze experimental or simulated data we might encounter the following tasks:

Stochastic Process II Dr.-Ing. Sudchai Boonto

Transformation of Probability Densities

Probability Theory Review Reading Assignments

The random variable 1

Probability. Carlo Tomasi Duke University

EXAMINATIONS OF THE HONG KONG STATISTICAL SOCIETY GRADUATE DIPLOMA, Statistical Theory and Methods I. Time Allowed: Three Hours

Transcription:

ESSENTIALS OF PROBABILITY THEORY Michele TARAGNA Dipartimento di Elettronica e Telecomunicazioni Politecnico di Torino michele.taragna@polito.it Master Course in Mechatronic Engineering Master Course in Computer Engineering 01RKYQW / 01RKYOV Estimation, Filtering and System Identification Academic Year 2017/2018

Random experiment and random source of data S : outcome space, i.e., the set of possible outcomessof the random experiment; F : space of events (or results) of interest, i.e., the set of the combinations of interest where the outcomes ins can be clustered; P( ) : probability function defined inf that associates to any event inf a real number between0and1. E = (S,F,P( )) : random experiment Example: roll a dice with six sides to see if an odd or even side appears S = {1,2,3,4,5,6} is the set of the six sides of the dice; F = {A,B,S, }, wherea = {2,4,6} andb = {1,3,5} are the events of interest, i.e., the even and odd number sets; P (A) = P (B) = 1/2 (if the dice is fair),p (S) = 1,P ( ) = 0. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 1

A random variable of the experimente is a variablev whose values depend on the outcomesofe through of a suitable functionϕ( ) : S V, wherev is the set of possible values of v: v = ϕ(s) Example: the random variable depending on the outcome of the roll of a dice with six sides can be defined as v = ϕ(s) = +1 ifs A = {2,4,6} 1 ifs B = {1,3,5} A random source of data produces data that, besides the process under investigation characterized by the unknown true valueθ o of the variable to be estimated, are also functions of a random variable; in particular, at the time instant t, the datum d(t) depends on the random variable v(t). 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 2

Probability distribution and density functions Let us consider a real scalar variablex R. The (cumulative) probability distribution function F( ) : R R of the scalar random variablev is defined as: F(x) = P(v x) Main properties of the function F( ): F( ) = 0 F(+ ) = 1 F( ) is a monotonic nondecreasing function: F(x 1 ) F(x 2 ), x 1 < x 2 F( ) is almost continuous and, in particular, it is continuous from the right: F(x + ) = F(x) P(x 1 v < x 2 ) = F(x 2 ) F(x 1) F( ) is almost everywhere differentiable 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 3

The p.d.f. or probability density functionf( ) : R R is defined as: Main properties of the function f( ): f(x) 0, x R f(x)dx = P(x v < x+dx) + f(x)dx = 1 F(x) = x f(ξ)dξ f(x) = df(x) dx P(x 1 v < x 2 ) = F(x 2 ) F(x 1) = x 2 x 1 f(x)dx 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 4

Characteristic elements of a probability distribution Let us consider a scalar random variable v. Mean or mean value or expected value or expectation: Ev] = + xf(x) dx = v Note thate ] is a linear operator, i.e.: Eαv +β] = αev]+β, α,β R. Variance: Varv] = E (v Ev]) 2] = + (x Ev]) 2 f(x) dx = σ 2 v 0 Standard deviation or root mean square deviation: σ v = Varv] 0 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 5

k-th order (raw) moment: m k v] = E v k] = + x k f(x) dx In particular: m 0 v] = E1] = 1,m 1 v] = Ev] = v k-th order central moment: µ k v] = E (v Ev]) k] = + (x Ev]) k f(x) dx In particular: µ 0 v] = E1] = 1,µ 1 v] = Ev Ev]] = 0, µ 2 v] = E (v Ev]) 2] = Varv] = σ 2 v 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 6

Vector random variables A vectorv = v 1 v 2 v n ] T is a vector random variable if it depends on the outcomes of a random experimentethrough a vector functionϕ( ):S R n such that ϕ 1 (v 1 x 1,v 2 x 2,...,v n x n ) F, x = x 1 x 2 x n ] T R n The joint (cumulative) probability distribution functionf( ) : R n 0,1] is defined as: F(x 1,...,x n ) = P(v 1 x 1,v 2 x 2,...,v n x n ) withx 1,...,x n R and with all the inequalities simultaneously satisfied. Thei-th marginal probability distribution functionf i ( ) : R 0,1] is defined as: F i (x i ) = F(+,...,+,x i,+,...,+ ) = }{{}}{{} i 1 n i = P(v 1,...,v i 1,v i x i,v i+1,...,v n ) 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 7

The joint p.d.f. or joint probability density functionf( ) : R n R is defined as: and it is such that: f(x 1,...,x n ) = n F(x 1,...,x n ) x 1 x 2 x n f(x 1,...,x n )dx 1 dx 2 dx n = P(x 1 <v 1 x 1 +dx 1,...,x n <v n x n +dx n ) Thei-th marginal probability density functionf i ( ) : R R is defined as: f i (x i ) = + + }{{} n 1 times f(x 1,...,x n )dx 1 dx i 1 dx i+1 dx n The n components of the vector random variable v are (mutually) independent if and only if: f(x 1,...,x n ) = n i=1 f i(x i ) 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 8

Mean or mean value or expected value or expectation: Ev] = Ev 1 ] Ev 2 ] Ev n ] ] T R n, Ev i ] = + x if i (x i ) dx i Variance matrix or covariance matrix: Σ v Main properties ofσ v : = Varv] = E it is symmetric, i.e.,σ v = Σ T v (v Ev])(v Ev]) T] = = R n (x Ev])(x Ev]) T f(x) dx R n n it is positive semidefinite, i.e.,σ v 0, since the quadratic form x T Σ v x = E (x T (v Ev])) 2] 0, x R n the eigenvaluesλ i (Σ v ) 0, i = 1,...,n det(σ v ) = n i=1 λ i(σ v ) 0 Σ v ] ii = E (v i Ev i ]) 2] = σ 2 v i = σ 2 i = variance ofv i Σ v ] ij = E(v i Ev i ])(v j Ev j ])] = σ vi v j = σ ij = covariance ofv i andv j 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 9

Example: letv = v1 Ev] = v = Σ v = E = E v 2 ] be a bidimensional vector random variable ] Ev1 ] Ev 2 ] = v1 v 2 ] (v v)(v v) T] = E v1 ] ] v 1 v1 v T ] 1 v 2 v 2 v 2 v 2 (v1 = E ] ]) ( ] ]) T ] v1 v1 v1 = v 2 v 2 v 2 v 2 ] v1 v ] ] 1 v 2 v v 1 v 1, v 2 v 2 = 2 ]] (v 1 v 1 ) 2 (v 1 v 1 )(v 2 v 2 ) = E (v 2 v 2 )(v 1 v 1 ) (v 2 v 2 ) 2 = E (v 1 v 1 ) 2] ] E(v 1 v 1 )(v 2 v 2 )] = E(v 1 v 1 )(v 2 v 2 )] E (v 2 v 2 ) 2] = ] ] σ 2 v1 σ v1 v 2 σ 2 σ = σ v1 v 2 σ 2 1 12 = v 2 σ 12 σ 2 = Σ T v 2 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 10

Correlation coefficient and correlation matrix Let us consider any two componentsv i andv j of a vector random variablev. The (linear) correlation coefficientρ ij R of the scalar random variablesv i andv j is defined as: E(v i Ev i ])(v j Ev j ])] ρ ij = E (v i Ev i ]) 2] E (v j Ev j ]) 2] = σ ij σ i σ j ρij 1, since the vector random variablew = vi v j ] T has: Σ w = Varw] = σ2 i σ ij σ = 2 i ρ ij σ i σ j 0 σ ij ρ ij σ i σ j Note that σ 2 j det(σ w ) = σ 2 iσ 2 j ρ 2 ij σ 2 iσ 2 j = ( 1 ρ 2 ij) σ 2 i σ 2 j 0 ρ 2 ij 1 σ 2 j 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 11

The random variablesv i andv j are uncorrelated if and only ifρ ij = 0, i.e., if and only ifσ ij = E(v i Ev i ])(v j Ev j ])] = 0. Note that: ρ ij = 0 Ev i v j ] = Ev i ]Ev j ] σ ij =E(v i Ev i ])(v j Ev j ])]=Ev i v j v i Ev j ] Ev i ]v j +Ev i ]Ev j ]]= =Ev i v j ] 2Ev i ]Ev j ]+Ev i ]Ev j ]=Ev i v j ] Ev i ]Ev j ]=0 Ev i v j ]=Ev i ]Ev j ] Ifv i andv j are linearly dependent, i.e.,v j = αv i +β α,β R withα 0, thenρ ij = α { α = sgn(α) = +1, ifα > 0 1, ifα < 0 and then ρij = 1 σ 2 i (v =E i Ev i ]) 2] =E vi 2 2v iev i ]+Ev i ] 2] =E vi 2 =E vi 2 ] Evi ] 2 σ 2 j (v =E j Ev j ]) 2] =E (αv i +β Eαv i +β]) 2] =E =E (αv i αev i ]) 2] =E α 2 (v i Ev i ]) 2] =α 2 E ] 2Evi ] 2 +Ev i ] 2 = (αv i +β αev i ] β) 2] = (v i Ev i ]) 2] =α 2 σ 2 i σ ij =Ev i v j ] Ev i ]Ev j ] =Ev i (αv i +β)] Ev i ]Eαv i +β]= =αe vi 2 ] +βevi ] Ev i ](αev i ]+β)=αe vi 2 ] αevi ] =α 2 E vi 2 ] Evi ] 2] =ασ 2 i 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 12

Note that, if the random variablesv i andv j are mutually stochastically independent, they are also uncorrelated, while the converse is not always true. In fact, ifv i andv j are mutually stochastically independent, then: Ev i v j ]= = = + + + + + = Ev i ]Ev j ] x i x j f(x i,x j ) dx i dx j = x i x j f i (x i )f j (x j ) dx i dx j = x i f i (x i ) dx i + ρ ij = 0 x j f j (x j ) dx j = Ifv i andv j are jointly Gaussian and uncorrelated, they are also mutually independent. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 13

Let us consider a vector random variablev = v 1 v 2 v n ] T. The correlation matrix or normalized covariance matrixρ v R n n is defined as: ρ 11 ρ 12 ρ 1n 1 ρ 12 ρ 1n ρ 12 ρ 22 ρ 2n ρ 12 1 ρ 2n ρ v =..... =........... Main properties ofρ v : ρ 1n ρ 2n ρ nn ρ 1n ρ 2n 1 it is symmetric, i.e.,ρ v = ρ T v it is positive semidefinite, i.e.,ρ v 0, sincex T ρ v x 0, x R n the eigenvaluesλ i (ρ v ) 0, i = 1,...,n det(ρ v ) = n i=1 λ i(ρ v ) 0 ρ v ] ii = ρ ii = σ ii σ 2 i = σ2 i σ 2 i = 1 ρ v ] ij = ρ ij = correlation coefficient ofv i andv j,i j 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 14

Relevant case #1: if a vector random variablev = v 1 v 2 v n ] T is such that all its components are each other uncorrelated (i.e.,σ ij =ρ ij = 0, i j), then: σ 2 1 0 0 0 σ 2 2 0 Σ v =..... = diag ( σ 2... 1,σ 2 2,,σ 2 ) n 0 0 σ 2 n 1 0 0 0 1 0 ρ v =........ 0 0 1 = I n n Obviously, the same result holds if all the components of v are mutually independent. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 15

Relevant case #2: if a vector random variablev = v 1 v 2 v n ] T is such that all its components are each other uncorrelated (i.e.,σ ij =ρ ij = 0, i j) and have the same standard deviation (i.e.,σ i =σ, i), then: Σ v = ρ v = σ 2 0 0 0 σ 2 0........ 0 0 σ 2 1 0 0 0 1 0........ 0 0 1 = σ 2 I n n = I n n Obviously, the same result holds if all the components of v are mutually independent. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 16

Gaussian or normal random variables A scalar Gaussian or normal random variablev is such that its p.d.f. turns out to be: f(x) = 1 2πσv exp ( ) (x v) 2 2σ 2 v, with v = Ev] andσ 2 v = Varv] and the notationsv N ( v,σ 2 v) orv G ( v,σ 2 v ) are used. Ifw = αv +β, wherev is a scalar normal random variable andα,β R, then: w N ( w,σ 2 ( w) = N α v +β,α 2 σ 2 ) v note that, ifα= 1 andβ= v, thenw N (0,1), i.e.,w has a normalized p.d.f.: σ v σ v f(x) = 1 ( ) x 2 exp 2π 2 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 17

The probabilityp k that the outcome of a scalar normal random variablev differs from the mean value v no more thank times the standard deviationσ v is equal to: P k = P ( v k σ v v v +k σ v ) = P ( v v k σ v ) = = 1 2 + ( ) x 2 exp dx 2π 2 k In particular, it turns out that: k P k 1 68.3% 2 95.4% 3 99.7% and this allows to define suitable confidence intervals of the random variable v. 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 18

A vector normal random variablev = v 1 v 2 v n ] T is such that its p.d.f. is: ( 1 f(x) = (2π) n/2 exp 1 ) detσ v 2 (x v)t Σ 1 v (x v) where v = Ev] R n andσ v = Varv] R n n. n scalar normal variablesv i,i = 1,...,n, are said to be jointly Gaussian if the vector random variablev = v 1 v 2 v n ] T is normal. Main properties: ifv 1,...,v n are jointly Gaussian, then anyv i,i = 1,...,n, is also normal, while the converse is not always true ifv 1,...,v n are normal and independent, then they are also jointly Gaussian ifv 1,...,v n are jointly Gaussian and uncorrelated, they are also independent 01RKYQW / 01RKYOV - Estimation, Filtering and System Identification 19