Random variables. Florence Perronnin. Univ. Grenoble Alpes, LIG, Inria. September 28, PDF Free Download

Random variables Florence Perronnin Univ. Grenoble Alpes, LIG, Inria September 28, 2018 Florence Perronnin (UGA) Random variables September 28, 2018 1 / 42

Variables aléatoires Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 2 / 42

Variables aléatoires Variables aléatoires Une variable aléatoire X est une application X : Ω F ou F est un ensemble ordonné, t.q. Human langage translation x F, {ω Ω : X (ω) x} F A random variable is a function X : Ω R (or N) characterizing the possible outcomes in Ω such that intervals in R have a probability. X est une v.a. discrète si F est dénombrable (typ. N) X est une v.a. continue si F est indénombrable (typ. R) Random variables are usually written as capital letters: X,... Florence Perronnin (UGA) Random variables September 28, 2018 3 / 42

Variables aléatoires Some examples of random variables X =result of a single die Y =sum of two dice (coin tossing) X = 1 if heads, X = 0 if tails Z = lifetime of a device A n = n-th measurement (collection of random variables) 1 k k n=1 A n is also a random variable (mean) Florence Perronnin (UGA) Random variables September 28, 2018 4 / 42

Variables aléatoires Cumulative Distribution function (C.D.F) Pour toute v.a. X on définit sa fonction de répartition (CDF en anglais): x, F X (x) = P[X x] F X donne les probabilités cumulées. F X décrit la loi ou distribution de probabilité de X. Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 ecdf(g) 0 100 200 300 400 x P(X<=x) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 x Properties: 0 F X (x) 1 x R F X is nondecreasing Florence Perronnin (UGA) Random variables September 28, 2018 5 / 42

Variables aléatoires Cas particulier d une variable aléatoire discrète La fonction de répartition (C.D.F) F X d une variable aléatoire discrète X est une fonction en escalier (sommes cumulées). Figure : Example of CDF : Binom(n,p) Florence Perronnin (UGA) Random variables September 28, 2018 6 / 42

Variables aléatoires Quantiles A quantile is the inverse of the cumulative distribution function. Median The median is the value m such that P[X m] = 1 2. (the value that splits Ω into two equiprobable parts.) It is therefore the 1/2-quantile or the 0.5-quantile. q-quantile The q-quantile is the value x q such that : F X (x q ) = P[X x q ] = q By Erzbischof - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=18 Florence Perronnin (UGA) Random variables September 28, 2018 7 / 42

Variables aléatoires Probability density function (P.D.F) La distribution d une variable aléatoire peut également être définie de façon plus élémentaire par : Cas d une variable discrète Fonction de distribution : f X (k) = P[X = k] pour toute valeur k possible. f (n) = 1 n N Cas d une variable continue (réelle) Densité de probabilité (PDF en anglais) f (x) P[x X y] = y x f X (u)du attention aux bornes d intégration... f X (x)dx = 1 Florence Perronnin (UGA) Random variables September 28, 2018 8 / 42

Variables aléatoires PDF pour les variables aléatoires discrètes Pour une v.a. discrète (à valeurs entières par exemple) on définit la fonction de distribution (ou loi) par f X (k) = P[X = k]. C est la fonction qui donne la probabilité de chaque valeur. Histogram of g Density 0.000 0.001 0.002 0.003 0.004 0.005 0 f X (k) 1 F X (k) = k m= f X (m) f X peut être approchée par l histogramme d un échantillon 0 100 200 300 g Florence Perronnin (UGA) Random variables September 28, 2018 9 / 42

Variables aléatoires PDF for continuous random variables The probability density function f can be defined as the derivative of the CDF F (x) F X (x) F X (a) = P[a X x] = Intuitively: x a f X (u)du => f X (x) = F X x (x) The area below the curve of f between a and b gives the probability that the r.v. falls into the x-axis interval [a, b] The density is the derivative of F X, so: f X (x)dx = df X (x) Florence Perronnin (UGA) Random variables September 28, 2018 10 / 42

Variables aléatoires PDF and CDF Some continuous distribution 0.3 f X (ω) 0.2 0.1 P(3 X 5) = 0.2705 F X (x) = f X (u)du = 1 x f X (v)dv 0.0 total area =1 2.5 0.0 2.5 5.0 7.5 10.0 ω Florence Perronnin (UGA) Random variables September 28, 2018 11 / 42

Variables aléatoires Density and histograms A probability density function f X can also be viewed as the limit of the histogram. Indeed, each box of the histogram gives P(x n X x n+1 ) = F X (x n+1 ) F X (x n ). Gaussian distribution µ = 3, σ = 2 Beta distribution α = 1, β = 5 0.20 5 4 0.15 frequency 0.10 frequency 3 2 0.05 1 0.00 0 5 0 5 10 observations 0.0 0.2 0.4 0.6 0.8 observations Florence Perronnin (UGA) Random variables September 28, 2018 12 / 42

Expected value Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 13 / 42

Expected value Expected value Espérance La moyenne (mean) ou espérance (expectation en anglais) d une v.a. discrète X à valeurs dans I (dénombrable) est: E [X ] = x I xp[x = x] Example Soit 0 < p < 1. Une variable aléatoire X à valeurs dans {0, 1} suit une loi de Bernoulli B(p) ssi P[X = 1] = p et P[x = 0] = 1 p. Son espérance est E [X ] = 0 (1 p) + 1 p = p. Exercise What is the expected value of the sum of two fair dice? Florence Perronnin (UGA) Random variables September 28, 2018 14 / 42

Expected value Expected value of a continuous r.v. Expectation of a continuous random variable X Let X be a continuous r.v. with values in R. Then (or equivalently : E [x] = E [X ] = xdf (x)) xf (x)dx Florence Perronnin (UGA) Random variables September 28, 2018 15 / 42

Expected value Exercise let Y be an exponential r.v. with parameter λ. The PDF is f (x) = λe λx 1 {x 0}. Compute E [Y ]. E [Y ] = = = = 0 0 xf (x)dx (1) xλe λx dx (2) ] 0 0 [ x( e λx ) [ 1 ] λ e λx = 1 0 λ ( e λx )dx (3) (4) Florence Perronnin (UGA) Random variables September 28, 2018 16 / 42

Expected value Expectation of a function of X Let X be an R-valued r.v. and a function g : R R. Then E [g(x)] = g(x)f (x)dx Exercise Let Y be a uniform r.v. over [a, b] (with 0 < a < b). Compute E [ log(y )] Florence Perronnin (UGA) Random variables September 28, 2018 17 / 42

Expected value Existence If the r.v. x has a infinite number of possible values, the expected value is not always finite. St Petersburg paradox How much should the bank ask for the following game? flip a coin until it lands on tails. If tails happens at the 1st round: reward=2$ 2nd round: reward=4$ 3nd round: reward=8$ and so on... Florence Perronnin (UGA) Random variables September 28, 2018 18 / 42

Expected value Let X be the reward and Y the number of coin tosses until tails. E [X ] = 2 n P[Y = n] = n=1 = (1 + 1 +...) = 2 n 1 2 n (Y Geom(1/2) However in reality no bank has an infinite amount of money. Let s assume the bank cannot pay more than 1, 000, 000$. Then if Y log 2 (10 6 ) 19, the player will only earns 10 6 $ after 19 heads events (when he should earn more). E [X ] = 19 n=1 n=1 2 n P[Y = n] + 10 6 P[Y > 19] = = 19 + 106 = 20, 9$ 524288 n=1 2 n 1 2 n + 106 1 2 19 So if the bank can t pay more than a million $, the expected reward of the player is only 20, 9$. A lot less than infinity... Florence Perronnin (UGA) Random variables September 28, 2018 19 / 42

Expected value Existence The example of the St Petersburg paradox shows that a finite random variable can have an infinite expected value. Florence Perronnin (UGA) Random variables September 28, 2018 20 / 42

Expected value Linearity of the Expectation operator Linearity Given two random variables X and Y we have: E [X + Y ] = E [X ] + E [Y ] even if X and Y are dependent variables. Also for any constant c we have: E [cx ] = ce [X ] More generally, if we have n r.v. X 1, X 2,... X n then: [ n ] n E X i = E [X i ] i=1 i=1 Florence Perronnin (UGA) Random variables September 28, 2018 21 / 42

Expected value Exercise (using the linearity of E [.]) Initially n objects are correctly placed. But someone came by and rearranged all the objects. On average, 1 object is correctly placed. let X i be the Bernoulli r.v. such that X i = 1 of object i is correctly placed X i = 0 otherwise Let Y be the total number of correctly placed objects. Then clearly Y = n i=1 X i. Florence Perronnin (UGA) Random variables September 28, 2018 22 / 42

Expected value Expectation exercises Compute the expectation of a binomial random variable Z. Let Y = X E [X ]. Compute E [Y ] as a function of the moments of X. Florence Perronnin (UGA) Random variables September 28, 2018 23 / 42

Expected value Conditional expectation E [Y Z = z] = y y P[Y = y Z = z] Law of total probability: E [Y ] = E [E [Y Z]] where E [Y Z] is a random variable (f (Z)). Florence Perronnin (UGA) Random variables September 28, 2018 24 / 42

Expected value Variance Définition La variance d une v.a. est définie par : var(x ) = E [ (X E [X ]) 2] = E [ X 2] (E [X ]) 2 Exercise : prove the above equality. Cas discret : var(x ) = ( x 2 P[X = x] xp[x = x] x I x I ( ) 2 Cas continu : var(x ) = x 2 f (x)dx xf (x)dx Question x I Pourquoi ne pas définir E [(X E [X ])] au lieu de l espérance du carré des écarts? x I ) 2 Florence Perronnin (UGA) Random variables September 28, 2018 25 / 42

Expected value Lois jointes Deux variables aléatoires X et Y définies sur un même espace de probabilités ont une fonction de répartition jointe : F (x, y) = P[X x, Y y] = P[(X x) (Y y)] Les distributions de X et Y respectives sont appelées lois marginales: F X (x) = P[X x] = F (x, y). lim y + La densité jointe se calcule comme précédemment : f XY = F XY x y soit P[X x, Y y] = x y f XY (u, v)dudv Densité conditionnelle : f X Y (x, y) = f XY (x, y) f Y (y) Florence Perronnin (UGA) Random variables September 28, 2018 26 / 42

Expected value Variables aléatoires multiples Espérance conditionnelle Cas discret : E [X Y = y] = xp[x = x Y = y]. x I Cas continu: E [X Y = y] = xf X Y (x, y)dx x I Loi de l espérance conditionnelle: E [X ] = y J E [X Y = y] P[Y = y] Soit en continu : E [X ] = E [X Y = y] f Y (y)dy R Florence Perronnin (UGA) Random variables September 28, 2018 27 / 42

Expected value Indépendance Indépendance Deux v.a. X et Y sont indépendantes ssi P[X x, Y y] = P[X x] P[Y y]. Si deux v.a. indépendantes ont une espérance alors : E [XY ] = E [Y ] E [Y ] Attention : la réciproque n est pas vraie. Florence Perronnin (UGA) Random variables September 28, 2018 28 / 42

Lois discrètes ultra-classiques Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 29 / 42

Lois discrètes ultra-classiques Bernoulli trials Bernoulli A binary random variable X {0, 1} such that P[X = 1] = p is a Bernoulli random variable B(p). Typical example : coin tossing, test result (pass/fail)... Florence Perronnin (UGA) Random variables September 28, 2018 30 / 42

Lois discrètes ultra-classiques Uniform distribution Une variable X {n 1, n 2, n 3,..., n k } pouvant prendre k valeurs est uniforme si chacune de ces valeurs a la même probabilité de se réaliser : P[X = n i ] = 1 k y=sample.int(10,size=n,replace=true) hist(y,breaks=br,freq=false) Histogram of y Histogram of y Density 0.00 0.05 0.10 0.15 Density 0.00 0.04 0.08 2 4 6 8 10 observed values, N=100 2 4 6 8 10 observed values, N=10000 Florence Perronnin (UGA) Random variables September 28, 2018 31 / 42

Lois discrètes ultra-classiques Binomial distribution Repeating bernoulli trials... Binomial La somme de n v.a. de Bernoulli indépendantes de même paramètre p est une v.a. binomiale Binom(n, p). Sa distribution est: P[X 1 +... + X n = k] = C k n p k (1 p) n k Exercice : Binom(57,8/9) 1 Preuve de la distribution? 2 Montrer que n i=0 P[X = k] = 1 3 Montrer que l espérance est E [X ] = np Density 0.00 0.05 0.10 0.15 40 45 50 55 observed values, N=100000 Florence Perronnin (UGA) Random variables September 28, 2018 32 / 42

Lois discrètes ultra-classiques Geometric distribution (1) Sequence of independent Bernoulli trials B(p) until first success: P[X = k] = p (1 p) k 1 Warning The geometric distribution in the R language (rgeom) is slightly different (number of failures before first success) Exercice: Prove that i=0 P[X = k] = 1 Prove that E [X ] = 1 p Florence Perronnin (UGA) Random variables September 28, 2018 33 / 42

Lois discrètes ultra-classiques Geometric distribution (2) Note that the geometric r.v. has infinite support, i.e. can take an infinite number of values. Frequency 0 100 300 500 Geometric samples with p=0.5 0 2 4 6 8 observed values, N=1000 y=rgeom(1000,prob=0.5) 0.00 br=seq(from=min(y)-0.5,to=max(y)+0.5,by=1) hist(y,xlab="observed values, N=1000",main= Geom(0.5),breaks=br) frequency of observations 0.20 0.15 0.10 0.05 Geometric samples with p = 0.2 0 10 20 30 40 Number of trials before success Florence Perronnin (UGA) Random variables September 28, 2018 34 / 42

Lois discrètes ultra-classiques Poisson distribution Limite de la binomiale pour de grandes valeurs de n (et petites valeurs de p) Paramètre: λ Prove that E [X ] = λ P[X = k] = λk k! e λx Link with binomial? (hint: use λ = np) rpois(n,lambda) Florence Perronnin (UGA) Random variables September 28, 2018 35 / 42

Lois discrètes ultra-classiques Other classical distributions that you should know about Hypergeometric Negative binomial distribution Multinomial distribution Florence Perronnin (UGA) Random variables September 28, 2018 36 / 42

Most famous continuous random variables Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 37 / 42

Most famous continuous random variables Uniform r.v Let X be a continuous uniform random variable over an interval I. Why can t I be R? 1.00 Uniform distribution over [ 3, 8 ] 0.75 f X (u) 0.50 P(5<X<6)= 0.2 0.25 0.00 2.5 5.0 7.5 10.0 u If I = [a, b] then: f (x) = 1 b a 1 {x [a,b]} Florence Perronnin (UGA) Random variables September 28, 2018 38 / 42

Most famous continuous random variables Uniform distribution Exercise What is the CDF of a uniform r.v. U over [a, b]? P[X u] 1.00 0.75 0.50 0.25 0.00 Uniform distribution over [ 3, 8 ] 2.5 5.0 7.5 10.0 u P[U x] = = = 1 {x [a,b]} x a b a 1 b a dx [ x b a = x a b a ] x a dx Florence Perronnin (UGA) Random variables September 28, 2018 39 / 42

Most famous continuous random variables Exponential r.v Let X be a continuous exponential random variable over R+ with parameter λ f (x) = λe λx F (x) = 1 e λx Exponential distribution with λ= 0.3 CDF of Exponential distribution with λ= 0.3 0.3 1.00 0.75 f X(ω) 0.2 0.1 P(3 X 5) = 0.1834 F X (u)=p[x u] 0.50 0.25 0.0 0.00 2.5 0.0 2.5 5.0 7.5 10.0 ω 0 5 10 15 u Florence Perronnin (UGA) Random variables September 28, 2018 40 / 42

Most famous continuous random variables Playing with exponential distribution Memoryless Prove that an exponential r.v. has no memory: P[X > x + y x > x] = P[X > y] The first event Let X and Y be two exponential r.v.s with respective parameters λ and µ. (for instance, the time before failure of two devices). Let Z = min(x, Y ) (the time of first failure). What is the distribution of Z? Florence Perronnin (UGA) Random variables September 28, 2018 41 / 42

Most famous continuous random variables Gaussian r.v N (µ, σ) f (x) = 1 σ 2π e 1 2( x µ σ ) 2 Gaussian distribution with µ=2, σ=3 CDFof Gaussian distribution with µ=2, σ=3 0.3 1.00 0.2 0.75 f X(ω) P(3 X 5) = 0.2108 F X(u) 0.50 0.1 0.25 0.0 0.00 5 0 5 10 ω 5 0 5 10 u Florence Perronnin (UGA) Random variables September 28, 2018 42 / 42

Random variables. Florence Perronnin. Univ. Grenoble Alpes, LIG, Inria. September 28, 2018