Random variables Florence Perronnin Univ. Grenoble Alpes, LIG, Inria September 28, 2018 Florence Perronnin (UGA) Random variables September 28, 2018 1 / 42
Variables aléatoires Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 2 / 42
Variables aléatoires Variables aléatoires Une variable aléatoire X est une application X : Ω F ou F est un ensemble ordonné, t.q. Human langage translation x F, {ω Ω : X (ω) x} F A random variable is a function X : Ω R (or N) characterizing the possible outcomes in Ω such that intervals in R have a probability. X est une v.a. discrète si F est dénombrable (typ. N) X est une v.a. continue si F est indénombrable (typ. R) Random variables are usually written as capital letters: X,... Florence Perronnin (UGA) Random variables September 28, 2018 3 / 42
Variables aléatoires Some examples of random variables X =result of a single die Y =sum of two dice (coin tossing) X = 1 if heads, X = 0 if tails Z = lifetime of a device A n = n-th measurement (collection of random variables) 1 k k n=1 A n is also a random variable (mean) Florence Perronnin (UGA) Random variables September 28, 2018 4 / 42
Variables aléatoires Cumulative Distribution function (C.D.F) Pour toute v.a. X on définit sa fonction de répartition (CDF en anglais): x, F X (x) = P[X x] F X donne les probabilités cumulées. F X décrit la loi ou distribution de probabilité de X. Fn(x) 0.0 0.2 0.4 0.6 0.8 1.0 ecdf(g) 0 100 200 300 400 x P(X<=x) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 x Properties: 0 F X (x) 1 x R F X is nondecreasing Florence Perronnin (UGA) Random variables September 28, 2018 5 / 42
Variables aléatoires Cas particulier d une variable aléatoire discrète La fonction de répartition (C.D.F) F X d une variable aléatoire discrète X est une fonction en escalier (sommes cumulées). Figure : Example of CDF : Binom(n,p) Florence Perronnin (UGA) Random variables September 28, 2018 6 / 42
Variables aléatoires Quantiles A quantile is the inverse of the cumulative distribution function. Median The median is the value m such that P[X m] = 1 2. (the value that splits Ω into two equiprobable parts.) It is therefore the 1/2-quantile or the 0.5-quantile. q-quantile The q-quantile is the value x q such that : F X (x q ) = P[X x q ] = q By Erzbischof - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=18 Florence Perronnin (UGA) Random variables September 28, 2018 7 / 42
Variables aléatoires Probability density function (P.D.F) La distribution d une variable aléatoire peut également être définie de façon plus élémentaire par : Cas d une variable discrète Fonction de distribution : f X (k) = P[X = k] pour toute valeur k possible. f (n) = 1 n N Cas d une variable continue (réelle) Densité de probabilité (PDF en anglais) f (x) P[x X y] = y x f X (u)du attention aux bornes d intégration... f X (x)dx = 1 Florence Perronnin (UGA) Random variables September 28, 2018 8 / 42
Variables aléatoires PDF pour les variables aléatoires discrètes Pour une v.a. discrète (à valeurs entières par exemple) on définit la fonction de distribution (ou loi) par f X (k) = P[X = k]. C est la fonction qui donne la probabilité de chaque valeur. Histogram of g Density 0.000 0.001 0.002 0.003 0.004 0.005 0 f X (k) 1 F X (k) = k m= f X (m) f X peut être approchée par l histogramme d un échantillon 0 100 200 300 g Florence Perronnin (UGA) Random variables September 28, 2018 9 / 42
Variables aléatoires PDF for continuous random variables The probability density function f can be defined as the derivative of the CDF F (x) F X (x) F X (a) = P[a X x] = Intuitively: x a f X (u)du => f X (x) = F X x (x) The area below the curve of f between a and b gives the probability that the r.v. falls into the x-axis interval [a, b] The density is the derivative of F X, so: f X (x)dx = df X (x) Florence Perronnin (UGA) Random variables September 28, 2018 10 / 42
Variables aléatoires PDF and CDF Some continuous distribution 0.3 f X (ω) 0.2 0.1 P(3 X 5) = 0.2705 F X (x) = f X (u)du = 1 x f X (v)dv 0.0 total area =1 2.5 0.0 2.5 5.0 7.5 10.0 ω Florence Perronnin (UGA) Random variables September 28, 2018 11 / 42
Variables aléatoires Density and histograms A probability density function f X can also be viewed as the limit of the histogram. Indeed, each box of the histogram gives P(x n X x n+1 ) = F X (x n+1 ) F X (x n ). Gaussian distribution µ = 3, σ = 2 Beta distribution α = 1, β = 5 0.20 5 4 0.15 frequency 0.10 frequency 3 2 0.05 1 0.00 0 5 0 5 10 observations 0.0 0.2 0.4 0.6 0.8 observations Florence Perronnin (UGA) Random variables September 28, 2018 12 / 42
Expected value Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 13 / 42
Expected value Expected value Espérance La moyenne (mean) ou espérance (expectation en anglais) d une v.a. discrète X à valeurs dans I (dénombrable) est: E [X ] = x I xp[x = x] Example Soit 0 < p < 1. Une variable aléatoire X à valeurs dans {0, 1} suit une loi de Bernoulli B(p) ssi P[X = 1] = p et P[x = 0] = 1 p. Son espérance est E [X ] = 0 (1 p) + 1 p = p. Exercise What is the expected value of the sum of two fair dice? Florence Perronnin (UGA) Random variables September 28, 2018 14 / 42
Expected value Expected value of a continuous r.v. Expectation of a continuous random variable X Let X be a continuous r.v. with values in R. Then (or equivalently : E [x] = E [X ] = xdf (x)) xf (x)dx Florence Perronnin (UGA) Random variables September 28, 2018 15 / 42
Expected value Exercise let Y be an exponential r.v. with parameter λ. The PDF is f (x) = λe λx 1 {x 0}. Compute E [Y ]. E [Y ] = = = = 0 0 xf (x)dx (1) xλe λx dx (2) ] 0 0 [ x( e λx ) [ 1 ] λ e λx = 1 0 λ ( e λx )dx (3) (4) Florence Perronnin (UGA) Random variables September 28, 2018 16 / 42
Expected value Expectation of a function of X Let X be an R-valued r.v. and a function g : R R. Then E [g(x)] = g(x)f (x)dx Exercise Let Y be a uniform r.v. over [a, b] (with 0 < a < b). Compute E [ log(y )] Florence Perronnin (UGA) Random variables September 28, 2018 17 / 42
Expected value Existence If the r.v. x has a infinite number of possible values, the expected value is not always finite. St Petersburg paradox How much should the bank ask for the following game? flip a coin until it lands on tails. If tails happens at the 1st round: reward=2$ 2nd round: reward=4$ 3nd round: reward=8$ and so on... Florence Perronnin (UGA) Random variables September 28, 2018 18 / 42
Expected value Let X be the reward and Y the number of coin tosses until tails. E [X ] = 2 n P[Y = n] = n=1 = (1 + 1 +...) = 2 n 1 2 n (Y Geom(1/2) However in reality no bank has an infinite amount of money. Let s assume the bank cannot pay more than 1, 000, 000$. Then if Y log 2 (10 6 ) 19, the player will only earns 10 6 $ after 19 heads events (when he should earn more). E [X ] = 19 n=1 n=1 2 n P[Y = n] + 10 6 P[Y > 19] = = 19 + 106 = 20, 9$ 524288 n=1 2 n 1 2 n + 106 1 2 19 So if the bank can t pay more than a million $, the expected reward of the player is only 20, 9$. A lot less than infinity... Florence Perronnin (UGA) Random variables September 28, 2018 19 / 42
Expected value Existence The example of the St Petersburg paradox shows that a finite random variable can have an infinite expected value. Florence Perronnin (UGA) Random variables September 28, 2018 20 / 42
Expected value Linearity of the Expectation operator Linearity Given two random variables X and Y we have: E [X + Y ] = E [X ] + E [Y ] even if X and Y are dependent variables. Also for any constant c we have: E [cx ] = ce [X ] More generally, if we have n r.v. X 1, X 2,... X n then: [ n ] n E X i = E [X i ] i=1 i=1 Florence Perronnin (UGA) Random variables September 28, 2018 21 / 42
Expected value Exercise (using the linearity of E [.]) Initially n objects are correctly placed. But someone came by and rearranged all the objects. On average, 1 object is correctly placed. let X i be the Bernoulli r.v. such that X i = 1 of object i is correctly placed X i = 0 otherwise Let Y be the total number of correctly placed objects. Then clearly Y = n i=1 X i. Florence Perronnin (UGA) Random variables September 28, 2018 22 / 42
Expected value Expectation exercises Compute the expectation of a binomial random variable Z. Let Y = X E [X ]. Compute E [Y ] as a function of the moments of X. Florence Perronnin (UGA) Random variables September 28, 2018 23 / 42
Expected value Conditional expectation E [Y Z = z] = y y P[Y = y Z = z] Law of total probability: E [Y ] = E [E [Y Z]] where E [Y Z] is a random variable (f (Z)). Florence Perronnin (UGA) Random variables September 28, 2018 24 / 42
Expected value Variance Définition La variance d une v.a. est définie par : var(x ) = E [ (X E [X ]) 2] = E [ X 2] (E [X ]) 2 Exercise : prove the above equality. Cas discret : var(x ) = ( x 2 P[X = x] xp[x = x] x I x I ( ) 2 Cas continu : var(x ) = x 2 f (x)dx xf (x)dx Question x I Pourquoi ne pas définir E [(X E [X ])] au lieu de l espérance du carré des écarts? x I ) 2 Florence Perronnin (UGA) Random variables September 28, 2018 25 / 42
Expected value Lois jointes Deux variables aléatoires X et Y définies sur un même espace de probabilités ont une fonction de répartition jointe : F (x, y) = P[X x, Y y] = P[(X x) (Y y)] Les distributions de X et Y respectives sont appelées lois marginales: F X (x) = P[X x] = F (x, y). lim y + La densité jointe se calcule comme précédemment : f XY = F XY x y soit P[X x, Y y] = x y f XY (u, v)dudv Densité conditionnelle : f X Y (x, y) = f XY (x, y) f Y (y) Florence Perronnin (UGA) Random variables September 28, 2018 26 / 42
Expected value Variables aléatoires multiples Espérance conditionnelle Cas discret : E [X Y = y] = xp[x = x Y = y]. x I Cas continu: E [X Y = y] = xf X Y (x, y)dx x I Loi de l espérance conditionnelle: E [X ] = y J E [X Y = y] P[Y = y] Soit en continu : E [X ] = E [X Y = y] f Y (y)dy R Florence Perronnin (UGA) Random variables September 28, 2018 27 / 42
Expected value Indépendance Indépendance Deux v.a. X et Y sont indépendantes ssi P[X x, Y y] = P[X x] P[Y y]. Si deux v.a. indépendantes ont une espérance alors : E [XY ] = E [Y ] E [Y ] Attention : la réciproque n est pas vraie. Florence Perronnin (UGA) Random variables September 28, 2018 28 / 42
Lois discrètes ultra-classiques Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 29 / 42
Lois discrètes ultra-classiques Bernoulli trials Bernoulli A binary random variable X {0, 1} such that P[X = 1] = p is a Bernoulli random variable B(p). Typical example : coin tossing, test result (pass/fail)... Florence Perronnin (UGA) Random variables September 28, 2018 30 / 42
Lois discrètes ultra-classiques Uniform distribution Une variable X {n 1, n 2, n 3,..., n k } pouvant prendre k valeurs est uniforme si chacune de ces valeurs a la même probabilité de se réaliser : P[X = n i ] = 1 k y=sample.int(10,size=n,replace=true) hist(y,breaks=br,freq=false) Histogram of y Histogram of y Density 0.00 0.05 0.10 0.15 Density 0.00 0.04 0.08 2 4 6 8 10 observed values, N=100 2 4 6 8 10 observed values, N=10000 Florence Perronnin (UGA) Random variables September 28, 2018 31 / 42
Lois discrètes ultra-classiques Binomial distribution Repeating bernoulli trials... Binomial La somme de n v.a. de Bernoulli indépendantes de même paramètre p est une v.a. binomiale Binom(n, p). Sa distribution est: P[X 1 +... + X n = k] = C k n p k (1 p) n k Exercice : Binom(57,8/9) 1 Preuve de la distribution? 2 Montrer que n i=0 P[X = k] = 1 3 Montrer que l espérance est E [X ] = np Density 0.00 0.05 0.10 0.15 40 45 50 55 observed values, N=100000 Florence Perronnin (UGA) Random variables September 28, 2018 32 / 42
Lois discrètes ultra-classiques Geometric distribution (1) Sequence of independent Bernoulli trials B(p) until first success: P[X = k] = p (1 p) k 1 Warning The geometric distribution in the R language (rgeom) is slightly different (number of failures before first success) Exercice: Prove that i=0 P[X = k] = 1 Prove that E [X ] = 1 p Florence Perronnin (UGA) Random variables September 28, 2018 33 / 42
Lois discrètes ultra-classiques Geometric distribution (2) Note that the geometric r.v. has infinite support, i.e. can take an infinite number of values. Frequency 0 100 300 500 Geometric samples with p=0.5 0 2 4 6 8 observed values, N=1000 y=rgeom(1000,prob=0.5) 0.00 br=seq(from=min(y)-0.5,to=max(y)+0.5,by=1) hist(y,xlab="observed values, N=1000",main= Geom(0.5),breaks=br) frequency of observations 0.20 0.15 0.10 0.05 Geometric samples with p = 0.2 0 10 20 30 40 Number of trials before success Florence Perronnin (UGA) Random variables September 28, 2018 34 / 42
Lois discrètes ultra-classiques Poisson distribution Limite de la binomiale pour de grandes valeurs de n (et petites valeurs de p) Paramètre: λ Prove that E [X ] = λ P[X = k] = λk k! e λx Link with binomial? (hint: use λ = np) rpois(n,lambda) Florence Perronnin (UGA) Random variables September 28, 2018 35 / 42
Lois discrètes ultra-classiques Other classical distributions that you should know about Hypergeometric Negative binomial distribution Multinomial distribution Florence Perronnin (UGA) Random variables September 28, 2018 36 / 42
Most famous continuous random variables Outline 1 Variables aléatoires 2 Expected value 3 Lois discrètes ultra-classiques 4 Most famous continuous random variables Florence Perronnin (UGA) Random variables September 28, 2018 37 / 42
Most famous continuous random variables Uniform r.v Let X be a continuous uniform random variable over an interval I. Why can t I be R? 1.00 Uniform distribution over [ 3, 8 ] 0.75 f X (u) 0.50 P(5<X<6)= 0.2 0.25 0.00 2.5 5.0 7.5 10.0 u If I = [a, b] then: f (x) = 1 b a 1 {x [a,b]} Florence Perronnin (UGA) Random variables September 28, 2018 38 / 42
Most famous continuous random variables Uniform distribution Exercise What is the CDF of a uniform r.v. U over [a, b]? P[X u] 1.00 0.75 0.50 0.25 0.00 Uniform distribution over [ 3, 8 ] 2.5 5.0 7.5 10.0 u P[U x] = = = 1 {x [a,b]} x a b a 1 b a dx [ x b a = x a b a ] x a dx Florence Perronnin (UGA) Random variables September 28, 2018 39 / 42
Most famous continuous random variables Exponential r.v Let X be a continuous exponential random variable over R+ with parameter λ f (x) = λe λx F (x) = 1 e λx Exponential distribution with λ= 0.3 CDF of Exponential distribution with λ= 0.3 0.3 1.00 0.75 f X(ω) 0.2 0.1 P(3 X 5) = 0.1834 F X (u)=p[x u] 0.50 0.25 0.0 0.00 2.5 0.0 2.5 5.0 7.5 10.0 ω 0 5 10 15 u Florence Perronnin (UGA) Random variables September 28, 2018 40 / 42
Most famous continuous random variables Playing with exponential distribution Memoryless Prove that an exponential r.v. has no memory: P[X > x + y x > x] = P[X > y] The first event Let X and Y be two exponential r.v.s with respective parameters λ and µ. (for instance, the time before failure of two devices). Let Z = min(x, Y ) (the time of first failure). What is the distribution of Z? Florence Perronnin (UGA) Random variables September 28, 2018 41 / 42
Most famous continuous random variables Gaussian r.v N (µ, σ) f (x) = 1 σ 2π e 1 2( x µ σ ) 2 Gaussian distribution with µ=2, σ=3 CDFof Gaussian distribution with µ=2, σ=3 0.3 1.00 0.2 0.75 f X(ω) P(3 X 5) = 0.2108 F X(u) 0.50 0.1 0.25 0.0 0.00 5 0 5 10 ω 5 0 5 10 u Florence Perronnin (UGA) Random variables September 28, 2018 42 / 42