Lecture 5. Random variable and distribution of probability

Itroductio to theory of probability ad statistics Lecture 5. Radom variable ad distributio of probability prof. dr hab.iż. Katarzya Zarzewsa Katedra Eletroii, AGH e-mail: za@agh.edu.pl http://home.agh.edu.pl/~za Itroductio to probability ad statistics, Lecture 5

Outlie: Cocept of radom variable Quatitative descriptio of radom variables Eamples of probability distributios Itroductio to probability ad statistics, Lecture 5

The cocept of radom variable Radom variable is a fuctio X, that attributes a real value to a certai results of a radom eperimet. { e, e, X : R X( e ) i i } R Eamples: ) Coi toss: evet head taes a value of ; evet tails - 0. ) Products: evet failure - 0, well-performig 3) Dice:, etc. ) Iterval [a, b] a choice of a poit of a coordiate is attributed a value, e.g. si (3+7) etc.. Itroductio to probability ad statistics, Lecture 5 3

The cocept of radom variable Radom variable Discrete Whe the values of radom variable X are isolated poits o a umber lie Toss of a coi Trasmissio errors Faulty elemets o a productio lie A umber of coectios comig i 5 miutes Cotiuous Whe the values of radom variable cover all poits of a iterval Electrical curret, I Temperature, T Pressure, p Itroductio to probability ad statistics, Lecture 5

Quatitative descriptio of radom variables Probability distributios ad probability mass fuctios (for discrete radom variables) Probability desity fuctios (for cotiuous variables) Cumulative distributio fuctio (distributio fuctio for discrete ad cotiuous variables) Characteristic quatities (epected value, variace, quatiles, etc.) Itroductio to probability ad statistics, Lecture 5 5

Distributio of radom variable Distributio of radom variable (probability distributio for discrete variables) is a set of pairs ( i, p i ) where i is a value of radom variable X ad p i is a probability, that a radom variable X will tae a value i Eample. Probability mass fuctio for a sigle toss of coi. Evet correspodig to heads is attributed =; tails meas =0. p( X ) p( ) 0 p( X 0) p( ) Itroductio to probability ad statistics, Lecture 5

Distributio of radom variable Eample. cot. Probability mass fuctio for a sigle toss of coi is give by a set of the followig pairs: { (, ), (0, )} p(x),0 0,9 0,8 0,7 0, 0,5 0, 0,3 0, 0, 0,0 prawdopodob. zdarzeia 0,0 0,5,0 Radom variable whe discrete etails probability distributio also discrete. X Itroductio to probability ad statistics, Lecture 5 7

Probability desity fuctio Probability fuctio is itroduced for cotiuous variables; it is related to probability i the followig way: f ( ) d P( X d) Properties of probability desity fuctio:. f ( ) 0. f ( ) is ormalized f ( ) d 3. f() has a measure of / Itroductio to probability ad statistics, Lecture 5 8

Probability desity fuctio Directly from a defiitio of probability desity fuctio f() we get a formula of calculatig the probability that the radom variable will assume a value withi a iterval of [a,b]: P ( a X b) f ( ) d b a Questio: what is a probability of =a is icorrect!!! Itroductio to probability ad statistics, Lecture 5 9

Probability desity fuctio Eample. Let the cotiuous radom variable X deote the curret measured i a thi copper wire i ma. Assume that the rage of X is [0, 0 ma], ad assume that the probability desity fuctio of X is f()=0,05 for 0 0. What is the probability that a curret measured is less tha 0 ma. 0,0 0,08 gestosc prawdop. 0,0 f() 0,0 0 P( 0 X 0) f ( ) d 0,05 d 0 0 0 Itroductio to probability ad statistics, Lecture 5 0 0,5 0,0 0,00 0 0 0 30 X

Quatitative descriptio of radom variables Cumulative distributio fuctio (CDF) F() is a probability of a evet that the radom variable X will assume a value smaller tha or equal to (at most ) Eample. cot. CDF of coi toss: F( ) P( X ) F( 0) P( X 0) F( ) P( X ) Itroductio to probability ad statistics, Lecture 5

Properties of CDF. 0 F ( ). F ( ) 3. F ( ) 0. y F ( ) F ( y) 5. F() has o uit. f ( ) df ( ) d o-decreasig fuctio Relatioship betwee cumulative distributio fuctio ad probability desity (for cotiuous variable) Itroductio to probability ad statistics, Lecture 5

Eample.3 CDF of discrete variable F ( ) P( X ) f ( ) f ( i ) probability mass fuctio Determie probability mass fuctio of X from the followig cumulative distributio fuctio F() F ( ) 0 0. for 0.7 for for for 0 0 From the plot, the oly poits to receive f() 0 are -, 0,. f ( ) 0. 0 0. f ( 0) 0.7 0. 0. 5 f ( ).0 0.7 0. 3 Itroductio to probability ad statistics, Lecture 5 3 i i

CDF for cotiuous variable t F ( t) P( X t) f ( ) d Cumulative distributio fuctio F(t) of cotiuous variable is a odecreasig cotiuous fuctio ad ca be calculated as a area uder desity probability fuctio f() over a iterval from - to t. Itroductio to probability ad statistics, Lecture 5

Numerical descriptors Parameters of Positio Quatile (e.g. media, quartile) Mode Variace (stadard deviatio) Rage Dispersio Epected value (average) Itroductio to probability ad statistics, Lecture 5 5

Quatile q represets a value of radom variable for which the cumulative distributio fuctio taes a value of q. F( q ) P( X q ) q f ( u) du Media i.e. 0.5 is the most frequetly used quatile. I eample. curret I=0 ma is a media of distributio. Eample. Numerical descriptors For a discrete distributio : 9,,,,,, 3, 5,, 7 media is (middle value or arithmetic average of two middle values) q Itroductio to probability ad statistics, Lecture 5

Numerical descriptors Mode represets the most frequetly occurrig value of radom variable ( at which probability distributio attais a maimum) Uimodal distributio has oe mode (multimodal distributios more tha oe mode) I eample.: = 9,,,,,, 3, 5,, 7 mode equals to (which appears 3 times, i.e., the most frequetly) Itroductio to probability ad statistics, Lecture 5 7

Arithmetic average: Average value = i - belogs to a set of elemets i= I eample.: i = 9,,,,,, 3, 5,, 7, the arithmetic average is.7 i Itroductio to probability ad statistics, Lecture 5 8

Arithmetic average May elemets havig the same value, we divide the set ito classes cotaiig idetical elemets Eample.5 f 0. 0.0357.3 0.9. 0.07 3. 8 0.857. 0.9 7.5 3 0.07 9.3 0.0357. 0.07. 0.07 5. 0.0357 Sum 8 = = f p + f = 0. 0.0 +.3 0. + + 5. 0.0 = p where: f =, p umber of classes p Normalizatio coditio + + =5.77 f f = Itroductio to probability ad statistics, Lecture 5 9

Momets of distributio fuctios Momet of the order with respect to 0 m ( ) ( i 0) p( 0 i i ) for discrete variables m ( 0) ( 0) f ( ) d for cotiuous variables The most importat are the momets calculated with respect to 0 =0 (m ) ad X 0 =m the first momet (m is called the epected value) these are cetral momets µ. Itroductio to probability ad statistics, Lecture 5 0

Epected value Symbols: m, E(X), µ,, ˆ E( X ) i i p i for discrete variables E( X ) f ( ) d for cotiuous variables Itroductio to probability ad statistics, Lecture 5

Properties of E(X) E(X) is a liear operator, i.e.:. I a cosequece: E( Ci X i ) CiE( X i ) E(C)= C E(CX)= CE(X) E(X +X )=E(X )+E(X ). For idepedet variables X, X, X E X ) E( X ) Variables are idepedet whe: i ( i i i i f ( X, X,..., X ) f( X) f( X )... i ( X f ) Itroductio to probability ad statistics, Lecture 5

Properties of E(X) 3. For a fuctio of X; Y= Y(X) the epected value E(Y) ca be foud o the basis of distributio of variable X without ecessity of looig for distributio of f(y) E ( Y ) y( i ) i p i for discrete variables E( Y ) y( ) f ( ) d for cotiuous variables Ay momet m ( 0 ) ca be treated as a epected value of a fuctio Y(X)=(X- 0 ) m ( 0) ( 0) f ( ) d E(( 0) ) Itroductio to probability ad statistics, Lecture 5 3

Variace VARIANCE (dispersio) symbols: σ (X), var(x), V(X), D(X). Stadard deviatio σ() ( X ) i pi ( E( X i )) for discrete variables ( X ) f ( )( E( X ) d for cotiuous variables Variace (or the stadard deviatio) is a measure of scatter of radom variables aroud the epected value. ( X ) E( X ) E ( X ) Itroductio to probability ad statistics, Lecture 5

Properties of σ (X) Variace ca be calculated usig epected values oly:. ( X ) E( X ) E ( X ) I a cosequece we get: σ (C)= 0 σ (CX)= C σ (X) σ (C X+C )= C σ (X). For idepedet variables X, X, X ( C X ) C ( X i i i i i ) Itroductio to probability ad statistics, Lecture 5 5

UNIFORM DISTRIBUTION a b Uiform distributio Itroductio to probability ad statistics, Lecture 5

Czebyszew iequality Iterpretatio of variace results from Czebyszew theorem: Theorem: P a X E( X ) a. ( X ) Probability of the radom variable X to be shifted from the epected value E(X) by a-times stadard deviatio is smaller or equal to /a This theorem is valid for all distributios that have a variace ad the epected value. Number a is ay positive real value. Itroductio to probability ad statistics, Lecture 5 7

Variace as a measure of data scatter Big scatter of data Smaller scatter of data Itroductio to probability ad statistics, Lecture 5 8

Rage as a measure of scatter RANGE = ma - mi Itroductio to probability ad statistics, Lecture 5 9

Practical ways of calculatig variace Variace of -elemet sample: s = i i= average Variace of N-elemet populatio : σ μ = N N i= ep ected i μ value Itroductio to probability ad statistics, Lecture 5 30

Practical ways of calculatig stadard deviatio Stadard deviatio of sample (or: stadard ucertaity): s = i= i Stadard deviatio (populatio): σ = N N i= i μ Itroductio to probability ad statistics, Lecture 5 3

Eamples of probability distributios discrete variables Two-poit distributio (zero-oe), e.g. coi toss, head = failure =0, tail = success =, p probability of success, its distributio: Biomial (Beroulli) i 0 p i -p p p p ( p), 0,,, where 0<p<; X={0,,, } umber of successes whe -times sampled with replacemet For = two-poit distributio Itroductio to probability ad statistics, Lecture 5 3

Biomial distributio - assumptios Radom eperimet cosists of Beroulli trials :. Each trial is idepedet of others.. Each trial ca have oly two results: success ad failure (biary!). 3. Probability of success p is costat. Probability p of a evet that radom variable X will be equal to the umber of -successes at trials. p p ( p), 0,,, Itroductio to probability ad statistics, Lecture 5 33

Pascal s triagle Itroductio to probability ad statistics, Lecture 5 3 0 0 0 0 0! )! (! Symbol b a b a 0 ) ( Newto s biomial

Pascal s triagle + 3 3 5 0 0 5 5 0 5 = 0 = = = 3 = = 5 = Itroductio to probability ad statistics, Lecture 5 35

Beroulli distributio Eample. Probability that i a compay the daily use of water will ot eceed a certai level is p=3/. We moitor a use of water for days. Calculate a probability the daily use of water will ot eceed the set-up limit i 0,,,, cosecutive days, respectively. Data: p 3 q N 0,,, Itroductio to probability ad statistics, Lecture 5 3

Itroductio to probability ad statistics, Lecture 5 37 0 5 3 3 5 0 3 ) ( 3 5 5 ) ( 5 3 ) ( 3 3 3 ) ( 3 3 ) ( 3 ) ( 3 0 ) 0 ( 0 P P P P P P P Beroulli distributio

Itroductio to probability ad statistics, Lecture 5 38 0.78 (0) 79 9 9 9 3 () 0.35 (0) 58 3 9 9 3 (5) 5 0.97 (0) 5 9 9 5 3 5 () 0.3 (0) 50 3 9 0 3 0 (3) 3 0.033 (0) 35 9 5 3 5 () 0.00 (0) 8 3 3 () 0.000 (0) 0 0 5 3 3 5 P P P P P P P P P P P P P Beroulli distributio

Beroulli distributio 0, 0,35 P() 0,35 0,3 0,97 0,5 0, 0,5 0, 0,3 0,78 0,05 0,033 0 0,000 0,00 0 3 5 7 Maimum for =5 Itroductio to probability ad statistics, Lecture 5 39

Beroulli distributio Itroductio to probability ad statistics, Lecture 5 0

Beroulli distributio Epected value E( X ) p Variace V ( X ) p( p) Itroductio to probability ad statistics, Lecture 5

Errors i trasmissio Eample.7 Digital chael of iformatio trasfer is proe to errors i sigle bits. Assume that the probability of sigle bit error is p=0. Cosecutive errors i trasmissios are idepedet. Let X deote the radom variable, of values equal to the umber of bits i error, i a sequece of bits. E - bit error, O - o error OEOE correspods to X=; for EEOO matter) - X= (order does ot Itroductio to probability ad statistics, Lecture 5

Errors i trasmissio Eample.7 cd For X= we get the followig results: {EEOO, EOEO, EOOE, OEEO, OEOE, OOEE} What is a probability of P(X=), i.e., two bits will be set with error? Evets are idepedet, thus P(EEOO)=P(E)P(E)P(O)P(O)=(0.) (0.9) = 0.008 Evets are mutually ehaustive ad have the same probability, hece P(X=)= P(EEOO)= (0.) (0.9) = (0.008)=0.08 Itroductio to probability ad statistics, Lecture 5 3

Errors i trasmissio Eample.7 cotiued! ()!! Therefore, P(X=)= (0.) (0.9) is give by Beroulli distributio P( X ) p ( p), 0,,,3,, p 0. P(X = 0) = 0,5 P(X = ) = 0,9 P(X = ) = 0,08 P(X = 3) = 0,003 P(X = ) = 0,000 Itroductio to probability ad statistics, Lecture 5

Poisso s distributio Itroductio to probability ad statistics, Lecture 5 5 We itroduce a parameter λ=p (E(X) = λ) p p X P ) ( ) ( Let us assume that icreases while p decreases, but λ=p remais costat. Beroulli distributio chages to Poisso s distributio.! ) ( lim lim e X P

Poisso s distributio It is oe of the rare cases where epected value equals to variace: E ( X ) p Why? V ( X ) lim ( p p ) p, p0 Itroductio to probability ad statistics, Lecture 5

0, 0,35 0,3 0,5 0, 0,5 0, 0,05 0 Poisso s distributio lambda= lambda=5 lambda=0 0 5 0 5 0 5 0 3 5 p(x) Beroulli =50; p=0.0 0.3 0.37 0.8 0.0 0.0 0.003 0.000 Poisso: λ= 0.38 0.38 0.8 0.0 0.05 0.003 0.00 (- iteger, ifiite; 0) For big Beroulli distributio resembles Poisso s distributio Itroductio to probability ad statistics, Lecture 5 7

Normal distributio (Gaussia) Limitig case (ormal distributio) The most widely used model for the distributio of radom variable is a ormal distributio. Cetral limit theorem formulated i 733 by De Moivre Wheever a radom eperimet is replicated, the radom variable that equals the average (or total) result over the replicas teds to have a ormal distributio as the umber of replicas becomes large. Itroductio to probability ad statistics, Lecture 5 8

Normal distributio (Gaussia) A radom variable X with probability desity fuctio f(): ( f ( ) ep, where - is a ormal radom variable with two parameters:, We ca show that E(X)=μ ad V(X)=σ Notatio N(μ,σ) is used to deote this distributio Itroductio to probability ad statistics, Lecture 5 9

Normal distributio (Gaussia) Epected value, maimum of desity probability (mode) ad media overlap (=μ). Symmetric curve (Gaussia curve is bell shaped). Variace is a measure of the width of distributio. At =+σ ad =- σ there are the iflectio poits of N(0, σ). Itroductio to probability ad statistics, Lecture 5 50

Normal distributio (Gaussia) Is used i eperimetal physics ad describes distributio of radom errors. Stadard deviatio σ is a measure of radom ucertaity. Measuremets with larger σ correspod to bigger scatter of data aroud the average value ad thus have less precisio. Itroductio to probability ad statistics, Lecture 5 5

Stadard ormal distributio A ormal radom variable Z with probability desity N(z): N( z) ep z, where - z is called a stadard ormal radom variable N(0,) E( Z) 0, V ( Z) Defiitio of stadard ormal variable Z X Itroductio to probability ad statistics, Lecture 5 5

Stadard ormal distributio Sigificace level Cofidece level Advatages of stadardizatio: Tables of values of probability desity ad CDF ca be costructed for N(0,). A ew variable of the N(µ,σ) distributio ca be created by a simple trasformatio X= σ*z+µ By stadardizatio we shift all origial radom variables to the regio close to zero ad we rescale the -ais. The uit chages to stadard deviatio. Therefore, we ca compare differet distributio. Itroductio to probability ad statistics, Lecture 5 53

Calculatios of probability (Gaussia distributio) Φ() 8.% pow. (-σ, + σ) (-σ, + σ) (-3σ, + 3σ) P(μ- <X< μ+) = 0,87 (about /3 of results) P(μ- <X< μ+) = 0,955 P(μ- <X< μ+) = 0,9973 (almost all) Itroductio to probability ad statistics, Lecture 5 5