Quick Review of Probability - PDF Free Download

Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig Material 2. D. P. Bertsekas, J. N. Tsitsiklis. Itroductio to Probability.

Basic Ideas Defiitio: A experimet is a process that results i a outcome that caot be predicted i advace with certaity Examples: Rollig a die Tossig a coi Weighig the cotets of a box of cereal Defiitio: The set of all possible outcomes of a experimet is called the sample space for the experimet Examples: For rollig a fair die, the sample space is {1, 2, 3, 4, 5, 6} For a coi toss, the sample space is {heads, tails} For weighig a cereal box, the sample space is (0, ), a more reasoable sample space is (12, 20) for a 16 oz. box (with a ifiite umber of outcomes) Statistics-Berli Che 2

More Termiology Defiitio: A subset of a sample space is called a evet The empty set Ø is a evet The etire sample space is also a evet A give evet is said to have occurred if the outcome of the experimet is oe of the outcomes i the evet. For example, if a die comes up 2, the evets {2, 4, 6} ad {1, 2, 3} have both occurred, alog with every other evet that cotais the outcome 2 Statistics-Berli Che 3

Combiig Evets The uio of two evets A ad B, deoted A B, is the set of outcomes that belog either to A, to B, or to both I words, A B meas A or B. So the evet A or B occurs wheever either A or B (or both) occurs Example: Let A = {1, 2, 3} ad B = {2, 3, 4} The A B = {1, 2, 3, 4} Statistics-Berli Che 4

Itersectios The itersectio of two evets A ad B, deoted by A B, is the set of outcomes that belog to A ad to B I words, A B meas A ad B. Thus the evet A ad B occurs wheever both A ad B occur Example: Let A = {1, 2, 3} ad B = {2, 3, 4} The A B = {2, 3} Statistics-Berli Che 5

Complemets The complemet of a evet A, deoted A c, is the set of outcomes that do ot belog to A I words, A c meas ot A. Thus the evet ot A occurs wheever A does ot occur Example: Cosider rollig a fair sided die. Let A be the evet: rollig a six = {6}. The A c = ot rollig a six = {1, 2, 3, 4, 5} Statistics-Berli Che 6

Mutually Exclusive Evets Defiitio: The evets A ad B are said to be mutually exclusive if they have o outcomes i commo A, A,..., A More geerally, a collectio of evets is said to 1 2 be mutually exclusive if o two of them have ay outcomes i commo Sometimes mutually exclusive evets are referred to as disjoit evets Statistics-Berli Che 7

Example Whe you flip a coi, you caot have the coi come up heads ad tails The followig Ve diagram illustrates mutually exclusive evets Statistics-Berli Che 8

Probabilities Defiitio: Each evet i the sample space has a probability of occurrig. Ituitively, the probability is a quatitative measure of how likely the evet is to occur Give ay experimet ad ay evet A: The expressio P(A) deotes the probability that the evet A occurs P(A) is the proportio of times that the evet A would occur i the log ru, if the experimet were to be repeated over ad over agai Statistics-Berli Che 9

Axioms of Probability 1. Let S be a sample space. The P(S) = 1 2. For ay evet A, 0 PA ( ) 1 3. If A ad B are mutually exclusive evets, the P( A B) = P( A) + P( B) A, A,... More geerally, if are mutually exclusive 1 2 evets, the PA ( A...) = PA ( ) + PA ( ) +... 1 2 1 2 Statistics-Berli Che 10

A Few Useful Thigs For ay evet A, P(A c ) = 1 P(A) Let Ø deote the empty set. The P(Ø) = 0 If A is a evet, ad A = { } (ad are mutually exclusive), the P(A) = P(E1) + P(E2) +.+ P(E ). E1, E2,..., E E 1, E2,..., E Additio Rule (for whe A ad B are ot mutually exclusive): PA ( B) = PA ( ) + PB ( ) PA ( B) Statistics-Berli Che 11

Coditioal Probability ad Idepedece Defiitio: A probability that is based o a part of the sample space is called a coditioal probability E.g., calculate the probability of a evet give that the outcomes from a certai part of the sample space occur Let A ad B be evets with P(B) 0. The coditioal probability of A give B is PAB ( ) = PA ( B) PB ( ) Ve diagram Statistics-Berli Che 12

More Defiitios Defiitio: Two evets A ad B are idepedet if the probability of each evet remais the same whether or ot the other occurs If P(B) 0 ad P(B) 0, the A ad B are idepedet if P(B A) = P(B) or, equivaletly, P(A B) = P(A) If either P(A) = 0 or P(B) = 0, the A ad B are idepedet Are A ad B idepedet (?) Statistics-Berli Che 13

The Multiplicatio (Chai) Rule If A ad B are two evets ad P(B) 0, the P(A B) = P(B)P(A B) If A ad B are two evets ad P(A) 0, the P(A B) = P(A)P(B A) If P(A) 0, ad P(B) 0, the both of the above hold If A ad B are two idepedet evets, the P(A B) = P(A)P(B) This result ca be exteded to more tha two evets Statistics-Berli Che 14

Law of Total Probability If A 1,, A are mutually exclusive ad exhaustive evets, ad B is ay evet, the P(B) = P(A 1 B) + + P(A B) Exhaustive evets: The uio of the evets cover the sample space S= A 1 A 2 A Or equivaletly, if P(A i ) 0 for each A i, P(B) = P(B A 1 )P(A 1 )+ + P(B A )P(A ) Statistics-Berli Che 15

Example Customers who purchase a certai make of car ca order a egie i ay of three sizes. Of all the cars sold, 45% have the smallest egie, 35% have a medium-sized egie, ad 20% have the largest. Of cars with smallest egies, 10% fail a emissios test withi two years of purchase, while 12% of those with the medium size ad 15% of those with the largest egie fail. What is the probability that a radomly chose car will fail a emissios test withi two years? Statistics-Berli Che 16

Solutio Let B deote the evet that a car fails a emissios test withi two years. Let A 1 deote the evet that a car has a small egie, A 2 the evet that a car has a medium size egie, ad A 3 the evet that a car has a large egie. The P(A 1 ) = 0.45, P(A 2 ) = 0.35, ad P(A 3 ) = 0.20. Also, P(B A 1 ) = 0.10, P(B A 2 ) = 0.12, ad P(B A 3 ) = 0.15. By the law of total probability, P(B) = P(B A 1 ) P(A 1 ) + P(B A 2 )P(A 2 ) + P(B A 3 ) P(A 3 ) = 0.10(0.45) + 0.12(0.35) + 0.15(0.20) = 0.117 Statistics-Berli Che 17

Statistics-Berli Che 18 Bayes Rule Let A 1,, A be mutually exclusive ad exhaustive evets, with P(A i ) 0 for each A i. Let B be ay evet with P(B) 0. The = = = i i i k k k k A P B A P A P B A P B P B A P B A P 1 ) ( ) ( ) ( ) ( ) ( ) ( ) (

Example The proportio of people i a give commuity who have a certai disease (D) is 0.005. A test is available to diagose the disease. If a perso has the disease, the probability that the test will produce a positive sigal (+) is 0.99. If a perso does ot have the disease, the probability that the test will produce a positive sigal is 0.01. If a perso tests positive, what is the probability that the perso actually has the disease? Statistics-Berli Che 19

Solutio Let D represet the evet that a perso actually has the disease Let + represet the evet that the test gives a positive sigal We wish to fid P(D +) We kow P(D) = 0.005, P(+ D) = 0.99, ad P(+ D C ) = 0.01 Usig Bayes rule P( D + ) = P( + P( + D) P( D) D) P( D) + P( + D C ) P( D C ) = 0.99(0.005) 0.99(0.005) + 0.01(0.995) = 0.332. Statistics-Berli Che 20

Radom Variables Defiitio: A radom variable assigs a umerical value to each outcome i a sample space We ca say a radom variable is a real-valued fuctio of the experimetal outcome Defiitio: A radom variable is discrete if its possible values form a discrete set Statistics-Berli Che 21

Example The umber of flaws i a 1-ich legth of copper wire maufactured by a certai process varies from wire to wire. Overall, 48% of the wires produced have o flaws, 39% have oe flaw, 12% have two flaws, ad 1% have three flaws. Let be the umber of flaws i a radomly selected piece of wire The, P( = 0) = 0.48, P( = 1) = 0.39, P( = 2) = 0.12, ad P( = 3) = 0.01 The list of possible values 0, 1, 2, ad 3, alog with the probabilities of each, provide a complete descriptio of the populatio from which was draw Statistics-Berli Che 22

Probability Mass Fuctio The descriptio of the possible values of ad the probabilities of each has a ame: The probability mass fuctio Defiitio: The probability mass fuctio (deoted as pmf) of a discrete radom variable is the fuctio p(x) = P( = x). The probability mass fuctio is sometimes called the probability distributio Statistics-Berli Che 23

Cumulative Distributio Fuctio The probability mass fuctio specifies the probability that a radom variable is equal to a give value A fuctio called the cumulative distributio fuctio (cdf) specifies the probability that a radom variable is less tha or equal to a give value The cumulative distributio fuctio of the radom variable is the fuctio F(x) = P( x) Statistics-Berli Che 24

Example Recall the example of the umber of flaws i a radomly chose piece of wire. The followig is the pdf: P( = 0) = 0.48, P( = 1) = 0.39, P( = 2) = 0.12, ad P( = 3) = 0.01 For ay value x, we compute F(x) by summig the probabilities of all the possible values of x that are less tha or equal to x F(0) = P( 0) = 0.48 F(1) = P( 1) = 0.48 + 0.39 = 0.87 F(2) = P( 2) = 0.48 + 0.39 + 0.12 = 0.99 F(3) = P( 3) = 0.48 + 0.39 + 0.12 + 0.01 = 1 Statistics-Berli Che 25

More o Discrete Radom Variables Let be a discrete radom variable. The The probability mass fuctio (cmf) of is the fuctio p(x) = P( = x) The cumulative distributio fuctio (cdf) of is the fuctio F(x) = P( x) F( x) = p( t) = P( = t) t x px ( ) = P ( = x) = 1, where the sum is over all the possible x values of x t x Statistics-Berli Che 26

Mea ad Variace for Discrete Radom Variables The mea (or expected value) of is give by μ = xp( = x) x where the sum is over all possible values of, also deoted as E[ ] The variace of is give by 2 2 σ = ( x μ ) P( = x) x = xp ( = x) μ. x 2 2 [( ) 2 ] μ, also deotedas E [ ] 2 E[ ] ( ) 2, also deoted as E The stadard deviatio is the square root of the variace Statistics-Berli Che 27

The Probability Histogram Whe the possible values of a discrete radom variable are evely spaced, the probability mass fuctio ca be represeted by a histogram, with rectagles cetered at the possible values of the radom variable The area of the rectagle cetered at a value x is equal to P( = x) Such a histogram is called a probability histogram, because the areas represet probabilities Statistics-Berli Che 28

Example The followig is a probability histogram for the example with umber of flaws i a radomly chose piece of wire P( = 0) = 0.48, P( = 1) = 0.39, P( = 2) = 0.12, ad P( = 3) = 0.01 Figure 2.8 Statistics-Berli Che 29

Cotiuous Radom Variables A radom variable is cotiuous if its probabilities are give by areas uder a curve The curve is called a probability desity fuctio (pdf) for the radom variable. Sometimes the pdf is called the probability distributio Let be a cotiuous radom variable with probability desity fuctio f(x). The f( x) dx= 1. Statistics-Berli Che 30

Computig Probabilities Let be a cotiuous radom variable with probability desity fuctio f(x). Let a ad b be ay two umbers, with a < b. The b Pa ( b) = Pa ( < b) = Pa ( < b) = f ( xdx ). I additio, P ( a) = P ( < a) = f( xdx ) P ( a) = P ( > a) = f( xdx ). a a a Statistics-Berli Che 31

More o Cotiuous Radom Variables Let be a cotiuous radom variable with probability desity fuctio f(x). The cumulative distributio fuctio (cdf) of is the fuctio x F( x) = P( x) = f( t) dt. The mea of is give by μ = xf ( xdx )., also deoted as E[ ] The variace of is give by σ 2 2 = x μ ( ) f( x) dx = x f( x) dx μ. 2 2 [( ) 2 ] μ 2 ( ) 2 E, also deotedas E E [ ] [ ], also deoted as Statistics-Berli Che 32

Media ad Percetiles Let be a cotiuous radom variable with probability mass fuctio f(x) ad cumulative distributio fuctio F(x) The media of is the poit x m that solves the equatio F( xm) = P( xm) = f( x) dx= 0.5. If p is ay umber betwee 0 ad 100, the pth percetile is the poit x p that solves the equatio F( xp) = P( xp) = f( x) dx= p/100. x m x p The media is the 50 th percetile Statistics-Berli Che 33

Liear Fuctios of Radom Variables If is a radom variable, ad a ad b are costats, the μ σ σ + = aμ + b a b 2 2 2 a + b = a σ a + b = a σ Statistics-Berli Che 34

More Liear Fuctios If ad Y are radom variables, ad a ad b are costats, the μ μ μ μ μ. a + by = a + by = a + b Y More geerally, if 1,, are radom variables ad c 1,, c are costats, the the mea of the liear combiatio c 1 1,, c is give by μc + c +... + c = c1μ + c2 μ +... + cμ. 1 1 2 2 1 2 Statistics-Berli Che 35

Two Idepedet Radom Variables If ad Y are idepedet radom variables, ad S ad T are sets of umbers, the P( S ad Y T) = P( S) P( Y T). More geerally, if 1,, are idepedet radom variables, ad S 1,, S are sets, the P( S, S,..., S ) = P( S ) P( S )... P( S ). 1 1 2 2 1 1 2 2 Statistics-Berli Che 36

Variace Properties If 1,, are idepedet radom variables, the the variace of the sum 1 + + is give by 2 2 2 2 σ + +... + = σ + σ +... + σ. 1 2 1 2 If 1,, are idepedet radom variables ad c 1,, c are costats, the the variace of the liear combiatio c 1 1 + + c is give by 2 2 2 2 2 2 2 σ c + c +... + c = c1 σ + c2 σ +... + cσ. 1 1 2 2 1 2 Statistics-Berli Che 37

More Variace Properties If ad Y are idepedet radom variables with 2 2 variaces σ ad σ Y, the the variace of the sum + Y is σ σ σ 2 2 2 + Y= + Y. The variace of the differece Y is σ σ σ 2 2 2 Y = + Y. Statistics-Berli Che 38

Idepedece ad Simple Radom Samples Defiitio: If 1,, is a simple radom sample, the 1,, may be treated as idepedet radom variables, all from the same populatio Phrased aother way, 1,, are idepedet, ad idetically distributed (i.i.d.) Statistics-Berli Che 39

Properties of (1/4) If 1,, is a simple radom sample from a populatio with mea μ ad variace σ 2, the the sample mea is a radom variable with mea of sample mea variace of sample mea μ σ = 2 = μ 2 σ. = 1 + 2 + L+ The stadard deviatio of is σ = σ. Statistics-Berli Che 40

Properties of (2/4) Populatio parameters ( 2 μ,σ ) 1 1 3 37 40 35... 39 simple radom sample of size sample mea = x 1 ( = 37.8) 41 38 42... 38.5 sample mea simple radom sample of size = x 2 ( = 40.2) 37.5 38 42... 40.2 sample mea simple radom sample of size = x 3 ( = 38.6) 1, 2, K, are i.i.d ad follow the same distributio sample mea cae be view x, x2, K, x,k 1 = ( 1 + 2 + L + ) as a radom variable with values 1 k ca be represeted as Statistics-Berli Che 41

Statistics-Berli Che 42 Properties of (3/4) [ ] ( ) μ μ μ μ μ μ μ μ μ = + + + = + + + = = = + + + 1 1 1 1 1 1 2 1 2 1 1 L L L E,, 2, 1 K are i.i.d ad follow the same distributio with mea μ ( ) ( ) 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 1 1 1 1 1 1 2 1 σ σ σ σ σ σ σ σ μ σ = + + + = + + + = = = + + + L L L E,, 2, 1 K are idetically distributed (follow the same distributio with variace ),, 2, 1 K 2 σ are idepedet

Properties of (4/4) μ mea of sample mea (equal to populatio mea ) μ sample mea = xi sample mea = x j The spread of sample mea is determied by the variace of sample 2 mea ( equal to 2 σ where is the populatio variace) σ 2 σ Statistics-Berli Che 43

Joitly Distributed Radom Variables If ad Y are joitly discrete radom variables: The joit probability mass fuctio of ad Y is the fuctio p( xy, ) = P ( = xad Y= y) The margial probability mass fuctios of ad Y ca be obtaied from the joit probability mass fuctio as follows: p ( x) = P ( = x) = pxy (, ) p( y) = PY ( = y) = pxy (, ) y where the sums are take over all the possible values of Y ad of, respectively The joit probability mass fuctio has the property that pxy (, ) = 1 x y where the sum is take over all the possible values of ad Y Y x Statistics-Berli Che 44

Joitly Cotiuous Radom Variables If ad Y are joitly cotiuous radom variables, with joit probability desity fuctio f(x,y), ad a < b, c < d, the P ( a b d b ad c Y d ) = f ( x, y ) dydx. a c The joit probability desity fuctio has the property that f( x, y) dydx= 1. Statistics-Berli Che 45

Margials of ad Y If ad Y are joitly cotiuous with joit probability desity fuctio f(x,y), the the margial probability desity fuctios of ad Y are give, respectively, by f ( x) = f( x, y) dy f ( y) = f( x, y) dx. Y Such a process is called margializatio Statistics-Berli Che 46

More Tha Two Radom Variables If the radom variables 1,, are joitly discrete, the joit probability mass fuctio is px (,..., x) = P ( = x,..., = x). 1 1 1 If the radom variables 1,, are joitly cotiuous, they have a joit probability desity fuctio f(x 1, x 2,, x ), where P a b a b f x x dx dx 1 ( 1 1 1,..., ) = L ( 1,..., ) 1.... a a1 for ay costats a 1 b 1,, a b b b Statistics-Berli Che 47

Meas of Fuctios of Radom Variables (1/2) If the radom variables 1,, are joitly discrete, the joit probability mass fuctio is px (,..., x) = P ( = x,..., = x). 1 1 1 If the radom variables 1,, are joitly cotiuous, they have a joit probability desity fuctio f(x 1, x 2,, x ), where P a b a b f x x dx dx 1 ( 1 1 1,..., ) = L ( 1,..., ) 1.... a a1 for ay costats a 1 b 1,, a b. b b Statistics-Berli Che 48

Meas of Fuctios of Radom Variables (2/2) Let be a radom variable, ad let h() be a fuctio of. The: If is a discrete with probability mass fuctio p(x), the mea of h() is give by μ h( x) = hxpx ( ) ( ). x, also deoted as E h where the sum is take over all the possible values of [ ( )] If is cotiuous with probability desity fuctio f(x), the mea of h(x) is give by μ h ( x ) = hx ( ) f( xdx )., also deoted as E h [ ( )] Statistics-Berli Che 49

Fuctios of Joit Radom Variables If ad Y are joitly distributed radom variables, ad h(,y) is a fuctio of ad Y, the If ad Y are joitly discrete with joit probability mass fuctio p(x,y), μ = h xy pxy h Y (, ) (, ) (, ). x y where the sum is take over all possible values of ad Y If ad Y are joitly cotiuous with joit probability mass fuctio f(x,y), μ h (, Y ) h( x, y) f ( x, y) dxdy. = Statistics-Berli Che 50

Discrete Coditioal Distributios Let ad Y be joitly discrete radom variables, with joit probability desity fuctio p(x,y), let p (x) deote the margial probability mass fuctio of ad let x be ay umber for which p (x) > 0. The coditioal probability mass fuctio of Y give = x is p Y pxy (, ) ( y x) =. px ( ) Note that for ay particular values of x ad y, the value of p Y (y x) is just the coditioal probability P(Y=y =x) Statistics-Berli Che 51

Cotiuous Coditioal Distributios Let ad Y be joitly cotiuous radom variables, with joit probability desity fuctio f(x,y). Let f (x) deote the margial desity fuctio of ad let x be ay umber for which f (x) > 0. The coditioal distributio fuctio of Y give = x is f Y f ( xy, ) ( y x) =. f ( x) Statistics-Berli Che 52

Coditioal Expectatio Expectatio is aother term for mea A coditioal expectatio is a expectatio, or mea, calculated usig the coditioal probability mass fuctio or coditioal probability desity fuctio The coditioal expectatio of Y give = x is deoted by E(Y = x) or μ Y Statistics-Berli Che 53

Idepedece (1/2) Radom variables 1,, are idepedet, provided that: If 1,, are joitly discrete, the joit probability mass fuctio is equal to the product of the margials: px (,..., x) = p ( x)... p ( x). 1 1 1 If 1,, are joitly cotiuous, the joit probability desity fuctio is equal to the product of the margials: f ( x,..., x ) = f( x )... f( x ). 1 1 Statistics-Berli Che 54

Idepedece (2/2) If ad Y are idepedet radom variables, the: If ad Y are joitly discrete, ad x is a value for which p (x) > 0, the p Y (y x)= p Y (y) If ad Y are joitly cotiuous, ad x is a value for which f (x) > 0, the f Y (y x)= f Y (y) Statistics-Berli Che 55

Covariace Let ad Y be radom variables with meas μ ad μ Y The covariace of ad Y is Cov(, ). Y = μ( μ )( Y μ ) Y A alterative formula is Cov( Y, ) = μ μ μ. Y Y Statistics-Berli Che 56

Correlatio Let ad Y be joitly distributed radom variables with stadard deviatios σ ad σ Y The correlatio betwee ad Y is deoted ρ,y ad is give by ρ Y, Cov(, Y ). σ σ = Or, called correlatio coefficiet Y For ay two radom variables ad Y -1 ρ,y 1. Statistics-Berli Che 57

Covariace, Correlatio, ad Idepedece If Cov(,Y) = ρ,y = 0, the ad Y are said to be ucorrelated If ad Y are idepedet, the ad Y are ucorrelated It is mathematically possible for ad Y to be ucorrelated without beig idepedet. This rarely occurs i practice Statistics-Berli Che 58

Example The pair of radom variables (, Y ) takes the values (1, 0), (0, 1), ( 1, 0), ad (0, 1), each with probability ¼ Thus, the margial pmfs of ad Y are symmetric aroud 0, ad E[] = E[Y ] = 0 Furthermore, for all possible value pairs (x, y), either x or y is equal to 0, which implies that Y = 0 ad E[Y ] = 0. Therefore, cov(, Y ) = E[( E[] )(Y E[Y ])] = 0, ad ad Y are ucorrelated However, ad Y are ot idepedet sice, for example, a ozero value of fixes the value of Y to zero Statistics-Berli Che 59

Variace of a Liear Combiatio of Radom Variables (1/2) If 1,, are radom variables ad c 1,, c are costats, the μ μ μ c +... + c = c1 +... + c 1 1 1 1 2 2 2 2 2 c 1 1+... + c = c 1 + + c 1 + cc i j i j i= 1 j= i+ 1 σ σ... σ 2 Cov(, ). For the case of two radom variables σ 2 2 2 + Y = σ + σy + 2 Cov, ( Y ) Statistics-Berli Che 60

Variace of a Liear Combiatio of Radom Variables (2/2) If 1,, are idepedet radom variables ad c 1,, c are costats, the 2 2 2 2 2 σ c +... + c = c1 σ +... + cσ. 1 1 1 I particular, 2 2 2 σ +... + = σ +... + σ. 1 1 Statistics-Berli Che 61

Summary (1/2) Probability ad rules Coutig techiques Coditioal probability Idepedece Radom variables: discrete ad cotiuous Probability mass fuctios Statistics-Berli Che 62

Summary (2/2) Probability desity fuctios Cumulative distributio fuctios Meas ad variaces for radom variables Liear fuctios of radom variables Mea ad variace of a sample mea Joitly distributed radom variables Statistics-Berli Che 63