MATH/STATS 425 : Introduction to Probability. Boaz Slomka

Size: px

Start display at page:

Download "MATH/STATS 425 : Introduction to Probability. Boaz Slomka"

Cameron Mason
5 years ago
Views:

1 MATH/STATS 425 : Introduction to Probability Boaz Slomka These notes are not proofread, and may contain typos and errors. Last update: April 10, 2018.

3 LECTURE 1 Counting (1.2 Basic multiplication principle: if there are m possible outcomes in experiment 1 and n possible outcomes in experiment 2, then there is a total of m n possible outcomes (works for more than two. Example 1.1. How many words with 5 letters are there? million. Actually probably only about 10, , 000 real words. In general: In a k-letter alphabet, there are k n n-letter words. Example 1.2. We roll three different dice (6 sided, how many possible outcomes are there? 6 3 = 216 Example 1.3. One die, roll three times (order of the results is not recorded, so e.g., and are considered the same outcome, how many possible outcomes now? a bit more difficult... (Answer: 56.ddf Permutations (1.3 Example 1.4. In how many ways can one arrange 8 people in a line? = 8! = Each arrangements is also called a permutation. In general: number of permutations (ways to order n different objects is n! Example 1.5. How many ways to arrange 4 couples in a line (each couple standing next to each other? There are 4! = 24 possible ways to order the different couples (as if each couple is one object, and = 16 to order each pair between themselves. Total: = 384. General permutations Next, we would like to count arrangements of objects, when some of the objects are indistinguishable: Example 1.6. How many arrangements of 5 identical red balloons, 3 identical blue balloons, and 2 identical green balloons are there? 3

4 Suppose that all the balloons are distinguishable. For example, by imagining that the red balloons are labeled R 1,... R 5, the blue balloons are labeled B 1,..., B 3, and the green balloons are labeled G 1, G 2. Then, there are 10! arrangements of the 10 balloons. Now, consider any specific arrangement, e.g. RRRRRGGBBB. With our imaginary labels, among the 10! permutations we have actually counted many arrangements that correspond to the same initial arrangement of RRRRRGGBBB, including R 1 R 2 R 3 R 4 R 5 G 1 G 2 B 1 B 2 B 3 or R 2 R 1 R 4 R 5 R 3 G 2 G 1 B 3 B 2 B 1. The exact number of arrangements corresponding to RRRRRGGBBB (or to any other initial arrangement of the balloons is 5! 3! 2! because there are 5! permutations of the red balloons among themselves, 3! permutations of the blue balloons among themselves, and 2! permutations of the green balloons among themselves. Therefore, we divide by that number to get a total of possible arrangements. 10! 5!3!2! = 2520 In general: the number of possible arrangements of n objects in a line - n 1 of type 1, n 2 of type 2,..., n r of type r (where objects of each type are identical, and n 1 + n n r = n is n! n 1!n 2!... n r!. 4

5 LECTURE 2 Binomial Coefficients (1.4 Example 2.1. How many combinations of 6 different courses out of 10 can one choose (order is not important? Solution. There are possibilities to choose 6 courses with order. Since each selection is counted 6! times (the number of permutations of 6 elements we divide by 6! to get: ! = 10! 6!4! = 210. Note: this is the same as choosing what 4 courses not to take out of 10. In general: the number of k-element subsets of an n-set (or ways to pick k different objects out of n, where order is not important is ( n k = n! - n choose k. Note that: ( ( n k!(n k! k = n n k. We also define: ( n = 0 ( n = 1 (0! = 1, n ( n = 0 if r < 0 or r > n r Example 2.2. The Detroit Pistons has 19 players. How many teams (5 players can they possibly form? ( Exercise 2.3. In how many ways can one arrange 10 balloons and 5 ribbons in a line? Solution. There are = 15 decorations, and we choose 5 spots for the ribbons. In total, there are ( = 3003 ways. Exercise 2.4. We have 10 balloons and 5 ribbons. In how many ways can one arrange them in a line so that no two ribbons are next to each other? Solution. We put spaces between the balloons, so that each ribbon can take one space: 5

6 There are 11 spaces and 5 ribbons, hence total of ( 11 5 = 462 ways. Example 2.5. (see 1.6, Proposition 6.2 How many nonnegative integer solutions a 1,..., a r 0 are there for the equation a a r = n? Solution. Same idea as the previous example: count orderings of n units of 1, and r 1 imaginary separators (separating r sets of 1 s. For example, n = 6, r = 5 : corresponds to Therefore, there are ( ( n+r 1 r 1 arrangements corresponding to n+r 1 r 1 solutions. Theorem 2.6. (The Binomial theorem (x + y n = n k=0 ( n x k y n k k Proof. A purely algebraic proof goes by induction. We will present a combinatorial proof: Write (x + y n = (x + y (x + y and expand. We now have a long sum (actually, 2 n terms, where each term in the sum is a product of n factors (consisting of x s and y s. For example: (x + y 2 = (x + y (x + y = xx + xy + yx + yy = x 2 + 2xy + y 2 (x + y 3 = (x + y (x + y (x + y = xxx + xxy + + yyy = x 3 + 3x 2 y + 3xy 2 + y 3. To obtain the final shorter sum, we combine like terms, i.e., with the same number of x factors and the same number of y factors (such as xy and yx to 2xy - for n = 2, or xxy, xyx, and yxx to 3x 2 y - for n = 3. How many like terms equivalent to x k y n k? One visual way to count it is by assigning n different colors to the different pairs of parentheses (x + y: (x + y(x + y(x + y. Then, when expanded, the like terms equivalent to x k y n k are formed by multiplying k x s of different colors, and n k y s of the remaining colors. The number of such terms is exactly the number of subsets of k colors (to choose for the x factors out of n possible colors. This number is ( n k, and therefore, the term x k y n k has the coefficient ( n k in front of it. Corollary 2.7. One has Proof. Two proofs: 2 n = n k=0 ( n k 6

7 1. Follows from the binomial theorem, by setting x = y = Another interpretation: the RHS and the LHS both count the total number of subsets of a set with n elements, only in two different ways. Indeed, LHS = Each element of a subset has 2 possibilities - to be or not to be (in the subset, hence = 2 n possibilities. RHS = This time we sum (over k the number of k-subsets an n-set, which is ( n k. Multinomial coefficients (1.5 Example 2.8. You own 12 cars, of which 5 are identical Ferraris, 3 are identical Bentleys, 2 are identical Lamborghinis and 2 are identical Ford focus. In how many ways can you arrange your cars for an exhibition (in one line? We already saw that the number of ways to order n items of r types, where n 1, n 2,..., n r are ( n! n the number of items of each type is n 1!n 2!...n r!. We use the notation n 1,n 2,...,n r for this expression (also known as the multinomial coefficient. In this example, there are ( 12 5,3,2,2 = 12! possible 5!3!2!2! arrangements. Another solution for the above problem: There are ( 12 5 possibilities to place the 5 Ferraris. There are: ( 7 3 possibilities to then place the 3 Bentleys. There are ( 4 2 possibilities to then place the 2 Lamborghinis. There are ( 2 2 possibilities to finally place the 2 Fords. By the multiplication principle, there are ( ( ( 2( 2 possible arrangements. It is not hard to check that, indeed, ( 12 5 ( 7 3 Theorem 2.9. (The multinomial theorem (x 1 + x x r n = ( ( k 1,..., k r k k r = n = 12! 5!3!2!2!. ( n k 1,..., k r x k 1 1 x k x kr r The sum is over all r-tuples of non-negative integers k 1,..., k r, such that k k r = n. Proof. Similar to the binomial theorem. 7

9 LECTURE 3 Sample space and events (2.2 How to define probability formally? not so simple, only done in the 1930s. We consider experiments with unpredictable outcomes: Example 3.1. (1 Flipping two coins (2 Rolling a die (3 The temperature tomorrow at noon (4 The result of the Super Bowl The set of all possible outcomes of an experiment forms its sample space: Definition. Sample space S = {all possible outcomes}. Example 3.2. (1 Flipping two coins (order is important: S = {HH, HT, T H, T T } (2 Rolling a die: S = {1, 2,..., 6} (3 The temperature tomorrow at noon: S = [60, 90] F. We are interested to understand the likelihood that certain events will occur. Question: what is an event? Definition. Any subset E of the sample space S is called an event. Example 3.3. (1 Flipping T exactly one time (E = {HT, T H} (2 Rolling an odd number (E = {1, 3, 5} (3 Temperature at noon tomorrow is at least 75 F (E = [75, 90] F Remark 3.4. Note that each experiment has exactly one outcome, but many different events can occur simultaneously. For example, if the outcome of one die roll is 5, then events {at least 4} and {an odd number} both occur. 9

10 Operations on events (and relations between events Let E, F be events in a sample space S. Then: Operation\Relation Meaning Example E F or E F E = F (same: E F & F E E F = EF E F If E occurs, then F occurs E = {1} E, F both occur, or both don t occur all outcomes that are both in E and in F, E F occurs if both E and F occur at the same time outcomes that either in E or in F (or both, E F occurs if E or F occur (or both Complement E c all outcomes not in E, E c occurs iff E doesn t Difference E\F = E F c all outcomes in E and not in F. E\F occurs ff E occurs and F doesn t F = {an odd number} E = {1, 3, 5} F = {an odd number} E = {an even number} F = {an odd number} E F = never happens E = {an even number} F = {an odd number} E F = {1, 2,..., 6} = S always happens E = {an even number} F = {an odd number} E = F c, E c = F E = {1, 3, 4, 6} F = {an even number} E\F = {1, 3} Venn diagram Basic Properties. Let E, F, G be events. Then (1 Commutative laws: E F = F E, and E F = F E (2 Associative laws: (E F G = E (F G and E F G = E (F G In particular, this means that E F G (respectively E F G is well defined since the order in which we take the intersection operations does not matter. (3 Distributive laws: (E F G = (E G (F G and (E F G = (E G (F G 10

11 Definition. (Operations on multiple events E 1, E 2,..., E n events (possibly infinitely many Intersection: n i=1 E i = E 1 E 2 E n. Meaning: n i=1 E i occurs if and only if all the events E i occur. Union: n i=1 E i = E 1 E 2 E n. Meaning: n i=1 E i occurs if and only if at least one of the events E i occurs. Theorem 3.5 (De Morgan s laws. E, F and E 1, E 2,... are events (a (E F c = E c F c and (b (E F c = E c F c (a ( n i=1 E i c = n i=1 Ec i and (b ( n i=1 E i c = n i=1 Ec i Proof. We prove only (a (the other cases are similar by drawing the Venn diagrams of the RHS and the LHS and show that we end up with the same subset. LHS: E F E F (E F c RHS: E c F c E c F c Formal proof (optional: x (E F c if and only if x E F. Moreover, x E F if and only if x E or x F. Equivalently, x E c or x F c (or both, which means that x E c F c. Example 3.6. S = {1, 2,..., 6}, E = {an even number}, F = {at least 4} then E c = {an odd number} F c = {at most 3} 11

12 and E F = {even and at least 4} = {4, 6} E F = {even or at least 4} = {2, 4, 5, 6} {1, 2, 3, 5} = (E F c = E c F c = {an odd number or at most 3} {1, 3} = (E F c = E c F c = {an odd number and at most 3} 12

13 LECTURE 4 Probability Spaces (2.3 Recall: S - sample space (set of all possible outcomes. Definition. Events E 1, E 2,..., E n in S are called mutually exclusive or disjoint if E j E k = for j k. The same definition works for infinitely many events. Definition. A probability function on S is an assignment of a number P (E to each event E, satisfying the following axioms: (A1 0 P (E 1 (A2 P (S = 1 (A3 If E 1, E 2,... are mutually exclusive events, then ( P E j = P (E j. j=1 j=1 Example 4.1. Flip a fair coin twice (order important.in this case, the sample space is S = {HH, HT, T H, T T }. What is the probability of getting tails once, that is P ({HT, T H}? Common mistake: we have three options; 2 tails, 1 tails or 0 tails, so probability is 1. 3 Fair coin means that all outcomes are equally likely, that is P ({HH} = P ({T H} = P ({HT } = P ({T T }. Since the events {HH}, {HT }, {T H}, {T T } are disjoint, axioms (A2 and (A3 imply that: 1 = P (S = P ({HH} + P ({HT } + P ({T H} + P ({T T } = 4 P ({HH}, and therefore P ({HH} = P ({HT } = P ({T H} = P ({T T } = 1 4. By axiom (A3: P ({HT, T H} = P ({HT } + P ({T H} = 2 =

14 Example 4.2. We roll two fair dice. What is P (sum of two dice is at least 11? S = {(1, 1, (1, 2, (2, 1,..., (5, 6, (6, 5, (6, 6} or S = {1, 2,..., 6} 2 = {1,... 6} {1,..., 6} Since the dice are fair, for any i, j {1,..., 6}, P ((i, j = We want to find P (sum of two dice is at least 11 = P ({(5, 6, (6, 5, (6, 6}. We have P (E = P ((5, 6 + P ((6, 5 + P ((6, 6 = = 1 12 Proposition 4.3. P (E c = 1 P (E Properties of probability (2.4 Proof. We have S = E E c. Since E and E c are disjoint, it follows by Axioms (A2 and (A3 that 1 = P (E + P (E c. Example 4.4. Roll two dice. What is P (sum of two dice is at most 10? Solution. P (roll at most 10 = 1 P (roll at least 11 = 1 1 = Proposition 4.5. E F = P (E P (F Proof. F = E (F \E (mutually exclusive, so P (F = P (E + P (F \E P (E (Axioms A1 and A3. Proposition 4.6. (Inclusion-exclusion principle - basic case P (E F = P (E + P (F P (E F Proof. First, note that the events E\F, F \E, and E F are mutually exclusive, and hence (convince yourselves, using a Venn diagram: (1 P (E = P (E \ F + P (EF, (2 P (F = P (F \ E + P (EF, and (3 P (E F = P (E\F + P (F \E + P (EF. 14

15 Therefore, we have that P (E + P (F P (EF = (P (E \ F + P (EF }{{} (1 = P (E\F + P (F \E + P (EF }{{} (3 = P (E F, + (P (F \ E + P (EF P (EF }{{} (2 as claimed. Proposition 4.7. (Inclusion-exclusion principle for 3 events For any events E 1, E 2, E 3, P (E 1 E 2 E 3 = P (E 1 + P (E 2 + P (E 3 P (E 1 E 2 P (E 1 E 3 P (E 2 E 3 + P (E 1 E 2 E 3 Proof. Similar to the basic case. The general inclusion-exclusion principle: n P (E 1 E n = P (E i P (E i1 E i2 + P (E i1 E i2 E i3 i=1 i 1 <i 2 i 1 <i 2 <i ( 1 n+1 P (E 1 E n, where means that we sum over all possible k different, and ordered, indices. i 1 <i 2 < <i k Example 4.8. Suppose Alice s probabilities of being accepted at UM and MSU are 0.35, and 0.9, respectively, and the probability of being accepted at both is What is the probability that she is accepted at neither? Solution. Let A, B be the events that Alice is accepted at UM, and MSU, respectively. Then P ((A B c = 1 P (A B = 1 (P (A + P (B P (AB = 1 ( = Exercise 4.9. There are three people with three different hats. If each person takes off a random hat from the coatroom, what is the probability that at least one person get their own hat? Solution. Let E i denote the event that person i gets his own hat. The event where someone gets his hat back is E 1 E 2 E 3. Notice that P (E 1 = P (E 2 = P (E 3 = 1 3 (Each person is equally likely to take 3 possible hats, out of which only one is his, hence 1 3 Also note that if two people take their own hat, then the third person must also take his own hat (only his hat is left. In other words, E 1 E 2 = E 1 E 3 = E 2 E 3 = E 1 E 2 E 3. 15

16 Therefore, P (E 1 E 2 = P (E 1 E 3 = P (E 2 E 3 = P (E 1 E 2 E 3 = 1 6 (6 equally likely combinations, out of which exactly one in which each takes his own hat. By the Inc.-Exc. principle: P (E 1 E 2 E 3 = = 2 3. Challenge : Same question, n people (what happens when n is very large, i.e., n?. Answer: 1 e 1. 16

17 LECTURE 5 Sample spaces with equally likely outcomes (2.5 So far, in all our examples, the outcomes of an experiment were all equally likely. Such sample spaces satisfy the following assumptions: Finite sample space (so the outcomes can be labeled by numbers S = {1, 2,..., N} The elementary outcomes are equally likely: P ({1} = = P ({N} = 1 N Therefore, the probability of any event E = {i 1, i 2,..., i k } is P (E = # of outcomes in E # of outcomes in S = E S. Example 5.1. We roll a die 3 times. What is the probability that it lands on 4 exactly once? Solution. We choose the sample space S of ordered results of three rolls, that is S = {(1, 1, 1, (1, 1, 2, (1, 2, 1, (2, 1, 1, (1, 1, 3,... }. We have that S = 6 3, where each outcome is equally likely. If E is the event that the die lands on 4 exactly once, then E = (3 possibilities to choose in which roll the die lands on 4, and then 5 2 possibilities for the remaining rolls. Thus P (E = Remark 5.2. If we choose another sample space, namely, the set of unordered results, that is, each outcome indicates only the number of times the die has landed on each one of its sides; S = {{1, 1, 1}, {1, 1, 2}, {1, 1, 3},... } In this case, we have S = ( = 56 (see lecture 1, and the solution of homework 1, and E = ( ( = 6 4 = 15 (one roll lands on 4, and either the remaining rolls are the same - which is 5 possibilities, or they are different - which is ( 5 2 = 10, and combined we get 15. However P (E 15, because, in this sample space, the outcomes are not equally likely! For example, 56 P (lands three times on 1 P (lands once on 1, once on 2, and once on 3. Therefore, in this case one cannot use the formula P (E = E S 17

18 Exercise 5.3. (Lottery Suppose that to win the lottery we have to pick 6 correct (and different numbers out of 49 possible choices. What is the probability that we pick 5 correct numbers and one incorrect one? Solution. The sample space S consists of all possible subsets of 6 numbers, from the possible 49 choices. Therefore, S = ( We want to find P (E, where E = {5 correct numbers, 1 incorrect}, and all choices of 6 numbers are equally likely. To find E, we need to choose 5 correct out of 6 possible correct numbers, for which there are ( 6 5 combinations, and then choose 1 incorrect number out of possible 43 = 49 6 incorrect numbers, for which there are ( ( 43 1 combinations. Thus, E = ( 1. Finally, P (E = (6 5( 43 1 ( ( million (small but much more likely than picking the correct 6 which is Example 5.4. If n people are in a room, what is the probability that none of them was born on July 21st? (Assume birthdays are equally likely, and there are 365 days in a year. S = {1,..., 365} {1,..., 365} {1,..., 365} = {1,..., 365} n, S = 365 n E = {not July 21st} n, E = 364 n P (E = ( n Exercise 5.5. In a group of n people, what is the probability that no two of them share a birthday? (Assume 365 days, equally likely. Solution. Clearly, if n 365, then the probability is 0 (more people than birthdays. If n < 365, the problem is similar to the elevator problem: let E denote the event that no two share a birthday. Then, the number of outcomes in E is and E = (365 (n 1 P (E = E S = (366 n 365 n. Remark 5.6. One can check that already for n = 23, P (E becomes smaller than 0.5, which means that in a group of 23 people, it is likely that 2 people will share a birthday. With 50 people, the probability that two share a birthday already exceeds 0.95! Exercise 5.7. A market study produces the following results: 80% of respondents drink coffee or tea or both 60% of respondents drink coffee (the may also drink tea 30% of respondents drink both coffee and tea What is the percentage of respondents drinking tea? 18

19 Solution. Pick a random respondent from {all respondents}, that is, all respondents are equally likely to be picked. Define C = {drinks coffee}, T = {drinks tea} Then C T = {drinks coffee or tea}, C T = {drinks coffee and tea}. According to the given information, we have P (C = C S = 0.6, P (C T = C T S By the inclusion-exclusion principle, we have = 0.8, P (C T = C T S = = P (C T = P (C + P (T P (C T = P (T 0.3 = P (T Therefore P (T = 0.5, or 50%. 19

21 LECTURE 6 More nontrivial examples (2.5 Exercise 6.1. (Quality control A shipment has 20 items. 5 random items are tested for quality; if one or more defective items are detected, the shipment is rejected. If the shipment is known to have k defective items, what is the probability that it is rejected? Solution. The sample space is S = {all possible samples of 5 items chosen from 20 options}. So, S = ( The event in question is E = {all samples of 5 items with at least one defective item}. It is easier to consider the complement event: E c = {all samples consisting of only good items}. To compute E c we count the number of subsets of 5 items, out of possible 20 k good items. That is, E c = ( 20 k 5. Therefore, P (E = 1 P (E c = 1 ( 20 k 5 ( 20 5 For example, if k = 5, then P (E 0.81 while if k = 2, then P (E Exercise 6.2. (Quality control - continued Now consider testing pixels of LCD screens with a million pixels. If a screen with 10 or more defective pixels is called a bad screen, find a sample size n of pixels to be tested, so that we can guarantee at least 90% chance to detect a bad screen (detect = finding one or more defective pixels? Solution. Suppose we have a bad screen with exactly 10 defective pixels (if we guarantee 90% to detect this bad screen, then we will clearly have at least 90% chance to detect bad screens with more than 10 defective pixels. From the solution of the previous problem, the probability of not finding any defective pixel in a sample of n pixels from the screen is P (E c = ( 1,000, n ( 1,000,000 n = 999,990! n! (999,990 n! 1,000,000! n! (1,000,000 n! = 999, , 989 (999, 990 (n 1 1, 000, , 999 (1, 000, 000 (n 1. We need to find n such that P (E c < 0.1 (so that P (E of finding defective pixels in the sample is at least 0.9. Note that ( n P (E c 999, , 989 = 1, 000, , , 990 (n 1 999, 990 1, 000, 000 (n 1 1, 000,

22 and hence it suffices to find n such that ( 999, 990 n 0.1 1, 000, 000 ( 999, 990 n ln ln (0.1 1, 000, 000 n ln (10 ( ln 1,000, , , Therefore, sampling n = 230, 258 guarantees at least 90% chance to find bad screens. Exercise 6.3. (GMAT practice problem A fair coin is tossed 10 times. What is the probability that at least two consecutive heads appear? Solution. We have S = {all results of 10 coin flips, order important} = {H, T } 10, and hence S = Denote the event that two consecutive heads appear by E. Consider We ll see two methods to find E c : Method 1: E c = {no heads appear next to each other}. Put the tails in a row with spaces between them, and count the ways to place heads in the possible spaces (similar to the balloons and ribbons examples # heads # tails # spaces # ways ( 11 0 = ( 10 1 = ( 9 2 = ( 8 3 = ( 7 4 = ( 6 5 = 6 Note that it is not possible to have more than 5 heads, since there are more heads than spaces. 22

23 In total, there are ( ( 6 5 = 144 combinations. Therefore P (E = 1 P (E c = 1 Ec S = Method 2: Denote by A n the number of ways to order n coins such that no two consecutive heads appear. Observe that: If the first coin is T : then there are A n 1 ways to order the remaining tosses. If the first coin is H: then the second one must be T, and there are A n 2 ways to order the remaining tosses. In total: A n = A n 1 + A n 2. Since A 1 = 2 and A 2 = 3 (easy to check, one can iteratively find A 3, A 4,..., A 10 or A n for any n. Remark 6.4. The numbers A 1, A 2,... are also known as Fibonacci numbers. 23

25 LECTURE 7 Conditional probability (3.1, 3.2 Conditional probability is used in situations where partial information concerning the result of an experiment is available. We will later see that even with no partial information, the concept of conditional probability can be very helpful in computing probabilities. Definition. Let E, F be events. Suppose P (F > 0. The conditional probability that E occurs given that F has occurred is P (E F := P (E F P (F Note that, equivalently, P (EF = P (E F P (F. Idea: now F becomes the new sample space, so we normalize by P (F. Example 7.1. The ELISA test (mid 1980s to screen donated blood for the presence of HIV. Among the people who are given this test, we can expect the following distribution Find P (A 1 B 2 and P (B 1 A 1. B 1 : HIV+ B 2 : HIV- Totals A 1 : Test positive A 2 : Test negative Totals Solution. We have P (A 1 B 2 = Meaning: given that a person is HIV-, the probability that he will be tested positive is about Similarly, P (B 1 A 1 = / , which means that only 6.5% of the people who are tested positive, are actually HIV+. Example 7.2. A family has two children. (a Given that at least one child is a boy, what is the probability that both children are boys? (b Given that the older child is a boy, what is the probability that both children are boys? Solution. The sample space is S = {BB, BG, GB, GG}. We assume all outcomes are equally likely. Let E be the event that both children are boys. 25

26 (a Let F 1 = {BB, BG, GB} be the event that at least one child is a boy. Thus, Hence, P (F = {BB, BG, GB} S P (E F 1 = P (E F = P (BB = 1/4 P (F P (F 3/4 = 1/3. (b Let F 2 be the event that the older child is a boy. Then F 2 = {BB, BG}, and hence P (F 2 = 2 4 = 1 2. Therefore P (E F 2 = P (E F 2 P (F 2 = 3 4. = P (BB P (F 2 = 1/4 1/2 = 1 2. Exercise 7.3. Three cards are randomly selected without replacement from an ordinary deck of 52 cards. Compute the conditional probability that the third card selected is a spade, given that the 1st and 2nd cards were spades. Solution. One can easily convince oneself, without the definition of conditional probability, that the probability is (why?. Using the definition of conditional probability: Define F = {1st and 2nd are spades}, and E = {third is spade}. We have that P (E F = P (all three are spades = P (F is the probability to get 2 spades by selecting two cards, which is simply (13 2. The conditional ( 52 2 probability is thus P (E F = ( 13 3 ( 52 3 ( 13 2 ( 52 2 = Exercise 7.4. The numbers 1, 2, 3, 4, 5, 6 were randomly written on the sides of a blank six-sided die. What is the probability that the sum of numbers on each pair of opposite sides of the die is equal to 7? ( 13 3 ( Solution. Define E 16 = {1 and 6 on opposite sides}. Similarly define E 25, E 34. Note that P (E 16 E 25 E 34 = P (E 16 E 25 as once 1 & 6, and 2 & 5 are on opposite sides then also 3 & 4 must be on opposite sides. P (E 16 E 25 E 34 = P (E 16 E 25 = P (E 16 P (E 25 E

27 It is not hard to see that P (E 16 = 1. Indeed, put 1 on any side of the die. The opposite side is 5 equally likely to have any number among {2, 3, 4, 5, 6}. Thus, the probability is 1. Moreover, to 5 compute P (E 25 E 16, note that 2 can be anywhere, and then the opposite side is equally likely to have any number among {3, 4, 5}, and hence P (E 25 E 16 = 1 3. Therefore, P (E 16 E 25 E 34 = P (E 16 E 25 = P (E 16 P (E 25 E 16 = = A generalization of P (EF = P (E P (F E is the following Multiplication rule: Proposition 7.5. We have: P (E 1 E 2 E n = P (E 1 P (E 2 E 1 P (E n E 1... E n 1 Proof. We simply write the definition of conditional probability to get P (E 1 P (E 2 E 1 P (E n E 1... E n 1 = P (E 1 P (E 1 E 2 P (E 1 P (E 1 E 2 E 3 P (E 1 E 2 = P (E 1... E n P (E 1E 2 E n P (E 1 E n 1 27

29 LECTURE 8 LTP (3.3 Law of total probability - basic case. We have: P (E = P (E F P (F + P (E F c P (F c. The proof is straightforward; by the definition of conditional probability, we know that P (E F P (F = P (E F, and P (E F c P (F c = P (E F c, hence the right hand side is equal to P (E F + P (E F c. Since S = F F c, the disjoint union of EF and EF c is equal to E, which means that P (E = P (E F + P (E F c (draw a Venn diagram to convince yourselves. Theorem 8.1. (general LTP Let F 1, F 2,..., F N be disjoint events such that Then for any event E, P (E = N S = F k. k=1 N P (E F k P (F k k=1 Remark 8.2. LTP also works for infinitely many events F 1, F 2,... Example 8.3. What % of Delta flights arrive on time if: 70% of flights depart on time (same as 30% depart late. 80% of flights that depart on time arrive on time 90% of flights that depart late arrive late (same as 10% arrive on time. We Select a random flight, and define the events: D = {departs on time} and A = {arrives on time}. In terms of A and D, the given information states that P (D = 0.7 P (D c = 0.3 P (A D = 0.8 P (A c D c = 0.9 P (A D c =

30 By LTP, it follows that P (A = P (A D P (D + P (A D c P (D c = = 0.59 Therefore, the final answer is: 59% Bayes formula (3.3 Idea: to relate P (E F with P (F E How? We know that P (F E P (E = P (E F = P (E F P (F, which implies Theorem 8.4. (Bayes formula If P (E, P (F > 0, then P (F E = P (E F P (F P (E LT P = P (E F P (F P (E F P (F + P (E F c P (F c Example 8.5. A certain disease affects 1% of the population. 5% of non-ill population will be tested positive, while 10% of the ill will be tested negative. What is the probability that a person tested positive is actually sick with this disease? Solution. Define D = {has disease}, T = {test positive}. We want to find P (D T. The probability that a non-ill person has is tested positive: P (T D c = The probability that an ill person is tested negative: P (T c D = 0.1. Using Bayes formula, P (D T = P (T D P (D P (T D P (D + P (T D c P (D c = which is very small.. Intuition: the disease is rare so almost all positive tests are of non-ill people. For example, among 2000 people, about 20 are ill, and 1980 are not. Among the 20, 18 will be tested positive, and among the healthy, 99 will be tested positive. so the probability is In some cases, conditional probability is useful, even if partial information is not specified in the problem: Exercise 8.6. There are 15 tennis balls in a box, only nine of which are new balls. In the first round, three of the balls are randomly selected, played with, and then returned to the box. Later, in the second round, another three balls are randomly selected from the box. Find the probability that all three balls in the second selection are new. Solution. Let E denote the event that all three balls in the second round are new. For i = 0, 1, 2, 3, let F i denote the event that i new balls have been selected in the first round. Note that 30

31 F i, F j are disjoint, and satisfy S = 3 i=0 F i. We now use LTP to write P (E = 3 P (E F i PF i. i=0 Thus, the solution is divided into a few simpler parts, namely, the computations of P (E F i, P (F i : P (F i = (9 i( 3 i 6 ( old balls. (choosing i new balls from the 9 new balls, and 3 i old balls from the The probability P (E F i is the probability that all three balls in the second selection are new, where given F i means that there are 9 i new balls (and 6 + i used balls. Therefore P (E F i = ( 9 i 3 Plugging back into LTP formula, we get ( 3 9 i ( P (E = ( 15 i( 3 i = 1 3 ( 9 i ( i=0 ( 15 3 i=0 ( ( 9 i ( 6 3 i =

33 LECTURE 9 Independent events (3.4 Intuitive definition: knowing that E has occurred doesn t affect the probability that F occurs, and vice versa. P (E F = P (E, or P (F E = P (F Example 9.1. Standard deck with 52 cards and P (king heart = P (king = 1 13 P (heart king = P (heart = 1 4 Note that both P (E F = P (E, or P (F E = P (F are equivalent to P (E P (F = P (E F since, for example, P (E = P (E F P (E = P (E F P (F P (E P (F = P (E F. For this reason we define: Definition 9.2. Events E, F are called independent if P (E P (F = P (E F. Exercise 9.3. Bob and Alice do not know each other, but both need to take an advanced math course next semester. The probability that Alice will take Math 425 is 0.8. On the other hand, Bob has decided that a coin flip will determine whether he takes MATH 425 or not. What is the probability that both of them will be taking MATH 425? Solution. By independence (they do not know each other, P ({Alice takes 425} {Bob takes 425} = P ({Alice takes 425} P ({Bob takes 425} = We may also define independence for any sequence of events: We say that E 1, E 2, E 3 are independent if P (E 1 E 2 E 3 = P (E 1 P (E 2 P (E 3 and every pair of events is independent; P (E i E j = P (E i P (E j for all i j. 33

34 We say that E 1,..., E 4 are independent if P (E 1 E 2 E 3 E 4 = P (E 1 P (E 2 P (E 3 P (E 4 and every collection of 3 events among {E 1,..., E 4 } are independent.. Iteratively, we say that E 1,..., E n are independent if P (E 1 E 2 E n = P (E 1... P (E n and every collection of n 1 events among {E 1,..., E n } are independent.. Remark 9.4. Sometimes we know that only pairs are independent, this is called pairwise independence and is a weaker property that does not imply independence. For example: Example 9.5. We roll 2 fair dice. E 1 = {first die shows 4} E 2 = {second die shows 4} E 3 = {sum of the dice is 7} It is easy to verify that E 1, E 2 are independent, E 1, E 3 are independent, and E 2, E 3 are independent. However, E 1, E 2, E 3 are not independent. Exercise 9.6. (Ross, example 4g A system composed of n separate components is said to be a parallel system if it functions when at least one of the components functions (See Figure. For such a system, if component i, which is independent of the other components, functions with probability p i, i = 1,..., n, what is the probability that the system functions? Solution. Let A i denote the event that component i functions. Then ( n P (system functions = 1 P (system does not function = 1 P By independence, we have that P (system functions = 1 Π n i=1 (1 p i i=1 A c i. 34

35 Exercise 9.7. Show that if A, B are independent then A c, B independent (which also implies that A, B c are independent and A c, B c are independent. Solution. We have that which implies The P (B = P (A c B + P (A B indep. = P (A c B + P (A P (B P (A c B = (1 P (A P (B = P (A c P (B. Exercise 9.8. Let A, B, C, D be independent. Show that A B and C D are also independent. Solution. From the discussion above, we know A c, B c, C c, D c are independent. Therefore, A c B c and C c D c are also independent (why?. This implies: A B = (A c B c c and C D = (C c D c c are also independent. Applying the ideas from the previous two exercises, leads to the following general properties of independence: Let E 1, E 2,... be independent events. Then (1 Replacing some of the E i by Ei c preserves independence (2 Intersections of different events among {E 1,..., E n } are independent. For example, E 1 E 2, E 3 E 4 E 5 are independent. (3 Unions of different events among {E 1,..., E n } are independent. For example, E 1 E 2, E 3 E 4 are independent. (4 Any combinations of the above (where each event E i appears at most once are independent. For example, ((E 1 E 2 E3 c c, E 4 E 5, E 6 are independent. Exercise 9.9. We want to drive from A to C through B. There are two roads connecting A to B and two other roads connecting B to C. Because of road construction, each of the four roads may be closed with probability p independently of other roads. What is the probability that we can reach C? 35

36 Solution. Define the events: R 1, R 2 roads from A to B (open, R 3, R 4 roads from B to C (open. P (can reach C = P (can go from A to B and then from B to C = P ((R 1 R 2 (R 3 R 4 By the previous proposition, we have P ((R 1 R 2 (R 3 R 4 = P (R 1 R 2 P (R 3 R 4 = (1 P ((R 1 R 2 c (1 P ((R 3 R 4 c = (1 P (R1 c R2 c (1 P (R3 c R4 c = (1 P (R1 c P (R2 c (1 P (R3 c P (R4 c = ( 1 p

37 LECTURE 10 Series of independent events (3.4 Exercise Independent trials consisting of rolling a pair of fair dice are performed (indefinitely. What is the probability that an outcome of 5 appears before an outcome of 7 when the outcome of a roll is the sum of the dice? Solution. Since, on any roll, 5 occurs with probability 4/36, and 7 with probability 6/36, it seems intuitive that the odds that a 5 appears before a 7 should be 6 to 4 against. The probability should then be 4/10. Indeed, one can solve the problem directly, by defining the event E n that neither 5 nor 7 occur in the n 1 first trials, and 5 occurs in the nth trial. Then P ( i=1 E n = P (E n is what we are looking for, which can be computed using independence (try!. We use conditional probability. Let E denote the event that 5 (that is, the sum 5 appears before 7. Let F 1 denote the event that the outcome of the first roll is 5, let F 2 denote the event that the outcome of the first roll is 7, and let F 3 denote the event that the outcome of the first roll is neither 5 nor 7. By LTP we have that P (E = P (E F 1 P (F 1 + P (E F 2 P (F 2 + P (E F 3 P (F 3. Notice that P (E F 1 = 1 and P (E F 2 = 0, and that, by independence, P (E F 3 = P (E. Moreover, an easy computation shows that P (F 1 = 4 36, P (F 2 = 6 36, and P (F 3 = Thus, P (E = P (E = P (E = 4 36 = P (E = 4 10 = 2 5. At home: similar arguments show that if E and F are mutually exclusive events of an experiment, then, when independent trials of the experiment are performed, the event E will occur before the event F with probability P (E P (E + P (F 37

38 Conditional probability as a probability (3.5 Conditional probabilities satisfy all the axioms of a probability function. Namely, let F be an event, Then P ( F is a probability function, satisfying 0 P ( F 1, P (S F = 1, and ( n n P E i F = P (E i F i=1 i=1 for any disjoint events E 1, E 2..., E n. This means that all the results/formulas we have seen so far, also hold for conditional probabilities. For example, the inclusion/exclusion relation: P (E 1 E 2 F = P (E 1 F + P (E 2 F P (E 1 E 2 F. Another useful relation is LTP for conditional probabilities. Denote P F (E := P (E F. Then, for any event G we have (4 P F (E = P F (E G P F (G + P F (E G c P F (G c. To understand what this formula means in terms of the original probability P, note that P F (E G = P F (E G P F (G = P (EG F P (G F = P(EGF P(F P(F G P(F = P (EF G P (F G = P (E F G which means that (4 is equivalent to P (E F = P (E F G P (G F + P (E F G c P (G c F Exercise One at a time, we turn over cards from a shuffled standard deck of 52 cards. Given that the first card is a Queen, what is the probability that the second card is the queen of spades? Solution. Let E denote the event that the second card is the queen of spades, let F denote the event that the first card is a queen, and let G denote the event that the first card is the queen of spades. Then P (E F = P (E F G P (G F + P (E F G c P (G c F. Clearly, we have that P (E F G = 0 (given that the first is the queen of spades, the second card cannot be the queen of spades. Moreover, P (E F G c = 1 (there are 51 cards left, including the 51 queen of spades, and P (G c F = 3 (3 possible queens out of 4. Thus, 4 Conditional independence. P (E F = =

39 Definition We say that E 1 and E 2 are conditionally independent given F if P F (E 1 E 2 = P F (E 1 or, equivalently, P F (E 1 E 2 = P F (E 1 P F (E 2. In terms of P, E 1 and E 2 are conditionally independent given F if P (E 1 E 2 F = P (E 1 F or, equivalently, if P (E 1 E 2 F = P (E 1 F P (E 2 F. Exercise (Ross, example 5a The probability that a policyholder is an accident-prone person is 0.3. During any given year, an accident-prone person will have an accident with probability 0.4 (independently of other years whereas a person who is not prone to accidents will have an accident with probability 0.2 (again, independently of other years. What is the conditional probability that a new policyholder will have an accident in his/her second year of policy ownership, given that the he/she has had an accident in the first year? Solution. Let A be the event that the policyholder is accident prone, and let A 1 (resp. A 2 be the event that the policyholder has had an accident in the 1st (resp. 2nd year. We want to find P (A 2 A 1. We condition on whether the policyholder is accident prone or not: P (A 2 A 1 = P (A 2 AA 1 P (A A 1 + P (A 2 A c A 1 P (A c A 1. By Bayes formula, we have P (A A 1 = P (A 1 A P (A P (A 1 A P (A + P (A 1 A c P (A c = = = 6 13, and hence P (A c A 1 = By conditional independence, we have P (A 2 AA 1 = P (A 2 A = 0.4 and P (A 2 A c A 1 = P (A 2 A c = 0.2. Thus, P (A 2 A 1 =

41 LECTURE 11 Random Variables (4.1 Definition A r.v. X is a real-valued function on the sample space S. Example. (a We toss a fair coin 3 times. Consider the following assignment of numbers to each one of the outcomes in the sample space: { S = HHH 3, HHT 2, HT H 2, HT T 1, T HH 2, T HT 1 This assignment is the r.v. describing the number of heads tossed., T T H, T T T 1 0 Example. (b Roll a fair die once. Then S = {1, 2, 3, 4, 5, 6}. Let E = {We roll an even number}. Consider the function X : S R defined by 1, s E X (s = 0, s E. This function the r.v. indicating whether E has occurred or not. More examples: (c age of a randomly chosen person from a group. (d delay time of a flight. (e molecule s velocity at a given moment. Notations: P (X = x = P ({s S : X (s = x}. P (X A = P ({s S : X (s A} P (X < a P (a < X b, etc. Examples: Consider example (a again. (1 P (X = 0 = P (T T T = 1 8 (2 P (X [2, 3] = P ({tossed at least two heads} = 1 2. (3 P (X 1 = 1 P (X < 1 = 1 P (X = 0 = 7 8. }. 41

42 Discrete R.V. S (4.2 Definition A r.v. X is called discrete if X takes on a finite or countable number of values x 1, x 2,.... Examples (a, (b and (c above describe discrete r.v s, but the other examples do not. The probability mass function Definition (pmf Let X be a discrete r.v. The probability mass function (pmf of X is defined as p X (x = P (X = x, x R. Properties: Suppose the range of X is I = {x 1, x 2,... }. Then the following holds. If A I then P (x A = P ( x i A {X = x i} = x i A p X (x i. In particular: p X (x x i I i = 1. Example Back to example (a. X = # heads tossed. The pmf of X is: p X (0 = 1 8, p X (1 = 3 8, p X (2 3 8, p X (3 = 1 8. Exercise n tosses of a fair coin. X = # heads in n coin flips. Find the pmf of X. Solution. We have p (k = ( n k ( 1 2 n for k=0, 1, 2,..., n. (here ( n k is the number of ways to choose k heads out of n, and 1/2 n is the probability of each such outcome. 42

Cumulative distribution function (CDF Definition 11.6. The cdf (cumulative distribution function of a r.v. X is the function F : R [0, 1], defined by F (x = P (X x = P ({s : X (s x}.

43 Cumulative distribution function (CDF Definition The cdf (cumulative distribution function of a r.v. X is the function F : R [0, 1], defined by F (x = P (X x = P ({s : X (s x}. Properties: The following holds. (1 F is non-decreasing. (2 lim x F (x = 0, lim x F (x = 1. Example Again 3 coin flips (example (a above. The CDF is: 0, x < 0 1/8, 0 x < 1 F = 1/2, 1 x < 2 7/8, 2 x < 3 1, 3 x 43

44 Connection between cdf and pmf Given one of the functions (pmf or cdf, one can calculate the other function: pmf cdf: If one knows the pmf of a r.v., then the cdf can be recovered by: F (x = P (X x = P (X = x i = p X (x i. x i x x i x cdf pmf: Similarly, given the cdf of X, one can recover the pmf of X by observing that at each point x i where F jumps (discontinuity point is a point for which p X (x i 0. The value p X (x i is exactly the hight of the jump. Exercise Let X be a r.v. with cdf 0, x < 1 F = 1 3, 1 x < 1. 1, x 1 Find the pmf of X. Can you come up with an underlying story that explains the sample space? Solution. There are two jumps, one at 1 and one at 1. The jump at 1 is F ( 1 F ( 1 = 1 3 and the jump at 1 is F (1 F (1 = 2 3. We thus have p X ( 1 = 1/3, p X (1 = 2/3. For x 1, 1 we have p X (x = 0. A suitable story for this r.v. can be: tossing a biased coin. Remark The pmf and the cdf of X are also referred to as the distribution of X. Exercise (if time permits The probability mass function of a random variable X is given by p(i = c λi, i = 0, 1, 2,..., where λ is some positive value. Find (a P (X = 0 and (b P (X > 2. i! Solution. We have 1 = p (i = c λ i i=0 = ce λ and hence c = e λ. Therefore, i! and P (X = 0 = p (0 = e λ P (X > 2 = 1 P (X 2 = 1 P (X = 0 P (X = 1 P (X = 2 ( = 1 e λ 1 λ λ

45 LECTURE 12 Review of practice problems on Ch. 1-3 before Exam 1. 45

47 LECTURE 13 Expected Value (4.3 We start with an example, followed by the definition. Example The number of apartments in Ann Arbor is given in the following table: Bedroom The average number of bedrooms in AA is thus: Ans : total # bedrooms total # apts Apts 1,200 2,572 3,307 3,198 1, = 12, = = 2.15 Probabilistic view: The experiment: select an apartment at random. Define the r.v. X = The number of bedrooms in the apartment. The pmf of X is p X (0 = , 083, p X (1 = , 083, p X (2 = , 083, p X (3 = , 083, p X (4 = , 083. The average above can be written in terms of p X as: 2.15 = 0 p X (0 + 1 p X (1 + 2 p X (2 + 3 p X (3 + 4 p X (4. This leads to the following definition: Definition Let X be a discrete r.v. with range I = {x 1, x 2... }. The expected value of X is then: E (X = x I xp (x = k x k p (x k. Remark If the range of X is {x 1, x 2,..., x N },and all the values of X are equally likely then p (x i = 1 and N N i=1 E (X = x i = Usual Average. N In general, E (X is the weighted average which puts more weight to likelier values. 47

48 Example Toss a fair coin 3 times. Let X = # heads. Then E (X = = 1.5. Exercise (Lottery To win the lottery one need to guess 6 out of 49 number (order irrelevant. Here s a table describing the prizes: # correct Prize 6 $1,200,000 5 $800 4 $35 3 or less $0 Is it advantageous to buy a 14 lottery ticket? Solution. We compute the prize money expected value. First, we find the pmf: Now, we have p (1, 200, 000 = 1 ( 49 6 p (800 = p (35 = ( 6 ( ( 49 6 ( 6 ( ( 49 6 p (0 = 1 p (35 p (800 p (1, 200, 000. E (X = 1, 200, 000 p (1, 200, p ( p ( p ( Therefore, on average, we lose a cent! (note that this does not mean that people are not willing to take the risk of losing a cent or two, in the hopes of winning $1 million... Properties of Expectation Expectation has the following basic and very useful properties: (1 E (X 1 + X X n = E (X E (X n for any r.v.s X 1,..., X n (we will prove it in the future (2 E (ax = a E (X for a r.v. X, and a R. (3 E (b = b for a constant r.v. X b, b R. (4 E (X + c = E (X + c for any r.v. X and constant c R. 48

49 Example (Group testing We need to test blood of n people for a rare disease (syphilis in men drafted during WWII, p = P (positive (assume that a person has the disease independently of other people. Method 1: test all people individually. This means we always run exactly n tests. Method 2: divide the n people into groups of k people, mix the blood samples of each group, and test each mixed sample (group testing. If a test is negative then no more tests are needed for that group (means: only 1 test was needed. If test is positive = test all k people in the group individually (means: k + 1 tests were needed. Question: what is the expected number of tests in method 2? Solution. Divide n people into n/k groups of k (assume that k divides n. Define X = X 1 + X X n/k wherex i = # of tests needed for group i, with possible values: 1, k + 1. We have: P (X i = 1 = P (all k people in group i do not have the disease = (1 p k and Thus, and therefore P (X i = k + 1 = 1 (1 p k. ( E (X i = 1 (1 p k + (k (1 p k, n/k E (X = E (X i = n k i=1 For example, if n = 100, 000, p = 10 4 E (X = 2026 (compare to method 1 where # = 100, 000! ( ( 1 (1 p k + (k (1 p k. (on average,10 are sick, then for k = 120, we get Expected value of a function of a r.v. (4.4 Example A box contains 11 disks with radii 1, 2,..., 11 inches. random. What is the expected area A of the chosen disk? One disk is chosen at We know that the area of a disk is A = πr 2, where R = radius of the disk is a r.v. E (R = = 6. with 49

50 Note : E (A π in 2. Since A takes the values π 1 2,..., π 11 2 with probabilities 1 11, it follows that E (A = π π 10 2 In general, the following formula holds: in 2. Proposition Let X be a discrete r.v. with pmf p (x. Then for any function g (, E [g (X] = x g (x p (x. (again, note that in general, we don t have E (g (X g (E (X Proof. The function g (X is a r.v. with values g (x k (where x 1, x 2,... are the possible values of X. Hence E [g (X] = k g (x k P (g (X = g (x k = [if g is 1 1 : g(x k p(x k ] = k = k g (x k P j: g(y j =g(x k j: g(y j =g(x k g (y j p (y j = x {X = y j } = k g (x p (x. g (x k P (X = y j j: g(y j =g(x k 50

51 LECTURE 14 Review of Exam #1 51

53 LECTURE 15 Variance (4.5 Example Compare the r.v.s for different values of a 0 (draw a picture +a Prob. 1 2 X a =. a Prob. 1 2 For very small a (even a = 0, the values are always very close to E (X = 0. However, for large values, they are always very far from E (X. The following notion of variance (and standard deviation of a r.v. X, gives an indication of how the values of X are spread around E (X, on average: Definition Let X be a r.v. with mean (= expected value µ. The variance of X is Var (X = E [ (X µ 2]. The standard deviation of X is σ X = Var (X. Remark Note that σ X has the same units as X while Var (X does not. Typically, one expects to get a result in the range E (X ± σ X (optional Another view of Var (X: suppose one want to replace X by the best constant (not random number y. One way to measure what best means is to minimize the expected difference (squared - to avoid cancellations with negative signs: d (y = E ( (X y 2. By differentiating with respect to y, one can show that the minimizer is y = E (X, and Var (X = d (E (X is the minimum expected difference possible. Often (but not always - see the above example!, it will be more convenient to use the following formula: Proposition. We have Var (X = E ( X 2 2µX + µ 2 = E ( X 2 = 2µE (X + µ 2 = E ( X 2 E ( X 2. 53

54 Example Compare the following two investments options: Stock A: 3% return with prob. 0.8; 2% loss with prob Stock B: 5% return with prob. 0.6; 2% loss with prob Expected profit in %: E (X A = = 2%. and E (X B = = 2.2%. However, we compare the variances (or risk in this case: E ( X 2 A = = 8, and E (X B = = 16.6, Therefore, Var (X A = E (XA 2 E (X A 2 = = 4, σ XA = Var (X A = 2%, and Var (X B = = 11.76, σ XB = Var (X B = 3.4%. So, typically, the profit from A is 2 ± 2% and from B, 2.2 ± 3.4%. Therefore, stock A seems safer. Properties of variance Proposition Let X be a r.v. Let a, b R. Then (1 Var (ax = a 2 Var (X (a Var (X + b = Var (X. (b Combined: Var (ax + b = a 2 Var (X. Proof. (1 We have a Var (ax = E ( (ax E (ax 2 = E ( (ax ae (X 2 = E ( a 2 (X E (X 2 = a 2 E ( (X E (X 2 = a 2 Var (X. (2 A shift by a constant does not change how that values are spread around the mean. Indeed, Var (X + b = E ( ((X + b E (X + b 2 = E ( (X + b E (X b 2 = E ( (X E (X 2 = Var (X. 54

55 Exercise (The matching problem N assignments are returned to N students at random. (1 How many student, on average, will get their own assignment? (2 What is the standard deviation of the number of students who will get their own hw back? Solution. (1 Method of indicators: Let X = # of students that got their own HW back. We write where X = N i=1 X i 1, student i gets his/hers own hw X i =. 0,, o/w Now, we use the additivity of expected value to get E (X = N E (X i. i=1 For each i we have E (X i = 1 P (X i = P (X i = 0 = P (X i = 1 = 1 N. Therefore, E (X = N 1 N = 1. (2 We need to calculate the standard deviation of X. We have E ( ( N 2 ( X 2 N N = E X i = E X i X j = E ( Xi 2 + E (X i X j, i=1 i=1 X 2 i + i j i=1 i j Note that E (X 2 i = E (X i = 1 N (the square of an indicator is itself, and E (X i X j = 1 P (X i X j = P (X i X j = 0 = P (X i = 1 and X j = 1 Therefore, = P (both students i and j get their own hw back = 1 N 1 N 1 and hence E ( X 2 = N 1 # of (ordered pairs i j N + 1 N (N 1 N (N 1 = 2, Var (X = E ( X 2 E (X 2 = 2 1 = 1 σ X = Var (X = 1. 55

56 Exercise (Optional - if time permits A fair die is rolled, and you win (or lose money according to the following law: If X is the number on which the die lands, then you win/lose 2X 3 dollars. Find the expected value and standard deviation of your winnings. Solution. Since E (X = 1 ( = 7, the expected winnings are: 6 2 E (2X 3 = 2E (X 3 = = 4. For the standard deviation, we first calculate E ( X 2 = 1 ( = Hence, Var (X = E ( X 2 (E (X 2 = 91 6 The variance of our winnings is: Var (2X 3 = σ 2X 3 = = 35 3 ( 2 7 = , and the standard deviation is 56

57 LECTURE 16 Bernoulli and Binomial distributions (4.6 Bernoulli R.V.S Consider an experiment with two outcome; e.g., success, and failure, where P (success = p, P (failure = 1 p. Let X be the indicator of success: 1, with prob. p X = 0, with prob. 1 p. Then X is called a Bernoulli r.v. with parameter p. Formally: Definition A r.v. X is said to be a Bernoulli r.v. with parameter p if its pmf is given by p X (1 = p, p X (0 = 1 p. Notation: X Bern (p. The expectation of a Bernoulli r.v.: E (X = 1 p + 0 (1 p = p. The variance of a Bernoulli r.v. : Var (X = E (X 2 E (X 2 = p p 2 = p (1 p. Binomial R.V.S Now consider an experiment consisting of n independent Bernoulli trials, each having success with probability p and failure with probability 1 p, independently of other trials. Let the r.v. X be the number of successes. The pmf of X is then: p X (k = P (X = k = P (k successes in n trials = ( n p k (1 p n k k where ( n k stands for the # of possible arrangement of successes and failures, and p k (1 p n k stands for the probability of each arrangement. Formally: Definition A r.v. X is said to be a binomial r.v. with parameters n, p if ( n p X (k = p k (1 p n k, k = 0, 1,..., n. k Notation: X Bin (n, p. 57

58 We have: E (X = np,var (X = np (1 p (see exercise below. Example Jack hits his target 70% of the time. What is the probability that he hits his target in at least 8 of 10 shots? Solution. X = # of hits. Assuming that Jack s hits are independent, X Bin (10, 0.7. Then ( ( ( P (X 8 = p X (8+p X (9+p X (10 = Exercise Compute the expectation and variance of a binomial r.v. Solution. We present two solutions: 1. Direct computation: by definition, we have n n E (X = k p X (k = k p X (k = k=0 k=1 We use the identity k ( ( n k = n n 1 k 1 to obtain n ( n 1 E (X = n p k (1 p n k = np k 1 k=1 n k=1 n k we next change variables by setting j = k 1. Hence, n 1 ( n 1 E (X = np p j (1 p n 1 j. j ( n 1 j=0 k=1 ( n p k (1 p n k. k ( n 1 p k 1 (1 p (n 1 (k 1. k 1 We claim that n 1 j=0 j p j (1 p n 1 j = 1. Indeed, note that if Y Bin (n 1, p then the pmf of Y is p Y (j = ( n 1 j p j (1 p n 1 j, for j = 0,... n 1. Therefore, n 1 n 1 ( n 1 1 = p Y (j = p j (1 p n 1 j, j which implies that E (X = np. j=0 j=0 Remark The same method can be used to find E (X 2 and Var (X - See Ross p Method of indicators: We write X = n i=1 X i, where 1, success at trial i X i = 0, failure at trial i Note that X i Bernoulli (p. Therefore, E (X = n i=1 E (X i = np. Moreover, 58

59 E ( ( n 2 ( n X 2 = E X i = E i=1 i=1 X 2 i + i j X i X j = = np + n (n 1 (1 P (X i X j = P (X i X j = 0 = np + n (n 1 P (X i = 1 and X j = 1 = np + n (n 1 p 2 which implies that Var (X = np + n (n 1 p 2 (np 2 = np (1 p. n E ( Xi 2 + E (X i X j Exercise. (overbooking Flight A: 10 tickets sold, the plane has 9 seats. Flight B: 20 tickets sold, the plane has 18 seats. Passengers show up with prob. 0.9 each. Which flight is more likely to get overbooked? Solution. We have: i=1 X A = # passengers who show up for flight A X B = # passengers who show up for flight B. Note that X A Bin (10, 0.9, and X B Bin (20, 0.9, and hence ( 10 P (A overbooked = P (X A = 10 = = P (B overbooked = P (X B = 19 + P (X B = 20 = Therefore, B is more likely to get overbooked. ( i j ( = Exercise Let X Bin (n, p where p is a parameter. maximum variance? For what value of p X has the Solution. We know that Var (X = np (1 p. We look for the critical points of the variance: (np (1 p = np 2np = 0 and find that the maximum of np (1 p is attained at p =

61 LECTURE 17 Poisson distribution (4.7 Recall: X Bin (n, p where X is the number of successes in n independent trials and p is the probability of success at each trial. The pmf of X is : p (k = ( n k p k (1 p n k, k = 0, 1,..., n. Also: E (X = p, Var (X = np (1 p. Example In a class of 40 students, on average 2 students are sick. What is the prob. that 4 students are sick? Solution. X Bin (40, p, and E (X = 40p = 2 = p = 1/20. Therefore ( ( 4 ( 40 1 p (4 = = An approximation to the Binomial distribution The pmf p (k of Bin (n, p is sometimes difficult to compute, especially for large n. approximate the pmf of Bin (n, p under the following assumptions: n is large (n p is small, or successes are rare (p 0 np is of moderate size ( p = λ n Let us rewrite the pmf of X: p (k = n! k! (n k! = λk k! for a constant λ, or np λ. ( k ( λ 1 λ n k n k n (n 1 (n 2 (n k + 1 n k ( 1 n λ n ( 1 n λ k. Note that under the above assumptions, as n we have that for any fixed k: n(n 1(n 2 (n k+1 n k 1, ( 1 λ n n e λ, and ( 1 λ n k 1 (n, and k fixed We will 61

62 Combining the above, we get p (k λk k! e λ. This function defines the Poisson distribution: Definition. (Poisson distribution. A r.v. X has the Poisson distribution with parameter λ > 0 if the pmf of X is given by: Notation: X Pois (λ. p (k = λk k! e λ, k = 0, 1, 2, 3,... Remark The Poisson distribution was introduced by Poisson in 1837, along with applications in criminal lawsuits (jury decisions. However, it had not attracted much attention until a little book by Bortkiewicz was published in 1898, which included a strange example: deaths by horse kicks in the Prussian army. Using the Poisson distribution, such rare events were shown to be quite regularized and predictable, which helped to popularize this distribution. Example Let us use Poisson approximation for our previous example (sick students: here X Bin (40, 1/20 Pois (2, and hence P (X = ! e (compare to the exact answer Remark One may use Poisson approximation to the binomial distribution, that is Bin (n, p Pois (np, whenever the number of independent trials is large, p is small, and λ = np is of moderate size. As a consequence of the fact that a Poisson r.v. is the limit of Binomial r.v.s, we get: Corollary For X Pois (λ, we have that E (X = λ and Var (X = λ. Indeed, the idea is that the pmf of X is the limit of the pmfs of Binomial r.v. X Bin ( n, λ n, and hence one expects to have E (X np = λ and Var (X np (1 p = λ (1 p λ (since p 0. Indeed, one can also verify these facts directly (try!. Remark We know we must have 1 = λ k k=0 k! e λ (probability axiom. This is equivalent to e λ λ k = k! e λ k=0 which is a known identity (Taylor series of the exponential functions. This is a probabilistic proof of this identity. 62

63 Other Poisson distribution models Besides approximating the binomial distribution, the Poisson distribution is an appropriate model in various situations, and has numerous applications. For example, (1 The number of accidents on a section of a highway on a given day (2 The number of floods at a river during over a century. (3 The number of α particles emitted from a radioactive source during a fixed period of time. Poisson Process. Often, we consider the number N (t of occurrences of an event in an interval, say [0, t], where the average number of occurrences is proportional to the length of the interval, that is E (N (t = λt. This means that, for each t, N (t Pois (λt. The family of r.v.s {N (t} is called a Poisson process with rate λ. Exercise Births of twins in a certain city are described by a Poisson process with the constant rate of 1.2 births per year. (a What is the probability that more than two twin births will occur during the year 2017? (b What is the probability that no twin births will occur during the next five years? (c If we learn that there was at least one birth of twins during the year 2015, what is the conditional probability that there were no twin births during the first half of that year? Solution. (a Let X be the number of twin births during Then X Pois (1.2, and hence ( P(X > 2 = 1 P(X 2 = 1 e (b Let Y be the number of twin births in the next 5 years. Then Y Pois (6, and hence P (Y = 0 = e 6. (c Let N 1 [0, 2], N [ 1,1], N [0,1] respectively be the number of twin births during the 1 st half, of 2015, 2 2 nd half of 2015, and the entire year of Then ( P N 1 [0, 2] = 0 N [0,1] 1 = P 2] = 0 and N [0,1] 1 P ( N [0,1] 1 ( 2] = 0 P N [ 1,1] 1 2 P ( N [0,1] 1 ( N [0, 1 ( P N 1 [0, ( P N = 1 [0, 2] = 0 and N [ 1,1] 1 2 P ( N [0,1] 1 ( e = 1 e e

64 Example Poisson approximation to the binomial r.v. is very good. In fact, it remains pretty good even when the trials are not independent, provided that their dependence is weak. For example, recall the problem in which n people randomly select n hats (each hat belongs to exactly one person, and let X be the number of people who select their own hat. Then X = X 1 + +X n where X i is the indicator of the event that person i selects his own hat. Clearly, we have that P (X i = 1 = 1 n and P (X i = 1 X j = 1 = 1 n 1 for i j. Thus, we see that X 1,..., X n are not independent, but their dependence for large n appears to be weak. Let us estimate the probability that at least one selects his own hat: we approximately have that X Pois (1 and hence P (X > 0 1 e 1. This is exactly the same result we previously got for n! (see the Challenge problem at the end of Exercise 4.9 from Lecture 4. 64

65 LECTURE The Geometric r.v. (4.8 Consider an infinite sequence of independent trials, with probability p of success at each trial. Let X = the trial number of the first success The pmf of X is: p X (k = P (X = k = (1 p k 1 p, k = 1, 2,..., where (1 p k is the probability that the first k 1 trials are failures, and p is the probability that the k th trial is a success. Definition A r.v. X is called a geometric r.v. with parameter p if the pmf of X is p X (k = (1 p k 1 p, where k = 1, 2,.... Notation: X G (p. Moments: E (X = 1 1 p (see below, Var (X = (see Ross p In the future, we will p p 2 compute these moments using the notion of conditional expectation. Remark We have k=1 p X (k = k=1 (1 pk 1 p = 1, which is equivalent to (1 p k 1 = (1 p k = 1 (the well-known geometric series. p k=1 k=0 Exercise Show that the expectation of X G (p is E (X = 1. p Solution. By definition, E (X = k (1 p k 1 p = (k (1 p k 1 p = k=1 (k 1 (1 p k 1 p + k=1 k=1 (1 p k 1 p = [change index: l = k 1] k=1 l (1 p l p + 1 = (1 p l (1 p l 1 p + 1 =(1 p E (X + 1. l=1 l=1 We thus have that E (X = (1 p E (X + 1, which yields E (X = 1 p. 65

66 Exercise A miner is trapped in a mine with three doors. One door leads outside through a tunnel, in 2 hours. The other two doors are connected by another tunnel, which one can walk through in 3 hours. The miner is equally likely to choose any of the three doors. If he finds himself back in the mine, he again chooses one of the 3 doors at random (the miner is tired and disoriented, which makes him forget which doors he had chosen before. What is the expected time it takes the miner to escape the mine? Solution. The number of times it takes until the door leading outside is chosen is X G ( 1 3. Therefore E (X = 3. This means that the average time it would take to reach safety is = 8 hours (three hours for choosing the first two connected doors, and another two hours in the tunnel leading outside. More formally, the r.v. representing the time passing until reaching safety is: T = 3 (X 1 + 2, where X 1 is # of times choosing one of the connected doors. Thus, E (T = 3 E (X = 3 X 1 = = 8. Exercise Let X G (p. Find the probability that X is even. Solution. One may use a direct computation of i=1 P (X = 2i (try!. We can also use conditional probability as follows. Let E denote the event that X is even. Then P (E = P (E X = 1 p + P (E X > 1 (1 p = P (E X > 1 (1 p. To understand P (E X > 1, let Y be the trial of the first success, starting counting from the second trial. Note that Y G (p. Also note that given X > 1, X is even if and only if Y is odd (if we don t know that X > 1 then it may happen, for example, that X = 1 and Y = 1 -i.e. first and second trials are a success. Thus, we have P (E X > 1 = P (Y is odd = 1 P (Y is even. Since Y G (p, it follows that P (Y is even = P (X is even = P (E. Therefore, P (E = (1 P (E (1 p, which yields P (E = 1 p 2 p. Other discrete distributions (4.8 The negative Binomial r.v. Similar to geometric. This time, we are interested in the trial number of the r th success. The pmf: ( k + r 1 p X (r + k = p r (1 p k, k = 0, 1,... k Explanation: ( k+r 1 k is the number of possibilities for k failures among the k + r 1 first trials (the (r + k th trail is the r th success and p r (1 p k is the probability of each such possibility. 66

67 Notation: X NB (r, p. Moments: E (X = r 1 p, Var (X = r p p 2 (see Ross for a direct computation. One can write X = X X r where X i G (p are independent. This provides another way to compute E (X and Var (X. Hypergeometric r.v. (Definition by example An urn contains N balls of which D are green and N D are red. We take a sample of n balls. Let X be the number of green balls in the sample. The pmf: p X (k = ( D ( N D k n k ( N n where ( ( D k - k greens out of D, N D ( n k - (n k reds out of N D, and N n - samples of n balls out of N. Notation: X HG (N, D, n. Range of X: We have: X 0 and X n sample size (N D, and so the smallest possible value of X is # of red balls max (0, n N + D We have: X D (total number of green balls and X n (sample size, and so the greatest possible value of X is min (D, n Moments of X: E (X = n D D, Var (X = n (1 D N N N N n. N 1 One can calculate these using the indicator method (try!: 1, j th ball is green option 1: define X j =, j = 1,..., n, X = X j, 0, o/w 1, i th green ball is selected option 2: define Y i =, i = 1,..., D (X = Y i. 0, o/w Remark Read Ross 4.8 for more details and examples. 67

69 LECTURE 19 Continuous distributions (5.1 Example Discrete r.v.s would not be able to model certain situations, such as Lifetime of equipment Travel time between points (in a city, or on highway Payoff of an enterprise etc. In a continuous model, P (X = x = 0 so a pmf would not make sense. Instead we have the pdf (probability distribution function: Definition A r.v. X is said to be continuous if there exists a function f (x 0 defined on R such that P (X B = f (x dx for each B R. The function f is called pdf (probability density function, or just density of X. B Figure 1. Example of pdf. Here the area is P (X [a, b] Remark We have For B = [a, b], we have that P (a X B = b f (x dx. a For any a R, we have that P (X = a = a f (x = 0. a By the probability axiom: f (x dx = P ( < x < = 1. 69

70 Example Suppose that X is a continuous random variable whose probability density function is given by (1 What is the value of C? (2 Find P (X > 1. C (4x 2x 2, 0 < x < 2 f (x = 0, otherwise. Solution. (1 We set the equation 1 = f dx = C 2 (4x 0 2x2 dx = ( Hence C = 3 8. C 4 x x = C ( (2 We have P (X > 1 = f (x dx = 3 2 (4x x2 dx = 1. 2 As in the discrete case, we define Furthermore, we have that = C The cdf of a continuous distribution F (a = P (X X = a f (x dx. P (a X b = P (X b P (X < a = F (b F (a. Connection to the pdf of X. By the fundamental theorem of calculus, F (a = f (a. Summary: pdf cdf f (x 0, f (x dx = 1 P (a X b = b f (x dx a f (a = F (a F ( = 0, F ( = 1,F P (a X B = F (b = F (a F(a = a f (x dx 70

71 Expectation & Variance of continuous r.v. (5.2 The expected value of a discrete r.v. was defined as E (X = x x p X (x. In the continuous case, we the sum is replaced by an integral, and the pmf is replaced by the pdf: Definition For a r.v. X with pdf f (x, the expected value is E (X = x f (x dx. Example Find E (X, where X is a continuous random variable whose probability density function is given by Solution. We have that E (X = 0 e x 0 < x f (x = 0, x < 0. xe x = [u = x, v = e x ] = xe x 0 + e x = 1. Exercise The concentration of alcohol in your blood t hours after drinking is e t. The concentration is measured at a random time X, whose pdf is given by 1/2, 3 < x < 5 f (x = 0, otherwise. Find the pdf of Y = e X (concentration level of alcohol, and the expected concentration of alcohol in the blood. Solution. Note that F Y (y = P (Y y = P ( e X y = P ( X ln y = P (X ln y 0, ln y 5 ln y 0, ln y 5 = f (x dx = 5 1 ln y 2 ln y, 3 < ln y < 5 = 1 (5 + ln y, 3 < ln y < , ln y 3 1, ln y , y e 5 = 1 2 (ln y + 5, e 5 < y < e 3. 1, y e 3 0 Therefore, 1 f Y (y = F Y (y =, 2y e 5 < y < e 3. 0, otherwise 71

72 The expected concentration in blood is thus E (Y = e 3 e 5 y 1 2y dy = 1 y e 3 e = 1 ( e 3 e In fact, there is a way to compute the expected value of Y as a function of X, without finding f Y. Namely, we can use the following proposition: Proposition (Expectation of a function of a r.v. For any function g (x E [g (X] = g (x f (x dx. Compare to E [g (X] = x g (x p (x for a discrete r.v. X. Exercise Compute the expected concentration of alcohol in blood from the previous exercise, using the above proposition. Solution. We have that Y = e X, and hence E ( e X = e x f (x dx = 1 R e x dx = 1 2 ( e 3 e 5 Definition As in the discrete case, the variance of a continuous r.v. is E ( (X E (X 2 = E (X 2 E (X 2. Exercise Find the variance and standard deviation of Y from the previous example. Solution. We have Y 2 = e 2X and hence E ( Y 2 = e 2x f (x dx = 1 2 Therefore, The standard deviation is σ Y e 2x = 1 4 Var (Y = ( e 6 e Remark The properties we established for E (X and Var (X also hold in the continuous case. That is E (ax + b = ae (X + b, Var (ax + b = a 2 Var (X. 72

73 LECTURE Uniform distributions (5.3 Definition We say that a r.v. X has the uniform distribution on (α, β if X has pdf 1, x (α, β β α f (x = 0, otherwise Notation: X U (α, β. Loosely speaking: X is equally likely to take any value in [α, β]. Why 1? Because one must have 1 β f (x dx = dx = 1. β α β α α Exercise Suppose X has a uniform distribution on (3, 8. (1 Find P (X > 5. (2 Find P (2.5 X 7.5. Solution. The pdf of X is and hence 1 f (x =, 3 < x < 8 5 0, otherwise (1 P (X > 5 = dx = (2 P (2.5 X 7.5 = dx = The cdf of a uniform r.v. Let X U (α, β. By definition, the cdf of X is 0, x < α F (x = P (X x = x α β α, α < x < β. 1, x > β 73

74 In particular, if X U (0, 1 then 0, x 0 F (x = x, 0 < x < 1. 1, x 1 Example Let X U (0, 1. Then E (X = 1 0 and hence E ( X 2 = R x 2 f (x dx = x 1dx = x2 2 1 Var (X = E ( X 2 (E (X 2 = E ( X 2 Exercise Let X U (α, β. Find E (X and Var (X. Solution. Two ways: Method 1: Let Y = X α β α and 1 2 = E (Y = E 1 12 = Var (Y = Var. Then Y U (0, 1, and hence ( X α = E (X α β α β α ( X α = β α Method 2: Direct computation. We have E (X = β α 0 1 = 1, 2 0 x 2 dx = 1 3, ( 2 1 = = E (X = β α 2 + α = α + β, 2 1 (β α2 2 Var (X = Var (X =. (β α 12 x β α dx = 1 (β α β2 α 2 2 = α + β, 2 74

75 and hence E ( X 2 β = α x 2 β α dx = β3 α 3 3 (β α = β2 + αβ + α 2, 3 ( Var (X = β2 + αβ + α 2 2 α + β =... = 3 2 (β α2. 12 Exercise (Bus stop problem Busses arrive at the bus stop every 30 minutes, at 1 : 00, 1 : 30, 2:00, etc. Matt arrives at the bus stop at a random time, uniformly distributed between 2 and 3. What is the distribution of his waiting time? Expected wait? standard deviation? Solution. Denote Matt s arrival by X U (2, 3, and his waiting time by W. Since X is uniform, we have that 1, 2 < x < 3 f X (x = 0, otherwise. In particular, note that for any 2 < a < b < 3. P (a < X < b = b 1 dx = b a. a Next, we find the cdf of Matt s waiting time W : if t 1 we clearly have that F 2 W (t = P (W t = 1 (Matt will wait at most a 1 hour for a bus. If t < 1 then 2 2 ( F W (t = P (W t = P t X 21 + P (3 t X 3 = 2 = [2.5 (2.5 t] + [3 (3 t] = 2t. Therefore, 1 1, < t 2 F W (t = 2t, 0 < t 1 2 0, t 0. which means that W U ( 0, 1 2. Thus E (W = 1 (1/22, Var (W = 4 12 So, the typical waiting time is 15 ± 7 minutes. = σ W = 1/ min. Geometric Probability. The following is a multidimensional generalization of U (α, β. Given a set S R 2 (or R 3, or R n, the probability that a uniformly distributed random point in S belongs to a subset A S is P (X A = Area (A Area (S ( Vol (A Vol (S. 75

76 Exercise A certain city has the shape of a disk with radius R. A family builds their home at a random point in the city, uniformly distributed. (1 What is the probability that the distance of their home to the city center is less than r miles? (2 Find the expected distance from their home to the city center, and its variance. Solution. Let X be the distance from their home to the city center. Then This means that the pdf of X is ( P (X r = πr2 r 2 πr =. 2 R 2r 0 < r < R R f (r = 2 0 otherwise. Therefore and Var (X = E (X = R 0 R 0 r 2r R 2 dr = 2 3 R, r 2 2r ( 2 2 R dr 2 3 R = R

77 LECTURE 21 The normal distribution (5.4 The standard normal distribution. Definition A r.v. has the standard normal distribution if the pdf of X os Notation: X N (0, 1. f (x = 1 2π e x2 /2, < x <. Remark Gauss used normal distributions to model his observations in astronomy. Therefore, normal distributions are often referred to as Gaussian distributions. Why 1 2π? As usual, since: Proposition f (x above is indeed a pdf, i.e., Proof. Need to show that Trick: I 2 = I := f (x dx = 1. e x2 /2 dx = 2π. ( ( /2 e x2 dx /2 e x2 dx = e x2 +y 2 R 2 2 dx. We now pass to polar coordinates: x = r cos θ,y = r sin θ dxdy = rdrdθ, so, [ I 2 = 0 ] u = r 2 /2 = 2π du = rdr 2π e r2 /2 rdrdθ = 2π e u du = 2π e u 0 = 2π. re r2 /2 dr 77

78 Exercise Let X N (0, 1. Show that E (X = 0, and Var (X = 1. Proof. By definition Moreover, Var (X = E ( X 2 = 1 2π [ E (X = 1 2π u = x, v = xe x2 /2 u = 1, v = xe x2 /2 = e x2 /2 1 (0 + 2π = 1. 2π x 2 e x2 /2 dx = xe x2 /2 odd function dx = 0. ] = 1 ( xe x2 /2 + 2π e x2 /2 dx = The cdf of N (0, 1. The cdf of a standard r.v. is denoted by φ (a. We have φ (a = P (X a = 1 2π a e x2 /2 dx, a R. The function φ (x cannot be expressed in terms of elementary functions, such as sin x, cos x, e x, ln x, x n, etc. It is a special function, tabulated on p.201 of Ross. Remark Bt the symmetry of the density of X N (0, 1, one has φ ( x = 1 φ (x. Figure 1. By symmetry of the density of N (0, 1,φ ( z = 1 φ (z Example Let X N (0, 1.Find P ( X 2. Solution. We have P( 2 X 2 = φ (2 φ ( 2 = 2φ (2 1 = In the this example, you see how high the probability of a small interval around the origin is. This is a demonstration of the fast tail decay of f (x. 78

79 General normal distributions. Let Z N (01, and consider the r.v. X = µ + σz, where µ R, and σ > 0. Then, the cdf of X is : F (x = P (X x = P (µ + σz x = P By the chain rule, the pdf of X is: f (x = F (x = φ ( x µ σ The expected value and variance are: This leads us to the following definition: ( Z x µ = φ σ 1 σ = 1 σ E (X = E (µ + σz = µ + σe (Z =0 (x µ 2π 2 e = µ Var (X = Var (µ + σz = σ 2 Var (Z = σ 2. 2σ 2. ( x µ Definition A r.v. X has normal distribution with parameters µ, σ if the pdf X has pdf: Notation: X N (µ, σ 2. We have X µ σ f (x = 1 2π e σ (t µ 2 2σ 2. N (0, 1, and F X (x = P (X x = P ( X µ σ x µ σ σ ( = φ x µ σ. Meaning of the parameters: µ = E (X is the mean value of X, and σ = σ X is the standard deviation of X. Different values of µ correspond to horizontal shifts of the density function, and different values of σ make the density s graph narrower/broader (see figure below.. 79

Probabilistic models

Probabilistic models Kolmogorov (Andrei Nikolaevich, 1903 1987) put forward an axiomatic system for probability theory. Foundations of the Calculus of Probabilities, published in 1933, immediately became the definitive formulation