Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 16 Notes. Class URL:

Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 16 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/

Notes slides from before lecture Notes March 8 (1) This week: Days 21, 22, 23 of posted slides Probability distributions, conditional probability, expected value, Bayes theorem HW7 is due tonight. HW8 will be due next Wednesday. MT2 3-day (= 72-hour) window for regrade requests ends tomorrow Please ask questions about MT2, grading in OHs. Discussion section this week will spend some time going over MT2. Poll for Final exam reviews: 3 top choices = F 5-7, Sa 1-3, Su 1-3 second round of polling to choose between F 5-7, Su 1-3 please vote! Any logistic, other issues?

Expected Value Examples Rosen p. 460,478 The expectation (average, expected value) of random variable X on sample space S is Calculate the expected sum of two 6-sided dice. A. 6 B. 7 C. 8 D. 9 E. None of the above.

Properties of Expectation Rosen p. 460,478 E(X) may not be an actually possible value of X. But m <= E(X) <= M, where m is the minimum possible value of X and M is the maximum possible value of X.

Useful trick 1: Case analysis Rosen p. 460,478 The expectation can be computed by conditioning on an event and its complement Theorem: For any random variable X and event A, E(X) = P(A) E(X A) + P( A c ) E ( X A c ) where A c is the complement of A. Conditional Expectation

Useful trick 1: Case analysis Example: If X is the number of pairs of consecutive Hs when we flip a fair coin three times, what is the expectation of X? e.g. X(HHT) = 1 X(HHH) = 2.

Useful trick 1: Case analysis Example: If X is the number of pairs of consecutive Hs when we flip a fair coin three times, what is the expectation of X? Solution: Directly from definition For each of eight possible outcomes, find probability and value of X: HHH (P(HHH)=1/8, X(HHH) = 2), HHT, HTH, HTT, THH, THT, TTH, TTTetc.

Useful trick 1: Case analysis Example: If X is the number of pairs of consecutive Hs when we flip a fair coin three times, what is the expectation of X? Solution: Using conditional expectation Let A be the event "The middle flip is H". Which subset of S is A? A. { HHH } B. { THT } C. { HHT, THH} D. { HHH, HHT, THH, THT} E. None of the above.

Useful trick 1: Case analysis Example: If X is the number of pairs of consecutive Hs when we flip a fair coin three times, what is the expectation of X? Solution: Using conditional expectation Let A be the event "The middle flip is H". P(A) = 1/2, P(A c ) = 1/2 E(X) = P(A) E(X A) + P( A c ) E ( X A c ) E( X A c ) : If middle flip isn't H, there can't be any pairs of consecutive Hs

Useful trick 1: Case analysis Example: If X is the number of pairs of consecutive Hs when we flip a fair coin three times, what is the expectation of X? Solution: Using conditional expectation Let A be the event "The middle flip is H". P(A) = 1/2, P(A c ) = 1/2 E(X) = P(A) E(X A) + P( A c ) E ( X A c ) E( X A c ) : If middle flip isn't H, there can't be any pairs of consecutive Hs E( X A ) : If middle flip is H, # pairs of consecutive Hs = # Hs in first & last flips

Useful trick 1: Case analysis Example: If X is the number of pairs of consecutive Hs when we flip a fair coin three times, what is the expectation of X? Solution: Using conditional expectation Let A be the event "The middle flip is H". P(A) = 1/2, P(A c ) = 1/2 E( X A c ) = 0 E( X A ) = ¼ * 0 + ½ * 1 + ¼ * 2 = 1 E(X) = P(A) E(X A) + P( A c ) E ( X A c )

Useful trick 1: Case analysis Example: If X is the number of pairs of consecutive Hs when we flip a fair coin three times, what is the expectation of X? Solution: Using conditional expectation Let A be the event "The middle flip is H". P(A) = 1/2, P(A c ) = 1/2 E(X) = P(A) E(X A) + P( A c ) E ( X A c ) = ½ ( 1 ) + ½ ( 0 ) = 1/2 E( X A c ) = 0 E( X A ) = ¼ * 0 + ½ * 1 + ¼ * 2 = 1

Useful trick 1: Case analysis Examples: Ending condition Each time I play solitaire I have a probability p of winning. I play until I win a game. Each time a child is born, it has probability p of being left-handed. I keep having children until I have a left-handed one. Let X be the number of games [equivalently, #children] until ending condition is met. What's E(X)? A. 1. B. Some big number that depends on p. C. 1/p. D. None of the above.

Useful trick 1: Case analysis Ending condition Let X be the number of games number of kids until ending condition is met. Solution: Directly from definition Need to compute the sum of all possible P(X = i) i. P(X = i) = Probability that don't stop the first i-1 times and do stop at the i th time = (1-p) i-1 p

Useful trick 1: Case analysis Ending condition Let X be the number of games number of kids until ending condition is met. Solution: Using conditional expectation E(X) = P(A) E(X A) + P( A c ) E ( X A c )

Useful trick 1: Case analysis Ending condition Let X be the number of games number of kids until ending condition is met. Solution: Using conditional expectation Let A be the event "success at first try". E(X) = P(A) E(X A) + P( A c ) E ( X A c ) P(A) = p P(A c ) = 1-p

Useful trick 2: Linearity of expectation Theorem: If X i are random variables on S and if a and b are real numbers then E(X 1 + +X n ) = E(X 1 ) + + E(X n ) and E(aX+b) = ae(x) + b. Rosen p. 477-484 The expectation of the sum is the sum of the expectations The random variables X i can be dependent on each other or otherwise interrelated the Theorem still holds!

Useful trick 2: Linearity of expectation Example: Expected number of consecutive heads when we flip a fair coin n times? A. 1. B. 2. C. n. D. None of the above. E.??

Useful trick 2: Linearity of expectation Example: Expected number of consecutive heads when we flip a fair coin n times? Solution: Define X i = 1 if both the i th and (i+1) st flips are H; X i = 0 otherwise. Looking for E(X) where. For each i, what is E(X i )? A. 0. B. ¼. C. ½. D. 1. E. It depends on the value of i.

Useful trick 2: Linearity of expectation Example: Expected number of consecutive heads when we flip a fair coin n times? Solution: Define X i = 1 if both the i th and (i+1) st flips are H; X i = 0 otherwise. Looking for E(X) where. Indicator variables: 1 if pattern occurs, 0 otherwise

Example: Thank-You Notes A child has just had a birthday party. He has written 20 thank-you notes and addressed 20 corresponding envelopes. But then he gets distracted, and puts the letters randomly into the envelopes (one letter per envelope). What is the expected number of letters that are in their correct envelopes? Define r.v. X = #letters in correct envelopes? We want E(X) = i=0,,20 i P(X = i) How about: r.v. s X j = 1 if letter j is in envelope j, and X j = 0 otherwise? We want E( j=1,,20 X j )

Example: Baseball Cards A bubble gum company puts one random major league baseball player s card into every pack of its gum. There are N major leaguers in the league this year. If X is the number of packs of gum that a collector must buy up until she has at least one of each player s card, what is E(X)?

Other functions? Expectation does not in general commute with other functions. Rosen p. 460,478 E ( f(x) ) f ( E (X) ) For example, let X be random variable with P(X = 0) = ½, P(X =1) = ½ What's E(X)? What's E(X 2 )? What's ( E(X) ) 2?

One More Example: Hashing Definition: A hash function maps data of variable length to data of a fixed length. The values returned by a hash function are called hash values or simply hashes. Open Hashing: Each item hashes to a particular location in an array, and locations can hold more than one item. Multiple input data values can have the same hash value. These are called collisions!

Number of Items Per Location Suppose: n items, hash to table of size k What is expected number of items per location? n independent trials with outcome = location in table (i.e., hash value) Let r.v. X = # inputs that hash to location 1 Let s write X as the sum of X i s : X i = 1 if the i th input hashes to location 1 X i = 0 otherwise X = X 1 + X 2 + + X n E(X) = E(X 1 ) + E(X 2 ) +.. + E(X n ) = n 1/k = n/k

Number of Empty Locations What is expected number of empty locations? Define r.v. Y = # empty locations. We want E(Y) Example: 20 items into table of size 5: E(Y) =? Example: 20 items into table of size 10: E(Y) =? Apply linearity of expectation! (worked out at end of slide deck )

Independence, Concentration, Bayes Theorem CSE21 Winter 2017, Day 23 (B00), Day 15 (A00) March 10, 2017 http://vlsicad.ucsd.edu/courses/cse21-w17

Independent Events Two events E and F are independent iff P( E F ) = P(E) P(F). U Rosen p. 457 Problem: Suppose E is the event that a randomly generated bitstring of length 4 starts with a 1 F is the event that this bitstring contains an even number of 1s. Are E and F independent if all bitstrings of length 4 are equally likely? Are they disjoint? First impressions? A. E and F are independent and disjoint. B. E and F are independent but not disjoint. C. E and F are disjoint but not independent. D. E and F are neither disjoint nor independent.

Independent Random Variables Rosen p. 485 Let X and Y be random variables over the same sample space. X and Y are called independent random variables if, for all possible values of v and u, P ( X = v and Y = u ) = P ( X = v) P(Y = u) Which of the following pairs of random variables on sample space of sequences of H/T when coin is flipped four times are independent? A. X 12 = # of H in first two flips, X 34 = # of H in last two flips. B. X = # of H in the sequence, Y = # of T in the sequence. C. X 12 = # of H in first two flips, X = # of H in the sequence. D. None of the above.

Independence Theorem: If X and Y are independent random variables over the same sample space, then E ( X Y ) = E( X ) E( Y ) Rosen p. 486 Note: This is not necessarily true if the random variables are not independent!

Concentration - Definitions How close (on average) will we be to the average / expected value? Let X be a random variable with E(X) = E. The unexpectedness of X is the random variable U = X-E The average unexpectedness of X is AU(X) = E ( X-E ) = E( U ) The variance of X is V(X) = E( X E 2 ) = E ( U 2 ) The standard deviation of X is σ(x) = ( E( X E 2 ) ) 1/2 = V(X) 1/2 Rosen Section 7.4

Concentration How close (on average) will we be to the average / expected value? Let X be a random variable with E(X) = E. The unexpectedness of X is the random variable U = X-E The average unexpectedness of X is AU(X) = E ( X-E ) = E( U ) The variance of X is V(X) = E( X E 2 ) = E ( U 2 ) The standard deviation of X is σ(x) = ( E( X E 2 ) ) 1/2 = V(X) 1/2 Weight all differences from mean equally Weight large differences from mean more

Concentration How close (on average) will we be to the average / expected value? Let X be a random variable with E(X) = E. The unexpectedness of X is the random variable U = X-E The average unexpectedness of X is AU(X) = E ( X-E ) = E( U ) The variance of X is V(X) = E( X E 2 ) = E ( U 2 ) The standard deviation of X is σ(x) = ( E( X E 2 ) ) 1/2 = V(X) 1/2 Does the average unexpectedness equal the standard deviation? A. Yes B. No C.??

Concentration How close (on average) will we be to the average / expected value? Let X be a random variable with E(X) = E. The variance of X is V(X) = E( X E 2 ) = E ( U 2 ) Example: X 1 is a random variable with distribution P( X 1 = -2 ) = 1/5, P( X 1 = -1 ) = 1/5, P( X 1 = 0 ) = 1/5, P( X 1 = 1 ) = 1/5, P( X 1 = 2 ) = 1/5. X 2 is a random variable with distribution P( X 2 = -2 ) = 1/2, P( X 2 = 2 ) = 1/2. Which is true? A. E(X 1 ) E(X 2 ) B. V(X 1 ) < V(X 2 ) C. V(X 1 ) > V(X 2 ) D. V(X 1 ) = V(X 2 )

Concentration How close (on average) will we be to the average / expected value? Let X be a random variable with E(X) = E. The variance of X is V(X) = E( X E 2 ) = E ( U 2 ) Theorem: V(X) = E(X 2 ) ( E(X) ) 2 Proof: V(X) = E( (X-E) 2 ) = E( X 2 2XE + E 2 ) = E(X 2 ) 2E E (X) + E 2 = E(X 2 ) 2E 2 + E 2 Linearity of expectation = E(X 2 ) ( E(X) ) 2

Concentration How close (on average) will we be to the average / expected value? Let X be a random variable with E(X) = E. The variance of X is V(X) = E( X E 2 ) = E ( U 2 ) Theorem: V(X) = E(X 2 ) ( E(X) ) 2 What does it mean for the variance of X to be 0? A. The value of the random variable X must always be 0. B. The value of the random variable must be constant. C. E(X 2 ) = E(X) D. E(X 2 ) = E(U 2 ) E. E(U) = 0

Recall: Conditional probabilities Probability of an event may change if have additional information about outcomes. Suppose E and F are events, and P(F) > 0. Then, i.e., Rosen p. 456

Conditional Probability Bayes Theorem Probability of E and F, i.e., P(E F) P(E F) = P(F E) P(E) probability of F given E, times probability of E P(E F) = P(E F) P(F) (symmetrical observation) Rosen p. 470, Theorem 1 P(F E) P(E) = P(E F) P(F) P(F E) = P(E F) P(F) / P(E) Bayes Theorem

Bayes' Theorem Rosen Section 7.3 Based on previous knowledge about how probabilities of two events relate to one another, how does knowing that one event occurred impact the probability that the other event occurred?

Bayes' Theorem: Example 1 Rosen Section 7.3 A manufacturer claims that its drug test will detect steroid use 95% of the time. What the company does not tell you is that 15% of all steroid-free individuals also test positive (the false positive rate). 10% of the Tour de France bike racers use steroids. Your favorite cyclist just tested positive. What s the probability that he used steroids? Your first guess? A. Close to 95% B. Close to 85% C. Close to 50% D. Close to 15% E. Close to 10%

Bayes' Theorem: Example 1 Rosen Section 7.3 A manufacturer claims that its drug test will detect steroid use 95% of the time. What the company does not tell you is that 15% of all steroid-free individuals also test positive (the false positive rate). 10% of the Tour de France bike racers use steroids. Your favorite cyclist just tested positive. What s the probability that he used steroids? Define events: we want P ( used steroids tested positive)

Bayes' Theorem: Example 1 Rosen Section 7.3 A manufacturer claims that its drug test will detect steroid use 95% of the time. What the company does not tell you is that 15% of all steroid-free individuals also test positive (the false positive rate). 10% of the Tour de France bike racers use steroids. Your favorite cyclist just tested positive. What s the probability that he used steroids? Define events: we want P ( used steroids tested positive) so let E = Tested positive F = Used steroids

Bayes' Theorem: Example 1 Rosen Section 7.3 A manufacturer claims that its drug test will detect steroid use 95% of the time. What the company does not tell you is that 15% of all steroid-free individuals also test positive (the false positive rate). 10% of the Tour de France bike racers use steroids. Your favorite cyclist just tested positive. What s the probability that he used steroids? Define events: we want P ( used steroids tested positive) E = Tested positive P( E F ) = 0.95 F = Used steroids

Bayes' Theorem: Example 1 Rosen Section 7.3 A manufacturer claims that its drug test will detect steroid use 95% of the time. What the company does not tell you is that 15% of all steroid-free individuals also test positive (the false positive rate). 10% of the Tour de France bike racers use steroids. Your favorite cyclist just tested positive. What s the probability that he used steroids? Define events: we want P ( used steroids tested positive) E = Tested positive P( E F ) = 0.95 F = Used steroids P(F) = 0.1 P( ) = 0.9

Bayes' Theorem: Example 1 Rosen Section 7.3 A manufacturer claims that its drug test will detect steroid use 95% of the time. What the company does not tell you is that 15% of all steroid-free individuals also test positive (the false positive rate). 10% of the Tour de France bike racers use steroids. Your favorite cyclist just tested positive. What s the probability that he used steroids? Define events: we want P ( used steroids tested positive) E = Tested positive P( E F ) = 0.95 P( E ) = 0.15 F = Used steroids P(F) = 0.1 P( ) = 0.9

Bayes' Theorem: Example 1 Rosen Section 7.3 A manufacturer claims that its drug test will detect steroid use 95% of the time. What the company does not tell you is that 15% of all steroid-free individuals also test positive (the false positive rate). 10% of the Tour de France bike racers use steroids. Your favorite cyclist just tested positive. What s the probability that he used steroids? Define events: we want P ( used steroids tested positive) E = Tested positive P( E F ) = 0.95 P( E ) = 0.15 F = Used steroids P(F) = 0.1 P( ) = 0.9 Plug in: 41%

Bayes' Theorem: Example 2 Rosen Section 7.3 Suppose we have found that the word Rolex occurs in 250 of 2000 messages known to be spam and in 5 out of 1000 messages known not to be spam. Estimate the probability that an incoming message containing the word Rolex is spam, assuming that it is equally likely that an incoming message is spam or not spam. Your first guess? A. Close to 95% B. Close to 85% C. Close to 50% D. Close to 15% E. Close to 10%

Bayes' Theorem: Example 2 Rosen Section 7.3 Suppose we have found that the word Rolex occurs in 250 of 2000 messages known to be spam and in 5 out of 1000 messages known not to be spam. Estimate the probability that an incoming message containing the word Rolex is spam, assuming that it is equally likely that an incoming message is spam or not spam. We want: P( spam contains "Rolex" ). So define the events E = contains "Rolex" F = spam

Bayes' Theorem: Example 2 Rosen Section 7.3 Suppose we have found that the word Rolex occurs in 250 of 2000 messages known to be spam and in 5 out of 1000 messages known not to be spam. Estimate the probability that an incoming message containing the word Rolex is spam, assuming that it is equally likely that an incoming message is spam or not spam. We want: P( spam contains "Rolex" ). So define the events E = contains "Rolex" F = spam What is P(E F)? A. 0.005 B. 0.125 C. 0.5 D. Not enough info

Bayes' Theorem: Example 2 Rosen Section 7.3 Suppose we have found that the word Rolex occurs in 250 of 2000 messages known to be spam and in 5 out of 1000 messages known not to be spam. Estimate the probability that an incoming message containing the word Rolex is spam, assuming that it is equally likely that an incoming message is spam or not spam. We want: P( spam contains "Rolex" ). Training set: establish probabilities E = contains "Rolex" P( E F) = 250/2000 = 0.125 P( E ) = 5/1000 = 0.005 F = spam

Bayes' Theorem: Example 2 Rosen Section 7.3 Suppose we have found that the word Rolex occurs in 250 of 2000 messages known to be spam and in 5 out of 1000 messages known not to be spam. Estimate the probability that an incoming message containing the word Rolex is spam, assuming that it is equally likely that an incoming message is spam or not spam. We want: P( spam contains "Rolex" ). E = contains Rolex P( E F) = 250/2000 = 0.125 P( E ) = 5/1000 = 0.005 F = spam P( F ) = P( ) = 0.5

Bayes' Theorem: Example 2 Rosen Section 7.3 Suppose we have found that the word Rolex occurs in 250 of 2000 messages known to be spam and in 5 out of 1000 messages known not to be spam. Estimate the probability that an incoming message containing the word Rolex is spam, assuming that it is equally likely that an incoming message is spam or not spam. We want: P( spam contains "Rolex" ). E = contains "Rolex" P( E F) = 250/2000 = 0.125 P( E ) = 5/1000 = 0.005 F = spam P( F ) = P( ) = 0.5 Plug in: 96%

Example 3: Market Research An electronics company commissions a marketing report for each new product that predicts the success or failure of the product. Of the new products that have been introduced by the company, 60% have been successes. Furthermore, 70% of the successful products were predicted to be successes, while 40% of the failed products were predicted to be successes. Find the probability that a new smart phone will be successful, given that success is predicted. Define: Event A = Event B = Exercise: Should get P(A B) 0.724

Announcements Some extra slides follow OHs and 1-1 sessions Plenty available! HW #8 due Wednesday, March 15 11:59pm

Your list of important definitions/concepts =? Sample space, probability space Bernoulli trial Outcome Event Conditional probability P(A B) Random variable Distribution function of r.v. Binomial distribution Expectation Variance Use of an indicator variable Linearity of expectation (and, its application) Independence Bayes theorem / rule.. (Advice: Have such a list )

More: Independence, Conditional Probability Fact: P(B S) = P(B) See Rosen p. 486 Conditioned on everything, we don t get any new knowledge Fact: A and B are independent iff P(B A) = P(B) Three slides ago: A and B are independent iff P(A B) = P(A) P(B) Also: P(A B) = P(B A) P(A) Divide by P(A) to get Fact Independence means that conditioning on new information doesn t change probability

Linearity of Expectation: Baseball Cards A bubble gum company puts one random major league baseball player s card into every pack of its gum. There are N major leaguers in the league this year. If X is the number of packs of gum that a collector must buy up until she has at least one of each player s card, what is E(X)? Think about the card collecting in phases Phase 1: collect the first unique card Phase 2: collect the second unique card Phase k-1: collect the (k-1) st unique card Phase k: collect the k th unique card P(next pack ends this phase) = E(# packs to buy in this phase) =

Linearity of Expectation: Baseball Cards A bubble gum company puts one random major league baseball player s card into every pack of its gum. There are N major leaguers in the league this year. If X is the number of packs of gum that a collector must buy up until she has at least one of each player s card, what is E(X)? Define r.v. X k = #packs bought in Phase k E(X) = E( k=1..n X k ) = k=1..n E(X k ) linearity of expectation! = k=1..n N / (N k + 1) = N (1/1 + 1/2 + 1/3 + 1/4 + + 1/N) = N H N H N = N th harmonic number ~N log N packs

Market Research An electronics company commissions a marketing report for each new product that predicts the success or failure of the product. Of the new products that have been introduced by the company, 60% have been successes. Furthermore, 70% of the successful products were predicted to be successes, while 40% of the failed products were predicted to be successes. Find the probability that a new smart phone will be successful, given that success is predicted. Define: Event A = smartphone is a success Event B = success is predicted Exercise: Should get P(A B) 0.724

Number of Empty Locations What is expected number of empty locations? Define r.v. Y = # empty locations. We want E(Y) Example: 20 items into table of size 5: E(Y) = 5 (4/5) 20 0.06 Example: 20 items into table of size 10: E(Y) = 10 (9/10) 20 1.22 Understand why this is an application of linearity of expectation!

Random/Review Question (see next slide) A student knows 80% of the material on an exam. What is the probability that she answers a given true-false question correctly? Knows material: answers T/F correctly with P = 1 Doesn t know material: answers T/F correctly with P = 0.5

Random/Review Question A student knows 80% of the material on an exam. What is the probability that she answers a given true-false question correctly? Knows material: answers T/F correctly with P = 1 Doesn t know material: answers T/F correctly with P = 0.5 Material she knows P = 0.8 Material she doesn t know P = 0.2 Right 1.0 Wrong 0 Right 0.5 Wrong 0.5