Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr( )

Similar documents
Randomized Algorithms

1. Discrete Distributions

Random Variable. Pr(X = a) = Pr(s)

CS155: Probability and Computing: Randomized Algorithms and Probabilistic Analysis

CSE 548: Analysis of Algorithms. Lectures 18, 19, 20 & 21 ( Randomized Algorithms & High Probability Bounds )

CMPSCI 240: Reasoning Under Uncertainty

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

Conditional Probability

Lecture 4: Probability and Discrete Random Variables

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Dynamic Programming Lecture #4

Ma/CS 6b Class 15: The Probabilistic Method 2

Discrete Random Variables

Random Variables Example:

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

Properties of Probability

STAT:5100 (22S:193) Statistical Inference I

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

5. Conditional Distributions

MATH MW Elementary Probability Course Notes Part I: Models and Counting

2. AXIOMATIC PROBABILITY

Sample Spaces, Random Variables

Probability Theory and Applications

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Discrete Random Variables

Discrete Random Variables

Discrete Structures for Computer Science

Entropy. Probability and Computing. Presentation 22. Probability and Computing Presentation 22 Entropy 1/39

CS5314 Randomized Algorithms. Lecture 5: Discrete Random Variables and Expectation (Conditional Expectation, Geometric RV)

the time it takes until a radioactive substance undergoes a decay

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Formalizing Probability. Choosing the Sample Space. Probability Measures

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

MARKING A BINARY TREE PROBABILISTIC ANALYSIS OF A RANDOMIZED ALGORITHM

Chapter 1: Introduction to Probability Theory

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Probability: Terminology and Examples Class 2, Jeremy Orloff and Jonathan Bloom

CPSC 536N: Randomized Algorithms Term 2. Lecture 2

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Name: Firas Rassoul-Agha

Introductory Probability

Probability theory basics

With Question/Answer Animations. Chapter 7

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:

RVs and their probability distributions

Conditional Probability

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

CS206 Review Sheet 3 October 24, 2018

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Probability and distributions. Francesco Corona

CS280, Spring 2004: Final

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

CS 361: Probability & Statistics

Lecture 13 (Part 2): Deviation from mean: Markov s inequality, variance and its properties, Chebyshev s inequality

Northwestern University Department of Electrical Engineering and Computer Science

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, < x <. ( ) X s. Real Line

CS 125 Section #12 (More) Probability and Randomized Algorithms 11/24/14. For random numbers X which only take on nonnegative integer values, E(X) =

CSC Discrete Math I, Spring Discrete Probability

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Mathematical Foundations of Computer Science Lecture Outline October 18, 2018

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

1 INFO Sep 05

REPEATED TRIALS. p(e 1 ) p(e 2 )... p(e k )

Module 1. Probability

STAT 7032 Probability Spring Wlodek Bryc

P (A) = P (B) = P (C) = P (D) =

6.041/6.431 Spring 2009 Quiz 1 Wednesday, March 11, 7:30-9:30 PM. SOLUTIONS

Discrete Random Variable

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

CSE525: Randomized Algorithms and Probabilistic Analysis April 2, Lecture 1

Lecture 9: Conditional Probability and Independence

Senior Math Circles November 19, 2008 Probability II

Discrete Probability

Monty Hall Puzzle. Draw a tree diagram of possible choices (a possibility tree ) One for each strategy switch or no-switch

Discrete Random Variables

Bayesian Models in Machine Learning

Chapter 2 Random Variables

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

MAT 271E Probability and Statistics

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 16. Random Variables: Distribution and Expectation

Lecture 4: Sampling, Tail Inequalities

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

Topic 3 Random variables, expectation, and variance, II

University of California, Berkeley, Statistics 134: Concepts of Probability. Michael Lugo, Spring Exam 1

b. ( ) ( ) ( ) ( ) ( ) 5. Independence: Two events (A & B) are independent if one of the conditions listed below is satisfied; ( ) ( ) ( )

Randomized Algorithms. Andreas Klappenecker

Recitation 2: Probability

Lecture 5: January 30

Essentials on the Analysis of Randomized Algorithms

Statistical Inference

Transcription:

Theorem 1.7 [Bayes' Law]: Assume that,,, are mutually disjoint events in the sample space s.t.. Then Pr Pr = Pr Pr Pr() Pr Pr. We are given three coins and are told that two of the coins are fair and the third coin is biased, landing heads with probability 2/3 We permute the coins randomly, and then flip each of the coins MAT-72306 RandAl, Spring 2017 18-Jan-17 49 The first and second coins come up heads, and the third comes up tails? What is the probability that the first coin is the biased one? The coins are in a random order and so, before our observing the outcomes of the coin flips, each of the three coins is equally likely to be the biased one Let be the event that the th coin flipped is the biased one, and let be the event that the three coin flips came up heads, heads, and tails MAT-72306 RandAl, Spring 2017 18-Jan-17 50 1

Before we flip the coins Pr( ) = 1/3 for all The probability of the event conditioned on : Pr = Pr = 2 3 1 2 1 2 =1 6 and Pr = 1 2 1 2 1 3 = 1 12 Applying Bayes' law, we have Pr = 1 Pr Pr = 18 Pr Pr 1 18 + 1 18 + 1 = 2 5 36 The three coin flips increases the likelihood that the first coin is the biased one from 1/3 to 2/5 MAT-72306 RandAl, Spring 2017 18-Jan-17 51 In randomized matrix multiplication test, we want to evaluate the increase in confidence in the matrix identity obtained through repeated tests In the Bayesian approach one starts with a prior model, giving some initial value to the model parameters This model is then modified, by incorporating new observations, to obtain a posterior model that captures the new information If we have no information about the process that generated the identity then a reasonable prior assumption is that the identity is correct with probability 1/2 MAT-72306 RandAl, Spring 2017 18-Jan-17 52 2

Let be the event that the identity is correct, and let be the event that the test returns that the identity is correct We start with Pr = Pr = 1, 2and since the test has a one-sided error bounded by 1, 2 we have Pr =1and Pr 12 Applying Bayes' law yields Pr Pr Pr = Pr Pr + Pr Pr 12 1 2 +12 1 2 = 2 3 MAT-72306 RandAl, Spring 2017 18-Jan-17 53 Assume now that we run the randomized test again and it again returns that the identity is correct After the first test, we may have revised our prior model, so that we believe Pr( 2/3 and Pr( 1/3 Now let be the event that the new test returns that the identity is correct; since the tests are independent, as before we have Pr =1and Pr 1/2 MAT-72306 RandAl, Spring 2017 18-Jan-17 54 3

Applying Bayes' law then yields 23 Pr = 4 2 3 +13 12 5 In general: If our prior model (before running the test) is that Pr( 2 2 +1 and if the test returns that the identity is correct (event ), then Pr = 2 2 +1 =1 1 2 +1 Thus, if all 100 calls to the matrix identity test return that it is correct, our confidence in the correctness of this identity is 1/(2 + 1) MAT-72306 RandAl, Spring 2017 18-Jan-17 55 1.4. A Randomized Min-Cut Algorithm A cut-set in a graph is a set of edges whose removal breaks the graph into two or more connected components Given a graph =(,)with vertices, the minimum cut or min-cut problem is to find a minimum cardinality cut-set in Minimum cut problems arise in many contexts, including the study of network reliability MAT-72306 RandAl, Spring 2017 18-Jan-17 56 4

Minimum cuts also arise in clustering problems For example, if nodes represent Web pages (or any documents in a hypertext-based system) and two nodes have an edge between them if the corresponding nodes have a hyperlink between them, then small cuts divide the graph into clusters of documents with few links between clusters Documents in different clusters are likely to be unrelated MAT-72306 RandAl, Spring 2017 18-Jan-17 57 The main operation is edge contraction In contracting an edge {, }we merge vertices and into one, eliminate all edges connecting and, and retain all other edges in the graph The new graph may have parallel edges but no self-loops The algorithm consists of 2iterations Each iteration picks an edge from the existing edges in the graph and contracts that edge Our randomized algorithm chooses the edge uniformly at random from the remaining edges MAT-72306 RandAl, Spring 2017 18-Jan-17 58 5

Each iteration reduces # of vertices by one After 2iterations, there are two vertices The algorithm outputs the set of edges connecting the two remaining vertices Any cut-set in an intermediate iteration of the algorithm is also a cut-set of the original graph Not every cut-set of the original graph is one in an intermediate iteration, since some edges may have been contracted in previous iterations As a result, the output of the algorithm is always a cut-set of the original graph but not necessarily the minimum cardinality cut-set MAT-72306 RandAl, Spring 2017 18-Jan-17 59 MAT-72306 RandAl, Spring 2017 18-Jan-17 60 6

Theorem 1.8: The algorithm outputs a min-cut set with probability at least 2 (( 1) ). Proof: Let be the size of the min-cut set of. The graph may have several cut-sets of minimum size. We compute the probability of finding one specific such set. Since is a cut-set in the graph, removal of the set partitions the set of vertices into two sets, and, such that there are no edges connecting vertices in to those in. MAT-72306 RandAl, Spring 2017 18-Jan-17 61 Assume that, throughout an execution of the algorithm, we contract only edges that connect two vertices in or two in, but not edges in. In that case, all the edges eliminated throughout the execution will be edges connecting vertices in or vertices in, and after 2iterations the algorithm returns a graph with two vertices connected by the edges in. We may conclude that, if the algorithm never chooses an edge of in its 2iterations, then the algorithm returns as the minimum cut-set. MAT-72306 RandAl, Spring 2017 18-Jan-17 62 7

If the size of the cut is small, then the probability that the algorithm chooses an edge of is small at least when the number of edges remaining is large compared to. Let be the event that the edge contracted in iteration is not in, and let = be the event that no edge of was contracted in the first iterations. We need to compute Pr. Start by computing Pr = Pr. Since the minimum cut-set has edges, all vertices in the graph must have degree or larger. If each vertex is adjacent to at least edges, then the graph must have at least 2 edges. MAT-72306 RandAl, Spring 2017 18-Jan-17 63 Since there are at least /2 edges in the graph and since has edges, the probability that we do not choose an edge of in the first iteration is Pr = Pr =1 =12 Suppose that the first contraction did not eliminate an edge of. I.e., we condition on the event. Then, after the first iteration, we are left with an ( 1)-node graph with minimum cut-set of size. Again, the degree of each vertex in the graph must be at least, and the graph must have at least ( 1)/2 edges. MAT-72306 RandAl, Spring 2017 18-Jan-17 64 8

Pr 1 1 2 1 Similarly, Pr +1 2 2 +1 To compute Pr, we use Pr = Pr = Pr Pr = Pr Pr Pr Pr 2 1 = +1 +1 = 2 3 1 2 1 4 3 = 2 1. MAT-72306 RandAl, Spring 2017 18-Jan-17 65 2. Discrete Random Variables and Expectation Random Variables and Expectation The Bernoulli and Binomial Random Variables Conditional Expectation The Geometric Distribution The Expected Run-Time of Quicksort 9

In tossing two dice we are often interested in the sum of the dice rather than their separate values The sample space in tossing two dice consists of 36 events of equal probability, given by the ordered pairs of numbers {(1,1), (1,2),, (6, 6)} If the quantity we are interested in is the sum of the two dice, then we are interested in 11 events (of unequal probability) Any such function from the sample space to the real numbers is called a random variable MAT-72306 RandAl, Spring 2017 18-Jan-17 67 2.1. Random Variables and Expectation Definition 2.1: A random variable (RV) on a sample space is a real-valued function on ; that is,. A discrete random variable is a RV that takes on only a finite or countably infinite number of values For a discrete RV and a real value, the event " = " includes all the basic events of the sample space in which assumes the value I.e., " = " represents the set () =} MAT-72306 RandAl, Spring 2017 18-Jan-17 68 10

We denote the probability of that event by Pr = = Pr, () If is the RV representing the sum of the two dice, the event =4corresponds to the set of basic events {(1, 3), (2,2), (3, 1)} Hence, Pr = 4 = 3 36 = 1 12 MAT-72306 RandAl, Spring 2017 18-Jan-17 69 Definition 2.2: Two RVs and are independent if and only if Pr(( = (=)) = Pr( = ) Pr( = ) for all values and. Similarly, RVs,,, are mutually independent if and only if, for any subset [1, ] and any values,, Pr ( = ) = Pr = MAT-72306 RandAl, Spring 2017 18-Jan-17 70 11

Definition 2.3: The expectation of a discrete RV, denoted by E[], is given by = Pr =, where the summation is over all values in the range of. The expectation is finite if Pr( = ), converges; otherwise, it is unbounded. E.g., the expectation of the RV representing the sum of two dice is = 1 36 2+ 2 36 3+ 3 36 4++ 1 12=7 36 MAT-72306 RandAl, Spring 2017 18-Jan-17 71 As an example of where the expectation of a discrete RV is unbounded, consider a RV that takes on the value 2 with probability 12 for = 1,2, The expected value of is [] = 1 2 2 = 1 expresses that [] is unbounded MAT-72306 RandAl, Spring 2017 18-Jan-17 72 12

2.1.1. Linearity of Expectations By this property, the expectation of the sum of RVs is equal to the sum of their expectations Theorem 2.1 [Linearity of Expectations]: For any finite collection of discrete RVs,,, with finite expectations, = MAT-72306 RandAl, Spring 2017 18-Jan-17 73 Proof: We prove the statement for two random variables and (general case by induction). The summations that follow are understood to be over the ranges of the corresponding RVs: + = + Pr ( = (=) =Pr ( = (=) +Pr ( = (=) = Pr ( = (=) + Pr ( = (=) =Pr = + Pr = = []+[] The first equality follows from Definition 1.2. In the penultimate equation uses Theorem 1.6, the law of total probability. MAT-72306 RandAl, Spring 2017 18-Jan-17 74 13

Let us now compute the expected sum of two standard dice Let = +, where represents the outcome of die for = 1,2 Then = 1 6 = 7 2 Applying the linearity of expectations, we have = + =7 MAT-72306 RandAl, Spring 2017 18-Jan-17 75 Linearity of expectations holds for any collection of RVs, even if they are not independent Consider, e.g., the previous example and let the random variable = 1 + 1 2 We have = 1 + 2 1 = 1 + [ 2 1 ] even though 1 and 1 2 are clearly dependent Verify the identity by considering the six possible outcomes for 1 MAT-72306 RandAl, Spring 2017 18-Jan-17 76 14

Lemma 2.2: For any constant and discrete RV, = []. Proof: The lemma is obvious for =0. For 0, [] =Pr = = / Pr = / = Pr = =. MAT-72306 RandAl, Spring 2017 18-Jan-17 77 2.1.2. Jensen's Inequality Let us choose the length of a side of a square uniformly at random from the range [1,99] What is the expected value of the area? We can write this as [ ] It is tempting to think of this as being equal to, but a simple calculation shows that this is not correct In fact, = = 50 = 2500 whereas = 99503 3317 > 2500 MAT-72306 RandAl, Spring 2017 18-Jan-17 78 15

More generally, [ ] ( ) Consider =() The RV is nonnegative and hence its expectation must also be nonnegative []=[( []) ] = [ + ] = [ 2[ ]+( ) =[ ( ) To obtain the penultimate line, use the linearity of expectations To obtain the last line use Lemma 2.2 to simplify [[]] = [] [] MAT-72306 RandAl, Spring 2017 18-Jan-17 79 The fact that [ ] ( ) is an example of Jensen's inequality Jensen's inequality shows that, for any convex function, we have [()]([]) Definition 2.4: A function : is said to be convex if, for any, and 1, + +(1)( ) Lemma 2.3: If is twice differentiable function, then is convex if and only if "() 0 MAT-72306 RandAl, Spring 2017 18-Jan-17 80 16

MAT-72306 RandAl, Spring 2017 18-Jan-17 81 Theorem 2.4 [Jensen's Inequality]: If is a convex function, then [()]([]). Proof: We prove the theorem assuming that has a Taylor expansion. Let = []. By Taylor's theorem there is a value such that = + (x)+ () 2 + (x) since () >0by convexity. Taking expectations and applying linearity of and Lemma 2.2 yields: [ [ + ] =[ ]+()([ ) =()=([]). MAT-72306 RandAl, Spring 2017 18-Jan-17 82 17

2.2. The Bernoulli and Binomial Random Variables We run an experiment that succeeds with probability and fails with probability Let be a RV such that = 1iftheexperimentsucceeds, 0otherwise The variable is called a Bernoulli or an indicator random variable Note that, for a Bernoulli RV, [] = 1 + (1 ) 0 = = Pr( = 1) MAT-72306 RandAl, Spring 2017 18-Jan-17 83 If we, e.g., flip a fair coin and consider heads a success, then the expected value of the corresponding indicator RV is 1/2 Consider a sequence of independent coin flips What is the distribution of the number of heads in the entire sequence? More generally, consider a sequence of independent experiments, each of which succeeds with probability If we let represent the number of successes in the experiments, then has a binomial distribution MAT-72306 RandAl, Spring 2017 18-Jan-17 84 18

Definition 2.5: A binomial RV with parameters and, denoted by (, ), is defined by the following probability distribution on = 0,1,2,, : Pr = = I.e., the binomial RV (BRV) equals when there are exactly successes and failures in independent experiments, each of which is successful with probability Definition 2.5 ensures that the BRV is a valid probability function (Definition 1.2): Pr = = 1 MAT-72306 RandAl, Spring 2017 18-Jan-17 85 We want to gather data about the packets going through a router We want to know the approximate fraction of packets from a certain source / of a certain type We store a random subset or sample of the packets for later analysis Each packet is stored with probability and packets go through the router each day, the number of sampled packets each day is a BRV with parameters and To know how much memory is necessary for such a sample, determine the expectation of MAT-72306 RandAl, Spring 2017 18-Jan-17 86 19

If is a BRV with parameters and, then is the number of successes in trials, where each trial is successful with probability Define a set of indicator RVs,,, where =1if the th trial is successful and 0 otherwise Clearly, [ ]=and = and so, by the linearity of expectations, = = = MAT-72306 RandAl, Spring 2017 18-Jan-17 87 2.3. Conditional Expectation Definition 2.6: [ = ] =Pr = = where the summation is over all in the range of The conditional expectation of a RV is, like, a weighted sum of the values it assumes Now each value is weighted by the conditional probability that the variable assumes that value MAT-72306 RandAl, Spring 2017 18-Jan-17 88, 20

Suppose that we independently roll two standard six-sided dice Let be the number that shows on the first die, the number on the second die, and the sum of the numbers on the two dice Then = 2 = Pr = =2 = 1 = 11 6 2 MAT-72306 RandAl, Spring 2017 18-Jan-17 89 As another example, consider =5: = 5 = Pr = =5 = Pr = =5 Pr =5 = 136 = 5 436 2 MAT-72306 RandAl, Spring 2017 18-Jan-17 90 21

Lemma 2.5: For any RVs and, [] = Pr = [ = ], where the sum is over all values in the range of and all of the expectations exist. Proof: Pr = = Pr = = = Pr = = Pr = = Pr = = Pr = = = Pr = = [] MAT-72306 RandAl, Spring 2017 18-Jan-17 91 The linearity of expectations also extends to conditional expectations Lemma 2.6: For any finite collection of discrete RVs,,, with finite expectations and for any RV, = = [ = ] MAT-72306 RandAl, Spring 2017 18-Jan-17 92 22

Confusingly, the conditional expectation is also used to refer to the following RV Definition 2.7: The expression [ ] is a RV () that takes on the value [ = ] when = [ ] is not a real value; it is actually a function of the RV Hence [ ] is itself a function from the sample space to the real numbers and can therefore be thought of as a RV MAT-72306 RandAl, Spring 2017 18-Jan-17 93 In the previous example of rolling two dice, = Pr = = 1 6 = + 7 2 We see that is a RV whose value depends on If [ ] is a RV, then it makes sense to consider its expectation[ ] We found that = + 72 Thus, = + 7 2 =7 2 +7 2 =7=[] MAT-72306 RandAl, Spring 2017 18-Jan-17 94 23

More generally, Theorem 2.7: Y = [ ] Proof: From Definition 2.7 we have =, where takes on the value = when =. Hence = = Pr = The right-hand side equals Y by Lemma 2.5. MAT-72306 RandAl, Spring 2017 18-Jan-17 95 Consider a program that includes one call to a process Assume that each call to process recursively spawns new copies of the process, where the number of new copies is a BRV with parameters and We assume that these random variables are independent for each call to What is the expected number of copies of the process generated by the program? MAT-72306 RandAl, Spring 2017 18-Jan-17 96 24

To analyze this recursive spawning process, we use generations The initial process is in generation 0 Otherwise, we say that a process is in generation if it was spawned by another process in generation 1 Let denote the number of processes in generation Since we know that =1, the number of processes in generation 1 has a binomial distribution Thus, = MAT-72306 RandAl, Spring 2017 18-Jan-17 97 Similarly, suppose we knew that the number of processes in generation 1was, so = Then = = Applying Theorem 2.7, we can compute the expected size of the th generation inductively We have = = = [ ] By induction on, and using the fact that =1, we then obtain = MAT-72306 RandAl, Spring 2017 18-Jan-17 98 25

The expected total number of copies of process generated by the program is given by = = If 1then the expectation is unbounded; if <1, the expectation is 1 (1) The # of processes generated by the program is bounded iff the # of processes spawned by each process is less than 1 This is a simple example of a branching process, a probabilistic paradigm extensively studied in probability theory MAT-72306 RandAl, Spring 2017 18-Jan-17 99 2.4. The Geometric Distribution Let us flip a coin until it lands onheads? What is the distribution of the number of flips? This is an example of a geometric distribution It arises when we perform a sequence of independent trials until the first success, where each trial succeeds with probability Definition 2.8: A geometric RV with parameter is given by the following probability distribution on = 1,2, : Pr( = ) = MAT-72306 RandAl, Spring 2017 18-Jan-17 100 26

Geometric RVs are said to be memoryless because the probability that you will reach your first success trials from now is independent of the number of failures you have experienced Informally, one can ignore past failures they do not change the distribution of the number of future trials until first success Formally, we have the following Lemma 2.8: For a geometric RV with parameter and for >0, Pr( = + > ) = Pr( = ) MAT-72306 RandAl, Spring 2017 18-Jan-17 101 When a RV takes values in the set of natural numbers = {0,1,2,3, } there is an alternative formula for calculating its expectation Lemma 2.9: Let be a discrete RV that takes on only nonnegative integer values. Then Proof: [] = Pr = Pr Pr = = Pr = = Pr = = [] MAT-72306 RandAl, Spring 2017 18-Jan-17 102 27

For a geometric RV with parameter, Pr = = Hence = = 1 (1) = 1 Thus, for a fair coin where = 1/2, on average it takes two flips to see the first heads MAT-72306 RandAl, Spring 2017 18-Jan-17 103 Finding the expectation of a geometric RV with parameter using conditional expectations and the memoryless property of geometric RVs Recall that corresponds to the number of flips until the first heads given that each flip isheads with probability Let =0if the first flip istails and =1if the first flip isheads By the identity from Lemma 2.5, = Pr =0 =0 + Pr =1 [ = 1] = (1 )[ = 0] + [ = 1] MAT-72306 RandAl, Spring 2017 18-Jan-17 104 28

If = 1 then = 1, so [ = 1] = 1 If =0, then >1 In this case, let the number of remaining flips (after the first flip until the first heads) be Then, by the linearity of expectations, [] =(1)[+1]+1=(1)[]+1 By the memoryless property of geometric RVs, is also a geometric RV with parameter Hence [] =[], since they both have the same distribution We therefore have [] =(1)[]+1= (1)[]+1, which yields [] = 1/ MAT-72306 RandAl, Spring 2017 18-Jan-17 105 2.4.1. Example: Coupon Collector's Problem Each box of cereal contains one of different coupons Once you obtain one of every type of coupon, you can send in for a prize Coupon in each box is chosen independently and uniformly at random from the possibilities and that you do not collaborate to collect coupons? How many boxes of cereal must you buy before you obtain at least one of every type of coupon? MAT-72306 RandAl, Spring 2017 18-Jan-17 106 29

Let be the number of boxes bought until at least one of every type of coupon is obtained If is the number of boxes bought while you had exactly 1different coupons, then clearly = The advantage of breaking into a sum of random variables, =1,,, is that each is a geometric RV When exactly 1coupons have been found, the probability of obtaining a new coupon is =1 1 MAT-72306 RandAl, Spring 2017 18-Jan-17 107 Hence, is a geometric RV with parameter : = 1 = +1 Using the linearity of expectations, we have that = = +1 = 1 MAT-72306 RandAl, Spring 2017 18-Jan-17 108 30

The summation 1 harmonic number () is known as the Lemma 2.10: The harmonic number = 1satisfies () = ln (1). Thus, for the coupon collector's problem, the expected number of random coupons required to obtain all coupons is ln () MAT-72306 RandAl, Spring 2017 18-Jan-17 109 31